From: Mark Walters Date: Sun, 2 Feb 2014 18:21:28 +0000 (+0000) Subject: Re: [PATCH v2 2/7] cli: refactor reply from guessing X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=894e6b95a722c924ca714d9619ce03c05e2f0c00;p=notmuch-archives.git Re: [PATCH v2 2/7] cli: refactor reply from guessing --- diff --git a/cc/3c219170ea74f94b20bdb9013578b58d432f91 b/cc/3c219170ea74f94b20bdb9013578b58d432f91 new file mode 100644 index 000000000..ae32d82bc --- /dev/null +++ b/cc/3c219170ea74f94b20bdb9013578b58d432f91 @@ -0,0 +1,331 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 30B40431FBD + for ; Sun, 2 Feb 2014 10:23:59 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -1.098 +X-Spam-Level: +X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 + tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, + NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id uMsOckJtkBkL for ; + Sun, 2 Feb 2014 10:23:52 -0800 (PST) +Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 88DF4431FBC + for ; Sun, 2 Feb 2014 10:23:52 -0800 (PST) +Received: from smtp.qmul.ac.uk ([138.37.6.40]) + by mail2.qmul.ac.uk with esmtp (Exim 4.71) + (envelope-from ) + id 1WA1hu-0001h9-EX; Sun, 02 Feb 2014 18:23:50 +0000 +Received: from 93-97-24-31.zone5.bethere.co.uk ([93.97.24.31] helo=localhost) + by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) + (envelope-from ) + id 1WA1h0-0006D9-Ps; Sun, 02 Feb 2014 18:22:55 +0000 +From: Mark Walters +To: Jani Nikula , notmuch@notmuchmail.org +Subject: Re: [PATCH v2 2/7] cli: refactor reply from guessing +In-Reply-To: + +References: + +User-Agent: Notmuch/0.15.2+484~gfb59956 (http://notmuchmail.org) Emacs/23.4.1 + (x86_64-pc-linux-gnu) +Date: Sun, 02 Feb 2014 18:21:28 +0000 +Message-ID: <87vbwxz87r.fsf@qmul.ac.uk> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Sender-Host-Address: 93.97.24.31 +X-QM-Geographic: According to ripencc, + this message was delivered by a machine in Britain (UK) (GB). +X-QM-SPAM-Info: Sender has good ham record. :) +X-QM-Body-MD5: 80499a0cc79b7c6c53c87dbaf1e23162 (of first 20000 bytes) +X-SpamAssassin-Score: 0.0 +X-SpamAssassin-SpamBar: / +X-SpamAssassin-Report: The QM spam filters have analysed this message to + determine if it is + spam. We require at least 5.0 points to mark a message as spam. + This message scored 0.0 points. Summary of the scoring: + * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail + provider * (markwalters1009[at]gmail.com) + * 0.0 AWL AWL: From: address is in the auto white-list +X-QM-Scan-Virus: ClamAV says the message is clean +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sun, 02 Feb 2014 18:23:59 -0000 + +On Sat, 30 Nov 2013, Jani Nikula wrote: +> The guess_from_received_header() function had grown quite big. Chop it +> up into smaller functions. +> +> No functional changes. +> --- +> notmuch-reply.c | 178 +++++++++++++++++++++++++++++++++----------------------- +> 1 file changed, 105 insertions(+), 73 deletions(-) +> +> diff --git a/notmuch-reply.c b/notmuch-reply.c +> index 9d6f843..ca41405 100644 +> --- a/notmuch-reply.c +> +++ b/notmuch-reply.c +> @@ -369,78 +369,44 @@ add_recipients_from_message (GMimeMessage *reply, +> return from_addr; +> } +> +> +/* +> + * Look for the user's address in " for " in the +> + * received headers. +> + * +> + * Return the address that was found, if any, and NULL otherwise. +> + */ +> static const char * +> -guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message) +> +guess_from_received_for (notmuch_config_t *config, const char *received) +> { +> - const char *addr, *received, *by; +> - char *mta,*ptr,*token; +> - char *domain=NULL; +> - char *tld=NULL; +> - const char *delim=". \t"; +> - size_t i; +> - +> - const char *to_headers[] = { +> - "Envelope-to", +> - "X-Original-To", +> - "Delivered-To", +> - }; +> - +> - /* sadly, there is no standard way to find out to which email +> - * address a mail was delivered - what is in the headers depends +> - * on the MTAs used along the way. So we are trying a number of +> - * heuristics which hopefully will answer this question. +> - +> - * We only got here if none of the users email addresses are in +> - * the To: or Cc: header. From here we try the following in order: +> - * 1) check for an Envelope-to: header +> - * 2) check for an X-Original-To: header +> - * 3) check for a Delivered-To: header +> - * 4) check for a (for ) clause in Received: headers +> - * 5) check for the domain part of known email addresses in the +> - * 'by' part of Received headers +> - * If none of these work, we give up and return NULL +> - */ + +I like having the logic laid out in a comment as above so would prefer +to see something similar included (that is points 1-6) but I am happy to +be overruled. + +> - for (i = 0; i < ARRAY_SIZE (to_headers); i++) { +> - const char *tohdr = notmuch_message_get_header (message, to_headers[i]); +> - +> - /* Note: tohdr potentially contains a list of email addresses. */ +> - addr = user_address_in_string (tohdr, config); +> - if (addr) +> - return addr; +> - } +> + const char *ptr; +> +> - /* We get the concatenated Received: headers and search from the +> - * front (last Received: header added) and try to extract from +> - * them indications to which email address this message was +> - * delivered. +> - * The Received: header is special in our get_header function +> - * and is always concatenated. +> - */ +> - received = notmuch_message_get_header (message, "received"); +> - if (received == NULL) +> + ptr = strstr (received, " for "); +> + if (! ptr) +> return NULL; +> +> - /* First we look for a " for " in the received +> - * header +> - */ +> - ptr = strstr (received, " for "); +> + return user_address_in_string (ptr, config); +> +} +> +> - /* Note: ptr potentially contains a list of email addresses. */ +> - addr = user_address_in_string (ptr, config); +> - if (addr) +> - return addr; +> - +> - /* Finally, we parse all the " by MTA ..." headers to guess the +> - * email address that this was originally delivered to. +> - * We extract just the MTA here by removing leading whitespace and +> - * assuming that the MTA name ends at the next whitespace. +> - * We test for *(by+4) to be non-'\0' to make sure there's +> - * something there at all - and then assume that the first +> - * whitespace delimited token that follows is the receiving +> - * system in this step of the receive chain +> - */ +> - by = received; +> - while((by = strstr (by, " by ")) != NULL) { +> +/* +> + * Parse all the " by MTA ..." parts in received headers to guess the +> + * email address that this was originally delivered to. +> + * +> + * Extract just the MTA here by removing leading whitespace and +> + * assuming that the MTA name ends at the next whitespace. Test for +> + * *(by+4) to be non-'\0' to make sure there's something there at all +> + * - and then assume that the first whitespace delimited token that +> + * follows is the receiving system in this step of the receive chain. +> + * +> + * Return the address that was found, if any, and NULL otherwise. +> + */ +> +static const char * +> +guess_from_received_by (notmuch_config_t *config, const char *received) +> +{ +> + const char *addr; +> + const char *by = received; +> + char *domain, *tld, *mta, *ptr, *token; +> + +> + while ((by = strstr (by, " by ")) != NULL) { +> by += 4; +> if (*by == '\0') +> break; +> @@ -454,7 +420,7 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message +> * as domain and tld. +> */ +> domain = tld = NULL; +> - while ((ptr = strsep (&token, delim)) != NULL) { +> + while ((ptr = strsep (&token, ". \t")) != NULL) { +> if (*ptr == '\0') +> continue; +> domain = tld; +> @@ -462,13 +428,13 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message +> } +> +> if (domain) { +> - /* Recombine domain and tld and look for it among the configured +> - * email addresses. +> - * This time we have a known domain name and nothing else - so +> - * the test is the other way around: we check if this is a +> - * substring of one of the email addresses. +> + /* Recombine domain and tld and look for it among the +> + * configured email addresses. This time we have a known +> + * domain name and nothing else - so the test is the other +> + * way around: we check if this is a substring of one of +> + * the email addresses. +> */ +> - *(tld-1) = '.'; +> + *(tld - 1) = '.'; +> +> addr = string_in_user_address (domain, config); +> if (addr) { +> @@ -482,6 +448,63 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message +> return NULL; +> } +> +> +/* +> + * Get the concatenated Received: headers and search from the front +> + * (last Received: header added) and try to extract from them +> + * indications to which email address this message was delivered. +> + * +> + * The Received: header is special in our get_header function and is +> + * always concatenated. +> + * +> + * Return the address that was found, if any, and NULL otherwise. +> + */ +> +static const char * +> +guess_from_received_header (notmuch_config_t *config, +> + notmuch_message_t *message) +> +{ +> + const char *received, *addr; +> + +> + received = notmuch_message_get_header (message, "received"); +> + if (! received) +> + return NULL; +> + +> + addr = guess_from_received_for (config, received); +> + if (! addr) +> + addr = guess_from_received_by (config, received); +> + +> + return addr; +> +} +> + +> +/* +> + * Try to find user's email address in one of the extra To-like +> + * headers, such as Envelope-To, X-Original-To, and +> + * Delivered-To. +> + * +> + * Return the address that was found, if any, and NULL otherwise. +> + */ + +I would prefer to replace the "extra To-like headers, such as ..." by +something more explicit: eg "extra To-like headers: Envelope-To, +X-Original-To, and Delivered-To (searched in that order)" + + +> +static const char * +> +from_from_to_headers (notmuch_config_t *config, notmuch_message_t *message) + +I am not keen on this name, but I am not sure I have a better +suggestion. + +Best wishes + +Mark + +> +{ +> + size_t i; +> + const char *tohdr, *addr; +> + const char *to_headers[] = { +> + "Envelope-to", +> + "X-Original-To", +> + "Delivered-To", +> + }; +> + +> + for (i = 0; i < ARRAY_SIZE (to_headers); i++) { +> + tohdr = notmuch_message_get_header (message, to_headers[i]); +> + +> + /* Note: tohdr potentially contains a list of email addresses. */ +> + addr = user_address_in_string (tohdr, config); +> + if (addr) +> + return addr; +> + } +> + +> + return NULL; +> +} +> + +> static GMimeMessage * +> create_reply_message(void *ctx, +> notmuch_config_t *config, +> @@ -508,6 +531,15 @@ create_reply_message(void *ctx, +> from_addr = add_recipients_from_message (reply, config, +> message, reply_all); +> +> + /* +> + * Sadly, there is no standard way to find out to which email +> + * address a mail was delivered - what is in the headers depends +> + * on the MTAs used along the way. So we are trying a number of +> + * heuristics which hopefully will answer this question. +> + */ +> + if (from_addr == NULL) +> + from_addr = from_from_to_headers (config, message); +> + +> if (from_addr == NULL) +> from_addr = guess_from_received_header (config, message); +> +> -- +> 1.8.4.2 +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch