Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 169C54196F2 for ; Mon, 17 May 2010 00:56:42 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[BAYES_20=-0.001] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q6GKhVqnjwsv for ; Mon, 17 May 2010 00:56:31 -0700 (PDT) Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) by olra.theworths.org (Postfix) with ESMTP id 334124196F0 for ; Mon, 17 May 2010 00:56:31 -0700 (PDT) Received: from localhost (unknown [192.168.200.4]) by max.feld.cvut.cz (Postfix) with ESMTP id B52EC19F33E1; Mon, 17 May 2010 09:56:29 +0200 (CEST) X-Virus-Scanned: IMAP AMAVIS Received: from max.feld.cvut.cz ([192.168.200.1]) by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new, port 10044) with ESMTP id 9HHT-Kdl6kxV; Mon, 17 May 2010 09:56:28 +0200 (CEST) Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) by max.feld.cvut.cz (Postfix) with ESMTP id 26F0119F334D; Mon, 17 May 2010 09:56:28 +0200 (CEST) Received: from steelpick.2x.cz (k335-30.felk.cvut.cz [147.32.86.30]) (Authenticated sender: sojkam1) by imap.feld.cvut.cz (Postfix) with ESMTPSA id F30C515C062; Mon, 17 May 2010 09:56:27 +0200 (CEST) Received: from wsh by steelpick.2x.cz with local (Exim 4.71) (envelope-from ) id 1ODvBb-0006xe-Kw; Mon, 17 May 2010 09:56:27 +0200 From: Michal Sojka To: Igor Shenderovich , notmuch@notmuchmail.org Subject: Re: utf-8 in author field In-Reply-To: References: User-Agent: Notmuch/0.3.1-33-g594021b (http://notmuchmail.org) Emacs/23.1.1 (x86_64-pc-linux-gnu) Date: Mon, 17 May 2010 09:56:27 +0200 Message-ID: <87vdanrmfo.fsf@steelpick.2x.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2010 07:56:42 -0000 On Fri, 14 May 2010, Igor Shenderovich wrote: > Hello all, > > I'm using the latest version of notmuch (cloned from git on May 13), but I > can't handle with utf-8 symbols in the authors field. For example, I have a > letter with the field > > "authors": > "=?UTF-8?B?Z3JpZmZvbiAtINCa0L7QvNC80LXQvdGC0LDRgNC40Lkg0LIg0JbQlg==?=", > > (got it from usual emacs interface). > > However, the body of this letter is pretty readable (it also contains some > utf-8 characters). > > What should one do to see the true list of authors? Hi, I encounter the same when headers are not encoded properly according to RFC 2047. I commonly see the violation of section 5, paragraph (3), sentence "An 'encoded-word' MUST NOT appear within a 'quoted-string'". That is when the encoded word is enclosed in double quotes. I guess, the "problem" is not only notmuch related, but all users of gmime library must be affected. I use the following patch for notmuch to sanitize headers from a popular mailing list server in Czech republic: Cheers, Michal From: Michal Sojka Subject: Fix broken headers from pandora.cz --- lib/message-file.c | 34 ++++++++++++++++++++++++++++++++++ 1 files changed, 34 insertions(+), 0 deletions(-) diff --git a/lib/message-file.c b/lib/message-file.c index 7722832..abfedc1 100644 --- a/lib/message-file.c +++ b/lib/message-file.c @@ -42,6 +42,7 @@ struct _notmuch_message_file { int broken_headers; int good_headers; size_t header_size; /* Length of full message header in bytes. */ + notmuch_bool_t pandora_cz_quirk; /* Parsing state */ char *line; @@ -324,7 +325,40 @@ notmuch_message_file_get_header (notmuch_message_file_t *message, else match = (strcasecmp (header, header_desired) == 0); + if (strstr(message->value.str, "=40pandora=2Ecz=29") || + strstr(message->value.str, "@pandora.cz") || + message->pandora_cz_quirk) + { + char *quote = message->value.str; + message->pandora_cz_quirk = TRUE; + if (*quote == '"') { + int len = strlen(quote); + bcopy(quote+1, quote, len); + quote = strchr(quote, '"'); + if (quote) { + len = strlen(quote); + bcopy(quote+1, quote, len); + } + } + } + decoded_value = g_mime_utils_header_decode_text (message->value.str); + + if (message->pandora_cz_quirk && + strcasecmp (header, "From") == 0) + { + /* remove "(@pandora.cz)" */ + char *langle = strchr(decoded_value, '<'); + if (langle) { + char *comment = langle - 2; + if (comment > decoded_value && *comment == ')') + while (comment > decoded_value && *comment != '(') + comment--; + if (comment > decoded_value) + bcopy(langle, comment, strlen(langle)+1); + } + } + header_sofar = (char *)g_hash_table_lookup (message->headers, header); /* we treat the Received: header special - we want to concat ALL of * the Received: headers we encounter. -- tg: (417274d..) t/Fix-broken-headers-from-pandora.cz (depends on: master)