1 Return-Path: <triumhiz@yandex.ru>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 25CF0431FBC
\r
6 for <notmuch@notmuchmail.org>; Thu, 23 Feb 2012 23:54:14 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=1.146 tagged_above=-999 required=5
\r
12 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
\r
13 RCVD_IN_BL_SPAMCOP_NET=1.246, RCVD_IN_DNSWL_NONE=-0.0001]
\r
15 Received: from olra.theworths.org ([127.0.0.1])
\r
16 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
17 with ESMTP id a5PVBYPZWIG4 for <notmuch@notmuchmail.org>;
\r
18 Thu, 23 Feb 2012 23:54:13 -0800 (PST)
\r
19 Received: from forward8.mail.yandex.net (forward8.mail.yandex.net
\r
21 by olra.theworths.org (Postfix) with ESMTP id 4E573431FAE
\r
22 for <notmuch@notmuchmail.org>; Thu, 23 Feb 2012 23:54:13 -0800 (PST)
\r
23 Received: from smtp7.mail.yandex.net (smtp7.mail.yandex.net [77.88.61.55])
\r
24 by forward8.mail.yandex.net (Yandex) with ESMTP id 994B3F621CD
\r
25 for <notmuch@notmuchmail.org>; Fri, 24 Feb 2012 11:54:09 +0400 (MSK)
\r
26 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail;
\r
27 t=1330070049; bh=D3ossLmvdziNqa8crZWTpQsRO0my2YfBzLjCgTFP8RI=;
\r
28 h=Content-Type:MIME-Version:Content-Transfer-Encoding:From:To:
\r
29 References:In-Reply-To:Message-ID:Subject:Date;
\r
30 b=Q9FRC+BbL3shtKCvoftRS2mAhIeUPx9+P326TNVRwkBeeivwCo1LxP/mouKsQAmjk
\r
31 E+n4ajdYuzveKqp0ytC6prq46yBOVKOgDJix+7vDpr2yTN910GnxcCdaV7rmSGI54w
\r
32 PulUeRnADexPi33GJOocjQxOgkQH2MS+3csHQ6wU=
\r
33 Received: from smtp7.mail.yandex.net (localhost [127.0.0.1])
\r
34 by smtp7.mail.yandex.net (Yandex) with ESMTP id 7B37815803AA
\r
35 for <notmuch@notmuchmail.org>; Fri, 24 Feb 2012 11:54:09 +0400 (MSK)
\r
36 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail;
\r
37 t=1330070049; bh=D3ossLmvdziNqa8crZWTpQsRO0my2YfBzLjCgTFP8RI=;
\r
38 h=Content-Type:MIME-Version:Content-Transfer-Encoding:From:To:
\r
39 References:In-Reply-To:Message-ID:Subject:Date;
\r
40 b=Q9FRC+BbL3shtKCvoftRS2mAhIeUPx9+P326TNVRwkBeeivwCo1LxP/mouKsQAmjk
\r
41 E+n4ajdYuzveKqp0ytC6prq46yBOVKOgDJix+7vDpr2yTN910GnxcCdaV7rmSGI54w
\r
42 PulUeRnADexPi33GJOocjQxOgkQH2MS+3csHQ6wU=
\r
43 Received: from host-158-152-66-217.spbmts.ru (host-158-152-66-217.spbmts.ru
\r
45 by smtp7.mail.yandex.net (nwsmtp/Yandex) with ESMTP id
\r
46 s7Q0RJVT-s8Q0oiH5; Fri, 24 Feb 2012 11:54:08 +0400
\r
48 Content-Type: text/plain; charset="utf-8"
\r
50 Content-Transfer-Encoding: quoted-printable
\r
51 From: Serge Z <triumhiz@yandex.ru>
\r
52 User-Agent: alot/0.21+
\r
53 To: notmuch@notmuchmail.org
\r
54 References: <877gzd5axk.fsf@steelpick.2x.cz>
\r
55 <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz>
\r
56 <20120224042925.2870.87924@localhost> <874nug67il.fsf@steelpick.2x.cz>
\r
57 In-Reply-To: <874nug67il.fsf@steelpick.2x.cz>
\r
58 Message-ID: <20120224075700.13214.28221@localhost>
\r
59 Subject: Re: [PATCH] test: Add test for searching of uncommonly encoded
\r
61 Date: Fri, 24 Feb 2012 11:57:00 +0400
\r
62 X-BeenThere: notmuch@notmuchmail.org
\r
63 X-Mailman-Version: 2.1.13
\r
65 List-Id: "Use and development of the notmuch mail system."
\r
66 <notmuch.notmuchmail.org>
\r
67 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
68 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
69 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
70 List-Post: <mailto:notmuch@notmuchmail.org>
\r
71 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
72 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
73 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
74 X-List-Received-Date: Fri, 24 Feb 2012 07:54:14 -0000
\r
77 Quoting Michal Sojka (2012-02-24 11:00:02)
\r
78 >On Fri, 24 Feb 2012, Serge Z wrote:
\r
81 >> Quoting Michal Sojka (2012-02-24 04:33:15)
\r
82 >> >Emails that are encoded differently than as ASCII or UTF-8 are not
\r
83 >> >indexed properly by notmuch. It is not possible to search for non-ASCII
\r
84 >> >words within those messages.
\r
87 >> Ok. But we can preprocess each incoming message right after 'getmail' to
\r
88 >> convert it from html to text and to utf8 encoding. One solution is to cr=
\r
90 >> seperate script for this and make gmail pipe all messages to this script=
\r
92 >> then to notmuch. But It would be better if maildir contains original mes=
\r
94 >> only, so the question is: can we make nomuch indexing engine to index
\r
95 >> preprocessed message while maildir will contain original message - as it=
\r
101 >I'm not big fan of adding "preprocessor". First, I thing that both
\r
102 >reasons you mention are actually bugs and it would be better to fix them
\r
103 >for everybody than requiring each user to configure some preprocessor.
\r
104 >Second, depending on what and how would your preprocessor do, the
\r
105 >initial mail indexing could be a way slower, which is also nothing that
\r
108 >Do you have any other use case for the preprocessor besides utf8 and
\r
109 >html->text conversions?
\r
114 Well, I don't want to add any external preprocessor too.
\r
116 This may be considered as an architectural decision: search engine should n=
\r
118 access messages directly, but through some preprocessing layer which would
\r
119 handle the case of different encodings in body and headers, RFC2047-encoded
\r
120 headers (if this is not handled yet) etc.
\r
122 Anyway, this solution imho would be nice to be concluded inside a separate
\r
123 library which would be useful for notmuch clients as well as other mail
\r
124 indexing engines. Or an existing library should be looked for.
\r