Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 25CF0431FBC for ; Thu, 23 Feb 2012 23:54:14 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 1.146 X-Spam-Level: * X-Spam-Status: No, score=1.146 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_BL_SPAMCOP_NET=1.246, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a5PVBYPZWIG4 for ; Thu, 23 Feb 2012 23:54:13 -0800 (PST) Received: from forward8.mail.yandex.net (forward8.mail.yandex.net [77.88.61.38]) by olra.theworths.org (Postfix) with ESMTP id 4E573431FAE for ; Thu, 23 Feb 2012 23:54:13 -0800 (PST) Received: from smtp7.mail.yandex.net (smtp7.mail.yandex.net [77.88.61.55]) by forward8.mail.yandex.net (Yandex) with ESMTP id 994B3F621CD for ; Fri, 24 Feb 2012 11:54:09 +0400 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1330070049; bh=D3ossLmvdziNqa8crZWTpQsRO0my2YfBzLjCgTFP8RI=; h=Content-Type:MIME-Version:Content-Transfer-Encoding:From:To: References:In-Reply-To:Message-ID:Subject:Date; b=Q9FRC+BbL3shtKCvoftRS2mAhIeUPx9+P326TNVRwkBeeivwCo1LxP/mouKsQAmjk E+n4ajdYuzveKqp0ytC6prq46yBOVKOgDJix+7vDpr2yTN910GnxcCdaV7rmSGI54w PulUeRnADexPi33GJOocjQxOgkQH2MS+3csHQ6wU= Received: from smtp7.mail.yandex.net (localhost [127.0.0.1]) by smtp7.mail.yandex.net (Yandex) with ESMTP id 7B37815803AA for ; Fri, 24 Feb 2012 11:54:09 +0400 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1330070049; bh=D3ossLmvdziNqa8crZWTpQsRO0my2YfBzLjCgTFP8RI=; h=Content-Type:MIME-Version:Content-Transfer-Encoding:From:To: References:In-Reply-To:Message-ID:Subject:Date; b=Q9FRC+BbL3shtKCvoftRS2mAhIeUPx9+P326TNVRwkBeeivwCo1LxP/mouKsQAmjk E+n4ajdYuzveKqp0ytC6prq46yBOVKOgDJix+7vDpr2yTN910GnxcCdaV7rmSGI54w PulUeRnADexPi33GJOocjQxOgkQH2MS+3csHQ6wU= Received: from host-158-152-66-217.spbmts.ru (host-158-152-66-217.spbmts.ru [217.66.152.158]) by smtp7.mail.yandex.net (nwsmtp/Yandex) with ESMTP id s7Q0RJVT-s8Q0oiH5; Fri, 24 Feb 2012 11:54:08 +0400 X-Yandex-Spam: 1 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Serge Z User-Agent: alot/0.21+ To: notmuch@notmuchmail.org References: <877gzd5axk.fsf@steelpick.2x.cz> <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz> <20120224042925.2870.87924@localhost> <874nug67il.fsf@steelpick.2x.cz> In-Reply-To: <874nug67il.fsf@steelpick.2x.cz> Message-ID: <20120224075700.13214.28221@localhost> Subject: Re: [PATCH] test: Add test for searching of uncommonly encoded messages Date: Fri, 24 Feb 2012 11:57:00 +0400 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 07:54:14 -0000 Quoting Michal Sojka (2012-02-24 11:00:02) >On Fri, 24 Feb 2012, Serge Z wrote: >> = >> Quoting Michal Sojka (2012-02-24 04:33:15) >> >Emails that are encoded differently than as ASCII or UTF-8 are not >> >indexed properly by notmuch. It is not possible to search for non-ASCII >> >words within those messages. >> = >> Ok. But we can preprocess each incoming message right after 'getmail' to >> convert it from html to text and to utf8 encoding. One solution is to cr= eate a >> seperate script for this and make gmail pipe all messages to this script= , and >> then to notmuch. But It would be better if maildir contains original mes= sages >> only, so the question is: can we make nomuch indexing engine to index >> preprocessed message while maildir will contain original message - as it= was >> obtained? > >Hi, > >I'm not big fan of adding "preprocessor". First, I thing that both >reasons you mention are actually bugs and it would be better to fix them >for everybody than requiring each user to configure some preprocessor. >Second, depending on what and how would your preprocessor do, the >initial mail indexing could be a way slower, which is also nothing that >people want. > >Do you have any other use case for the preprocessor besides utf8 and >html->text conversions? > >Cheers, >-Michal Well, I don't want to add any external preprocessor too. This may be considered as an architectural decision: search engine should n= ot access messages directly, but through some preprocessing layer which would handle the case of different encodings in body and headers, RFC2047-encoded headers (if this is not handled yet) etc. Anyway, this solution imho would be nice to be concluded inside a separate library which would be useful for notmuch clients as well as other mail indexing engines. Or an existing library should be looked for.