Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id C5AAA431FBC for ; Thu, 23 Feb 2012 23:00:07 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FySAFePq64fa for ; Thu, 23 Feb 2012 23:00:06 -0800 (PST) Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) by olra.theworths.org (Postfix) with ESMTP id 487F1431FAE for ; Thu, 23 Feb 2012 23:00:04 -0800 (PST) Received: from localhost (unknown [192.168.200.4]) by max.feld.cvut.cz (Postfix) with ESMTP id 134A73CFEC6; Fri, 24 Feb 2012 08:00:04 +0100 (CET) X-Virus-Scanned: IMAP AMAVIS Received: from max.feld.cvut.cz ([192.168.200.1]) by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new, port 10044) with ESMTP id td6IedLwPKbr; Fri, 24 Feb 2012 08:00:03 +0100 (CET) Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) by max.feld.cvut.cz (Postfix) with ESMTP id 150A83CFEC3; Fri, 24 Feb 2012 08:00:03 +0100 (CET) Received: from steelpick.2x.cz (cable-86-56-3-85.cust.telecolumbus.net [86.56.3.85]) (Authenticated sender: sojkam1) by imap.feld.cvut.cz (Postfix) with ESMTPSA id F2826660969; Fri, 24 Feb 2012 08:00:02 +0100 (CET) Received: from wsh by steelpick.2x.cz with local (Exim 4.77) (envelope-from ) id 1S0p8M-0006qe-9A; Fri, 24 Feb 2012 08:00:02 +0100 From: Michal Sojka To: Serge Z , notmuch@notmuchmail.org Subject: Re: [PATCH] test: Add test for searching of uncommonly encoded messages In-Reply-To: <20120224042925.2870.87924@localhost> References: <877gzd5axk.fsf@steelpick.2x.cz> <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz> <20120224042925.2870.87924@localhost> User-Agent: Notmuch/0.11.1+210~g5c2fc0a (http://notmuchmail.org) Emacs/23.3.1 (x86_64-pc-linux-gnu) Date: Fri, 24 Feb 2012 08:00:02 +0100 Message-ID: <874nug67il.fsf@steelpick.2x.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 07:00:08 -0000 On Fri, 24 Feb 2012, Serge Z wrote: > > Quoting Michal Sojka (2012-02-24 04:33:15) > >Emails that are encoded differently than as ASCII or UTF-8 are not > >indexed properly by notmuch. It is not possible to search for non-ASCII > >words within those messages. > > Ok. But we can preprocess each incoming message right after 'getmail' to > convert it from html to text and to utf8 encoding. One solution is to create a > seperate script for this and make gmail pipe all messages to this script, and > then to notmuch. But It would be better if maildir contains original messages > only, so the question is: can we make nomuch indexing engine to index > preprocessed message while maildir will contain original message - as it was > obtained? Hi, I'm not big fan of adding "preprocessor". First, I thing that both reasons you mention are actually bugs and it would be better to fix them for everybody than requiring each user to configure some preprocessor. Second, depending on what and how would your preprocessor do, the initial mail indexing could be a way slower, which is also nothing that people want. Do you have any other use case for the preprocessor besides utf8 and html->text conversions? Cheers, -Michal