Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id C7DCF431FB6 for ; Wed, 22 Feb 2012 05:07:45 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Db8mSx74CKpR for ; Wed, 22 Feb 2012 05:07:42 -0800 (PST) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E0167431FAE for ; Wed, 22 Feb 2012 05:07:41 -0800 (PST) Received: by qafk1 with SMTP id k1so165835qaf.5 for ; Wed, 22 Feb 2012 05:07:40 -0800 (PST) Received-SPF: pass (google.com: domain of jani@nikula.org designates 10.229.115.25 as permitted sender) client-ip=10.229.115.25; Authentication-Results: mr.google.com; spf=pass (google.com: domain of jani@nikula.org designates 10.229.115.25 as permitted sender) smtp.mail=jani@nikula.org Received: from mr.google.com ([10.229.115.25]) by 10.229.115.25 with SMTP id g25mr23000596qcq.27.1329916060286 (num_hops = 1); Wed, 22 Feb 2012 05:07:40 -0800 (PST) Received: by 10.229.115.25 with SMTP id g25mr19397953qcq.27.1329916059825; Wed, 22 Feb 2012 05:07:39 -0800 (PST) Received: from localhost (nikula.org. [92.243.24.172]) by mx.google.com with ESMTPS id bd19sm66057395qab.17.2012.02.22.05.07.37 (version=SSLv3 cipher=OTHER); Wed, 22 Feb 2012 05:07:38 -0800 (PST) From: Jani Nikula To: Jesse Rosenthal , Daniel Schoepe , Justus Winter <4winter@informatik.uni-hamburg.de>, Philippe LeCavalier , notmuch@notmuchmail.org Subject: Re: nomuch_addresses.py In-Reply-To: <87boosjgd9.fsf@jhu.edu> References: <87r4xur3rv.fsf@plc.plecavalier.com> <87fweamenf.fsf@schoepe.localhost> <20120221091509.8534.59492@thinkbox.jade-hamburg.de> <87zkccjnst.fsf@schoepe.localhost> <87boosjgd9.fsf@jhu.edu> User-Agent: Notmuch/0.11.1+222~ga47a98c (http://notmuchmail.org) Emacs/23.1.1 (i686-pc-linux-gnu) Date: Wed, 22 Feb 2012 13:07:35 +0000 Message-ID: <871upn6mp4.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gm-Message-State: ALoCoQnfaEEiP9S1cSpqJRPUSMty0CZFY0ZzKmPJGWLOt1Ek5pOgoMMAgJNYNnn4eqX28EYTwxaP X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Feb 2012 13:07:45 -0000 On Tue, 21 Feb 2012 11:33:38 -0500, Jesse Rosenthal wrote: > On Tue, 21 Feb 2012 14:53:06 +0100, Daniel Schoepe wrote: > > On Tue, 21 Feb 2012 09:15:09 -0000, Justus Winter <4winter@informatik.uni-hamburg.de> wrote: > > The reason I mentioned nottoomuch-addresses at all, is that completion > > itself is _a lot_ faster (at least for me), compared to > > addrlookup. According to the wiki, notmuch-addresses.py is even slower > > than addrlookup, so I thought (and still think) that it was worth > > mentioning. Of course, one could rewrite the database-generation part in > > python using the bindings, but I personally don't think it's that > > necessary. > > I'm not sure what speed comparisons were being used -- I think it was > Sebastian comparing vala to python. In any case, using > notmuch_addresses.py to look up a common prefix ("Jes") on a slowish > computer takes 0.2 seconds. So I'm not sure if the speed is all that > much of an issue. It might be a question of cache temperature, though -- > it'll probably take longer the first time you run it. Still, even trying > something out on a cold cache, it seems to be about a second. The speed comparisons between vanilla notmuch_addresses.py and nottoomuch-addresses.sh are going to be flawed in that they do different things. It's comparing apples and oranges. notmuch_addresses.py looks for matches in the recipients of mails the user has sent. Nothing else. notmuch_addresses.py filters out multiple names for one email address using a popularity contest. AFAICT nottoomuch-addresses.sh scans all the addresses in all the mails. It has no logic for filtering out multiple names for one email address, and just returns all matches. Personally I would like to have best of both worlds, and I'm using a modified notmuch_addresses.py that matches all the mails I have, and cleans up the duplicate results. Unfortunately that does have a toll on performance, taking about a second on my system for typical searches, cache hot, while nottoomuch-addresses.sh takes less than a tenth of a second. It is enough to be annoying, I'm afraid. Even so, it's not a fair comparison because notmuch_addresses.py wasn't designed with this in mind, and nottoomuch-addresses.sh maintains its own database and does less. One just needs to pick the tool that fits the needs best. BR, Jani.