Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E6E61429E21 for ; Thu, 27 Oct 2011 04:13:00 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b5-7dvZhQczV for ; Thu, 27 Oct 2011 04:13:00 -0700 (PDT) Received: from smtprelay03.ispgateway.de (smtprelay03.ispgateway.de [80.67.31.37]) by olra.theworths.org (Postfix) with ESMTP id F2D05431FB6 for ; Thu, 27 Oct 2011 04:12:59 -0700 (PDT) Received: from [87.180.87.168] (helo=stokes.schwinge.homeip.net) by smtprelay03.ispgateway.de with esmtpa (Exim 4.68) (envelope-from ) id 1RJNtI-0007SR-Tg for notmuch@notmuchmail.org; Thu, 27 Oct 2011 13:12:57 +0200 Received: (qmail 30953 invoked from network); 27 Oct 2011 11:12:52 -0000 Received: from kepler.schwinge.homeip.net (192.168.111.7) by stokes.schwinge.homeip.net with QMQP; 27 Oct 2011 11:12:52 -0000 Received: (nullmailer pid 2292 invoked by uid 1000); Thu, 27 Oct 2011 11:12:52 -0000 From: Thomas Schwinge To: notmuch@notmuchmail.org Subject: Austin's custom query parser: folder/directory searching, some numbers User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.3.1 (i486-pc-linux-gnu) Date: Thu, 27 Oct 2011 13:12:46 +0200 Message-ID: <87mxcmbscx.fsf@kepler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-Df-Sender: dGhvbWFzQHNjaHdpbmdlLm5hbWU= Cc: Austin Clements X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 11:13:01 -0000 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! As I already told on IRC (and which I still have to polish and publish...), I recently merged Austin's custom query parser into my local tree, mainly (for now) for its exact folder/directory searching capabilities. Austin had published this work several months ago, and Carl in the mean time had implemented his own folder: searches. Now, there was a conflict about which to use; they have different semantics, Carl's being inadequate for my use case (not rooted, for example). On IRC, Carl recently had the most pragmatic solution for how to approach this: if we can't agree on having either his folder: semantics, or Austin's strict filename matching -- then just have both of them. So I now have arranged for having both Carl's folder: (with it's ``weak'' mail folder semantics), and also Austin's directory: (with it's ``hard'' directory/filename matching semantics), and on top of the latter implemented rdirectory: which extends directory: by recursive matching. This works really nice. IRC, freenode, #notmuch, 2011-09-30: tschwinge: Before you get in too deep I should point out that there's a (not unsurmountable) flaw in the folder handling. Because it expands to all of the desired dir-entry terms, it can chew up a huge amount of memory (~50K per matched file, IIRC). After importing several GNU mailing lists' archives yesterday, I now did some measurements, and it is in the 20s KiB per file, ranging from 26 KiB for a 9000 files hierarchy to 21 KiB for a 23000 files hierarchy (the reason for the non-linearity mostly being notmuch's regular resident size, etc., I assume). And, of course: $ find ~/Mail-schwinge.name-thomas/import/GNU/2011-04-03/ -type f | wc = -l 276010 $ notmuch search --output=3Dfiles -- rdirectory:import/GNU/2011-04-03 |= grep -F import/GNU/2011-04-03 | wc -l 0 $ echo "${PIPESTATUS[@]}" 137 1 0 $ dmesg | grep notmuch [3797089.224252] notmuch invoked oom-killer: gfp_mask=3D0x200da, order= =3D0, oom_adj=3D0, oom_score_adj=3D0 [3797089.224282] notmuch cpuset=3D/ mems_allowed=3D0 [3797089.224290] Pid: 586, comm: notmuch Not tainted 3.0.0-1-686-pae #1 [3797089.232081] [ 586] 1000 586 310693 257874 0 0 = 0 notmuch [3797089.232081] Out of memory: Kill process 586 (notmuch) score 697 or= sacrifice child [3797089.232081] Killed process 586 (notmuch) total-vm:1242772kB, anon-= rss:1031492kB, file-rss:4kB :-) (But this is no problem for me; I don't need to do such coarse-grained matching.) tschwinge: The solution is probably to add folder terms to messages (but as one, unsplit term, unlike in cworth's approach) and expand on those so that the space is bounded by the number of matched folders, rather than files. That would also make it quite easy to do arbitrary glob matching. (These would now be directory terms.) This suggestion still stands. (But I'm not working on it at the moment.) Gr=C3=BC=C3=9Fe, Thomas --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJOqTyuAAoJEGe3hdm9kOii6FkH/j+gJ7LvgHbHOHraQJ4BcNRu /DK9vQvslfbS6m5sVpBjpdJGW2rCs9wqnSFQvXljhd3FEMTjD6x4hecV2/ZT4tTr OPI3bd70IunSO5i2ssE2P2D31olnTY95c2+mYDOng3OXXBZFlCsW0Wh5GEx1C66U 31D1LRunsAeJBVkJNMgnPtpQlBtTZWKrTcFZNHyIhlXySQu8HkM1aWiOtpE+AbOa GHNxL7UaZkgqd+i6S3sKejKrN6u5SOwbUutCJCbOKKBbfO7SFIzo33i62aOx/YRJ yg9MVH60TugEtbwYx0OE11FW3a66u0lxqfcm3d8jwkPkXsOkjI+NaSjjF7iDmD8= =Yu2Y -----END PGP SIGNATURE----- --=-=-=--