From: Tomi Ollila Date: Sat, 11 Jun 2016 17:09:28 +0000 (+0300) Subject: Re: [PATCH] WIP: regexp matching in 'subject' and 'from' X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=dedf19def81c424501aa1b3c080a96fb7e73aa51;p=notmuch-archives.git Re: [PATCH] WIP: regexp matching in 'subject' and 'from' --- diff --git a/f8/7b2c4fcafd458c75c3c1c3371d4daf87d2973d b/f8/7b2c4fcafd458c75c3c1c3371d4daf87d2973d new file mode 100644 index 000000000..c1175240e --- /dev/null +++ b/f8/7b2c4fcafd458c75c3c1c3371d4daf87d2973d @@ -0,0 +1,120 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by arlo.cworth.org (Postfix) with ESMTP id E73936DE01EE + for ; Sat, 11 Jun 2016 10:09:52 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at cworth.org +X-Spam-Flag: NO +X-Spam-Score: 0.569 +X-Spam-Level: +X-Spam-Status: No, score=0.569 tagged_above=-999 required=5 tests=[AWL=-0.083, + SPF_NEUTRAL=0.652] autolearn=disabled +Received: from arlo.cworth.org ([127.0.0.1]) + by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id yAFjy2Mk-5xX for ; + Sat, 11 Jun 2016 10:09:44 -0700 (PDT) +Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) + by arlo.cworth.org (Postfix) with ESMTP id 988386DE01BE + for ; Sat, 11 Jun 2016 10:09:44 -0700 (PDT) +Received: from guru.guru-group.fi (localhost [IPv6:::1]) + by guru.guru-group.fi (Postfix) with ESMTP id 30235100104; + Sat, 11 Jun 2016 20:09:29 +0300 (EEST) +From: Tomi Ollila +To: Gaute Hope , David Bremner , + Austin Clements +Cc: notmuch +Subject: Re: [PATCH] WIP: regexp matching in 'subject' and 'from' +In-Reply-To: <1465662533-astroid-3-6vuqm3zu54-1296@strange> +References: <1465265149-7174-1-git-send-email-david@tethera.net> + <1465525688-30913-1-git-send-email-david@tethera.net> + <1465547660-astroid-0-nudmv20lbk-1296@strange> + <87a8itxpu7.fsf@zancas.localnet> + <1465662533-astroid-3-6vuqm3zu54-1296@strange> +User-Agent: Notmuch/0.22+42~gafaa8cf (https://notmuchmail.org) Emacs/24.5.1 + (x86_64-unknown-linux-gnu) +X-Face: HhBM'cA~ +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.20 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 11 Jun 2016 17:09:53 -0000 + +On Sat, Jun 11 2016, Gaute Hope wrote: + +> David Bremner writes on juni 10, 2016 13:09: +>> Gaute Hope writes: +>> +>>> +>>> Cool! +>>> +>>> Would it break a lot of things if you just replace the original prefix? +>> +>> It would change the matching behaviour. I guess there are people that +>> like the current "sloppy" matching of from: and subject:. In my +>> not-very-scientific tests, it is a factor of 5 to 10 times slower to do +>> regexp search, which makes sense because it is effectively post +>> processing the results from Xapian. At least on my system it seems fast +>> enough to be usable interactively, but that is a pretty shocking +>> performance regression. And I know there are people with more mail on +>> slower systems. +> +> Maybe we could check if the search string contains a regexp and decide +> whether to pre-process it on the background of that? I think that would +> make the interface more user-friendly. You'd just always use search +> whether you decide that you need to put in some regexp or not. + +You probably wanted to suggest that the command line handling in notmuch +goes through the search terms and potentially modify it before giving +to xapian to chew for... I think this is deliberately avoided (*) -- this +would get out of hands so easily (if we could decide syntax)... + +(*) there is some optmization done before feeding the query to xapian -- +but that does not affect interface (i.e. it could be dropped and none of +the users' expectations would be broken...) + +What one can do, is write ones own wrapper around notmuch. I have one +that was written long before notmuch got date: searches (it mangles +e.g 5h.. to 1234567890.. (**) and logs search and show queries +(**) should change that to use date:... instead (i.e. date: queries w/o +date: prefix). I "suggested" subject:/one's own subject re search w// slashes/ +which one could pretty easily write to the wrapper... + +Tomi + +> +>> +>>> Could it be made to work on the message body? +>> +>> See Austin's previous reply for the details, but basically no; these +>> "values" index in terms of whole strings, while the body is indexed by +>> terms (roughly, words). In principle we could add a value slot for the +>> body, but I think that would at least double the size of the database +>> (maybe more). +>> +> +> I would rather have double the db and be able wildcard beginning of +> terms. If it is not too much maintaining overhead it might be made +> optional? +> +> +> Regards, Gaute +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> https://notmuchmail.org/mailman/listinfo/notmuch