From: Mark Walters Date: Thu, 30 Oct 2014 09:00:13 +0000 (+0000) Subject: Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=0e5293b7042048a4fcc7664eb417b6be7c9227c8;p=notmuch-archives.git Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses --- diff --git a/b3/0279c109cff97f7726618d85bcfa2874401261 b/b3/0279c109cff97f7726618d85bcfa2874401261 new file mode 100644 index 000000000..38de0892f --- /dev/null +++ b/b3/0279c109cff97f7726618d85bcfa2874401261 @@ -0,0 +1,116 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 1F9AE431FC7 + for ; Thu, 30 Oct 2014 02:00:46 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -1.098 +X-Spam-Level: +X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 + tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, + NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id OCE1qIamr9hA for ; + Thu, 30 Oct 2014 02:00:38 -0700 (PDT) +Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id 0BEA2431FB6 + for ; Thu, 30 Oct 2014 02:00:38 -0700 (PDT) +Received: from smtp.qmul.ac.uk ([138.37.6.40]) + by mail2.qmul.ac.uk with esmtp (Exim 4.71) + (envelope-from ) + id 1XjlaZ-0002vG-94; Thu, 30 Oct 2014 09:00:35 +0000 +Received: from 5751dfa2.skybroadband.com ([87.81.223.162] helo=localhost) + by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) + (envelope-from ) + id 1XjlaY-0004TR-VY; Thu, 30 Oct 2014 09:00:15 +0000 +From: Mark Walters +To: Tomi Ollila , Michal Sojka , + notmuch@notmuchmail.org +Subject: Re: [PATCH v4 5/6] cli: search: Add configurable way to + filter out duplicate addresses +In-Reply-To: +References: <1414421455-3037-1-git-send-email-sojkam1@fel.cvut.cz> + <1414421455-3037-6-git-send-email-sojkam1@fel.cvut.cz> + <87egtqug4t.fsf@qmul.ac.uk> +User-Agent: Notmuch/0.18.1+86~gef5e66a (http://notmuchmail.org) Emacs/23.4.1 + (x86_64-pc-linux-gnu) +Date: Thu, 30 Oct 2014 09:00:13 +0000 +Message-ID: <87r3xq9bky.fsf@qmul.ac.uk> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Sender-Host-Address: 87.81.223.162 +X-QM-Geographic: According to ripencc, + this message was delivered by a machine in Britain (UK) (GB). +X-QM-SPAM-Info: Sender has good ham record. :) +X-QM-Body-MD5: a0b2b2c2538659214970f37fd0d3d080 (of first 20000 bytes) +X-SpamAssassin-Score: -0.1 +X-SpamAssassin-SpamBar: / +X-SpamAssassin-Report: The QM spam filters have analysed this message to + determine if it is + spam. We require at least 5.0 points to mark a message as spam. + This message scored -0.1 points. + Summary of the scoring: + * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail + provider * (markwalters1009[at]gmail.com) + * -0.1 AWL AWL: From: address is in the auto white-list +X-QM-Scan-Virus: ClamAV says the message is clean +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 30 Oct 2014 09:00:46 -0000 + +On Thu, 30 Oct 2014, Tomi Ollila wrote: +> On Thu, Oct 30 2014, Mark Walters wrote: +> +>> On Mon, 27 Oct 2014, Michal Sojka wrote: +>>> This adds an algorithm to filter out duplicate addresses from address +>>> outputs (sender, receivers). The algorithm can be configured with +>>> --filter-by command line option. +>>> +>>> The code here is an extended version of a patch from Jani Nikula. +>> +>> Hi +>> +>> As this is getting into the more controversial bike shedding region I +>> wonder if it would be worth splitting this into 2 patches: the first +>> could do the default dedupe based on name/address and the second could +>> do add the filter-by options. +>> +>> I think the default deduping is obviously worth doing but I am not sure +>> about the rest. In any case I think the default deduping could go in +>> pre-freeze but I would recommend the rest is left until after. +> +> I can agree with that, but there is one hard thing to resolve: +> "naming things"(*) +> +> (*) http://martinfowler.com/bliki/TwoHardThings.html +> +> With all rest ignored (sorry no time to work on this in more detail now), +> this default deduping could be done with single argument '--unique'... + +In this case I am suggesting that to start with the default deduping is +unconditionally done and that there is no command line argument. We can +decide on other filter options, possibly including a completely +unfiltered list (*), later. + +Best wishes + +Mark + +(*) Personally I don't really see a use case for the unfiltered list but +others may disagree. +