Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1F9AE431FC7 for ; Thu, 30 Oct 2014 02:00:46 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OCE1qIamr9hA for ; Thu, 30 Oct 2014 02:00:38 -0700 (PDT) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 0BEA2431FB6 for ; Thu, 30 Oct 2014 02:00:38 -0700 (PDT) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1XjlaZ-0002vG-94; Thu, 30 Oct 2014 09:00:35 +0000 Received: from 5751dfa2.skybroadband.com ([87.81.223.162] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) (envelope-from ) id 1XjlaY-0004TR-VY; Thu, 30 Oct 2014 09:00:15 +0000 From: Mark Walters To: Tomi Ollila , Michal Sojka , notmuch@notmuchmail.org Subject: Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses In-Reply-To: References: <1414421455-3037-1-git-send-email-sojkam1@fel.cvut.cz> <1414421455-3037-6-git-send-email-sojkam1@fel.cvut.cz> <87egtqug4t.fsf@qmul.ac.uk> User-Agent: Notmuch/0.18.1+86~gef5e66a (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu) Date: Thu, 30 Oct 2014 09:00:13 +0000 Message-ID: <87r3xq9bky.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 87.81.223.162 X-QM-Geographic: According to ripencc, this message was delivered by a machine in Britain (UK) (GB). X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: a0b2b2c2538659214970f37fd0d3d080 (of first 20000 bytes) X-SpamAssassin-Score: -0.1 X-SpamAssassin-SpamBar: / X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -0.1 points. Summary of the scoring: * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * -0.1 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2014 09:00:46 -0000 On Thu, 30 Oct 2014, Tomi Ollila wrote: > On Thu, Oct 30 2014, Mark Walters wrote: > >> On Mon, 27 Oct 2014, Michal Sojka wrote: >>> This adds an algorithm to filter out duplicate addresses from address >>> outputs (sender, receivers). The algorithm can be configured with >>> --filter-by command line option. >>> >>> The code here is an extended version of a patch from Jani Nikula. >> >> Hi >> >> As this is getting into the more controversial bike shedding region I >> wonder if it would be worth splitting this into 2 patches: the first >> could do the default dedupe based on name/address and the second could >> do add the filter-by options. >> >> I think the default deduping is obviously worth doing but I am not sure >> about the rest. In any case I think the default deduping could go in >> pre-freeze but I would recommend the rest is left until after. > > I can agree with that, but there is one hard thing to resolve: > "naming things"(*) > > (*) http://martinfowler.com/bliki/TwoHardThings.html > > With all rest ignored (sorry no time to work on this in more detail now), > this default deduping could be done with single argument '--unique'... In this case I am suggesting that to start with the default deduping is unconditionally done and that there is no command line argument. We can decide on other filter options, possibly including a completely unfiltered list (*), later. Best wishes Mark (*) Personally I don't really see a use case for the unfiltered list but others may disagree.