Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D671A431FAF for ; Sun, 4 Nov 2012 20:28:18 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OaNKv9qAUY3m for ; Sun, 4 Nov 2012 20:28:18 -0800 (PST) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by olra.theworths.org (Postfix) with ESMTP id 21336431FAE for ; Sun, 4 Nov 2012 20:28:18 -0800 (PST) X-AuditID: 12074424-b7fce6d000000925-17-50974060829e Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 8D.4A.02341.06047905; Sun, 4 Nov 2012 23:28:16 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id qA54SFdC020876; Sun, 4 Nov 2012 23:28:16 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qA54SCp2029786 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Sun, 4 Nov 2012 23:28:14 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1TVEIG-0005QM-NN; Sun, 04 Nov 2012 23:28:12 -0500 Date: Sun, 4 Nov 2012 23:28:12 -0500 From: Austin Clements To: Jani Nikula Subject: Re: Automatic suppression of non-duplicate messages Message-ID: <20121105042749.GT15377@mit.edu> References: <87mwyz3s9d.fsf@star.eba> <87390qxvb4.fsf@maritornes.cs.unb.ca> <87390pf14v.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87390pf14v.fsf@nikula.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprGKsWRmVeSWpSXmKPExsUixCmqrZvoMD3AYN5NAYsbrd2MFitvXGS0 aJrubHH95kxmBxaPPYs6mD1u3X/N7vFs1S1mjy2H3jMHsERx2aSk5mSWpRbp2yVwZexqeMRc cE+w4sGrpYwNjFP4uhg5OSQETCTOzZjBCmGLSVy4t56ti5GLQ0hgH6PE0W1LwRJCAusZJWYt dYFInGCSeLdsCTuEs4RRYu/lTiaQKhYBFYmzO/eyg9hsAhoS2/YvZwSxRQQUJTaf3A9mMwuU Srzc2Q1WLyxgI3F+/1WwDbwCOhLHN51ih9iWLXH78W+ouKDEyZlPWCB6tSRu/HsJ1MsBZEtL LP/HARLmBFp15P8PNhBbFOiEKSe3sU1gFJqFpHsWku5ZCN0LGJlXMcqm5Fbp5iZm5hSnJusW Jyfm5aUW6Zrr5WaW6KWmlG5iBAU/u4vKDsbmQ0qHGAU4GJV4eD9ITA8QYk0sK67MPcQoycGk JMrLYQsU4kvKT6nMSCzOiC8qzUktPsQowcGsJMLLwQCU401JrKxKLcqHSUlzsCiJ815Puekv JJCeWJKanZpakFoEk5Xh4FCS4NW1B2oULEpNT61Iy8wpQUgzcXCCDOcBGu4KUsNbXJCYW5yZ DpE/xagoJc4rAZIQAElklObB9cKS0ytGcaBXhHl9Qap4gIkNrvsV0GAmoMHbL00BGVySiJCS amDcknVvxpXpPxa4rJBYfe5URNtpnRlrmxqKWOP0DxksfBIja8/m1So6yY0lneHkXc4pQjsy L5x7efL9F4Z32yYy8fk7Tvj/r13jgTDvnI9zHXZnOcjZndB+pxjG4CXJdnFuxSnmUveQzn+9 R28f6zmt9qlfnsHjtO+bU71v5Nr2xafneyWHqkxVYinOSDTUYi4qTgQAMGXJrSkDAAA= Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Nov 2012 04:28:19 -0000 Quoth Jani Nikula on Nov 05 at 12:34 am: > On Sat, 03 Nov 2012, David Bremner wrote: > > Eirik Byrkjeflot Anonsen writes: > > > >> That's not what I see. If I search for a term that only appears in > >> one of the "copies", none of the copies are included in the search > >> result. > > > > The offending code is at line 1813 of lib/database.cc; the message is > > only indexed if the message-id is new. > > > > It might be sensible to move _notmuch_message_index_file into the other > > branch of the if, but even if that works fine, something more > > sophisticated is needed for the call to > > __notmuch_message_set_header_values; the invariant that each message has > > a single subject seems reasonable. > > > > Offhand I'm not sure of a good method of automatically deciding what is > > the same message (with e.g. headers and footer text added by a mailing > > list). > > Assuming there was good method, what would you do with two different > messages that have the same message id? That is the unique id we use to > identify messages (which should be fine per RFC 5322 and its > predecessors; we're talking about messages from broken systems here). > > It might be helpful to have a configuration option similar to new.tags > that would define the tags to be assigned to messages with duplicate > message ids. (This could be done in the > NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID case near line 516 of > notmuch-new.c). This could be used to assign a "dupe" tag, for example, > so the user could do whatever they want in the post-new hook or the user > interface. A sufficiently clever post-new hook could compare the files > of a message, and drop the tag or add another, as the case may > be. Surely not a perfect solution, but keeps the implementation simple. This would also trigger on message flag changes and folder moves performed outside of notmuch, since notmuch sees those as a duplicate message ID followed by a deletion. The only way to do something for every received message even if it has the same message ID as an existing message is to do it in whatever delivers mail. Currently, we don't have a good story for integrating on-delivery operations with notmuch.