Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 0E70B431FAF for ; Sun, 4 Nov 2012 14:34:46 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FNhWQGK48vUg for ; Sun, 4 Nov 2012 14:34:45 -0800 (PST) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 3E275431FAE for ; Sun, 4 Nov 2012 14:34:45 -0800 (PST) Received: by mail-la0-f53.google.com with SMTP id l5so4055251lah.26 for ; Sun, 04 Nov 2012 14:34:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:subject:in-reply-to:references:user-agent:date:message-id :mime-version:content-type:x-gm-message-state; bh=rvj1zVYGT8ukw4NUhaiYyBE/VMYkkVUEG0cElHKvth0=; b=huK+vJNakhJOhbZgJ/bx7EzMqlpzFrUhh4SsGQ9bCJ8ml56Y750RCAU1B/E69fep6P XVEHxILgELYQSS/IMLSZXIRAx70ydd5xwMQRvEhfpWBWhaBVs1tQv1HZp7n5OrXg0gHP VKXrmFV6KY0SMEb7BZpRG3ca+dlpwlsFSRLPAUn8e5P2USw8d61AY8QvUoiCwHZp+uHD 7HYMmX5eReicmimy7asAqN2wTWI/ZaQQ035OF7ov86/DFn77ExG/jKKiIp5zQM2vJmxG 1JPhkaMam9sUZv3if1kuVjhZSgk7giiJcob0HJZY3OYw9P7nIMbKFjTZkMM23sfKzjrr diWg== Received: by 10.152.106.79 with SMTP id gs15mr7545417lab.31.1352068483774; Sun, 04 Nov 2012 14:34:43 -0800 (PST) Received: from localhost (dsl-hkibrasgw4-fe51df00-27.dhcp.inet.fi. [80.223.81.27]) by mx.google.com with ESMTPS id sj3sm5033777lab.2.2012.11.04.14.34.42 (version=SSLv3 cipher=OTHER); Sun, 04 Nov 2012 14:34:43 -0800 (PST) From: Jani Nikula To: David Bremner , Eirik Byrkjeflot Anonsen , notmuch@notmuchmail.org Subject: Re: Automatic suppression of non-duplicate messages In-Reply-To: <87390qxvb4.fsf@maritornes.cs.unb.ca> References: <87mwyz3s9d.fsf@star.eba> <87390qxvb4.fsf@maritornes.cs.unb.ca> User-Agent: Notmuch/0.14+81~g1924356 (http://notmuchmail.org) Emacs/23.4.1 (i686-pc-linux-gnu) Date: Mon, 05 Nov 2012 00:34:40 +0200 Message-ID: <87390pf14v.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Gm-Message-State: ALoCoQl48JG7cyaDBq1jTBqLU3DIygRcl6khW54JeAuRMFoXOgf5yh9PIJxQyNBuHhwPdyyXanIZ X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2012 22:34:46 -0000 On Sat, 03 Nov 2012, David Bremner wrote: > Eirik Byrkjeflot Anonsen writes: > >> That's not what I see. If I search for a term that only appears in >> one of the "copies", none of the copies are included in the search >> result. > > The offending code is at line 1813 of lib/database.cc; the message is > only indexed if the message-id is new. > > It might be sensible to move _notmuch_message_index_file into the other > branch of the if, but even if that works fine, something more > sophisticated is needed for the call to > __notmuch_message_set_header_values; the invariant that each message has > a single subject seems reasonable. > > Offhand I'm not sure of a good method of automatically deciding what is > the same message (with e.g. headers and footer text added by a mailing > list). Assuming there was good method, what would you do with two different messages that have the same message id? That is the unique id we use to identify messages (which should be fine per RFC 5322 and its predecessors; we're talking about messages from broken systems here). It might be helpful to have a configuration option similar to new.tags that would define the tags to be assigned to messages with duplicate message ids. (This could be done in the NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID case near line 516 of notmuch-new.c). This could be used to assign a "dupe" tag, for example, so the user could do whatever they want in the post-new hook or the user interface. A sufficiently clever post-new hook could compare the files of a message, and drop the tag or add another, as the case may be. Surely not a perfect solution, but keeps the implementation simple. BR, Jani.