Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 62518431FBD; Mon, 21 Dec 2009 10:33:27 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l7qr1eBlIeIY; Mon, 21 Dec 2009 10:33:26 -0800 (PST) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id AB3FA431FAE; Mon, 21 Dec 2009 10:33:26 -0800 (PST) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id 53070254306; Mon, 21 Dec 2009 10:33:26 -0800 (PST) From: Carl Worth To: James Westby , notmuch@notmuchmail.org In-Reply-To: <1261186149-24078-1-git-send-email-jw+debian@jameswestby.net> References: <871virzzjy.fsf@yoom.home.cworth.org> <1261186149-24078-1-git-send-email-jw+debian@jameswestby.net> Date: Mon, 21 Dec 2009 10:33:21 -0800 Message-ID: <87vdg0npn2.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: Re: [notmuch] [PATCH] Reindex larger files that duplicate ids we have X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Dec 2009 18:33:27 -0000 --=-=-= Hi James, I just got to a point in my outstanding rework where I thought it would make sense to pull this patch series in, (I'm adding support for storing multiple filenames in a single mail document). I took a closer look at this series, and I think it's still independent, so I'll finish up what I'm doing and then add this series on top later. But I can at least answer some of the questions you asked for now: > Does the re-indexing replace the old terms? Before this patch, there's there's not yet any "re-indexing" in notmuch. So we'll basically need to think about what we want to do here. As this patch is written, (just calling into the existing _index_file function), the re-indexing only adds new terms, (and doesn't delete any). That's probably correct. We're using file size as an heuristic that the larger file is a superset of the smaller file, but it doesn't guarantee that the smaller file doesn't contain any unique terms. So I'd be extremely hesitant to drop any terms here. > In the case > where you had a collision with different text this could > make a search return mails that don't contain that text. > I don't think it's a big issue though, even if that is the > case. That's correct. As mentioned in a previous thread, this is likely only a big issue in the face of deliberate message-ID spoofing or so. In that thread we talked about some ideas for mitigating that. But I don't think we need to solve that problem before applying this patch series. -Carl --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFLL79x6JDdNq8qSWgRAs/9AKCZ+RWr9C2bkxVCrLeqau6L+2psigCfYcO9 7WG+Tis8W9qhm0g7oESvOT8= =wtON -----END PGP SIGNATURE----- --=-=-=--