Re: [notmuch] RFC: Multiple filenames for email messages

author Carl Worth <cworth@cworth.org>

Sun, 22 Nov 2009 04:21:52 +0000 (05:21 +0100)

committer W. Trevor King <wking@tremily.us>

Fri, 7 Nov 2014 17:35:39 +0000 (09:35 -0800)
author Carl Worth <cworth@cworth.org>
Sun, 22 Nov 2009 04:21:52 +0000 (05:21 +0100)
committer W. Trevor King <wking@tremily.us>
Fri, 7 Nov 2014 17:35:39 +0000 (09:35 -0800)
diff --git a/92/70149004d4b208ae4d2177bf6258d07a44b09f b/92/70149004d4b208ae4d2177bf6258d07a44b09f

new file mode 100644 (file)

index 0000000..30525ba
--- /dev/null
+++ b/92/70149004d4b208ae4d2177bf6258d07a44b09f
@@ -0,0 +1,94 @@
+Return-Path: <cworth@cworth.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id AC85B431FBF;\r
+       Sat, 21 Nov 2009 20:22:05 -0800 (PST)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+       by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+       with ESMTP id OQN3GZ6Fahom; Sat, 21 Nov 2009 20:22:05 -0800 (PST)\r
+Received: from cworth.org (localhost [127.0.0.1])\r
+       by olra.theworths.org (Postfix) with ESMTP id 9C241431FAE;\r
+       Sat, 21 Nov 2009 20:22:04 -0800 (PST)\r
+From: Carl Worth <cworth@cworth.org>\r
+To: Jan Janak <jan@ryngle.com>, Not Much Mail <notmuch@notmuchmail.org>\r
+In-Reply-To: <f35dbb950911211437q34923ee8w14b1ef65a204b09f@mail.gmail.com>\r
+References: <f35dbb950911211437q34923ee8w14b1ef65a204b09f@mail.gmail.com>\r
+Date: Sun, 22 Nov 2009 05:21:52 +0100\r
+Message-ID: <878wdz2nq7.fsf@yoom.home.cworth.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain; charset=us-ascii\r
+Subject: Re: [notmuch] RFC: Multiple filenames for email messages\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.12\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+       <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+       <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sun, 22 Nov 2009 04:22:05 -0000\r
+\r
+On Sat, 21 Nov 2009 23:37:24 +0100, Jan Janak <jan@ryngle.com> wrote:\r
+> The comment of _notmuch_message_set_filename says:\r
+> \r
+>    XXX: We should still figure out if we think it's important to store\r
+>    multiple filenames for email messages with identical message IDs.\r
+...\r
+> I'd like to propose that we store all filenames for email messages in\r
+> the database, not just one per message. I'd be happy to work on it and\r
+> submit a patch if others think that this would be good to have.\r
+\r
+Oh, sure. As soon as we start using filenames for searches, then that\r
+makes a lot of sense.\r
+\r
+Currently, notmuch isn't storing any filename that way, but should be,\r
+(need to just add a prefix to the table at the top of lib/database.cc,\r
+document it, and then make the indexing stage generate terms from the\r
+filename with that prefix).\r
+\r
+The term generator and query parser should do the right thing, which is\r
+to split the filename into individual terms at each '/', store position\r
+data with each, and then turn a search like:\r
+\r
+       filename:some/filename/segment\r
+\r
+into a phrase search that looks for the terms "some", "filename", and\r
+"segment", each with the filename prefix you choose and each in\r
+sequential position. Note that if you compile notmuch with CFLAGS\r
+including -DDEBUG then you'll see a nice report of the post-parsed query\r
+that's useful for debugging stuff like this.\r
+\r
+The reason for my comment was related to the other use of the filename,\r
+(that is, the only one we're currently using). This is with regard to\r
+querying the database for the actual filename, rather than searching on\r
+it. For this, we don't use terms, but instead use the "data" field of\r
+the document. I was wondering if in the presentation of an email message\r
+it would ever be important to have access to the multiple files.\r
+\r
+Can anyone think of a case where they would need that? That is, a case\r
+where you care about the distinct content of two messages that have the\r
+same message ID?\r
+\r
+I suppose that in the case of getting a message by two paths, (say\r
+through a mailing list and also via CC), one might want to inspect the\r
+different headers in the two versions. So maybe we'll need to break down\r
+and provide this information to the interfaces.\r
+\r
+Also, if we're going to support file deletion well, then I suppose we\r
+really will need to store all the filenames, (so if one disappears we\r
+can still point to the others). Also, we'll need to be able to\r
+accurately update the filename terms when a message disappears, so that\r
+means having all of the complete filenames around.\r
+\r
+So I guess I'm convincing myself that we really should store all the\r
+filenames, and also provide an interface to get a list of filenames for\r
+a message, (but also expect that many users of the API will only want to\r
+look at the first filename in the list).\r
+\r
+-Carl\r
author	Carl Worth <cworth@cworth.org>
	Sun, 22 Nov 2009 04:21:52 +0000 (05:21 +0100)
committer	W. Trevor King <wking@tremily.us>
	Fri, 7 Nov 2014 17:35:39 +0000 (09:35 -0800)