From 88c16a43ed40f78056914756bc4172517d818964 Mon Sep 17 00:00:00 2001 From: Mark Walters Date: Wed, 22 Oct 2014 00:10:21 +0100 Subject: [PATCH] Re: [PATCH v2 08/12] lib: Implement ghost-based thread linking --- 22/40912bea97d1cd9142b68456e58b5689237aa6 | 274 ++++++++++++++++++++++ 1 file changed, 274 insertions(+) create mode 100644 22/40912bea97d1cd9142b68456e58b5689237aa6 diff --git a/22/40912bea97d1cd9142b68456e58b5689237aa6 b/22/40912bea97d1cd9142b68456e58b5689237aa6 new file mode 100644 index 000000000..fe77bb5bb --- /dev/null +++ b/22/40912bea97d1cd9142b68456e58b5689237aa6 @@ -0,0 +1,274 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 132F7431FB6 + for ; Tue, 21 Oct 2014 16:10:38 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -1.098 +X-Spam-Level: +X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 + tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, + NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id uV+vVp9B3lMM for ; + Tue, 21 Oct 2014 16:10:30 -0700 (PDT) +Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) + (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) + (No client certificate requested) + by olra.theworths.org (Postfix) with ESMTPS id C5058431FAE + for ; Tue, 21 Oct 2014 16:10:29 -0700 (PDT) +Received: from smtp.qmul.ac.uk ([138.37.6.40]) + by mail2.qmul.ac.uk with esmtp (Exim 4.71) + (envelope-from ) + id 1XgiZM-0004l0-3F; Wed, 22 Oct 2014 00:10:26 +0100 +Received: from 5751dfa2.skybroadband.com ([87.81.223.162] helo=localhost) + by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.71) + (envelope-from ) + id 1XgiZL-0002hb-E9; Wed, 22 Oct 2014 00:10:23 +0100 +From: Mark Walters +To: Austin Clements , notmuch@notmuchmail.org +Subject: Re: [PATCH v2 08/12] lib: Implement ghost-based thread linking +In-Reply-To: <1412637438-4821-9-git-send-email-aclements@csail.mit.edu> +References: <1412637438-4821-1-git-send-email-aclements@csail.mit.edu> + <1412637438-4821-9-git-send-email-aclements@csail.mit.edu> +User-Agent: Notmuch/0.18.1+86~gef5e66a (http://notmuchmail.org) Emacs/23.4.1 + (x86_64-pc-linux-gnu) +Date: Wed, 22 Oct 2014 00:10:21 +0100 +Message-ID: <87zjcphvdu.fsf@qmul.ac.uk> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +X-Sender-Host-Address: 87.81.223.162 +X-QM-Geographic: According to ripencc, + this message was delivered by a machine in Britain (UK) (GB). +X-QM-SPAM-Info: Sender has good ham record. :) +X-QM-Body-MD5: 9724697ff21306866e8fe63340c902a0 (of first 20000 bytes) +X-SpamAssassin-Score: -0.1 +X-SpamAssassin-SpamBar: / +X-SpamAssassin-Report: The QM spam filters have analysed this message to + determine if it is + spam. We require at least 5.0 points to mark a message as spam. + This message scored -0.1 points. + Summary of the scoring: + * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail + provider * (markwalters1009[at]gmail.com) + * -0.1 AWL AWL: From: address is in the auto white-list +X-QM-Scan-Virus: ClamAV says the message is clean +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Tue, 21 Oct 2014 23:10:38 -0000 + +On Tue, 07 Oct 2014, Austin Clements wrote: +> From: Austin Clements +> +> This updates the thread linking code to use ghost messages instead of +> user metadata to link messages into threads. +> +> In contrast with the old approach, this is actually correct. +> Previously, thread merging updated only the thread IDs of message +> documents, not thread IDs stored in user metadata. As originally +> diagnosed by Mark Walters [1] and as demonstrated by the broken +> T260-thread-order test, this can cause notmuch to fail to link +> messages even though they're in the same thread. In principle the old +> approach could have been fixed by updating the user metadata thread +> IDs as well, but these are not indexed and hence this would have +> required a full scan of all stored thread IDs. Ghost messages solve +> this problem naturally by reusing the exact same thread ID and message +> ID representation and indexing as regular messages. +> +> Furthermore, thanks to this greater symmetry, ghost messages are also +> algorithmically simpler. We continue to support the old user metadata +> format, so this patch can't delete any code, but when we do remove +> support for the old format, several functions can simply be deleted. +> +> [1] id:8738h7kv2q.fsf@qmul.ac.uk +> --- +> lib/database.cc | 86 +++++++++++++++++++++++++++++++++++++++++++++++++-------- +> 1 file changed, 75 insertions(+), 11 deletions(-) +> +> diff --git a/lib/database.cc b/lib/database.cc +> index c641bcd..fdcc526 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char *message_id) +> message_id); +> } +> +> +static notmuch_status_t +> +_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch, +> + void *ctx, +> + const char *message_id, +> + const char **thread_id_ret); +> + +> /* Find the thread ID to which the message with 'message_id' belongs. +> * +> * Note: 'thread_id_ret' must not be NULL! +> @@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char *message_id) +> * +> * Note: If there is no message in the database with the given +> * 'message_id' then a new thread_id will be allocated for this +> - * message and stored in the database metadata, (where this same +> + * message ID and stored in the database metadata so that the +> * thread ID can be looked up if the message is added to the database +> - * later). +> + * later. +> */ +> static notmuch_status_t +> _resolve_message_id_to_thread_id (notmuch_database_t *notmuch, +> @@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t *notmuch, +> const char *message_id, +> const char **thread_id_ret) +> { +> + notmuch_private_status_t status; +> + notmuch_message_t *message; +> + +> + if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS)) +> + return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id, +> + thread_id_ret); +> + +> + /* Look for this message (regular or ghost) */ +> + message = _notmuch_message_create_for_message_id ( +> + notmuch, message_id, &status); +> + if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) { +> + /* Message exists */ +> + *thread_id_ret = talloc_steal ( +> + ctx, notmuch_message_get_thread_id (message)); +> + } else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) { +> + /* Message did not exist. Give it a fresh thread ID and +> + * populate this message as a ghost message. */ +> + *thread_id_ret = talloc_strdup ( +> + ctx, _notmuch_database_generate_thread_id (notmuch)); +> + if (! *thread_id_ret) { +> + status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY; +> + } else { +> + status = _notmuch_message_initialize_ghost (message, *thread_id_ret); +> + if (status == 0) +> + /* Commit the new ghost message */ +> + _notmuch_message_sync (message); +> + } +> + } else { +> + /* Create failed. Fall through. */ +> + } +> + +> + notmuch_message_destroy (message); +> + +> + return COERCE_STATUS (status, "Error creating ghost message"); +> +} +> + +> +/* Pre-ghost messages _resolve_message_id_to_thread_id */ +> +static notmuch_status_t +> +_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch, +> + void *ctx, +> + const char *message_id, +> + const char **thread_id_ret) +> +{ +> notmuch_status_t status; +> notmuch_message_t *message; +> string thread_id_string; +> @@ -2007,7 +2056,7 @@ _consume_metadata_thread_id (void *ctx, notmuch_database_t *notmuch, +> } +> } +> +> -/* Given a (mostly empty) 'message' and its corresponding +> +/* Given a blank or ghost 'message' and its corresponding +> * 'message_file' link it to existing threads in the database. +> * +> * The first check is in the metadata of the database to see if we + +There is quite a lot of comment below this and it is not clear to me how +much of it applies to the is_ghost case. In particular does the + + * Finally, we look in the database for existing message that + * reference 'message'. + +still apply? I would have thought that we would already have done that +in the is_ghost case. + +Of course this also applies to the actual code which seems to call +_notmuch_database_link_message_to_children unconditionally. It might be +worth a comment in any case. + +Best wishes + +Mark + + +> @@ -2035,16 +2084,22 @@ _consume_metadata_thread_id (void *ctx, notmuch_database_t *notmuch, +> static notmuch_status_t +> _notmuch_database_link_message (notmuch_database_t *notmuch, +> notmuch_message_t *message, +> - notmuch_message_file_t *message_file) +> + notmuch_message_file_t *message_file, +> + notmuch_bool_t is_ghost) +> { +> void *local = talloc_new (NULL); +> notmuch_status_t status; +> - const char *thread_id; +> + const char *thread_id = NULL; +> +> /* Check if the message already had a thread ID */ +> - thread_id = _consume_metadata_thread_id (local, notmuch, message); +> - if (thread_id) +> - _notmuch_message_add_term (message, "thread", thread_id); +> + if (notmuch->features & NOTMUCH_FEATURE_GHOSTS) { +> + if (is_ghost) +> + thread_id = notmuch_message_get_thread_id (message); +> + } else { +> + thread_id = _consume_metadata_thread_id (local, notmuch, message); +> + if (thread_id) +> + _notmuch_message_add_term (message, "thread", thread_id); +> + } +> +> status = _notmuch_database_link_message_to_parents (notmuch, message, +> message_file, +> @@ -2079,6 +2134,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +> notmuch_message_t *message = NULL; +> notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, ret2; +> notmuch_private_status_t private_status; +> + notmuch_bool_t is_ghost = false; +> +> const char *date, *header; +> const char *from, *to, *subject; +> @@ -2171,12 +2227,20 @@ notmuch_database_add_message (notmuch_database_t *notmuch, +> +> _notmuch_message_add_filename (message, filename); +> +> - /* Is this a newly created message object? */ +> - if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) { +> + /* Is this a newly created message object or a ghost +> + * message? We have to be slightly careful: if this is a +> + * blank message, it's not safe to call +> + * notmuch_message_get_flag yet. */ +> + if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND || +> + (is_ghost = notmuch_message_get_flag ( +> + message, NOTMUCH_MESSAGE_FLAG_GHOST))) { +> _notmuch_message_add_term (message, "type", "mail"); +> + if (is_ghost) +> + /* Convert ghost message to a regular message */ +> + _notmuch_message_remove_term (message, "type", "ghost"); +> +> ret = _notmuch_database_link_message (notmuch, message, +> - message_file); +> + message_file, is_ghost); +> if (ret) +> goto DONE; +> +> -- +> 2.1.0 +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch -- 2.26.2