From: Jani Nikula Date: Sat, 10 Jan 2015 12:13:09 +0000 (+0200) Subject: Re: [PATCH] Index Content-Type of attachments with a contenttype prefix X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=7be425976ee26dfa0380a9afa47b0cc72fb8872b;p=notmuch-archives.git Re: [PATCH] Index Content-Type of attachments with a contenttype prefix --- diff --git a/d4/770ea703ce09f95d91a9991d50fec3b5978309 b/d4/770ea703ce09f95d91a9991d50fec3b5978309 new file mode 100644 index 000000000..3133a32c0 --- /dev/null +++ b/d4/770ea703ce09f95d91a9991d50fec3b5978309 @@ -0,0 +1,264 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id E0887431E64 + for ; Sat, 10 Jan 2015 04:13:04 -0800 (PST) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 1.738 +X-Spam-Level: * +X-Spam-Status: No, score=1.738 tagged_above=-999 required=5 + tests=[DNS_FROM_AHBL_RHSBL=2.438, RCVD_IN_DNSWL_LOW=-0.7] + autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id YZd839BIRTqo for ; + Sat, 10 Jan 2015 04:13:01 -0800 (PST) +Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com + [74.125.82.51]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client + certificate requested) by olra.theworths.org (Postfix) with ESMTPS id + 057B7431FAF for ; Sat, 10 Jan 2015 04:13:01 -0800 + (PST) +Received: by mail-wg0-f51.google.com with SMTP id x12so12235571wgg.10 + for ; Sat, 10 Jan 2015 04:12:59 -0800 (PST) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20130820; + h=x-gm-message-state:from:to:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=qyewTrDxRgBL0ElwA74BVsu5aODFBpuk0lSWb5qt/+Q=; + b=GUtG2CKWj/obDHbhC7hwi3+GjjeMmqCCVYx8lNwnUE0YYR7T2IIB1EJb5bBxF6Vi74 + hqQAXbgkF5waLHUOnjugs9K95S2zXcGXAsYPG/wKijbG+QbOTTbSGt8AXVqIWbm15F8v + Z1dg9WQAO1PutC3/5hXVfLVZiBOibeHFQWKTRzyB23DeUOJ9n+/dzJZhP/uUm8yHeuWs + eYrJ6GIk1RcJKfwKXGwlaBNbJUnNU/F3MWIEN1TNaLb4dWv/8nL4QjLn1H3gR4ZyPegE + 0oFfp2J4JCn3QuYRUV1FvafsH42v6T0j50hwUGH5g6/zMVxl5JZV6OvYP/lIpxduaOc1 + A9uQ== +X-Gm-Message-State: + ALoCoQmGwSVpDeUQd2tPiVlc8KhPvTMpw/JX7x1T5Gd4xZb3+dd7KztrIg+ShvJsQ2rSOSgxa1r4 +X-Received: by 10.180.21.225 with SMTP id y1mr13417812wie.42.1420891979749; + Sat, 10 Jan 2015 04:12:59 -0800 (PST) +Received: from localhost (mobile-internet-bcee14-89.dhcp.inet.fi. + [188.238.20.89]) + by mx.google.com with ESMTPSA id r3sm2150348wic.10.2015.01.10.04.12.58 + (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); + Sat, 10 Jan 2015 04:12:59 -0800 (PST) +From: Jani Nikula +To: Todd , notmuch@notmuchmail.org +Subject: Re: [PATCH] Index Content-Type of attachments with a contenttype + prefix +In-Reply-To: <1420849787-4401-1-git-send-email-todd@electricoding.com> +References: <1420849787-4401-1-git-send-email-todd@electricoding.com> +User-Agent: Notmuch/0.19+6~gf2e3d2c (http://notmuchmail.org) Emacs/24.4.1 + (x86_64-pc-linux-gnu) +Date: Sat, 10 Jan 2015 14:13:09 +0200 +Message-ID: <8761ce7s16.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 10 Jan 2015 12:13:05 -0000 + +On Sat, 10 Jan 2015, Todd wrote: +> I wanted to tag messages with calendar invitations, but couldn't as +> the information wasn't indexed. +> +> This patch allows for queries for like: +> +> Find calendar invites +> - contenttype:text/calendar or contenttype:applicaton/ics +> +> Find any image attachments +> - contenttype:image +> +> Find all patches +> - contenttype:text/x-patch +> +> +> - Todd +> +> --- +> NEWS | 6 ++++++ +> completion/notmuch-completion.bash | 2 +- +> doc/man7/notmuch-search-terms.rst | 6 ++++++ +> emacs/notmuch.el | 2 +- +> lib/database.cc | 1 + +> lib/index.cc | 5 +++++ +> test/T190-multipart.sh | 32 ++++++++++++++++++++++++++++++++ + +IMO these could be split into several patches. + +> 7 files changed, 52 insertions(+), 2 deletions(-) +> +> diff --git a/NEWS b/NEWS +> index 44e8d05..5f4622c 100644 +> --- a/NEWS +> +++ b/NEWS +> @@ -15,6 +15,12 @@ keyboard shortcuts to saved searches. +> Command-Line Interface +> ---------------------- +> +> +There is a new `contenttype:` search prefix +> + +> + The new `contenttype:` search prefix allows searching for the +> + content-type of attachments, which is now indexed by `notmuch +> + insert`. See the `notmuch-search-terms` manual page for details. +> + + +Admittedly I did not have the time to dig into details, but I think +"attachment" is misleading, as it's really all mime parts, right? + +Will this also index the Content-Type: header of the message itself, +regardless of whether it has mime structure or not? Maybe it should? + +> Stopped `notmuch dump` failing if someone writes to the database +> +> The dump command now takes the write lock when running. This +> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash +> index d58dc8b..05b5969 100644 +> --- a/completion/notmuch-completion.bash +> +++ b/completion/notmuch-completion.bash +> @@ -61,7 +61,7 @@ _notmuch_search_terms() +> sed "s|^$path/||" | grep -v "\(^\|/\)\(cur\|new\|tmp\)$" ) ) +> ;; +> *) +> - local search_terms="from: to: subject: attachment: tag: id: thread: folder: path: date:" +> + local search_terms="from: to: subject: attachment: contenttype: tag: id: thread: folder: path: date:" +> compopt -o nospace +> COMPREPLY=( $(compgen -W "${search_terms}" -- ${cur}) ) +> ;; +> diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst +> index 1acdaa0..d126ce6 100644 +> --- a/doc/man7/notmuch-search-terms.rst +> +++ b/doc/man7/notmuch-search-terms.rst +> @@ -40,6 +40,8 @@ indicate user-supplied values): +> +> - attachment: +> +> +- contenttype: +> + +> - tag: (or is:) +> +> - id: +> @@ -66,6 +68,10 @@ by including quotation marks around the phrase, immediately following +> The **attachment:** prefix can be used to search for specific filenames +> (or extensions) of attachments to email messages. +> +> +The **contenttype:** prefix can be used to search for specific +> +content-types of attachments to email messages (as specified by the +> +sender). +> + +> For **tag:** and **is:** valid tag values include **inbox** and +> **unread** by default for new messages added by **notmuch new** as well +> as any other tag values added manually with **notmuch tag**. +> diff --git a/emacs/notmuch.el b/emacs/notmuch.el +> index 218486a..702700c 100644 +> --- a/emacs/notmuch.el +> +++ b/emacs/notmuch.el +> @@ -858,7 +858,7 @@ PROMPT is the string to prompt with." +> (lexical-let +> ((completions +> (append (list "folder:" "path:" "thread:" "id:" "date:" "from:" "to:" +> - "subject:" "attachment:") +> + "subject:" "attachment:" "contenttype:") +> (mapcar (lambda (tag) +> (concat "tag:" (notmuch-escape-boolean-term tag))) +> (process-lines notmuch-command "search" "--output=tags" "*"))))) +> diff --git a/lib/database.cc b/lib/database.cc +> index 3601f9d..a7a64c9 100644 +> --- a/lib/database.cc +> +++ b/lib/database.cc +> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= { +> { "from", "XFROM" }, +> { "to", "XTO" }, +> { "attachment", "XATTACHMENT" }, +> + { "contenttype", "XCONTENTTYPE"}, +> { "subject", "XSUBJECT"}, + +Is the use of probabilistic prefix intentional? I think it's probably +the right thing to do, but just checking. + +BR, +Jani. + +> }; +> +> diff --git a/lib/index.cc b/lib/index.cc +> index 1a2e63d..c3f7c6b 100644 +> --- a/lib/index.cc +> +++ b/lib/index.cc +> @@ -346,6 +346,11 @@ _index_mime_part (notmuch_message_t *message, +> return; +> } +> +> + GMimeContentType* content_type = g_mime_object_get_content_type(part); +> + if (content_type) { +> + _notmuch_message_gen_terms (message, "contenttype", g_mime_content_type_to_string(content_type)); +> + } +> + +> if (GMIME_IS_MESSAGE_PART (part)) { +> GMimeMessage *mime_message; +> +> diff --git a/test/T190-multipart.sh b/test/T190-multipart.sh +> index 85cbf67..e3270a7 100755 +> --- a/test/T190-multipart.sh +> +++ b/test/T190-multipart.sh +> @@ -104,6 +104,30 @@ Content-Transfer-Encoding: base64 +> 7w0K +> --==-=-=-- +> EOF +> + +> +cat < content_types +> +From: Todd +> +To: todd@electricoding.com +> +Subject: odd content types +> +Date: Fri, 05 Jan 2001 15:42:57 +0000 +> +User-Agent: Notmuch/0.5 (http://notmuchmail.org) Emacs/23.3.1 (i486-pc-linux-gnu) +> +Message-ID: <87liy5ap01.fsf@yoom.home.cworth.org> +> +MIME-Version: 1.0 +> +Content-Type: multipart/alternative; boundary="==-=-==" +> + +> +--==-=-== +> +Content-Type: application/unique_identifier +> + +> +

This is an embedded message, with a multipart/alternative part.

+> + +> +--==-=-== +> +Content-Type: text/some_other_identifier +> + +> +This is an embedded message, with a multipart/alternative part. +> + +> +--==-=-==-- +> +EOF +> +cat content_types >> ${MAIL_DIR}/odd_content_type +> notmuch new > /dev/null +> +> test_begin_subtest "--format=text --part=0, full message" +> @@ -727,4 +751,12 @@ test_begin_subtest "html parts included" +> notmuch show --format=json --include-html id:htmlmessage > OUTPUT +> test_expect_equal_json "$(cat OUTPUT)" "$(cat EXPECTED.withhtml)" +> +> +test_begin_subtest "indexes content-type" +> +output=$(notmuch search contenttype:application/unique_identifier | notmuch_search_sanitize) +> +test_expect_equal "$output" "thread:XXX 2001-01-05 [1/1] Todd; odd content types (inbox unread)" +> + +> +output=$(notmuch search contenttype:text/some_other_identifier | notmuch_search_sanitize) +> +test_expect_equal "$output" "thread:XXX 2001-01-05 [1/1] Todd; odd content types (inbox unread)" +> + +> + +> test_done +> -- +> 1.9.1 +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch