From 09ea7bbc4ac859c6076b5fcec8c0d7707d664236 Mon Sep 17 00:00:00 2001 From: Michal Sojka Date: Thu, 9 Oct 2014 12:55:38 +0200 Subject: [PATCH] Re: [PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar --- a0/a323efae2ecb3e8bd2a70e61bd6b9fc52abe77 | 509 ++++++++++++++++++++++ 1 file changed, 509 insertions(+) create mode 100644 a0/a323efae2ecb3e8bd2a70e61bd6b9fc52abe77 diff --git a/a0/a323efae2ecb3e8bd2a70e61bd6b9fc52abe77 b/a0/a323efae2ecb3e8bd2a70e61bd6b9fc52abe77 new file mode 100644 index 000000000..4443199c5 --- /dev/null +++ b/a0/a323efae2ecb3e8bd2a70e61bd6b9fc52abe77 @@ -0,0 +1,509 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id BEED0431FB6 + for ; Thu, 9 Oct 2014 03:55:56 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -2.3 +X-Spam-Level: +X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id V9osyjXzGVic for ; + Thu, 9 Oct 2014 03:55:47 -0700 (PDT) +Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) + by olra.theworths.org (Postfix) with ESMTP id 73D37431FAF + for ; Thu, 9 Oct 2014 03:55:47 -0700 (PDT) +Received: from localhost (unknown [192.168.200.7]) + by max.feld.cvut.cz (Postfix) with ESMTP id 8F02E5CCF60; + Thu, 9 Oct 2014 12:55:42 +0200 (CEST) +X-Virus-Scanned: IMAP STYX AMAVIS +Received: from max.feld.cvut.cz ([192.168.200.1]) + by localhost (styx.feld.cvut.cz [192.168.200.7]) (amavisd-new, + port 10044) + with ESMTP id NbQiOYUUUiMu; Thu, 9 Oct 2014 12:55:39 +0200 (CEST) +Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) + by max.feld.cvut.cz (Postfix) with ESMTP id 06CAA5CCE7B; + Thu, 9 Oct 2014 12:55:38 +0200 (CEST) +Received: from wsh by steelpick.2x.cz with local (Exim 4.84) + (envelope-from ) + id 1XcBNi-0000TV-MN; Thu, 09 Oct 2014 12:55:38 +0200 +From: Michal Sojka +To: Tomi Ollila , notmuch@notmuchmail.org +Subject: Re: [PATCH v2 2/4] cli: Extend the search command for + --output=addresses and similar +In-Reply-To: +References: <874mvqxrnp.fsf@nikula.org> + <1412542319-20017-1-git-send-email-sojkam1@fel.cvut.cz> + <1412542319-20017-3-git-send-email-sojkam1@fel.cvut.cz> + +User-Agent: Notmuch/0.18.1+101~g56b0ff0 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-pc-linux-gnu) +Date: Thu, 09 Oct 2014 12:55:38 +0200 +Message-ID: <87zjd51phx.fsf@steelpick.2x.cz> +MIME-Version: 1.0 +Content-Type: text/plain; charset=utf-8 +Content-Transfer-Encoding: quoted-printable +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 09 Oct 2014 10:55:56 -0000 + +On Mon, Oct 06 2014, Tomi Ollila wrote: +> On Sun, Oct 05 2014, Michal Sojka wrote: +> +>> The new outputs allow printing senders, recipients or both of matching +>> messages. +>> +>> This code based on a patch from Jani Nikula. +> +> OK, IMO... +> +> 1/4 OK +> +> Before 2/4 add support for 'flag' arguments, drop the --output=3Daddresses +> option which is now done as --output=3Dsender --output=3Drecipients + +OK + +> In deduplication comment did not describe the deduplication at all... +> so I looked a bit into the code now... the Default you described was +> that with "John Doe" and "John Doe" +> only one was printed (but not which one). + +I intentionally didn't want to define which one, but I agree that it +might be useful in same cases. It would depend on --sort option and on +the order of addresses in email headers. + +> Secondly, what happens with "Doe, John" and +> "John Doe" ... ah, it is same as *addr* with +> case-insensitive address. +> +> Sorry, but IMO these options are a bit strange. + +My impression is that I did bad job describing the deduplication +algorithm, which is why you don't understand it. Maybe, we can also +change the name of the option to --filter-by, or something like this. + +When thinking about how to best document such an option, it seems that +the user must be aware that this is implemented as flags that are ORed. +Which means that the default should be what was in the previous patch +--unique=3Dnone. + +What about the following? + + ``--filter-flag=3D``\ (**addr**\ \|\ **name**\ \|\ **addrfold**) + + Can be used with ``--output=3Daddresses``, ``--output=3Dsender`` or + ``--output=3Drecipients`` to filter out duplicate addresses. The + filtering algorithm receives a sequence of email addresses and + outputs the same sequence without the addresses that are + considered a duplicate of a previously output address. What is + considered a duplicate depends on the flags given: + + **addr** means that the address part is compared. + Case-sensitivity can be controlled by **addrfold** flag (see + below). For example, the addresses "John Doe " + and "Dr. John Doe " will be considered + duplicate. + + **name** means that the name part is compared in case-sensitive + manner. For example, the addresses "John Doe " + and "John Doe " will be considered duplicate. + + **addrfold** when used with the **addr** flag, the address + comparison is performed in case-insensitive manner. For example, + the addresses "John Doe " and "Dr. John Doe + " will be considered duplicate. + + To specify multiple flags, this option can be given multiple + times. For example, ``--filter-flag=3Dname --filter-flag=3Daddr`` + will print unique case-sensitive combinations of both name and + address parts. + +With this, the previously default behavior would now has to be spelled +as "--filter-flag=3Daddr --filter-flag=3Daddrfold". + +I'm not sure it is wise present such a low-level interface (flags) to +command-line users, but it is hopefully more understandable now. What do +you think? + +> Not to go to choose which one to choose (first, last, most common) instead +> of the suggested options these should be the ones: +> +> 1) "John Doe" and "John Doe" : +> only one printed, but if either were "Dr. John Doe", both of these are pr= +inted +> (this as default). + +According to the above, which could be achieved as --filter-flag=3Dname +--filter-flag=3Daddr --filter-flag=3Daddrfold. + +> 2) same as above, but only make case-insensitive + +case-insensitive is already in 1), you probably mean case-sensitive. + +> address match -- i.e. in the 2 above cases in option 1, print only +> one. + +This would be --filter-flag=3Dname --filter-flag=3Daddr. + +> (and same name but different address to perhaps never been an option...) +> +> I might like to have option that does case-sensitive address match,=20 + +This would be just --filter-flag=3Daddr. + +As a side note, it is interesting, that you mentioned your options as an +enumeration even though they are actually combinations of several on/off +flags. I think that it is more natural for human brains to think in +terms of simple lists than in terms of combinations of flags. That's why +I originally implemented --output=3Daddresses as just another keyword, +rather than requiring the user to specify both sender and receivers. + +Thanks for the review. +-Michal + +> In those cases I don't know the recipient's culture and the email he +> sent to me used format (and not knowing which +> one is the first and which last name (or whatever names these are) -- +> just to reply in same case format in respect... + + +> +> +> Tomi +> +> +>> --- +>> completion/notmuch-completion.bash | 2 +- +>> completion/notmuch-completion.zsh | 3 +- +>> doc/man1/notmuch-search.rst | 22 +++++++- +>> notmuch-search.c | 100 ++++++++++++++++++++++++++++++= +++++--- +>> test/T090-search-output.sh | 64 ++++++++++++++++++++++++ +>> 5 files changed, 182 insertions(+), 9 deletions(-) +>> +>> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-com= +pletion.bash +>> index 0571dc9..c37ddf5 100644 +>> --- a/completion/notmuch-completion.bash +>> +++ b/completion/notmuch-completion.bash +>> @@ -294,7 +294,7 @@ _notmuch_search() +>> return +>> ;; +>> --output) +>> - COMPREPLY=3D( $( compgen -W "summary threads messages files tags" = +-- "${cur}" ) ) +>> + COMPREPLY=3D( $( compgen -W "summary threads messages files tags s= +ender recipients addresses" -- "${cur}" ) ) +>> return +>> ;; +>> --sort) +>> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-comp= +letion.zsh +>> index 67a9aba..bff8fd5 100644 +>> --- a/completion/notmuch-completion.zsh +>> +++ b/completion/notmuch-completion.zsh +>> @@ -52,7 +52,8 @@ _notmuch_search() +>> _arguments -s : \ +>> '--max-threads=3D[display only the first x threads from the search = +results]:number of threads to show: ' \ +>> '--first=3D[omit the first x threads from the search results]:numbe= +r of threads to omit: ' \ +>> - '--sort=3D[sort results]:sorting:((newest-first\:"reverse chronolog= +ical order" oldest-first\:"chronological order"))' +>> + '--sort=3D[sort results]:sorting:((newest-first\:"reverse chronolog= +ical order" oldest-first\:"chronological order"))' \ +>> + '--output=3D[select what to output]:output:((summary threads messag= +es files tags sender recipients addresses))' +>> } +>> +>> _notmuch() +>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst +>> index 90160f2..3447820 100644 +>> --- a/doc/man1/notmuch-search.rst +>> +++ b/doc/man1/notmuch-search.rst +>> @@ -35,7 +35,7 @@ Supported options for **search** include +>> intended for programs that invoke **notmuch(1)** internally. If +>> omitted, the latest supported version will be used. +>> +>> - ``--output=3D(summary|threads|messages|files|tags)`` +>> + ``--output=3D(summary|threads|messages|files|tags|sender|recipients= +|addresses)`` +>> +>> **summary** +>> Output a summary of each thread with any message matching +>> @@ -78,6 +78,26 @@ Supported options for **search** include +>> by null characters (--format=3Dtext0), as a JSON array +>> (--format=3Djson), or as an S-Expression list (--format=3Ds= +exp). +>> +>> + **sender** +>> + Output all addresses from the *From* header that appear on +>> + any message matching the search terms, either one per line +>> + (--format=3Dtext), separated by null characters +>> + (--format=3Dtext0), as a JSON array (--format=3Djson), or as +>> + an S-Expression list (--format=3Dsexp). +>> + +>> + Note: Searching for **sender** should be much faster than +>> + searching for **recipients** or **addresses**, because +>> + sender addresses are cached directly in the database +>> + whereas other addresses need to be fetched from message +>> + files. +>> + +>> + **recipients** +>> + Like **sender** but for addresses from *To*, *Cc* and +>> + *Bcc* headers. +>> + +>> + **addresses** +>> + Like **sender** and **recipients** together. +>> + +>> ``--sort=3D``\ (**newest-first**\ \|\ **oldest-first**) +>> This option can be used to present results in either +>> chronological order (**oldest-first**) or reverse chronological +>> diff --git a/notmuch-search.c b/notmuch-search.c +>> index 5ac2a26..0614f10 100644 +>> --- a/notmuch-search.c +>> +++ b/notmuch-search.c +>> @@ -23,11 +23,14 @@ +>> #include "string-util.h" +>> +>> typedef enum { +>> - OUTPUT_SUMMARY, +>> - OUTPUT_THREADS, +>> - OUTPUT_MESSAGES, +>> - OUTPUT_FILES, +>> - OUTPUT_TAGS +>> + OUTPUT_SUMMARY =3D 1 << 0, +>> + OUTPUT_THREADS =3D 1 << 1, +>> + OUTPUT_MESSAGES =3D 1 << 2, +>> + OUTPUT_FILES =3D 1 << 3, +>> + OUTPUT_TAGS =3D 1 << 4, +>> + OUTPUT_SENDER =3D 1 << 5, +>> + OUTPUT_RECIPIENTS =3D 1 << 6, +>> + OUTPUT_ADDRESSES =3D OUTPUT_SENDER | OUTPUT_RECIPIENTS, +>> } output_t; +>> +>> typedef struct { +>> @@ -220,6 +223,67 @@ do_search_threads (search_options_t *o) +>> return 0; +>> } +>> +>> +static void +>> +print_address_list (const search_options_t *o, InternetAddressList *lis= +t) +>> +{ +>> + InternetAddress *address; +>> + int i; +>> + +>> + for (i =3D 0; i < internet_address_list_length (list); i++) { +>> + address =3D internet_address_list_get_address (list, i); +>> + if (INTERNET_ADDRESS_IS_GROUP (address)) { +>> + InternetAddressGroup *group; +>> + InternetAddressList *group_list; +>> + +>> + group =3D INTERNET_ADDRESS_GROUP (address); +>> + group_list =3D internet_address_group_get_members (group); +>> + if (group_list =3D=3D NULL) +>> + continue; +>> + +>> + print_address_list (o, group_list); +>> + } else { +>> + InternetAddressMailbox *mailbox; +>> + const char *name; +>> + const char *addr; +>> + char *full_address; +>> + +>> + mailbox =3D INTERNET_ADDRESS_MAILBOX (address); +>> + +>> + name =3D internet_address_get_name (address); +>> + addr =3D internet_address_mailbox_get_addr (mailbox); +>> + +>> + if (name && *name) +>> + full_address =3D talloc_asprintf (o->format, "%s <%s>", name, addr); +>> + else +>> + full_address =3D talloc_strdup (o->format, addr); +>> + +>> + if (!full_address) { +>> + fprintf (stderr, "Error: out of memory\n"); +>> + break; +>> + } +>> + o->format->string (o->format, full_address); +>> + o->format->separator (o->format); +>> + +>> + talloc_free (full_address); +>> + } +>> + } +>> +} +>> + +>> +static void +>> +print_address_string (const search_options_t *o, const char *recipients) +>> +{ +>> + InternetAddressList *list; +>> + +>> + if (recipients =3D=3D NULL) +>> + return; +>> + +>> + list =3D internet_address_list_parse_string (recipients); +>> + if (list =3D=3D NULL) +>> + return; +>> + +>> + print_address_list (o, list); +>> +} +>> + +>> static int +>> do_search_messages (search_options_t *o) +>> { +>> @@ -266,11 +330,29 @@ do_search_messages (search_options_t *o) +>> +>> notmuch_filenames_destroy( filenames ); +>> +>> - } else { /* output =3D=3D OUTPUT_MESSAGES */ +>> + } else if (o->output =3D=3D OUTPUT_MESSAGES) { +>> format->set_prefix (format, "id"); +>> format->string (format, +>> notmuch_message_get_message_id (message)); +>> format->separator (format); +>> + } else { +>> + if (o->output & OUTPUT_SENDER) { +>> + const char *addrs; +>> + +>> + addrs =3D notmuch_message_get_header (message, "from"); +>> + print_address_string (o, addrs); +>> + } +>> + +>> + if (o->output & OUTPUT_RECIPIENTS) { +>> + const char *hdrs[] =3D { "to", "cc", "bcc" }; +>> + const char *addrs; +>> + size_t j; +>> + +>> + for (j =3D 0; j < ARRAY_SIZE (hdrs); j++) { +>> + addrs =3D notmuch_message_get_header (message, hdrs[j]); +>> + print_address_string (o, addrs); +>> + } +>> + } +>> } +>> +>> notmuch_message_destroy (message); +>> @@ -370,6 +452,9 @@ notmuch_search_command (notmuch_config_t *config, in= +t argc, char *argv[]) +>> (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY }, +>> { "threads", OUTPUT_THREADS }, +>> { "messages", OUTPUT_MESSAGES }, +>> + { "sender", OUTPUT_SENDER }, +>> + { "recipients", OUTPUT_RECIPIENTS }, +>> + { "addresses", OUTPUT_ADDRESSES }, +>> { "files", OUTPUT_FILES }, +>> { "tags", OUTPUT_TAGS }, +>> { 0, 0 } } }, +>> @@ -461,6 +546,9 @@ notmuch_search_command (notmuch_config_t *config, in= +t argc, char *argv[]) +>> ret =3D do_search_threads (&o); +>> break; +>> case OUTPUT_MESSAGES: +>> + case OUTPUT_SENDER: +>> + case OUTPUT_RECIPIENTS: +>> + case OUTPUT_ADDRESSES: +>> case OUTPUT_FILES: +>> ret =3D do_search_messages (&o); +>> break; +>> diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh +>> index 947d572..5458de1 100755 +>> --- a/test/T090-search-output.sh +>> +++ b/test/T090-search-output.sh +>> @@ -387,6 +387,70 @@ cat <EXPECTED +>> EOF +>> test_expect_equal_file OUTPUT EXPECTED +>> +>> +test_begin_subtest "--output=3Dsender" +>> +notmuch search --output=3Dsender '*' | sort | uniq --count >OUTPUT +>> +cat <EXPECTED +>> + 1 Adrian Perez de Castro +>> + 2 Alex Botero-Lowry +>> + 4 Alexander Botero-Lowry +>> + 1 Aron Griffis +>> + 12 Carl Worth +>> + 1 Chris Wilson +>> + 1 Fran=C3=A7ois Boulogne +>> + 1 Ingmar Vanhassel +>> + 1 Israel Herraiz +>> + 4 Jan Janak +>> + 2 Jjgod Jiang +>> + 7 Keith Packard +>> + 5 Lars Kellogg-Stedman +>> + 5 Mikhail Gusarov +>> + 1 Olivier Berger +>> + 1 Rolland Santimano +>> + 3 Stewart Smith +>> +EOF +>> +test_expect_equal_file OUTPUT EXPECTED +>> + +>> +test_begin_subtest "--output=3Drecipients" +>> +notmuch search --output=3Drecipients '*' | sort | uniq --count >OUTPUT +>> +cat <EXPECTED +>> + 1 Allan McRae +>> + 1 Discussion about the Arch User Repository (AUR) +>> + 1 Keith Packard +>> + 1 Mikhail Gusarov +>> + 2 notmuch +>> + 48 notmuch@notmuchmail.org +>> + 1 olivier.berger@it-sudparis.eu +>> +EOF +>> +test_expect_equal_file OUTPUT EXPECTED +>> + +>> +test_begin_subtest "--output=3Daddresses" +>> +notmuch search --output=3Daddresses '*' | sort | uniq --count >OUTPUT +>> +cat <EXPECTED +>> + 1 Adrian Perez de Castro +>> + 2 Alex Botero-Lowry +>> + 4 Alexander Botero-Lowry +>> + 1 Allan McRae +>> + 1 Aron Griffis +>> + 12 Carl Worth +>> + 1 Chris Wilson +>> + 1 Discussion about the Arch User Repository (AUR) +>> + 1 Fran=C3=A7ois Boulogne +>> + 1 Ingmar Vanhassel +>> + 1 Israel Herraiz +>> + 4 Jan Janak +>> + 2 Jjgod Jiang +>> + 8 Keith Packard +>> + 5 Lars Kellogg-Stedman +>> + 6 Mikhail Gusarov +>> + 1 Olivier Berger +>> + 1 Rolland Santimano +>> + 3 Stewart Smith +>> + 2 notmuch +>> + 48 notmuch@notmuchmail.org +>> + 1 olivier.berger@it-sudparis.eu +>> +EOF +>> +test_expect_equal_file OUTPUT EXPECTED +>> + +>> test_begin_subtest "sanitize output for quoted-printable line-breaks in= + author and subject" +>> add_message "[subject]=3D'two =3D?ISO-8859-1?Q?line=3D0A_subject?=3D +>> headers'" +>> -- +>> 2.1.1 +>> +>> _______________________________________________ +>> notmuch mailing list +>> notmuch@notmuchmail.org +>> http://notmuchmail.org/mailman/listinfo/notmuch -- 2.26.2