From: Michal Sojka <sojkam1@fel.cvut.cz>
To: Tomi Ollila <tomi.ollila@iki.fi>, notmuch@notmuchmail.org
Subject: Re: [PATCH v3 3/4] cli: Extend the search command for
	--output={sender, recipients}
In-Reply-To: <m238arrdzp.fsf@guru.guru-group.fi>
References: <87zjd51phx.fsf@steelpick.2x.cz>
	<1413150093-8383-1-git-send-email-sojkam1@fel.cvut.cz>
	<1413150093-8383-4-git-send-email-sojkam1@fel.cvut.cz>
	<m238arrdzp.fsf@guru.guru-group.fi>
User-Agent: Notmuch/0.18.1+101~g56b0ff0 (http://notmuchmail.org) Emacs/24.3.1
	(x86_64-pc-linux-gnu)
Date: Mon, 13 Oct 2014 23:38:30 +0200
Message-ID: <871tqbejl5.fsf@steelpick.2x.cz>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list

On Mon, Oct 13 2014, Tomi Ollila wrote:
> On Mon, Oct 13 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>
>> The new outputs allow printing senders, recipients or both of matching
>> messages. The --output option is converted from "keyword" argument to
>> "flags" argument, which means that the user can use --output=sender and
>> --output=recipients simultaneously, to print both. Other combinations
>> produce an error.
>>
>> This code based on a patch from Jani Nikula.
>> ---
>>  completion/notmuch-completion.bash |   2 +-
>>  completion/notmuch-completion.zsh  |   3 +-
>>  doc/man1/notmuch-search.rst        |  22 +++++++-
>>  notmuch-search.c                   | 110 ++++++++++++++++++++++++++++++++++---
>>  test/T090-search-output.sh         |  64 +++++++++++++++++++++
>>  5 files changed, 189 insertions(+), 12 deletions(-)
>>
>> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
>> index 0571dc9..cfbd389 100644
>> --- a/completion/notmuch-completion.bash
>> +++ b/completion/notmuch-completion.bash
>> @@ -294,7 +294,7 @@ _notmuch_search()
>>  	    return
>>  	    ;;
>>  	--output)
>> -	    COMPREPLY=( $( compgen -W "summary threads messages files tags" -- "${cur}" ) )
>> +	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients" -- "${cur}" ) )
>>  	    return
>>  	    ;;
>>  	--sort)
>> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
>> index 67a9aba..3e52a00 100644
>> --- a/completion/notmuch-completion.zsh
>> +++ b/completion/notmuch-completion.zsh
>> @@ -52,7 +52,8 @@ _notmuch_search()
>>    _arguments -s : \
>>      '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
>>      '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
>> -    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))'
>> +    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
>> +    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
>>  }
>>
>>  _notmuch()
>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
>> index 90160f2..c9d38b1 100644
>> --- a/doc/man1/notmuch-search.rst
>> +++ b/doc/man1/notmuch-search.rst
>> @@ -35,7 +35,7 @@ Supported options for **search** include
>>          intended for programs that invoke **notmuch(1)** internally. If
>>          omitted, the latest supported version will be used.
>>
>> -    ``--output=(summary|threads|messages|files|tags)``
>> +    ``--output=(summary|threads|messages|files|tags|sender|recipients)``
>>
>>          **summary**
>>              Output a summary of each thread with any message matching
>> @@ -78,6 +78,26 @@ Supported options for **search** include
>>              by null characters (--format=text0), as a JSON array
>>              (--format=json), or as an S-Expression list (--format=sexp).
>>
>> +	**sender**
>> +            Output all addresses from the *From* header that appear on
>> +            any message matching the search terms, either one per line
>> +            (--format=text), separated by null characters
>> +            (--format=text0), as a JSON array (--format=json), or as
>> +            an S-Expression list (--format=sexp).
>> +
>> +	    Note: Searching for **sender** should be much faster than
>> +	    searching for **recipients**, because sender addresses are
>> +	    cached directly in the database whereas other addresses
>> +	    need to be fetched from message files.
>> +
>> +	**recipients**
>> +            Like **sender** but for addresses from *To*, *Cc* and
>> +	    *Bcc* headers.
>> +
>> +	This option can be given multiple times to combine different
>> +	outputs. Curently, this is only supported for **sender** and
>> +	**recipients** outputs.
>> +
>>      ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
>>          This option can be used to present results in either
>>          chronological order (**oldest-first**) or reverse chronological
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index 5ac2a26..74588f8 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -23,11 +23,14 @@
>>  #include "string-util.h"
>>
>>  typedef enum {
>> -    OUTPUT_SUMMARY,
>> -    OUTPUT_THREADS,
>> -    OUTPUT_MESSAGES,
>> -    OUTPUT_FILES,
>> -    OUTPUT_TAGS
>> +    OUTPUT_SUMMARY	= 1 << 0,
>> +    OUTPUT_THREADS	= 1 << 1,
>> +    OUTPUT_MESSAGES	= 1 << 2,
>> +    OUTPUT_FILES	= 1 << 3,
>> +    OUTPUT_TAGS		= 1 << 4,
>> +    OUTPUT_SENDER	= 1 << 5,
>> +    OUTPUT_RECIPIENTS	= 1 << 6,
>> +    OUTPUT_ADDRESSES	= OUTPUT_SENDER | OUTPUT_RECIPIENTS,
>
> leftover, like mentioned below (this comment added just before
> sending)

This is needed to suppress a warning about unknown enum value in the
switch statement.

>>  } output_t;
>>
>>  typedef struct {
>> @@ -220,6 +223,67 @@ do_search_threads (search_options_t *o)
>>      return 0;
>>  }
>>
>> +static void
>> +print_address_list (const search_options_t *o, InternetAddressList *list)
>> +{
>> +    InternetAddress *address;
>> +    int i;
>> +
>> +    for (i = 0; i < internet_address_list_length (list); i++) {
>> +	address = internet_address_list_get_address (list, i);
>> +	if (INTERNET_ADDRESS_IS_GROUP (address)) {
>> +	    InternetAddressGroup *group;
>> +	    InternetAddressList *group_list;
>> +
>> +	    group = INTERNET_ADDRESS_GROUP (address);
>> +	    group_list = internet_address_group_get_members (group);
>> +	    if (group_list == NULL)
>> +		continue;
>> +
>> +	    print_address_list (o, group_list);
>> +	} else {
>> +	    InternetAddressMailbox *mailbox;
>> +	    const char *name;
>> +	    const char *addr;
>> +	    char *full_address;
>> +
>> +	    mailbox = INTERNET_ADDRESS_MAILBOX (address);
>> +
>> +	    name = internet_address_get_name (address);
>> +	    addr = internet_address_mailbox_get_addr (mailbox);
>> +
>> +	    if (name && *name)
>> +		full_address = talloc_asprintf (o->format, "%s <%s>", name, addr);
>> +	    else
>> +		full_address = talloc_strdup (o->format, addr);
>> +
>> +	    if (!full_address) {
>
> Apart from minor style issue like space after ! and the leftover ADDRESSES
> parts (w/o that I would not have commented about !<SPC>) the first 3
> patches look pretty good to me. I have not tested those yet.

If you want, I can send another version.

> But we keep to have some disagreement w/ unique/duplicate/filter-by
> handling ;/
>
> I (currently) rest the case of first/last/most common handling to just
> how the --sort=(newest-first|oldest-first) affects the order...
>
> Let's consider the following list of output if no /deduplication/ is done:
>
>   John Doe <john@example.com>
>   Dr. John Doe <john@example.com>
>   John Doe <JOHN@EXAMPLE.COM>
>   John Doe <john@doe.name>
>   Dr. John Doe <john@doe.name>
>   John Doe <JOHN@doe.name.example.com>
>   John Doe <john@doe.name>
>   Dr. John Doe <john@example.com>
>   Dr. John Doe <john@example.com>
>   Dr. John Doe <john@doe.name>
>   John Doe <john@example.com>
>   John Doe <john@doe.name.example.com>
>
> To stir the pool a little more, this could be the output when
> --duplicate=all (the default) is given.

This seems counter-intuitive. A command without --duplicate gives you a
certain number of addresses and when you add --duplicate, you get less
addresses?

Moreover, --duplicate is already used with --output=files and has
different meaning.

> With --duplicate=none the output could be (first match by unique
> case-insensitive address):
>
>   John Doe <john@example.com>
>   John Doe <john@doe.name>
>   John Doe <john@doe.name.example.com>

This seems confusing to me. If I see "duplicate none", I would simply
expect that duplicate lines are removed from the --duplicate=all list
above. Not that just duplicate addresses would be removed.

> (many people may have the same name, but email address is unique per person
> -- therefore I think 'none' limiting that just to John Doe <john@example.com>
> would be too little)

Here I agree that filtering by address part is perhaps most useful.

> and with --duplicate=address
>
>   John Doe <john@example.com>
>   Dr. John Doe <john@example.com>
>   John Doe <john@doe.name>
>   Dr. John Doe <john@doe.name>
>   John Doe <JOHN@doe.name.example.com>

I don't get this. You say --duplicate=address but the output contains
also duplicate names.

> (from this output user can choose how the recipient is to be called
> (like "pseudonyms" mentioned in id:20141010113202.GE28601@TP_L520.localdomain )
> when sending email)
>
> and --duplicate=address-casesensitive
>
>   John Doe <john@example.com>
>   Dr. John Doe <john@example.com>
>   John Doe <JOHN@EXAMPLE.COM>
>   John Doe <john@doe.name>
>   Dr. John Doe <john@doe.name>
>   John Doe <JOHN@doe.name.example.com>
>   John Doe <john@doe.name.example.com>
>
> This reuse of --duplicate was thought out after Jani's irc mention of it.
> This scheme would leave no room tho the filter-by=name suggestion -- for
> completeness that would make this look:
>
>   John Doe <john@example.com>
>   Dr. John Doe <john@example.com>
>
> This doesn't look too bad in this particular case but not having ability to
> see all potential addresses (perhaps the only working address is now
> hidden) isn't not much for general use. Well, maybe for some specific use
> --duplicate=no-unique-addresses could be useful :O
>
> Ok, this took an hour to get written to (w/ some interruptions). Healthy
> criticism appreciated :D

I don't know. You seem to think about this in the opposite way than how
it is implemented. The implementation really filters things out whereas
you specify what not to filter.

My feeling is that if you would implement your proposal, the code would
be more complex than in my patch, because the mapping between command
line options and the actual algorithm would require some extra code. And
in a previous comment, you preferred simplicity.

Hopefully, you consider the above as "healthy" criticism.

Thanks for quick review.
-Michal