--- /dev/null
+Return-Path: <sojkam1@fel.cvut.cz>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id 22336431FCB\r
+ for <notmuch@notmuchmail.org>; Thu, 30 Oct 2014 14:35:08 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -2.3\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id alERxAgzj4jm for <notmuch@notmuchmail.org>;\r
+ Thu, 30 Oct 2014 14:35:00 -0700 (PDT)\r
+Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36])\r
+ by olra.theworths.org (Postfix) with ESMTP id 275C0431FC2\r
+ for <notmuch@notmuchmail.org>; Thu, 30 Oct 2014 14:35:00 -0700 (PDT)\r
+Received: from localhost (unknown [192.168.200.7])\r
+ by max.feld.cvut.cz (Postfix) with ESMTP id E0B475CCFD0;\r
+ Thu, 30 Oct 2014 22:34:58 +0100 (CET)\r
+X-Virus-Scanned: IMAP STYX AMAVIS\r
+Received: from max.feld.cvut.cz ([192.168.200.1])\r
+ by localhost (styx.feld.cvut.cz [192.168.200.7]) (amavisd-new,\r
+ port 10044)\r
+ with ESMTP id yqLkr-oU8A41; Thu, 30 Oct 2014 22:34:55 +0100 (CET)\r
+Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34])\r
+ by max.feld.cvut.cz (Postfix) with ESMTP id 0F2B95CCFCB;\r
+ Thu, 30 Oct 2014 22:34:55 +0100 (CET)\r
+Received: from wsh by steelpick.2x.cz with local (Exim 4.84)\r
+ (envelope-from <sojkam1@fel.cvut.cz>)\r
+ id 1XjxMn-0005tO-5G; Thu, 30 Oct 2014 22:34:49 +0100\r
+From: Michal Sojka <sojkam1@fel.cvut.cz>\r
+To: Mark Walters <markwalters1009@gmail.com>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH v4 5/6] cli: search: Add configurable way to filter\r
+ out duplicate addresses\r
+In-Reply-To: <87egtqug4t.fsf@qmul.ac.uk>\r
+References: <1414421455-3037-1-git-send-email-sojkam1@fel.cvut.cz>\r
+ <1414421455-3037-6-git-send-email-sojkam1@fel.cvut.cz>\r
+ <87egtqug4t.fsf@qmul.ac.uk>\r
+User-Agent: Notmuch/0.18.2+157~ga00d359 (http://notmuchmail.org) Emacs/24.3.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Thu, 30 Oct 2014 22:34:49 +0100\r
+Message-ID: <874mulckcm.fsf@steelpick.2x.cz>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Thu, 30 Oct 2014 21:35:08 -0000\r
+\r
+On Thu, Oct 30 2014, Mark Walters wrote:\r
+> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:\r
+>> This adds an algorithm to filter out duplicate addresses from address\r
+>> outputs (sender, receivers). The algorithm can be configured with\r
+>> --filter-by command line option.\r
+>>\r
+>> The code here is an extended version of a patch from Jani Nikula.\r
+>\r
+> Hi\r
+>\r
+> As this is getting into the more controversial bike shedding region I\r
+> wonder if it would be worth splitting this into 2 patches: the first\r
+> could do the default dedupe based on name/address and the second could\r
+> do add the filter-by options. \r
+>\r
+> I think the default deduping is obviously worth doing but I am not sure\r
+> about the rest. In any case I think the default deduping could go in\r
+> pre-freeze but I would recommend the rest is left until after.\r
+\r
+Yes, this makes sense. I'll send v5 in a while.\r
+\r
+>\r
+>> ---\r
+>> completion/notmuch-completion.bash | 6 ++-\r
+>> completion/notmuch-completion.zsh | 3 +-\r
+>> doc/man1/notmuch-search.rst | 38 +++++++++++++++\r
+>> notmuch-search.c | 98 +++++++++++++++++++++++++++++++++++---\r
+>> test/T090-search-output.sh | 87 +++++++++++++++++++++++++++++++++\r
+>> test/T095-search-filter-by.sh | 64 +++++++++++++++++++++++++\r
+>> 6 files changed, 288 insertions(+), 8 deletions(-)\r
+>> create mode 100755 test/T095-search-filter-by.sh\r
+>>\r
+>> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash\r
+>> index cfbd389..6b6d43a 100644\r
+>> --- a/completion/notmuch-completion.bash\r
+>> +++ b/completion/notmuch-completion.bash\r
+>> @@ -305,12 +305,16 @@ _notmuch_search()\r
+>> COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) )\r
+>> return\r
+>> ;;\r
+>> + --filter-by)\r
+>> + COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) )\r
+>> + return\r
+>> + ;;\r
+>> esac\r
+>> \r
+>> ! $split &&\r
+>> case "${cur}" in\r
+>> -*)\r
+>> - local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate="\r
+>> + local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by="\r
+>> compopt -o nospace\r
+>> COMPREPLY=( $(compgen -W "$options" -- ${cur}) )\r
+>> ;;\r
+>> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh\r
+>> index 3e52a00..3e535df 100644\r
+>> --- a/completion/notmuch-completion.zsh\r
+>> +++ b/completion/notmuch-completion.zsh\r
+>> @@ -53,7 +53,8 @@ _notmuch_search()\r
+>> '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \\r
+>> '--first=[omit the first x threads from the search results]:number of threads to omit: ' \\r
+>> '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \\r
+>> - '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'\r
+>> + '--output=[select what to output]:output:((summary threads messages files tags sender recipients))' \\r
+>> + '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'\r
+>> }\r
+>> \r
+>> _notmuch()\r
+>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst\r
+>> index b6607c9..84af2da 100644\r
+>> --- a/doc/man1/notmuch-search.rst\r
+>> +++ b/doc/man1/notmuch-search.rst\r
+>> @@ -85,6 +85,9 @@ Supported options for **search** include\r
+>> (--format=text0), as a JSON array (--format=json), or as\r
+>> an S-Expression list (--format=sexp).\r
+>> \r
+>> + Duplicate addresses are filtered out. Filtering can be\r
+>> + configured with the --filter-by option.\r
+>> +\r
+>> Note: Searching for **sender** should be much faster than\r
+>> searching for **recipients**, because sender addresses are\r
+>> cached directly in the database whereas other addresses\r
+>> @@ -151,6 +154,41 @@ Supported options for **search** include\r
+>> prefix. The prefix matches messages based on filenames. This\r
+>> option filters filenames of the matching messages.\r
+>> \r
+>> + ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\)\r
+>> +\r
+>> + Can be used with ``--output=sender`` or\r
+>> + ``--output=recipients`` to filter out duplicate addresses. The\r
+>> + filtering algorithm receives a sequence of email addresses and\r
+>> + outputs the same sequence without the addresses that are\r
+>> + considered a duplicate of a previously output address. What is\r
+>> + considered a duplicate depends on how the two addresses are\r
+>> + compared and this can be controlled with the follwing flags:\r
+>> +\r
+>> + **nameaddr** means that both name and address parts are\r
+>> + compared in case-sensitive manner. Therefore, all same looking\r
+>> + addresses strings are considered duplicate. This is the\r
+>> + default.\r
+>> +\r
+>> + **name** means that only the name part is compared (in\r
+>> + case-sensitive manner). For example, the addresses "John Doe\r
+>> + <me@example.com>" and "John Doe <john@doe.name>" will be\r
+>> + considered duplicate.\r
+>> +\r
+>> + **addr** means that only the address part is compared (in\r
+>> + case-sensitive manner). For example, the addresses "John Doe\r
+>> + <john@example.com>" and "Dr. John Doe <john@example.com>" will\r
+>> + be considered duplicate.\r
+>> +\r
+>> + **addrfold** is like **addr**, but comparison is done in\r
+>> + canse-insensitive manner. For example, the addresses "John Doe\r
+>> + <john@example.com>" and "Dr. John Doe <JOHN@EXAMPLE.COM>" will\r
+>> + be considered duplicate.\r
+>> +\r
+>> + **nameaddrfold** is like **nameaddr**, but address comparison\r
+>> + is done in canse-insensitive manner. For example, the\r
+>> + addresses "John Doe <john@example.com>" and "John Doe\r
+>> + <JOHN@EXAMPLE.COM>" will be considered duplicate.\r
+>> +\r
+>> EXIT STATUS\r
+>> ===========\r
+>> \r
+>> diff --git a/notmuch-search.c b/notmuch-search.c\r
+>> index ce3bfb2..47aa979 100644\r
+>> --- a/notmuch-search.c\r
+>> +++ b/notmuch-search.c\r
+>> @@ -34,6 +34,14 @@ typedef enum {\r
+>> \r
+>> #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)\r
+>> \r
+>> +typedef enum {\r
+>> + FILTER_BY_NAMEADDR = 0,\r
+>> + FILTER_BY_NAME,\r
+>> + FILTER_BY_ADDR,\r
+>> + FILTER_BY_ADDRFOLD,\r
+>> + FILTER_BY_NAMEADDRFOLD,\r
+>> +} filter_by_t;\r
+>> +\r
+>> typedef struct {\r
+>> sprinter_t *format;\r
+>> notmuch_query_t *query;\r
+>> @@ -42,6 +50,7 @@ typedef struct {\r
+>> int offset;\r
+>> int limit;\r
+>> int dupe;\r
+>> + filter_by_t filter_by;\r
+>> } search_options_t;\r
+>> \r
+>> typedef struct {\r
+>> @@ -229,6 +238,52 @@ do_search_threads (search_options_t *opt)\r
+>> return 0;\r
+>> }\r
+>> \r
+>> +/* Returns TRUE iff name and/or addr is considered duplicite. */\r
+>\r
+> A triviality; duplicite should be duplicate\r
+>\r
+>> +static notmuch_bool_t\r
+>> +check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr)\r
+>\r
+> I am not sure on style but maybe is_duplicate would be clearer?\r
+\r
+OK\r
+\r
+-Michal\r