--- /dev/null
+Return-Path: <jani@nikula.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by olra.theworths.org (Postfix) with ESMTP id D9727431FD0\r
+ for <notmuch@notmuchmail.org>; Sat, 1 Nov 2014 01:55:18 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at olra.theworths.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -0.7\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5\r
+ tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled\r
+Received: from olra.theworths.org ([127.0.0.1])\r
+ by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id CMz+UlWTVCYw for <notmuch@notmuchmail.org>;\r
+ Sat, 1 Nov 2014 01:55:14 -0700 (PDT)\r
+Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com\r
+ [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client\r
+ certificate requested) by olra.theworths.org (Postfix) with ESMTPS id\r
+ E5994431FAF for <notmuch@notmuchmail.org>; Sat, 1 Nov 2014 01:55:13 -0700\r
+ (PDT)\r
+Received: by mail-wg0-f47.google.com with SMTP id a1so9303595wgh.6\r
+ for <notmuch@notmuchmail.org>; Sat, 01 Nov 2014 01:55:12 -0700 (PDT)\r
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\r
+ d=1e100.net; s=20130820;\r
+ h=x-gm-message-state:from:to:subject:in-reply-to:references\r
+ :user-agent:date:message-id:mime-version:content-type;\r
+ bh=otA/EnOStfjmAXX2XpaHXp3ilREuNPvn2w4VHM82nZ0=;\r
+ b=h1EC9tdRcMHteyP2WDQ0yHAt2UeDe1FFbZzEPpdCROqy2PYK3SSzLGUT0B9YD7T3Mq\r
+ AMDf3P007zEofBowyYaJbyblY+Yrhmt+epPS77RLUtW2rPrHM/FgwxVFEZUAv2znahQC\r
+ BLpQzHOdSF1JbOVQrYbnZE4LZ4BVxgXe0xqlePiGv2bh3iElBpOWZWUlsmIEO+WprPUQ\r
+ K8GZmW8FU3TpecwjTcrkczo4as7tQMij4xT1Jb0VjgJGdR5gVxd93jCFaXt0O/ZCaL37\r
+ p+K+imPVxyOA7WFSgQi/jTjdyygjOfcd+Ey38PefVfHNBXM9RpAM/IiqzMbhWIBANcd/\r
+ jVSw==\r
+X-Gm-Message-State:\r
+ ALoCoQlIBf59mxWN4xNZmMwJh6CWt3qzK6QRYQwx/1Oqkg8y9CsJ4OkUALFUkwvHTuVbtKF1hA7p\r
+X-Received: by 10.194.82.74 with SMTP id g10mr1095434wjy.116.1414832112433;\r
+ Sat, 01 Nov 2014 01:55:12 -0700 (PDT)\r
+Received: from localhost (dsl-hkibrasgw2-58c36d-48.dhcp.inet.fi.\r
+ [88.195.109.48])\r
+ by mx.google.com with ESMTPSA id wl1sm14710640wjb.4.2014.11.01.01.55.11\r
+ for <multiple recipients>\r
+ (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\r
+ Sat, 01 Nov 2014 01:55:11 -0700 (PDT)\r
+From: Jani Nikula <jani@nikula.org>\r
+To: Michal Sojka <sojkam1@fel.cvut.cz>, notmuch@notmuchmail.org\r
+Subject: Re: [PATCH v6 7/7] cli: search: Add --filter-by option to\r
+ configure address filtering\r
+In-Reply-To: <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz>\r
+References: <1414792441-29555-1-git-send-email-sojkam1@fel.cvut.cz>\r
+ <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz>\r
+User-Agent: Notmuch/0.18.2+156~g3cc8ed5 (http://notmuchmail.org) Emacs/24.3.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Sat, 01 Nov 2014 10:55:09 +0200\r
+Message-ID: <87mw8be1w2.fsf@nikula.org>\r
+MIME-Version: 1.0\r
+Content-Type: text/plain\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.13\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Sat, 01 Nov 2014 08:55:19 -0000\r
+\r
+On Fri, 31 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:\r
+> This option allows to configure the criterion for duplicate address\r
+> filtering. Without this option, all unique combinations of name and\r
+> address parts are printed. This option allows to filter the output\r
+> more, for example to only contain unique address parts.\r
+\r
+This patch finally makes me think we should have a separate 'notmuch\r
+address' command for all of this. We are starting to have two orthogonal\r
+sets of 'notmuch search' options, one set for search and another for\r
+addresses. I regret not following the series and then making the\r
+observation so late.\r
+\r
+BR,\r
+Jani.\r
+\r
+\r
+> ---\r
+> completion/notmuch-completion.bash | 6 +++-\r
+> completion/notmuch-completion.zsh | 3 +-\r
+> doc/man1/notmuch-search.rst | 39 +++++++++++++++++++-\r
+> notmuch-search.c | 53 +++++++++++++++++++++++++--\r
+> test/T095-search-filter-by.sh | 73 ++++++++++++++++++++++++++++++++++++++\r
+> 5 files changed, 169 insertions(+), 5 deletions(-)\r
+> create mode 100755 test/T095-search-filter-by.sh\r
+>\r
+> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash\r
+> index 39cd829..b625b02 100644\r
+> --- a/completion/notmuch-completion.bash\r
+> +++ b/completion/notmuch-completion.bash\r
+> @@ -305,12 +305,16 @@ _notmuch_search()\r
+> COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) )\r
+> return\r
+> ;;\r
+> + --filter-by)\r
+> + COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) )\r
+> + return\r
+> + ;;\r
+> esac\r
+> \r
+> ! $split &&\r
+> case "${cur}" in\r
+> -*)\r
+> - local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate="\r
+> + local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by="\r
+> compopt -o nospace\r
+> COMPREPLY=( $(compgen -W "$options" -- ${cur}) )\r
+> ;;\r
+> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh\r
+> index d7e5a5e..c1ccc32 100644\r
+> --- a/completion/notmuch-completion.zsh\r
+> +++ b/completion/notmuch-completion.zsh\r
+> @@ -53,7 +53,8 @@ _notmuch_search()\r
+> '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \\r
+> '--first=[omit the first x threads from the search results]:number of threads to omit: ' \\r
+> '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \\r
+> - '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))'\r
+> + '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' \\r
+> + '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'\r
+> }\r
+> \r
+> _notmuch()\r
+> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst\r
+> index ec89200..3a5556b 100644\r
+> --- a/doc/man1/notmuch-search.rst\r
+> +++ b/doc/man1/notmuch-search.rst\r
+> @@ -85,7 +85,8 @@ Supported options for **search** include\r
+> (--format=text0), as a JSON array (--format=json), or as\r
+> an S-Expression list (--format=sexp).\r
+> \r
+> - Duplicate addresses are filtered out.\r
+> + Duplicate addresses are filtered out. Filtering can be\r
+> + configured with the --filter-by option.\r
+> \r
+> Note: Searching for **sender** should be much faster than\r
+> searching for **recipients**, because sender addresses are\r
+> @@ -158,6 +159,42 @@ Supported options for **search** include\r
+> prefix. The prefix matches messages based on filenames. This\r
+> option filters filenames of the matching messages.\r
+> \r
+> + ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\)\r
+> +\r
+> + Can be used with ``--output=sender`` or\r
+> + ``--output=recipients`` to filter out duplicate addresses. The\r
+> + filtering algorithm receives a sequence of email addresses and\r
+> + outputs the same sequence without the addresses that are\r
+> + considered a duplicate of a previously output address. What is\r
+> + considered a duplicate depends on how the two addresses are\r
+> + compared and this can be controlled with the following\r
+> + keywords:\r
+> +\r
+> + **nameaddr** means that both name and address parts are\r
+> + compared in case-sensitive manner. Therefore, all same looking\r
+> + addresses strings are considered duplicate. This is the\r
+> + default.\r
+> +\r
+> + **name** means that only the name part is compared (in\r
+> + case-sensitive manner). For example, the addresses "John Doe\r
+> + <me@example.com>" and "John Doe <john@doe.name>" will be\r
+> + considered duplicate.\r
+> +\r
+> + **addr** means that only the address part is compared (in\r
+> + case-sensitive manner). For example, the addresses "John Doe\r
+> + <john@example.com>" and "Dr. John Doe <john@example.com>" will\r
+> + be considered duplicate.\r
+> +\r
+> + **addrfold** is like **addr**, but comparison is done in\r
+> + canse-insensitive manner. For example, the addresses "John Doe\r
+> + <john@example.com>" and "Dr. John Doe <JOHN@EXAMPLE.COM>" will\r
+> + be considered duplicate.\r
+> +\r
+> + **nameaddrfold** is like **nameaddr**, but address comparison\r
+> + is done in canse-insensitive manner. For example, the\r
+> + addresses "John Doe <john@example.com>" and "John Doe\r
+> + <JOHN@EXAMPLE.COM>" will be considered duplicate.\r
+> +\r
+> EXIT STATUS\r
+> ===========\r
+> \r
+> diff --git a/notmuch-search.c b/notmuch-search.c\r
+> index 4b39dfc..a350f06 100644\r
+> --- a/notmuch-search.c\r
+> +++ b/notmuch-search.c\r
+> @@ -35,6 +35,14 @@ typedef enum {\r
+> \r
+> #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS | OUTPUT_COUNT)\r
+> \r
+> +typedef enum {\r
+> + FILTER_BY_NAMEADDR = 0,\r
+> + FILTER_BY_NAME,\r
+> + FILTER_BY_ADDR,\r
+> + FILTER_BY_ADDRFOLD,\r
+> + FILTER_BY_NAMEADDRFOLD,\r
+> +} filter_by_t;\r
+> +\r
+> typedef struct {\r
+> sprinter_t *format;\r
+> notmuch_query_t *query;\r
+> @@ -43,6 +51,7 @@ typedef struct {\r
+> int offset;\r
+> int limit;\r
+> int dupe;\r
+> + filter_by_t filter_by;\r
+> } search_options_t;\r
+> \r
+> typedef struct {\r
+> @@ -231,15 +240,42 @@ do_search_threads (search_options_t *opt)\r
+> return 0;\r
+> }\r
+> \r
+> -/* Returns TRUE iff name and addr is duplicate. */\r
+> +/* Returns TRUE iff name and/or addr is considered duplicate. */\r
+> static notmuch_bool_t\r
+> is_duplicate (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr)\r
+> {\r
+> notmuch_bool_t duplicate;\r
+> char *key;\r
+> + gchar *addrfold = NULL;\r
+> mailbox_t *mailbox;\r
+> \r
+> - key = talloc_asprintf (opt->format, "%s <%s>", name, addr);\r
+> + if (opt->filter_by == FILTER_BY_ADDRFOLD ||\r
+> + opt->filter_by == FILTER_BY_NAMEADDRFOLD)\r
+> + addrfold = g_utf8_casefold (addr, -1);\r
+> +\r
+> + switch (opt->filter_by) {\r
+> + case FILTER_BY_NAMEADDR:\r
+> + key = talloc_asprintf (opt->format, "%s <%s>", name, addr);\r
+> + break;\r
+> + case FILTER_BY_NAMEADDRFOLD:\r
+> + key = talloc_asprintf (opt->format, "%s <%s>", name, addrfold);\r
+> + break;\r
+> + case FILTER_BY_NAME:\r
+> + key = talloc_strdup (opt->format, name); /* !name results in !key */\r
+> + break;\r
+> + case FILTER_BY_ADDR:\r
+> + key = talloc_strdup (opt->format, addr);\r
+> + break;\r
+> + case FILTER_BY_ADDRFOLD:\r
+> + key = talloc_strdup (opt->format, addrfold);\r
+> + break;\r
+> + default:\r
+> + INTERNAL_ERROR("invalid --filter-by flags");\r
+> + }\r
+> +\r
+> + if (addrfold)\r
+> + g_free (addrfold);\r
+> +\r
+> if (! key)\r
+> return FALSE;\r
+> \r
+> @@ -523,6 +559,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])\r
+> .offset = 0,\r
+> .limit = -1, /* unlimited */\r
+> .dupe = -1,\r
+> + .filter_by = FILTER_BY_NAMEADDR,\r
+> };\r
+> char *query_str;\r
+> int opt_index, ret;\r
+> @@ -567,6 +604,13 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])\r
+> { NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 },\r
+> { NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0 },\r
+> { NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0 },\r
+> + { NOTMUCH_OPT_KEYWORD, &opt.filter_by, "filter-by", 'b',\r
+> + (notmuch_keyword_t []){ { "nameaddr", FILTER_BY_NAMEADDR },\r
+> + { "name", FILTER_BY_NAME },\r
+> + { "addr", FILTER_BY_ADDR },\r
+> + { "addrfold", FILTER_BY_ADDRFOLD },\r
+> + { "nameaddrfold", FILTER_BY_NAMEADDRFOLD },\r
+> + { 0, 0 } } },\r
+> { 0, 0, 0, 0, 0 }\r
+> };\r
+> \r
+> @@ -577,6 +621,11 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])\r
+> if (! opt.output)\r
+> opt.output = OUTPUT_SUMMARY;\r
+> \r
+> + if (opt.filter_by && !(opt.output & OUTPUT_ADDRESS_FLAGS)) {\r
+> + fprintf (stderr, "Error: --filter-by can only be used with address output.\n");\r
+> + return EXIT_FAILURE;\r
+> + }\r
+> +\r
+> switch (format_sel) {\r
+> case NOTMUCH_FORMAT_TEXT:\r
+> opt.format = sprinter_text_create (config, stdout);\r
+> diff --git a/test/T095-search-filter-by.sh b/test/T095-search-filter-by.sh\r
+> new file mode 100755\r
+> index 0000000..15c9f77\r
+> --- /dev/null\r
+> +++ b/test/T095-search-filter-by.sh\r
+> @@ -0,0 +1,73 @@\r
+> +#!/usr/bin/env bash\r
+> +test_description='duplicite address filtering in "notmuch search --output=recipients"'\r
+> +. ./test-lib.sh\r
+> +\r
+> +add_message '[to]="John Doe <foo@example.com>, John Doe <bar@example.com>"'\r
+> +add_message '[to]="\"Doe, John\" <foo@example.com>"' '[cc]="John Doe <Bar@Example.COM>"'\r
+> +add_message '[to]="\"Doe, John\" <foo@example.com>"' '[bcc]="John Doe <Bar@Example.COM>"'\r
+> +\r
+> +test_begin_subtest "--output=recipients"\r
+> +notmuch search --output=recipients "*" >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +John Doe <bar@example.com>\r
+> +"Doe, John" <foo@example.com>\r
+> +John Doe <Bar@Example.COM>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=nameaddr"\r
+> +notmuch search --output=recipients --filter-by=nameaddr "*" >OUTPUT\r
+> +# The same as above\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +John Doe <bar@example.com>\r
+> +"Doe, John" <foo@example.com>\r
+> +John Doe <Bar@Example.COM>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=name"\r
+> +notmuch search --output=recipients --filter-by=name "*" >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +"Doe, John" <foo@example.com>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=addr"\r
+> +notmuch search --output=recipients --filter-by=addr "*" >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +John Doe <bar@example.com>\r
+> +John Doe <Bar@Example.COM>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=addrfold"\r
+> +notmuch search --output=recipients --filter-by=addrfold "*" >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +John Doe <bar@example.com>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=nameaddrfold"\r
+> +notmuch search --output=recipients --filter-by=nameaddrfold "*" >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +John Doe <foo@example.com>\r
+> +John Doe <bar@example.com>\r
+> +"Doe, John" <foo@example.com>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_begin_subtest "--output=recipients --filter-by=nameaddrfold --output=count"\r
+> +notmuch search --output=recipients --filter-by=nameaddrfold --output=count "*" | sort -n >OUTPUT\r
+> +cat <<EOF >EXPECTED\r
+> +1 John Doe <foo@example.com>\r
+> +2 "Doe, John" <foo@example.com>\r
+> +3 John Doe <bar@example.com>\r
+> +EOF\r
+> +test_expect_equal_file OUTPUT EXPECTED\r
+> +\r
+> +test_done\r
+> -- \r
+> 2.1.1\r
+>\r
+> _______________________________________________\r
+> notmuch mailing list\r
+> notmuch@notmuchmail.org\r
+> http://notmuchmail.org/mailman/listinfo/notmuch\r