From: Jani Nikula Date: Sat, 1 Nov 2014 08:55:09 +0000 (+0200) Subject: Re: [PATCH v6 7/7] cli: search: Add --filter-by option to configure address filtering X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=8048a2a7e049e9e4136be1b478a04b72298f8207;p=notmuch-archives.git Re: [PATCH v6 7/7] cli: search: Add --filter-by option to configure address filtering --- diff --git a/f7/3903cf1fc4669d4f8404892869274791931975 b/f7/3903cf1fc4669d4f8404892869274791931975 new file mode 100644 index 000000000..71ba40511 --- /dev/null +++ b/f7/3903cf1fc4669d4f8404892869274791931975 @@ -0,0 +1,381 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id D9727431FD0 + for ; Sat, 1 Nov 2014 01:55:18 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id CMz+UlWTVCYw for ; + Sat, 1 Nov 2014 01:55:14 -0700 (PDT) +Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com + [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client + certificate requested) by olra.theworths.org (Postfix) with ESMTPS id + E5994431FAF for ; Sat, 1 Nov 2014 01:55:13 -0700 + (PDT) +Received: by mail-wg0-f47.google.com with SMTP id a1so9303595wgh.6 + for ; Sat, 01 Nov 2014 01:55:12 -0700 (PDT) +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20130820; + h=x-gm-message-state:from:to:subject:in-reply-to:references + :user-agent:date:message-id:mime-version:content-type; + bh=otA/EnOStfjmAXX2XpaHXp3ilREuNPvn2w4VHM82nZ0=; + b=h1EC9tdRcMHteyP2WDQ0yHAt2UeDe1FFbZzEPpdCROqy2PYK3SSzLGUT0B9YD7T3Mq + AMDf3P007zEofBowyYaJbyblY+Yrhmt+epPS77RLUtW2rPrHM/FgwxVFEZUAv2znahQC + BLpQzHOdSF1JbOVQrYbnZE4LZ4BVxgXe0xqlePiGv2bh3iElBpOWZWUlsmIEO+WprPUQ + K8GZmW8FU3TpecwjTcrkczo4as7tQMij4xT1Jb0VjgJGdR5gVxd93jCFaXt0O/ZCaL37 + p+K+imPVxyOA7WFSgQi/jTjdyygjOfcd+Ey38PefVfHNBXM9RpAM/IiqzMbhWIBANcd/ + jVSw== +X-Gm-Message-State: + ALoCoQlIBf59mxWN4xNZmMwJh6CWt3qzK6QRYQwx/1Oqkg8y9CsJ4OkUALFUkwvHTuVbtKF1hA7p +X-Received: by 10.194.82.74 with SMTP id g10mr1095434wjy.116.1414832112433; + Sat, 01 Nov 2014 01:55:12 -0700 (PDT) +Received: from localhost (dsl-hkibrasgw2-58c36d-48.dhcp.inet.fi. + [88.195.109.48]) + by mx.google.com with ESMTPSA id wl1sm14710640wjb.4.2014.11.01.01.55.11 + for + (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); + Sat, 01 Nov 2014 01:55:11 -0700 (PDT) +From: Jani Nikula +To: Michal Sojka , notmuch@notmuchmail.org +Subject: Re: [PATCH v6 7/7] cli: search: Add --filter-by option to + configure address filtering +In-Reply-To: <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz> +References: <1414792441-29555-1-git-send-email-sojkam1@fel.cvut.cz> + <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz> +User-Agent: Notmuch/0.18.2+156~g3cc8ed5 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-pc-linux-gnu) +Date: Sat, 01 Nov 2014 10:55:09 +0200 +Message-ID: <87mw8be1w2.fsf@nikula.org> +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Sat, 01 Nov 2014 08:55:19 -0000 + +On Fri, 31 Oct 2014, Michal Sojka wrote: +> This option allows to configure the criterion for duplicate address +> filtering. Without this option, all unique combinations of name and +> address parts are printed. This option allows to filter the output +> more, for example to only contain unique address parts. + +This patch finally makes me think we should have a separate 'notmuch +address' command for all of this. We are starting to have two orthogonal +sets of 'notmuch search' options, one set for search and another for +addresses. I regret not following the series and then making the +observation so late. + +BR, +Jani. + + +> --- +> completion/notmuch-completion.bash | 6 +++- +> completion/notmuch-completion.zsh | 3 +- +> doc/man1/notmuch-search.rst | 39 +++++++++++++++++++- +> notmuch-search.c | 53 +++++++++++++++++++++++++-- +> test/T095-search-filter-by.sh | 73 ++++++++++++++++++++++++++++++++++++++ +> 5 files changed, 169 insertions(+), 5 deletions(-) +> create mode 100755 test/T095-search-filter-by.sh +> +> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash +> index 39cd829..b625b02 100644 +> --- a/completion/notmuch-completion.bash +> +++ b/completion/notmuch-completion.bash +> @@ -305,12 +305,16 @@ _notmuch_search() +> COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) ) +> return +> ;; +> + --filter-by) +> + COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) ) +> + return +> + ;; +> esac +> +> ! $split && +> case "${cur}" in +> -*) +> - local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate=" +> + local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by=" +> compopt -o nospace +> COMPREPLY=( $(compgen -W "$options" -- ${cur}) ) +> ;; +> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh +> index d7e5a5e..c1ccc32 100644 +> --- a/completion/notmuch-completion.zsh +> +++ b/completion/notmuch-completion.zsh +> @@ -53,7 +53,8 @@ _notmuch_search() +> '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \ +> '--first=[omit the first x threads from the search results]:number of threads to omit: ' \ +> '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \ +> - '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' +> + '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' \ +> + '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))' +> } +> +> _notmuch() +> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst +> index ec89200..3a5556b 100644 +> --- a/doc/man1/notmuch-search.rst +> +++ b/doc/man1/notmuch-search.rst +> @@ -85,7 +85,8 @@ Supported options for **search** include +> (--format=text0), as a JSON array (--format=json), or as +> an S-Expression list (--format=sexp). +> +> - Duplicate addresses are filtered out. +> + Duplicate addresses are filtered out. Filtering can be +> + configured with the --filter-by option. +> +> Note: Searching for **sender** should be much faster than +> searching for **recipients**, because sender addresses are +> @@ -158,6 +159,42 @@ Supported options for **search** include +> prefix. The prefix matches messages based on filenames. This +> option filters filenames of the matching messages. +> +> + ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\) +> + +> + Can be used with ``--output=sender`` or +> + ``--output=recipients`` to filter out duplicate addresses. The +> + filtering algorithm receives a sequence of email addresses and +> + outputs the same sequence without the addresses that are +> + considered a duplicate of a previously output address. What is +> + considered a duplicate depends on how the two addresses are +> + compared and this can be controlled with the following +> + keywords: +> + +> + **nameaddr** means that both name and address parts are +> + compared in case-sensitive manner. Therefore, all same looking +> + addresses strings are considered duplicate. This is the +> + default. +> + +> + **name** means that only the name part is compared (in +> + case-sensitive manner). For example, the addresses "John Doe +> + " and "John Doe " will be +> + considered duplicate. +> + +> + **addr** means that only the address part is compared (in +> + case-sensitive manner). For example, the addresses "John Doe +> + " and "Dr. John Doe " will +> + be considered duplicate. +> + +> + **addrfold** is like **addr**, but comparison is done in +> + canse-insensitive manner. For example, the addresses "John Doe +> + " and "Dr. John Doe " will +> + be considered duplicate. +> + +> + **nameaddrfold** is like **nameaddr**, but address comparison +> + is done in canse-insensitive manner. For example, the +> + addresses "John Doe " and "John Doe +> + " will be considered duplicate. +> + +> EXIT STATUS +> =========== +> +> diff --git a/notmuch-search.c b/notmuch-search.c +> index 4b39dfc..a350f06 100644 +> --- a/notmuch-search.c +> +++ b/notmuch-search.c +> @@ -35,6 +35,14 @@ typedef enum { +> +> #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS | OUTPUT_COUNT) +> +> +typedef enum { +> + FILTER_BY_NAMEADDR = 0, +> + FILTER_BY_NAME, +> + FILTER_BY_ADDR, +> + FILTER_BY_ADDRFOLD, +> + FILTER_BY_NAMEADDRFOLD, +> +} filter_by_t; +> + +> typedef struct { +> sprinter_t *format; +> notmuch_query_t *query; +> @@ -43,6 +51,7 @@ typedef struct { +> int offset; +> int limit; +> int dupe; +> + filter_by_t filter_by; +> } search_options_t; +> +> typedef struct { +> @@ -231,15 +240,42 @@ do_search_threads (search_options_t *opt) +> return 0; +> } +> +> -/* Returns TRUE iff name and addr is duplicate. */ +> +/* Returns TRUE iff name and/or addr is considered duplicate. */ +> static notmuch_bool_t +> is_duplicate (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr) +> { +> notmuch_bool_t duplicate; +> char *key; +> + gchar *addrfold = NULL; +> mailbox_t *mailbox; +> +> - key = talloc_asprintf (opt->format, "%s <%s>", name, addr); +> + if (opt->filter_by == FILTER_BY_ADDRFOLD || +> + opt->filter_by == FILTER_BY_NAMEADDRFOLD) +> + addrfold = g_utf8_casefold (addr, -1); +> + +> + switch (opt->filter_by) { +> + case FILTER_BY_NAMEADDR: +> + key = talloc_asprintf (opt->format, "%s <%s>", name, addr); +> + break; +> + case FILTER_BY_NAMEADDRFOLD: +> + key = talloc_asprintf (opt->format, "%s <%s>", name, addrfold); +> + break; +> + case FILTER_BY_NAME: +> + key = talloc_strdup (opt->format, name); /* !name results in !key */ +> + break; +> + case FILTER_BY_ADDR: +> + key = talloc_strdup (opt->format, addr); +> + break; +> + case FILTER_BY_ADDRFOLD: +> + key = talloc_strdup (opt->format, addrfold); +> + break; +> + default: +> + INTERNAL_ERROR("invalid --filter-by flags"); +> + } +> + +> + if (addrfold) +> + g_free (addrfold); +> + +> if (! key) +> return FALSE; +> +> @@ -523,6 +559,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) +> .offset = 0, +> .limit = -1, /* unlimited */ +> .dupe = -1, +> + .filter_by = FILTER_BY_NAMEADDR, +> }; +> char *query_str; +> int opt_index, ret; +> @@ -567,6 +604,13 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) +> { NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 }, +> { NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0 }, +> { NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0 }, +> + { NOTMUCH_OPT_KEYWORD, &opt.filter_by, "filter-by", 'b', +> + (notmuch_keyword_t []){ { "nameaddr", FILTER_BY_NAMEADDR }, +> + { "name", FILTER_BY_NAME }, +> + { "addr", FILTER_BY_ADDR }, +> + { "addrfold", FILTER_BY_ADDRFOLD }, +> + { "nameaddrfold", FILTER_BY_NAMEADDRFOLD }, +> + { 0, 0 } } }, +> { 0, 0, 0, 0, 0 } +> }; +> +> @@ -577,6 +621,11 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) +> if (! opt.output) +> opt.output = OUTPUT_SUMMARY; +> +> + if (opt.filter_by && !(opt.output & OUTPUT_ADDRESS_FLAGS)) { +> + fprintf (stderr, "Error: --filter-by can only be used with address output.\n"); +> + return EXIT_FAILURE; +> + } +> + +> switch (format_sel) { +> case NOTMUCH_FORMAT_TEXT: +> opt.format = sprinter_text_create (config, stdout); +> diff --git a/test/T095-search-filter-by.sh b/test/T095-search-filter-by.sh +> new file mode 100755 +> index 0000000..15c9f77 +> --- /dev/null +> +++ b/test/T095-search-filter-by.sh +> @@ -0,0 +1,73 @@ +> +#!/usr/bin/env bash +> +test_description='duplicite address filtering in "notmuch search --output=recipients"' +> +. ./test-lib.sh +> + +> +add_message '[to]="John Doe , John Doe "' +> +add_message '[to]="\"Doe, John\" "' '[cc]="John Doe "' +> +add_message '[to]="\"Doe, John\" "' '[bcc]="John Doe "' +> + +> +test_begin_subtest "--output=recipients" +> +notmuch search --output=recipients "*" >OUTPUT +> +cat <EXPECTED +> +John Doe +> +John Doe +> +"Doe, John" +> +John Doe +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=nameaddr" +> +notmuch search --output=recipients --filter-by=nameaddr "*" >OUTPUT +> +# The same as above +> +cat <EXPECTED +> +John Doe +> +John Doe +> +"Doe, John" +> +John Doe +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=name" +> +notmuch search --output=recipients --filter-by=name "*" >OUTPUT +> +cat <EXPECTED +> +John Doe +> +"Doe, John" +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=addr" +> +notmuch search --output=recipients --filter-by=addr "*" >OUTPUT +> +cat <EXPECTED +> +John Doe +> +John Doe +> +John Doe +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=addrfold" +> +notmuch search --output=recipients --filter-by=addrfold "*" >OUTPUT +> +cat <EXPECTED +> +John Doe +> +John Doe +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=nameaddrfold" +> +notmuch search --output=recipients --filter-by=nameaddrfold "*" >OUTPUT +> +cat <EXPECTED +> +John Doe +> +John Doe +> +"Doe, John" +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_begin_subtest "--output=recipients --filter-by=nameaddrfold --output=count" +> +notmuch search --output=recipients --filter-by=nameaddrfold --output=count "*" | sort -n >OUTPUT +> +cat <EXPECTED +> +1 John Doe +> +2 "Doe, John" +> +3 John Doe +> +EOF +> +test_expect_equal_file OUTPUT EXPECTED +> + +> +test_done +> -- +> 2.1.1 +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch