Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D9727431FD0 for ; Sat, 1 Nov 2014 01:55:18 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CMz+UlWTVCYw for ; Sat, 1 Nov 2014 01:55:14 -0700 (PDT) Received: from mail-wg0-f47.google.com (mail-wg0-f47.google.com [74.125.82.47]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id E5994431FAF for ; Sat, 1 Nov 2014 01:55:13 -0700 (PDT) Received: by mail-wg0-f47.google.com with SMTP id a1so9303595wgh.6 for ; Sat, 01 Nov 2014 01:55:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-type; bh=otA/EnOStfjmAXX2XpaHXp3ilREuNPvn2w4VHM82nZ0=; b=h1EC9tdRcMHteyP2WDQ0yHAt2UeDe1FFbZzEPpdCROqy2PYK3SSzLGUT0B9YD7T3Mq AMDf3P007zEofBowyYaJbyblY+Yrhmt+epPS77RLUtW2rPrHM/FgwxVFEZUAv2znahQC BLpQzHOdSF1JbOVQrYbnZE4LZ4BVxgXe0xqlePiGv2bh3iElBpOWZWUlsmIEO+WprPUQ K8GZmW8FU3TpecwjTcrkczo4as7tQMij4xT1Jb0VjgJGdR5gVxd93jCFaXt0O/ZCaL37 p+K+imPVxyOA7WFSgQi/jTjdyygjOfcd+Ey38PefVfHNBXM9RpAM/IiqzMbhWIBANcd/ jVSw== X-Gm-Message-State: ALoCoQlIBf59mxWN4xNZmMwJh6CWt3qzK6QRYQwx/1Oqkg8y9CsJ4OkUALFUkwvHTuVbtKF1hA7p X-Received: by 10.194.82.74 with SMTP id g10mr1095434wjy.116.1414832112433; Sat, 01 Nov 2014 01:55:12 -0700 (PDT) Received: from localhost (dsl-hkibrasgw2-58c36d-48.dhcp.inet.fi. [88.195.109.48]) by mx.google.com with ESMTPSA id wl1sm14710640wjb.4.2014.11.01.01.55.11 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 01 Nov 2014 01:55:11 -0700 (PDT) From: Jani Nikula To: Michal Sojka , notmuch@notmuchmail.org Subject: Re: [PATCH v6 7/7] cli: search: Add --filter-by option to configure address filtering In-Reply-To: <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz> References: <1414792441-29555-1-git-send-email-sojkam1@fel.cvut.cz> <1414792441-29555-8-git-send-email-sojkam1@fel.cvut.cz> User-Agent: Notmuch/0.18.2+156~g3cc8ed5 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Sat, 01 Nov 2014 10:55:09 +0200 Message-ID: <87mw8be1w2.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Nov 2014 08:55:19 -0000 On Fri, 31 Oct 2014, Michal Sojka wrote: > This option allows to configure the criterion for duplicate address > filtering. Without this option, all unique combinations of name and > address parts are printed. This option allows to filter the output > more, for example to only contain unique address parts. This patch finally makes me think we should have a separate 'notmuch address' command for all of this. We are starting to have two orthogonal sets of 'notmuch search' options, one set for search and another for addresses. I regret not following the series and then making the observation so late. BR, Jani. > --- > completion/notmuch-completion.bash | 6 +++- > completion/notmuch-completion.zsh | 3 +- > doc/man1/notmuch-search.rst | 39 +++++++++++++++++++- > notmuch-search.c | 53 +++++++++++++++++++++++++-- > test/T095-search-filter-by.sh | 73 ++++++++++++++++++++++++++++++++++++++ > 5 files changed, 169 insertions(+), 5 deletions(-) > create mode 100755 test/T095-search-filter-by.sh > > diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash > index 39cd829..b625b02 100644 > --- a/completion/notmuch-completion.bash > +++ b/completion/notmuch-completion.bash > @@ -305,12 +305,16 @@ _notmuch_search() > COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) ) > return > ;; > + --filter-by) > + COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) ) > + return > + ;; > esac > > ! $split && > case "${cur}" in > -*) > - local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate=" > + local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by=" > compopt -o nospace > COMPREPLY=( $(compgen -W "$options" -- ${cur}) ) > ;; > diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh > index d7e5a5e..c1ccc32 100644 > --- a/completion/notmuch-completion.zsh > +++ b/completion/notmuch-completion.zsh > @@ -53,7 +53,8 @@ _notmuch_search() > '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \ > '--first=[omit the first x threads from the search results]:number of threads to omit: ' \ > '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \ > - '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' > + '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' \ > + '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))' > } > > _notmuch() > diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst > index ec89200..3a5556b 100644 > --- a/doc/man1/notmuch-search.rst > +++ b/doc/man1/notmuch-search.rst > @@ -85,7 +85,8 @@ Supported options for **search** include > (--format=text0), as a JSON array (--format=json), or as > an S-Expression list (--format=sexp). > > - Duplicate addresses are filtered out. > + Duplicate addresses are filtered out. Filtering can be > + configured with the --filter-by option. > > Note: Searching for **sender** should be much faster than > searching for **recipients**, because sender addresses are > @@ -158,6 +159,42 @@ Supported options for **search** include > prefix. The prefix matches messages based on filenames. This > option filters filenames of the matching messages. > > + ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\) > + > + Can be used with ``--output=sender`` or > + ``--output=recipients`` to filter out duplicate addresses. The > + filtering algorithm receives a sequence of email addresses and > + outputs the same sequence without the addresses that are > + considered a duplicate of a previously output address. What is > + considered a duplicate depends on how the two addresses are > + compared and this can be controlled with the following > + keywords: > + > + **nameaddr** means that both name and address parts are > + compared in case-sensitive manner. Therefore, all same looking > + addresses strings are considered duplicate. This is the > + default. > + > + **name** means that only the name part is compared (in > + case-sensitive manner). For example, the addresses "John Doe > + " and "John Doe " will be > + considered duplicate. > + > + **addr** means that only the address part is compared (in > + case-sensitive manner). For example, the addresses "John Doe > + " and "Dr. John Doe " will > + be considered duplicate. > + > + **addrfold** is like **addr**, but comparison is done in > + canse-insensitive manner. For example, the addresses "John Doe > + " and "Dr. John Doe " will > + be considered duplicate. > + > + **nameaddrfold** is like **nameaddr**, but address comparison > + is done in canse-insensitive manner. For example, the > + addresses "John Doe " and "John Doe > + " will be considered duplicate. > + > EXIT STATUS > =========== > > diff --git a/notmuch-search.c b/notmuch-search.c > index 4b39dfc..a350f06 100644 > --- a/notmuch-search.c > +++ b/notmuch-search.c > @@ -35,6 +35,14 @@ typedef enum { > > #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS | OUTPUT_COUNT) > > +typedef enum { > + FILTER_BY_NAMEADDR = 0, > + FILTER_BY_NAME, > + FILTER_BY_ADDR, > + FILTER_BY_ADDRFOLD, > + FILTER_BY_NAMEADDRFOLD, > +} filter_by_t; > + > typedef struct { > sprinter_t *format; > notmuch_query_t *query; > @@ -43,6 +51,7 @@ typedef struct { > int offset; > int limit; > int dupe; > + filter_by_t filter_by; > } search_options_t; > > typedef struct { > @@ -231,15 +240,42 @@ do_search_threads (search_options_t *opt) > return 0; > } > > -/* Returns TRUE iff name and addr is duplicate. */ > +/* Returns TRUE iff name and/or addr is considered duplicate. */ > static notmuch_bool_t > is_duplicate (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr) > { > notmuch_bool_t duplicate; > char *key; > + gchar *addrfold = NULL; > mailbox_t *mailbox; > > - key = talloc_asprintf (opt->format, "%s <%s>", name, addr); > + if (opt->filter_by == FILTER_BY_ADDRFOLD || > + opt->filter_by == FILTER_BY_NAMEADDRFOLD) > + addrfold = g_utf8_casefold (addr, -1); > + > + switch (opt->filter_by) { > + case FILTER_BY_NAMEADDR: > + key = talloc_asprintf (opt->format, "%s <%s>", name, addr); > + break; > + case FILTER_BY_NAMEADDRFOLD: > + key = talloc_asprintf (opt->format, "%s <%s>", name, addrfold); > + break; > + case FILTER_BY_NAME: > + key = talloc_strdup (opt->format, name); /* !name results in !key */ > + break; > + case FILTER_BY_ADDR: > + key = talloc_strdup (opt->format, addr); > + break; > + case FILTER_BY_ADDRFOLD: > + key = talloc_strdup (opt->format, addrfold); > + break; > + default: > + INTERNAL_ERROR("invalid --filter-by flags"); > + } > + > + if (addrfold) > + g_free (addrfold); > + > if (! key) > return FALSE; > > @@ -523,6 +559,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) > .offset = 0, > .limit = -1, /* unlimited */ > .dupe = -1, > + .filter_by = FILTER_BY_NAMEADDR, > }; > char *query_str; > int opt_index, ret; > @@ -567,6 +604,13 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) > { NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 }, > { NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0 }, > { NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0 }, > + { NOTMUCH_OPT_KEYWORD, &opt.filter_by, "filter-by", 'b', > + (notmuch_keyword_t []){ { "nameaddr", FILTER_BY_NAMEADDR }, > + { "name", FILTER_BY_NAME }, > + { "addr", FILTER_BY_ADDR }, > + { "addrfold", FILTER_BY_ADDRFOLD }, > + { "nameaddrfold", FILTER_BY_NAMEADDRFOLD }, > + { 0, 0 } } }, > { 0, 0, 0, 0, 0 } > }; > > @@ -577,6 +621,11 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[]) > if (! opt.output) > opt.output = OUTPUT_SUMMARY; > > + if (opt.filter_by && !(opt.output & OUTPUT_ADDRESS_FLAGS)) { > + fprintf (stderr, "Error: --filter-by can only be used with address output.\n"); > + return EXIT_FAILURE; > + } > + > switch (format_sel) { > case NOTMUCH_FORMAT_TEXT: > opt.format = sprinter_text_create (config, stdout); > diff --git a/test/T095-search-filter-by.sh b/test/T095-search-filter-by.sh > new file mode 100755 > index 0000000..15c9f77 > --- /dev/null > +++ b/test/T095-search-filter-by.sh > @@ -0,0 +1,73 @@ > +#!/usr/bin/env bash > +test_description='duplicite address filtering in "notmuch search --output=recipients"' > +. ./test-lib.sh > + > +add_message '[to]="John Doe , John Doe "' > +add_message '[to]="\"Doe, John\" "' '[cc]="John Doe "' > +add_message '[to]="\"Doe, John\" "' '[bcc]="John Doe "' > + > +test_begin_subtest "--output=recipients" > +notmuch search --output=recipients "*" >OUTPUT > +cat <EXPECTED > +John Doe > +John Doe > +"Doe, John" > +John Doe > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=nameaddr" > +notmuch search --output=recipients --filter-by=nameaddr "*" >OUTPUT > +# The same as above > +cat <EXPECTED > +John Doe > +John Doe > +"Doe, John" > +John Doe > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=name" > +notmuch search --output=recipients --filter-by=name "*" >OUTPUT > +cat <EXPECTED > +John Doe > +"Doe, John" > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=addr" > +notmuch search --output=recipients --filter-by=addr "*" >OUTPUT > +cat <EXPECTED > +John Doe > +John Doe > +John Doe > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=addrfold" > +notmuch search --output=recipients --filter-by=addrfold "*" >OUTPUT > +cat <EXPECTED > +John Doe > +John Doe > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=nameaddrfold" > +notmuch search --output=recipients --filter-by=nameaddrfold "*" >OUTPUT > +cat <EXPECTED > +John Doe > +John Doe > +"Doe, John" > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_begin_subtest "--output=recipients --filter-by=nameaddrfold --output=count" > +notmuch search --output=recipients --filter-by=nameaddrfold --output=count "*" | sort -n >OUTPUT > +cat <EXPECTED > +1 John Doe > +2 "Doe, John" > +3 John Doe > +EOF > +test_expect_equal_file OUTPUT EXPECTED > + > +test_done > -- > 2.1.1 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch