Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 37899431FAF for ; Mon, 26 Nov 2012 02:15:16 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -1.098 X-Spam-Level: X-Spam-Status: No, score=-1.098 tagged_above=-999 required=5 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_FROM=0.001, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sr-mR+LyCRUR for ; Mon, 26 Nov 2012 02:15:14 -0800 (PST) Received: from mail2.qmul.ac.uk (mail2.qmul.ac.uk [138.37.6.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 98A76431FAE for ; Mon, 26 Nov 2012 02:15:14 -0800 (PST) Received: from smtp.qmul.ac.uk ([138.37.6.40]) by mail2.qmul.ac.uk with esmtp (Exim 4.71) (envelope-from ) id 1TcviW-0002AG-2M; Mon, 26 Nov 2012 10:15:10 +0000 Received: from 93-97-24-31.zone5.bethere.co.uk ([93.97.24.31] helo=localhost) by smtp.qmul.ac.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1TcviV-0004MC-LJ; Mon, 26 Nov 2012 10:15:07 +0000 From: Mark Walters To: Austin Clements Subject: Re: [PATCH v2 1/7] cli: allow query to come from stdin In-Reply-To: <20121124174134.GH4562@mit.edu> References: <1353763256-32336-1-git-send-email-markwalters1009@gmail.com> <1353763256-32336-2-git-send-email-markwalters1009@gmail.com> <20121124174134.GH4562@mit.edu> User-Agent: Notmuch/0.14+81~g9730584 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu) Date: Mon, 26 Nov 2012 10:15:06 +0000 Message-ID: <87mwy4smad.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Sender-Host-Address: 93.97.24.31 X-QM-SPAM-Info: Sender has good ham record. :) X-QM-Body-MD5: 3b216b42eab5445ef9a4d9355be19837 (of first 20000 bytes) X-SpamAssassin-Score: -1.7 X-SpamAssassin-SpamBar: - X-SpamAssassin-Report: The QM spam filters have analysed this message to determine if it is spam. We require at least 5.0 points to mark a message as spam. This message scored -1.7 points. Summary of the scoring: * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, * medium trust * [138.37.6.40 listed in list.dnswl.org] * 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider * (markwalters1009[at]gmail.com) * 0.6 AWL AWL: From: address is in the auto white-list X-QM-Scan-Virus: ClamAV says the message is clean Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2012 10:15:16 -0000 Hi Many thanks for all the reviews: I have incorporated most of your and Tomi's suggestions in my latest version. However, for this patch I wonder whether just using David's batch tagging would be sufficient. It does mean that I can't construct the list of possible tag removals correctly for a large query but I can just return all tags in this case. I think this is probably an acceptable trade off: you don't get the correct list of possible completions if you are tagging more than 5000 messages at once. This patch is not very complicated but it does add another feature/option to the command line so if it is not needed I am inclined not to add it. If people think that being able to do searches for queries in excess of ARGMAX (possible 2MB or so) is useful then we could add it. Incidentally for the tag completions: my view is the correct thing is to offer completions (for tag removal) based on what tags the buffer shows (ie what was there when the query was run) rather than what is actually tags are present now: this would be easy to add if anyone cared sufficiently. Any thoughts? Best wishes Mark Austin Clements writes: > Quoth markwalters1009 on Nov 24 at 1:20 pm: >> From: Mark Walters >> >> After this series there will be times when a caller will want to pass >> a very large query string to notmuch (eg a list of 10,000 message-ids) >> and this can exceed the size of ARG_MAX. Hence allow notmuch to take >> the query from stdin (if the query is -). >> --- >> query-string.c | 41 +++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 41 insertions(+), 0 deletions(-) >> >> diff --git a/query-string.c b/query-string.c >> index 6536512..b1fbdeb 100644 >> --- a/query-string.c >> +++ b/query-string.c >> @@ -20,6 +20,44 @@ >> >> #include "notmuch-client.h" >> >> +/* Read a single query string from STDIN, using >> + * 'ctx' as the talloc owner for all allocations. >> + * >> + * This function returns NULL in case of insufficient memory or read >> + * errors. >> + */ >> +static char * >> +query_string_from_stdin (void *ctx) >> +{ >> + char *query_string; >> + char buf[4096]; >> + ssize_t remain; >> + >> + query_string = talloc_strdup (ctx, ""); >> + if (query_string == NULL) >> + return NULL; >> + >> + for (;;) { >> + remain = read (STDIN_FILENO, buf, sizeof(buf) - 1); >> + if (remain == 0) >> + break; >> + if (remain < 0) { >> + if (errno == EINTR) >> + continue; >> + fprintf (stderr, "Error: reading from standard input: %s\n", >> + strerror (errno)); > > talloc_free (query_string) ? > >> + return NULL; >> + } >> + >> + buf[remain] = '\0'; >> + query_string = talloc_strdup_append (query_string, buf); > > Eliminate the NUL in buf and instead > talloc_strndup_append (query_string, buf, remain) ? > > Should there be some (large) bound on the size of the query string to > prevent runaway? > >> + if (query_string == NULL) > > Technically it would be good to talloc_free the old pointer here, too. > >> + return NULL; >> + } >> + >> + return query_string; >> +} >> + > > This whole approach is O(n^2), which might actually matter for large > query strings. How about (tested, but only a little): > > #define MAX_QUERY_STRING_LENGTH (16 * 1024 * 1024) > > /* Read a single query string from STDIN, using 'ctx' as the talloc > * owner for all allocations. > * > * This function returns NULL in case of insufficient memory or read > * errors. > */ > static char * > query_string_from_stdin (void *ctx) > { > char *query_string = NULL, *new_qs; > size_t pos = 0, end = 0; > ssize_t got; > > for (;;) { > if (end - pos < 512) { > end = MAX(end * 2, 1024); > if (end >= MAX_QUERY_STRING_LENGTH) { > fprintf (stderr, "Error: query too long\n"); > goto FAIL; > } > new_qs = talloc_realloc (ctx, query_string, char, end); > if (new_qs == NULL) > goto FAIL; > query_string = new_qs; > } > > got = read (STDIN_FILENO, query_string + pos, end - pos - 1); > if (got == 0) > break; > if (got < 0) { > if (errno == EINTR) > continue; > fprintf (stderr, "Error: reading from standard input: %s\n", > strerror (errno)); > goto FAIL; > } > pos += got; > } > > query_string[pos] = '\0'; > return query_string; > > FAIL: > talloc_free (query_string); > return NULL; > } > >> /* Construct a single query string from the passed arguments, using >> * 'ctx' as the talloc owner for all allocations. >> * >> @@ -35,6 +73,9 @@ query_string_from_args (void *ctx, int argc, char *argv[]) >> char *query_string; >> int i; >> >> + if ((argc == 1) && (strcmp ("-", argv[0]) == 0)) >> + return query_string_from_stdin (ctx); >> + >> query_string = talloc_strdup (ctx, ""); >> if (query_string == NULL) >> return NULL;