Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id D52F1431FD0 for ; Sat, 14 May 2011 18:37:26 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.699 X-Spam-Level: X-Spam-Status: No, score=-0.699 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R6xgbCdEw0S6 for ; Sat, 14 May 2011 18:37:26 -0700 (PDT) Received: from mail-qw0-f53.google.com (mail-qw0-f53.google.com [209.85.216.53]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 37AF9431FB6 for ; Sat, 14 May 2011 18:37:26 -0700 (PDT) Received: by qwb7 with SMTP id 7so2175805qwb.26 for ; Sat, 14 May 2011 18:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=/jzFaXqoOVmGepyHik8WGFtyk52B1kJfyFPbK9cQX5g=; b=R6l5YfFxIvfbBhfdWTyS+zPSEFKEqZoYlIcf+dPesbR3DQPqBrMx1rIEimrZK6gXlq jnB0ZzfJmn0w0JOmHYC1VX6xV96EtH2pWtA40Zpugg0huhdj+OBtlMcXh9v4qGFSbKUC PL5HP14IEURrBXnmdOvzE3wGW4k5UVNjpqOIU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=LpYwqEyaEygQKLfI7LH6/MJ0z6qvFCWRZVnYfw5GtH3FLJTgXrI5PgV8shGCLh9wBC bMxnAsA5J3VpQ+LcvAmyC2kmraNkOust5aCiih0oeLoiLJl0MnaBkT3w3uBGLVK7fw79 rflGQeD8oRvip16VfDfWZj53v/fhYY56NJlU8= MIME-Version: 1.0 Received: by 10.229.105.9 with SMTP id r9mr2264636qco.198.1305423445233; Sat, 14 May 2011 18:37:25 -0700 (PDT) Sender: amdragon@gmail.com Received: by 10.229.188.68 with HTTP; Sat, 14 May 2011 18:37:25 -0700 (PDT) In-Reply-To: References: Date: Sat, 14 May 2011 21:37:25 -0400 X-Google-Sender-Auth: Qlu8OqtiWS6GG9n8wLYgXn84bJc Message-ID: Subject: Re: storing From and Subject in xapian From: Austin Clements To: Istvan Marko Content-Type: text/plain; charset=ISO-8859-1 Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2011 01:37:27 -0000 I wonder if a better approach would be to use notmuch_message_get_header everywhere, rather than introducing _notmuch_message_get_header_value, and have it simply recognize headers that can be retrieved directly from the database. Then library callers could take advantage of this optimization and it could be trivially extended to other headers in the future. On Tue, May 3, 2011 at 11:40 PM, Istvan Marko wrote: > I have been looking at the I/O patterns of "notmuch search" with the > default output format and noticed that it has to parse the maildir file > of every matched message to get the From and Subject headers. I figured > that this must be slowing things down, especially when the files are not > in the filesystem cache. > > So I wanted to see how much difference would it make to have the From > and Subject stored in xapian to avoid this parsing. > > With the attached patch I get a speedup of 2x with cached and almost 10x > with uncached files for searches with many matches. > > The attached patch is only intended as proof of concept. I am not > familiar with xapian so I wasn't sure if this kind of data should be > stored as terms, values or data. I went with values simply because I saw > that message-id and timestamp were already stored that way. Perhaps the > data type would be more appropriate since the fields are not used for > searching or sorting. Oh and for some reason I get blank Subject for > about 1% of the matches. > > > Is there a downside to this approach? The only one I see is that the > xapian db size increases by about 1% but to me the speed increase would > be well worth it.