Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 6AB7D431FD0 for ; Thu, 25 Aug 2011 07:21:30 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kk1CxZdTsE-O for ; Thu, 25 Aug 2011 07:21:29 -0700 (PDT) Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) by olra.theworths.org (Postfix) with ESMTP id 9AA6C431FB6 for ; Thu, 25 Aug 2011 07:21:29 -0700 (PDT) Received: from [192.168.13.75] (lair.fifthhorseman.net [216.254.116.241]) by che.mayfirst.org (Postfix) with ESMTPSA id 6D8F9F970; Thu, 25 Aug 2011 10:21:26 -0400 (EDT) Message-ID: <4E565A61.7040600@fifthhorseman.net> Date: Thu, 25 Aug 2011 10:21:21 -0400 From: Daniel Kahn Gillmor User-Agent: Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20110807 Icedove/5.0 MIME-Version: 1.0 To: moabi2000 , notmuch Subject: Re: How does notmuch detect the presence of attachments? References: In-Reply-To: X-Enigmail-Version: 1.2.1 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig9DA2C5FC1FC3F0FE2D25E648" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: notmuch List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Aug 2011 14:21:30 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9DA2C5FC1FC3F0FE2D25E648 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 08/03/2011 06:01 AM, moabi2000 wrote: > 1) How does notmuch detect the presence of attachments? I have some > messages that have attachments (which I can see and open when reading > the message), but for which the 'attachment' flag is not set (and > therefore don't show up in a search like "from:myfriend AND > attachment:pdf"). How can I try to work out what is going on? According to lib/index.cc (around line 366 in the current version), the tag "attachment" is added to an e-mail only if one of the MIME parts of the message has an explicit "Content-Disposition: attachment" MIME subheader. So some mail clients may be attaching files with "Content-Disposition: inline" (i do this sometimes when attaching text/* files) or without a Content-Disposition: header on the MIME part at all. Perhaps notmuch could keep a (configurable?) list of Content-Types that should be tagged with "attachment" no matter what Content-Disposition is used? I could imagine an initial list like: application/pdf application/vnd.oasis.opendocument.text application/vnd.oasis.opendocument.spreadsheet Or maybe just any mime part with "application" as the major Content type? That would be a relatively easy (though non-general) heuristic to implement. Want to take a crack at it? > 2) Is there an option for notmuch to also index the text of > attachments (like recoll does, which also uses xapian)? People tend to > save attachments with really useless filenames (report2.pdf...), what > I'd like to be able to do is a search like "from:mycolleague AND > attachment:pdf AND attachmentcontains:ourproject" This is another great suggestion for improvement, i think. There are even comments in the code (around the same part referenced above) that sa= ys: /* XXX: Would be nice to call out to something here to parse * the attachment into text and then index that. */ A generic shim here, with a configurable index that associates Content-Types with safe convert-to-text functions would be quite nice. This would probably be a new section in ~/.notmuch-config, [textconverters], where the keys would be a specific Content-Type and the values would be system calls that take the file on stdin and produce plain text to index on stdout, like so: [textconverters] application/pdf=3Dpdf2txt /dev/stdin Starting with an initially empty set of textconverters seems reasonable and safe to me, and people could set up their own if they're interested. You'd need to re-index your message store after modifying the config, though, if you wanted to have pre-existing messages get indexed this way. Is there a way to tell notmuch to re-index a particular message? The above proposal isn't implemented at all, i'm just throwing it out for consideration. --dkg --------------enig9DA2C5FC1FC3F0FE2D25E648 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQJ8BAEBCgBmBQJOVlphXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwRUU1QkU5NzkyODJEODBCOUY3NTQwRjFD Q0QyRUQ5NEQyMTczOUU5AAoJEMzS7ZTSFznp1ZwP/iTBbAHrtb+vkJenkI8xXQFJ oeWWaVLOIMWK01q6Aq6CTrz3yGpDgJvZiVGT6+vqUGpKb/jGmxK2qR6ZsICVQmpg 0U1SFnR3uV+XKoObGABuEbVKtaeuQFeVU8tkMGvo0b5lY6eBGzXbxsX4A2W3jH2f HtartdrhNE5hT9Bbn6FXSkclZ5WDsgoXbvzberEkL6CpmOq6EAt6B6tKUMUvPsW4 SgQRM/tDTZ5leAuIy4kftfMPq1f6pAMbA3l8PNHAsx3fOYj8MHeEzDS4sWd2vii7 QGzkUvdw8gH/g/XnptUH/qVKVb6xVX+mLv2KThw7uWNBL/FMyBD1jAoGMZGvauGo 2Qe/r4xbVImAzEVe+6mVR1tOPijUhJVSIAbVV/7RLsSb1pdTsVG6lt+Dpo6y8UwS aKQq2ApDjvoAMXUr4WljukKz9tl212bkL3QAiJGhbX3qRj+/ov/vULZCOWWqgkzb J7p+Jgi54EmVye/RlRVdjpyHUGu5pndC4ijoEmeWSa93M4NmRndfhuTwMDWq11Pd mD8iOmI0FZsktccG8STbqmjSzEp1lRQ41/o08t+k4QUBKDwkuTlUe5jAlITB5/T6 4Ev2P4acGKnpZlHaV+BeJY06QH+qrUP6npeYZJgqPliawAf6xgrXKZpZSlSomXtF YahwnYfGcay8e28tRP7p =jrRV -----END PGP SIGNATURE----- --------------enig9DA2C5FC1FC3F0FE2D25E648--