Re: correct way to search for only PDF attachments
authorCarl Worth <cworth@cworth.org>
Tue, 29 Sep 2015 02:00:13 +0000 (19:00 +1700)
committerW. Trevor King <wking@tremily.us>
Sat, 20 Aug 2016 21:49:41 +0000 (14:49 -0700)
0e/b0a353c77c78ebd089632cf1c0ff29f10af3a3 [new file with mode: 0644]

diff --git a/0e/b0a353c77c78ebd089632cf1c0ff29f10af3a3 b/0e/b0a353c77c78ebd089632cf1c0ff29f10af3a3
new file mode 100644 (file)
index 0000000..78e8e67
--- /dev/null
@@ -0,0 +1,113 @@
+Return-Path: <cworth@cworth.org>\r
+X-Original-To: notmuch@notmuchmail.org\r
+Delivered-To: notmuch@notmuchmail.org\r
+Received: from localhost (localhost [127.0.0.1])\r
+ by arlo.cworth.org (Postfix) with ESMTP id F37E46DE0298\r
+ for <notmuch@notmuchmail.org>; Mon, 28 Sep 2015 19:00:15 -0700 (PDT)\r
+X-Virus-Scanned: Debian amavisd-new at cworth.org\r
+X-Spam-Flag: NO\r
+X-Spam-Score: -9\r
+X-Spam-Level: \r
+X-Spam-Status: No, score=-9 tagged_above=-999 required=5 tests=[AM.WBL=-8,\r
+ ALL_TRUSTED=-1, AWL=0.000] autolearn=disabled\r
+Received: from arlo.cworth.org ([127.0.0.1])\r
+ by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024)\r
+ with ESMTP id xzXl6xNmLQVl; Mon, 28 Sep 2015 19:00:14 -0700 (PDT)\r
+Received: from wondoo.home.cworth.org (unknown [10.0.0.1])\r
+ (Authenticated sender: cworth)\r
+ by arlo.cworth.org (Postfix) with ESMTPSA id 367596DE01F5;\r
+ Mon, 28 Sep 2015 19:00:14 -0700 (PDT)\r
+Received: from wondoo.home.cworth.org (localhost [IPv6:::1])\r
+ by wondoo.home.cworth.org (Postfix) with ESMTPS id 1DC3114C43AC;\r
+ Mon, 28 Sep 2015 19:00:14 -0700 (PDT)\r
+From: Carl Worth <cworth@cworth.org>\r
+To: Xu Wang <xuwang762@gmail.com>, notmuch@notmuchmail.org\r
+Subject: Re: correct way to search for only PDF attachments\r
+In-Reply-To:\r
+ <CAJhTkNgwX8cmsKfJGV+x7HHMXPNZvXFXO=KZzLvrcWCGrDL=Pg@mail.gmail.com>\r
+References:\r
+ <CAJhTkNgwX8cmsKfJGV+x7HHMXPNZvXFXO=KZzLvrcWCGrDL=Pg@mail.gmail.com>\r
+User-Agent: Notmuch/0.20.2 (http://notmuchmail.org) Emacs/24.5.1\r
+ (x86_64-pc-linux-gnu)\r
+Date: Mon, 28 Sep 2015 19:00:13 -0700\r
+Message-ID: <87vbau9e8i.fsf@wondoo.home.cworth.org>\r
+MIME-Version: 1.0\r
+Content-Type: multipart/signed; boundary="=-=-=";\r
+ micalg=pgp-sha512; protocol="application/pgp-signature"\r
+X-BeenThere: notmuch@notmuchmail.org\r
+X-Mailman-Version: 2.1.18\r
+Precedence: list\r
+List-Id: "Use and development of the notmuch mail system."\r
+ <notmuch.notmuchmail.org>\r
+List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>\r
+List-Archive: <http://notmuchmail.org/pipermail/notmuch/>\r
+List-Post: <mailto:notmuch@notmuchmail.org>\r
+List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>\r
+List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,\r
+ <mailto:notmuch-request@notmuchmail.org?subject=subscribe>\r
+X-List-Received-Date: Tue, 29 Sep 2015 02:00:16 -0000\r
+\r
+--=-=-=\r
+Content-Type: text/plain\r
+\r
+On Mon, Sep 28 2015, Xu Wang wrote:\r
+> I would look to look for all emails from a colleague jongho. I tried:\r
+>\r
+> from:jongho attachment:pdf\r
+>\r
+> which seems to do as I wanted.\r
+\r
+Good. That should work.\r
+\r
+> To understand more, what does the following search for?\r
+>\r
+> from:jongho attachment:.*pdf\r
+\r
+Uhm, probably only strange things. There are some mechanisms for getting\r
+notmuch to emit some debugging information on what the final search\r
+terms end up being, (but I don't recall if they still require\r
+recompilation or not).\r
+\r
+I'm not testing now, but I wouldn't be surprised if that ended up doing\r
+something like searching for a phrase like "attachment pdf" anywhere\r
+within a message. (The Xapian parser can be somewhat unpredictable when\r
+you give it unexpected input.)\r
+\r
+> Also, how does the first one above know that I want only PDF\r
+> attachments and not an attachment called "pdformula.txt" ?\r
+\r
+It doesn't know that you want only PDF attachments. The key part is that\r
+the indexing is performed by breaking text up into individual terms, (at\r
+punctuation boundaries usually). So a search specification like\r
+"attachment:pdf" is searching for things that were indexed with the\r
+"pdf" term within the attachment prefix. So that won't match a filename\r
+like pdformula.txt, (which would be indexed as two terms, "pdformula"\r
+and "txt"), but it would match pdf.ormula.txt, (which would be indexed\r
+as three terms, "pdf", "ormula" and "txt").\r
+\r
+The Xapian documentation can be examined if you want more details.\r
+\r
+-Carl\r
+\r
+--=-=-=\r
+Content-Type: application/pgp-signature; name="signature.asc"\r
+\r
+-----BEGIN PGP SIGNATURE-----\r
+Version: GnuPG v1\r
+\r
+iQIcBAEBCgAGBQJWCfCtAAoJEGACM7qeVNxhS7IP/28s0fs91BSkfOw8+0xMKP2q\r
+JSv4Ze/5bfe+52U4GwKOX53fRVCDAmGz4lIA88GciM0185p0j4jjG6K6u+WfTr9r\r
+cGMAWGGWFZM7UFjK6viVOTu0Y+XzVWxJFFO8nROr368eMQ7cZPNt9VgvNFFT51qa\r
+tulCjt0ImQ1yyLlKPpagv9YJ3UFgp3G9HTr08HvOutb5oSpNtIR9efBkq2M+u+p3\r
+SS9xmWwwCTY0OA6L6K0r5g3FazQrgdIXbldwf7EV64WdLBBcPJjleZGeAqhmHDwk\r
+UYZ6wc1u+2kcKOPafR8UwXSlAKMq8qLv6BcHPFoUaDFxnAvau1dS2w0FTLzmLS5J\r
+OZSBH5CV9Ucyt+X7OjnRCYbiH7Koa6Ov+Bv7GkoyznUiOU9m4YXBlVZSe3YsbhE0\r
+hKZPx/IuDKQXQsmzoE7FWtjjIqhaFAKH7YszO07tC29GCkw7C+VpYSLyZYpP49sc\r
+YMz4/YaQHVCIPLw+0YlHzFuTezrkryVAt2JuRUQgfffILXEZMdQ8dXsmVXU6Tk2S\r
+17ksXff2QJOuaoaYLEhmG3sGH7EHzrJL7LVaEJdnaoVxjqLC+awapJPqhd6OGMg4\r
+Nvu0848m43i3jjyuJfFAtcs23iDh8sxJbxnTCeG+Td0FrrbQpfcX7lDX/0OlRfIk\r
+u1Fzh0FeF5MOdIB70Yv/\r
+=ILAb\r
+-----END PGP SIGNATURE-----\r
+--=-=-=--\r