1 Return-Path: <prvs=00390df1c=jrosenthal@jhu.edu>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id DF2DD429E20
\r
6 for <notmuch@notmuchmail.org>; Sat, 29 Jan 2011 12:09:16 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
11 X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5
\r
12 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled
\r
13 Received: from olra.theworths.org ([127.0.0.1])
\r
14 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
15 with ESMTP id hy4DOvYqfPGp for <notmuch@notmuchmail.org>;
\r
16 Sat, 29 Jan 2011 12:09:16 -0800 (PST)
\r
17 Received: from ipex4.johnshopkins.edu (ipex4.johnshopkins.edu
\r
18 [128.220.161.141]) (using TLSv1 with cipher RC4-SHA (128/128 bits))
\r
19 (No client certificate requested)
\r
20 by olra.theworths.org (Postfix) with ESMTPS id 48690431FB6
\r
21 for <notmuch@notmuchmail.org>; Sat, 29 Jan 2011 12:09:16 -0800 (PST)
\r
22 X-IronPort-Anti-Spam-Filtered: true
\r
23 X-IronPort-Anti-Spam-Result: ApsEAKsCRE0KoSAO/2dsb2JhbAClbLITiGiFTgSFE4cO
\r
24 X-IronPort-AV: E=Sophos;i="4.60,397,1291611600"; d="scan'208";a="43728106"
\r
25 Received: from watt.hwcampus.jhu.edu ([10.161.32.14])
\r
26 by ipex4.johnshopkins.edu with ESMTP/TLS/ADH-AES256-SHA;
\r
27 29 Jan 2011 15:09:15 -0500
\r
28 Received: by watt.hwcampus.jhu.edu (Postfix, from userid 502)
\r
29 id 12A6F791C73; Sat, 29 Jan 2011 15:09:14 -0500 (EST)
\r
30 From: Jesse Rosenthal <jrosenthal@jhu.edu>
\r
31 To: Sebastian Spaeth <Sebastian@SSpaeth.de>, notmuch@notmuchmail.org
\r
32 Subject: Re: A tool for printing from notmuch
\r
33 In-Reply-To: <87r5bvmqgy.fsf@SSpaeth.de>
\r
34 References: <m162t87p3b.fsf@watt.hwcampus.jhu.edu> <87r5bvmqgy.fsf@SSpaeth.de>
\r
35 User-Agent: Notmuch/0.5-56-g74cb76a (http://notmuchmail.org) Emacs/23.2.1
\r
36 (x86_64-apple-darwin)
\r
37 Date: Sat, 29 Jan 2011 15:09:14 -0500
\r
38 Message-ID: <m1r5bvmpzp.fsf@watt.hwcampus.jhu.edu>
\r
40 Content-Type: text/plain; charset=us-ascii
\r
41 X-BeenThere: notmuch@notmuchmail.org
\r
42 X-Mailman-Version: 2.1.13
\r
44 List-Id: "Use and development of the notmuch mail system."
\r
45 <notmuch.notmuchmail.org>
\r
46 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
47 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
48 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
49 List-Post: <mailto:notmuch@notmuchmail.org>
\r
50 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
51 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
52 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
53 X-List-Received-Date: Sat, 29 Jan 2011 20:09:17 -0000
\r
57 On Sat, 29 Jan 2011 20:58:53 +0100, Sebastian Spaeth <Sebastian@SSpaeth.de> wrote:
\r
58 > I prefer to not have dependencies outside the std lib in python, but for
\r
59 > xml/html parsing, there is really nothing appropriate, it seems.
\r
61 I agree. And I'll admit I mainly chose BeautifulSoup out of
\r
62 familiarity. But you really can't count on email html being well-formed
\r
63 -- just vaguely renderable. And you certainly can't count on it being
\r
64 xhtml. So the built-in parsers wouldn't be of much help. And, in fact,
\r
65 if someone pastes a Word doc into Outlook, then the MS-specific tags and
\r
66 styles will even choke libtidy.
\r
68 So BS is the best I could find for this job (putting a title into the
\r
69 header and a table into the top of the body or html that might or might
\r
70 not even have a header or a body tag). And it's always available in
\r
71 Debian/Arch/Fedora/ports/MacPorts.
\r
73 The alternative, since we're trying leaving the email's html alone, is
\r
74 to do our business with splits and regexes. But that seems like a bad
\r