Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id DF2DD429E20 for ; Sat, 29 Jan 2011 12:09:16 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hy4DOvYqfPGp for ; Sat, 29 Jan 2011 12:09:16 -0800 (PST) Received: from ipex4.johnshopkins.edu (ipex4.johnshopkins.edu [128.220.161.141]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 48690431FB6 for ; Sat, 29 Jan 2011 12:09:16 -0800 (PST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEAKsCRE0KoSAO/2dsb2JhbAClbLITiGiFTgSFE4cO X-IronPort-AV: E=Sophos;i="4.60,397,1291611600"; d="scan'208";a="43728106" Received: from watt.hwcampus.jhu.edu ([10.161.32.14]) by ipex4.johnshopkins.edu with ESMTP/TLS/ADH-AES256-SHA; 29 Jan 2011 15:09:15 -0500 Received: by watt.hwcampus.jhu.edu (Postfix, from userid 502) id 12A6F791C73; Sat, 29 Jan 2011 15:09:14 -0500 (EST) From: Jesse Rosenthal To: Sebastian Spaeth , notmuch@notmuchmail.org Subject: Re: A tool for printing from notmuch In-Reply-To: <87r5bvmqgy.fsf@SSpaeth.de> References: <87r5bvmqgy.fsf@SSpaeth.de> User-Agent: Notmuch/0.5-56-g74cb76a (http://notmuchmail.org) Emacs/23.2.1 (x86_64-apple-darwin) Date: Sat, 29 Jan 2011 15:09:14 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jan 2011 20:09:17 -0000 Hi Sebastian, On Sat, 29 Jan 2011 20:58:53 +0100, Sebastian Spaeth wrote: > I prefer to not have dependencies outside the std lib in python, but for > xml/html parsing, there is really nothing appropriate, it seems. I agree. And I'll admit I mainly chose BeautifulSoup out of familiarity. But you really can't count on email html being well-formed -- just vaguely renderable. And you certainly can't count on it being xhtml. So the built-in parsers wouldn't be of much help. And, in fact, if someone pastes a Word doc into Outlook, then the MS-specific tags and styles will even choke libtidy. So BS is the best I could find for this job (putting a title into the header and a table into the top of the body or html that might or might not even have a header or a body tag). And it's always available in Debian/Arch/Fedora/ports/MacPorts. The alternative, since we're trying leaving the email's html alone, is to do our business with splits and regexes. But that seems like a bad road to head down. Best, Jesse