1 Return-Path: <gmn-notmuch@m.gmane.org>
\r
2 X-Original-To: notmuch@notmuchmail.org
\r
3 Delivered-To: notmuch@notmuchmail.org
\r
4 Received: from localhost (localhost [127.0.0.1])
\r
5 by olra.theworths.org (Postfix) with ESMTP id 38DAD431FBC
\r
6 for <notmuch@notmuchmail.org>; Fri, 4 Dec 2009 02:40:15 -0800 (PST)
\r
7 X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
\r
8 Received: from olra.theworths.org ([127.0.0.1])
\r
9 by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
\r
10 with ESMTP id BoCCw-cbJuA0 for <notmuch@notmuchmail.org>;
\r
11 Fri, 4 Dec 2009 02:40:14 -0800 (PST)
\r
12 Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
\r
13 by olra.theworths.org (Postfix) with ESMTP id 74557431FAE
\r
14 for <notmuch@notmuchmail.org>; Fri, 4 Dec 2009 02:40:14 -0800 (PST)
\r
15 Received: from list by lo.gmane.org with local (Exim 4.50) id 1NGVZz-0004DY-HQ
\r
16 for notmuch@notmuchmail.org; Fri, 04 Dec 2009 11:40:04 +0100
\r
17 Received: from ip-118-90-131-115.xdsl.xnet.co.nz ([118.90.131.115])
\r
18 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
\r
20 for <notmuch@notmuchmail.org>; Fri, 04 Dec 2009 11:40:03 +0100
\r
21 Received: from olly by ip-118-90-131-115.xdsl.xnet.co.nz with local (Gmexim
\r
22 0.1 (Debian)) id 1AlnuQ-0007hv-00
\r
23 for <notmuch@notmuchmail.org>; Fri, 04 Dec 2009 11:40:03 +0100
\r
24 X-Injected-Via-Gmane: http://gmane.org/
\r
25 To: notmuch@notmuchmail.org
\r
26 From: Olly Betts <olly@survex.com>
\r
27 Date: Fri, 4 Dec 2009 10:36:45 +0000 (UTC)
\r
29 Message-ID: <loom.20091204T113129-355@post.gmane.org>
\r
30 References: <1259840063-sup-1478@sam.mediasupervision.de>
\r
31 <871vjbh98x.fsf@yoom.home.cworth.org>
\r
32 <b8197bcb0912032314n1047a4d6q2b3214c6b16d07cc@mail.gmail.com>
\r
34 Content-Type: text/plain; charset=us-ascii
\r
35 Content-Transfer-Encoding: 7bit
\r
36 X-Complaints-To: usenet@ger.gmane.org
\r
37 X-Gmane-NNTP-Posting-Host: sea.gmane.org
\r
38 User-Agent: Loom/3.14 (http://gmane.org/)
\r
39 X-Loom-IP: 118.90.131.115 (Mozilla/5.0 (X11; U; Linux x86_64; en-GB;
\r
40 rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5.5)
\r
41 Sender: news <news@ger.gmane.org>
\r
42 Subject: Re: [notmuch] Notmuch's search view sucks
\r
43 X-BeenThere: notmuch@notmuchmail.org
\r
44 X-Mailman-Version: 2.1.12
\r
46 List-Id: "Use and development of the notmuch mail system."
\r
47 <notmuch.notmuchmail.org>
\r
48 List-Unsubscribe: <http://notmuchmail.org/mailman/options/notmuch>,
\r
49 <mailto:notmuch-request@notmuchmail.org?subject=unsubscribe>
\r
50 List-Archive: <http://notmuchmail.org/pipermail/notmuch>
\r
51 List-Post: <mailto:notmuch@notmuchmail.org>
\r
52 List-Help: <mailto:notmuch-request@notmuchmail.org?subject=help>
\r
53 List-Subscribe: <http://notmuchmail.org/mailman/listinfo/notmuch>,
\r
54 <mailto:notmuch-request@notmuchmail.org?subject=subscribe>
\r
55 X-List-Received-Date: Fri, 04 Dec 2009 10:40:15 -0000
\r
58 > On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
\r
59 > > And a step beyond that would support different languages for
\r
60 > > different emails, but that sounds like something "hard" to identify.
\r
62 > But probably not as hard as identifying spam. It could probably be
\r
63 > done with a simple Bayesian filter counting word frequencies---but
\r
64 > it'd be much better if somebody else had already solved the problem,
\r
65 > since this smells suspiciously like something that ought to be a
\r
66 > separate project and put in a library ... does anyone know if such a
\r
67 > project already exists?
\r
71 http://www.let.rug.nl/vannoord/TextCat/
\r
73 It looks at n-gram frequencies, and can guess pretty reliably from
\r
74 even a fairly small amount of text.
\r
76 TextCat is in Perl. I don't know if there's a C or C++ implementation
\r
77 but it isn't a huge piece of code - finding a good technique was the
\r