Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id EFD9F431FAF for ; Mon, 15 Oct 2012 03:58:44 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OtvRu9295bwU for ; Mon, 15 Oct 2012 03:58:43 -0700 (PDT) Received: from mail.cryptobitch.de (cryptobitch.de [88.198.7.68]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id B80E1431FAE for ; Mon, 15 Oct 2012 03:58:42 -0700 (PDT) Received: from mail.jade-hamburg.de (mail.jade-hamburg.de [85.183.11.228]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cryptobitch.de (Postfix) with ESMTPSA id 522395AE1DB for ; Mon, 15 Oct 2012 12:58:38 +0200 (CEST) Received: by mail.jade-hamburg.de (Postfix, from userid 401) id 9B568DF2A4; Mon, 15 Oct 2012 12:58:37 +0200 (CEST) Received: from thinkbox.jade-hamburg.de (mail.jade-hamburg.de [85.183.11.228]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: teythoon) by mail.jade-hamburg.de (Postfix) with ESMTPSA id DDD50DF2A1; Mon, 15 Oct 2012 12:58:32 +0200 (CEST) Received: from teythoon by thinkbox.jade-hamburg.de with local (Exim 4.80) (envelope-from ) id 1TNiNT-0001d2-79; Mon, 15 Oct 2012 12:58:31 +0200 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Justus Winter <4winter@informatik.uni-hamburg.de> User-Agent: alot/0.3.3+ To: Patrick Totzke , Suvayu Ali , notmuch@notmuchmail.org References: <20120924082646.GA10577@kuru.dyndns-at-home.com> <20120925104457.12264.30350@megatron> <20121008093429.GC4534@kuru.dyndns-at-home.com> <20121013165851.29671.29869@brick.lan> In-Reply-To: <20121013165851.29671.29869@brick.lan> Message-ID: <20121015105830.12412.43278@thinkbox.jade-hamburg.de> Subject: Re: nbook: a notmuch based address book written in python Date: Mon, 15 Oct 2012 12:58:30 +0200 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 10:58:45 -0000 Hi Suvayu :) welcome to notmuch and python. Quoting Patrick Totzke (2012-10-13 18:58:51) > > > And If I look for my own name, this takes over a minute, > > > eventually dying. This could be an issue with libnotmuch though. > > > Possibly, your algorithm takes very long and then reads from an initi= ally > > > opened Database object again, which was invalidated by concurrent wri= tes of other processes.. Hm no, see below. > > > ------------------------------- > > > [~] time nbook Patrick = > > > = > > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/133068= 2270_0.12958.megatron,U=3D8766,FMD5=3D66ff6a8bc18a8a3ac4b311daa93d358a:2,S:= Too many open files > > > Traceback (most recent call last): > > > File "/home/pazz/bin/nbook", line 167, in > > > File "/home/pazz/bin/nbook", line 71, in __init__ > > > File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message= .py", line 233, in get_header > > > notmuch.errors.NullPointerError > > > Error in sys.excepthook: > > > Traceback (most recent call last): > > > File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line= 66, in apport_excepthook > > > ImportError: No module named fileutils > > > = > > > Original exception was: > > > Traceback (most recent call last): > > > File "/home/pazz/bin/nbook", line 167, in > > > File "/home/pazz/bin/nbook", line 71, in __init__ > > > File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message= .py", line 233, in get_header > > > notmuch.errors.NullPointerError > > > nbook Patrick 3.20s user 5.47s system 12% cpu 1:11.65 total > > > ------------------------------------ > > > = > > = > > Yes someone else pointed this out too. Again I'm not sure how to > > proceed here. I had a quick look at this last week and it seemed to me > > the limitation comes from within the python bindings for notmuch. Do > > you have any ideas? > = > As mentioned before, I think you invalidate the Database object concurren= tly > while your long-running algorithm goes through all messages. > Xapian doesn't handle concurrent access to the index like a normal=E2=84= =A2 database would. > This means you are notified by this error that some changes were detected. > Maybe the error message should be more telling here though. Teythoon? The reason for this error is exactly what the error message says, you are opening to many files. Check out this limit using ulimit -n: % ulimit -n 4096 This problem is subtle. Here is a minimal test case: ~~~ snip ~~~ import notmuch with notmuch.Database() as db: query =3D notmuch.Query(db, 'a').search_messages() for msg in query: msg.get_header('from') with notmuch.Database() as db: query =3D notmuch.Query(db, 'a').search_messages() for msg in list(query): msg.get_header('from') ~~~ snap ~~~ % python test.py Error opening /home/teythoon/Maildir/.lists.notmuch/cur/1323251462.M53044P1= 8514.thinkbox,S=3D7306,W=3D7466:2,: Too many open files Traceback (most recent call last): File "test.py", line 11, in msg.get_header('from') File "/home/teythoon/.local/lib/python2.7/site-packages/notmuch/message.p= y", line 237, in get_header raise NullPointerError() notmuch.errors.NullPointerError Observe that it blows up in line 11, the first version works. The only difference is that the second version creates a list from the notmuch query. This prevents the garbage collector from collecting the message objects and thus closing the file handles. So here's your fix: ~~~ snip ~~~ diff --git a/nbook b/nbook index 387c71d..b3d4fd6 100755 --- a/nbook +++ b/nbook @@ -173,7 +173,7 @@ class AddressHeaders(object): # Search db =3D Database() query =3D Query(db, 'from:"{0}" or to:"{0}"'.format(querystr)) -msgs =3D list(query.search_messages()) +msgs =3D query.search_messages() = addresses =3D AddressHeaders(msgs, querystr) print addresses ~~~ snap ~~~ A few more comments: > from notmuch import * Please avoid * imports, they prevent tools like pyflakes from checking whether you accidentally misspelled any identifiers. > pyversion =3D float('%d.%d' % (sys.version_info.major, sys.version_info.m= inor)) > if pyversion < 2.7: Converting this to float feels wrong. Consider doing sth like if sys.version_info.major > 2 or (sys.version_info.major =3D=3D 2 and sys.v= ersion_info.minor >=3D 7): > print '`nbook\' needs Python 2.7 or higher for argparse' Note that in py3k print is a function and not a statement, so you need to use braces. Consider dropping this at the beginning of all your python files to make py2.7 use the new features: from __future__ import print_function, absolute_import, unicode_literals > exit(-1) exit is not a builtin function. You have to use sys.exit. Tools like pyflakes can spot this kind of mistakes. Also, sys.exit also accepts a string as argument which it prints to stderr before exiting with an error code. > self.__fromhdr__ +=3D ',' + msg.get_header('from') Hm, this is somewhat unpythonic. It used to be the case that building strings this way was a lot slower than building a list and then joining it on a delimiter of your choice (i.e. ','.join(from_headers)). This is (was?) because strings are immutable in python and constantly creating strings just to throw them away in the next iteration puts a lot of pressure on the memory management system. Somewhat recent discussion here: http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-strin= g-concatenation-method-in-python > def print_addrs(self, fmtstr=3D'', query=3D''): > if '' =3D=3D fmtstr: fmtstr =3D '%s %s\n' Ok, several things here: * The comparison looks weird, you are using the string constant as the first operand. While this is technically not wrong, it is somewhat unpythonic b/c if you read it out loud (''if the empty string is equal to fmtstr'') it somewhat bends the 1:1 mapping of the semantic of your program and the English sentence. It looks like this c hack that is actually unnecessary in python b/c you cannot use the assignment operator as a value (except for a=3Db=3Dc=3D0 style assignments). * Please don't put multiple statements in one line. * This can be written shorter and more idiomatic (yay keyword arguments): def print_addrs(self, fmtstr=3D'%s %s\n', query=3D''): [...] Happy hacking :) Justus