Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 48833431FC0 for ; Fri, 14 Dec 2012 22:18:34 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h0XoAy-JHxnv for ; Fri, 14 Dec 2012 22:18:32 -0800 (PST) Received: from dmz-mailsec-scanner-8.mit.edu (DMZ-MAILSEC-SCANNER-8.MIT.EDU [18.7.68.37]) by olra.theworths.org (Postfix) with ESMTP id 0F277431FB6 for ; Fri, 14 Dec 2012 22:18:31 -0800 (PST) X-AuditID: 12074425-b7f606d0000008ea-f7-50cc16374b8f Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id BE.4F.02282.7361CC05; Sat, 15 Dec 2012 01:18:31 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id qBF6I93i025608; Sat, 15 Dec 2012 01:18:09 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qBF6I7oG016603 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 15 Dec 2012 01:18:08 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1Tjl4Y-0007OZ-Qz; Sat, 15 Dec 2012 01:18:06 -0500 Date: Sat, 15 Dec 2012 01:18:06 -0500 From: Austin Clements To: "Jason A. Donenfeld" Subject: Re: notmuch python bindings corrupt db index (was: gmail importer script) Message-ID: <20121215061806.GF6187@mit.edu> References: <20121211182638.27237.98903@brick.lan> <20121212204922.GB6187@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuphleLIzCtJLcpLzFFi42IR4hRV1jUXOxNg8OOYisWDq1IW12/OZLa4 +EXWgdlj56y77B7PVt1i9pj89ylzAHMUl01Kak5mWWqRvl0CV0bn5pMsBT08FdtXbGBsYLzL 2cXIySEhYCKxcX8zO4QtJnHh3nq2LkYuDiGBfYwShy7/ZQRJCAlsYJR4dsMXInGRSaJzfTsL hLOEUeJ01zUWkCoWAVWJFf/PM4PYbAIaEtv2LwfrFhHQkvh88z0TiM0s4Cox+e8jsHphgRCJ l90tYHFeAW2J5e9BzgAZupJJouXYBaiEoMTJmU9YIJq1JG78ewkU5wCypSWW/+MACXMKBEr8 ndrNBmKLCqhITDm5jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdC LzezRC81pXQTIyjQ2V1UdzBOOKR0iFGAg1GJh3dHxOkAIdbEsuLK3EOMkhxMSqK8ufxnAoT4 kvJTKjMSizPii0pzUosPMUpwMCuJ8IptASrnTUmsrEotyodJSXOwKInz3ki56S8kkJ5Ykpqd mlqQWgSTleHgUJLglRIFGipYlJqeWpGWmVOCkGbi4AQZzgM0vB2khre4IDG3ODMdIn+KUVFK nDcdJCEAksgozYPrhSWiV4ziQK8I874SAariASYxuO5XQIOZgAbHXToOMrgkESEl1cAoNf10 lX8v06nlN7rM2bWzn2a6u8cumPSrQWaj7IdGyXMtV9kT0zLmvfpu2bRbMl6jIPRcq9TrxLio W3vmTvxS7duxiLvRx/5k0d4KrsxjQmFHHpW9nlAhn1mepjWXb2K72n5OhapJDvb9Ri75x7eL cO6wt93JUVnQM2Fu6/akR9xfGb/U31RiKc5INNRiLipOBABWxVmtHwMAAA== Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Dec 2012 06:18:34 -0000 Quoth Jason A. Donenfeld on Dec 13 at 3:32 pm: > On Wed, Dec 12, 2012 at 9:49 PM, Austin Clements wrote: > > There should be no way to corrupt the database at this level through > > the Xapian API, which means nothing libnotmuch can do (much less users > > of libnotmuch) should be able to corrupt the database. If you can > > reproduce the problem, it's probably a serious bug in Xapian, but it > > could also have been a file system bug or even random file system > > corruption. > > Well that's... troubling. > > Patrick: could you please backup and try to reproduce? Otherwise I'll > assume this was a one-off situation. > > > Austin-- think you could do a quick review of the script to double > check and confirm I'm not doing anything nefarious? > http://git.zx2c4.com/gmail-notmuch/tree/gmail-notmuch.py In theory the only way you could cause corruption besides tickling a bug would be to access the same database object concurrently from different threads (since it's not thread-safe), but you don't appear to be doing that. I did spot something that could corrupt delivered email, though. The way you deliver to the Maildir is resilient to process termination, but not to system failures such as power outages. In particular, you need to at least os.fsync before the os.link. I'd recommend looking at Python's mailbox module, which has a robust Maildir delivery implementation (though it appears it doesn't let you control the file name, so you probably can't use it directly).