Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 70C7E431FBC for ; Wed, 9 Dec 2009 23:01:41 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id K3UpuR28SnEB for ; Wed, 9 Dec 2009 23:01:40 -0800 (PST) Received: from mout.perfora.net (mout.perfora.net [74.208.4.195]) by olra.theworths.org (Postfix) with ESMTP id 1FA5D431FAE for ; Wed, 9 Dec 2009 23:01:40 -0800 (PST) Received-SPF: pass (mxus2: domain of kanru.info designates 67.220.217.187 as permitted sender) client-ip=67.220.217.187; envelope-from=kanru@kanru.info; helo=cp20.secserverpros.com; Received: from cp20.secserverpros.com (cp20.secserverpros.com [67.220.217.187]) by mx.perfora.net (node=mxus2) with ESMTP (Nemesis) id 0MAwiS-1NAxXD0UVg-00A20H for notmuch@notmuchmail.org; Thu, 10 Dec 2009 02:01:39 -0500 Received: from 61-228-148-113.dynamic.hinet.net ([61.228.148.113] helo=kanru.info) by cp20.secserverpros.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1NId1s-0001y9-3n for notmuch@notmuchmail.org; Thu, 10 Dec 2009 07:01:37 +0000 Received: from kanru (uid 1000) (envelope-from kanru@kanru.info) id 2269 by kanru.info (DragonFly Mail Agent) Thu, 10 Dec 2009 15:00:45 +0800 From: Kan-Ru Chen To: notmuch Date: Thu, 10 Dec 2009 15:00:42 +0800 Message-ID: <87r5r3joth.fsf@anar.kanru.info> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" X-ACL-Warn: { X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cp20.secserverpros.com X-AntiAbuse: Original Domain - notmuchmail.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - kanru.info X-Source: X-Source-Args: X-Source-Dir: Subject: [notmuch] Patch for xapian defect #250 X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Dec 2009 07:01:41 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable The termlist is already sorted, so this is the patch trying to minimize the modification of database as suggested in the comment and Carl's TODO file. My poor profiling shows not much, but some improvement. *Before* =20 % time notmuch tag +test id:hfntnu+gotv@eGroups.com=20=20=20 MOD_PLISTS: 368 notmuch tag +test id:hfntnu+gotv@eGroups.com 0.05s user 0.03s system 11% c= pu 0.673 total % time notmuch tag -test id:hfntnu+gotv@eGroups.com=20=20=20 MOD_PLISTS: 368 notmuch tag -test id:hfntnu+gotv@eGroups.com 0.06s user 0.01s system 10% c= pu 0.681 total =20 *After* =20 % time notmuch tag +test id:hfntnu+gotv@eGroups.com=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 MOD_PLIST: 1 notmuch tag +test id:hfntnu+gotv@eGroups.com 0.01s user 0.02s system 6% cp= u 0.436 total % time notmuch tag -test id:hfntnu+gotv@eGroups.com=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 MOD_PLIST: 1 notmuch tag -test id:hfntnu+gotv@eGroups.com 0.01s user 0.01s system 5% cp= u 0.383 total % time notmuch tag +test tag:notmuch notmuch tag +test tag:notmuch 1.71s user 0.03s system 65% cpu 2.632 total % time notmuch tag -test tag:notmuch notmuch tag -test tag:notmuch 1.61s user 0.02s system 73% cpu 2.204 total % notmuch count tag:notmuch 682 =2D-- flint_database.cc 2009-12-08 13:34:24.790284881 +0800 +++ flint_database.cc 2009-12-10 14:22:14.493653956 +0800 @@ -1188,7 +1188,7 @@ =20 termlist.next(); while (!termlist.at_end()) { =2D string tname =3D termlist.get_termname(); + string tname =3D termlist.get_termname(); position_table.delete_positionlist(did, tname); termcount wdf =3D termlist.get_wdf(); =20 @@ -1278,20 +1278,50 @@ } =20=20=20 if (!modifying || document.internal->terms_modified()) { =2D // FIXME - in the case where there is overlap between the new =2D // termlist and the old termlist, it would be better to compare the =2D // two lists, and make the minimum set of modifications required. =2D // This would lead to smaller changesets for replication, and =2D // probably be faster overall. =2D =2D // First, add entries to remove the postings in the underlying reco= rd. Xapian::Internal::RefCntPtr ptrtothis(th= is); FlintTermList termlist(ptrtothis, did); + Xapian::TermIterator term =3D document.termlist_begin(); + Xapian::TermIterator term_end =3D document.termlist_end(); + flint_doclen_t new_doclen =3D termlist.get_doclength(); + string old_tname, new_tname; +=20=20=20=20=20=20=20=20=20=20=20=20 + total_length -=3D new_doclen; +=20=20=20=20=20=20=20=20=20=20=20=20 + termlist.next(); + while (true) { + bool identical =3D false; + int cmp; + if (termlist.at_end() && term =3D=3D term_end) + break; + if (!termlist.at_end() && term !=3D term_end) { + old_tname =3D termlist.get_termname(); + new_tname =3D *term; + cmp =3D old_tname.compare(new_tname); + + // Check postlist to see whether they are identical + if (cmp =3D=3D 0) { + int new_count =3D term.positionlist_count(); + int old_count =3D termlist.positionlist_count(); + if (old_count =3D=3D new_count) { + PositionIterator it =3D term.positionlist_begin(); + PositionIterator it_end =3D term.positionlist_end(); + PositionIterator old =3D termlist.positionlist_begin(); + if (equal(it, it_end, old)) + identical =3D true; + } + } + } else if (termlist.at_end()) { + cmp =3D 2; + new_tname =3D *term; + } else { + cmp =3D -2; + old_tname =3D termlist.get_termname(); + } =20 =2D termlist.next(); =2D while (!termlist.at_end()) { =2D string tname =3D termlist.get_termname(); + if (cmp < 0) { + const string& tname =3D old_tname; termcount wdf =3D termlist.get_wdf(); + new_doclen -=3D wdf; =20 map >::iterator i; i =3D freq_deltas.find(tname); @@ -1318,58 +1348,62 @@ // Modifying a document we added/modified since the last flush. k->second =3D make_pair('D', 0u); } =2D =2D termlist.next(); =2D } =2D =2D total_length -=3D termlist.get_doclength(); =2D =2D flint_doclen_t new_doclen =3D 0; =2D Xapian::TermIterator term =3D document.termlist_begin(); =2D Xapian::TermIterator term_end =3D document.termlist_end(); =2D for ( ; term !=3D term_end; ++term) { =2D // Calculate the new document length + } else if (!identical) { + const string& tname =3D new_tname; termcount wdf =3D term.get_wdf(); =2D new_doclen +=3D wdf; =2D =2D string tname =3D *term; =2D if (tname.size() > MAX_SAFE_TERM_LENGTH) =2D throw Xapian::InvalidArgumentError("Term too long (> "STRINGIZE(MA= X_SAFE_TERM_LENGTH)"): " + tname); =2D map >::iterator i; =2D i =3D freq_deltas.find(tname); =2D if (i =3D=3D freq_deltas.end()) { =2D freq_deltas.insert(make_pair(tname, make_pair(1, termcount_diff(wd= f)))); =2D } else { =2D ++i->second.first; =2D i->second.second +=3D wdf; =2D } + new_doclen +=3D wdf; +=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 + if (cmp > 0) { + if (tname.size() > MAX_SAFE_TERM_LENGTH) + throw Xapian::InvalidArgumentError("Term too long (> "= STRINGIZE(MAX_SAFE_TERM_LENGTH)"): " + tname); + map >::iter= ator i; + i =3D freq_deltas.find(tname); + if (i =3D=3D freq_deltas.end()) { + freq_deltas.insert(make_pair(tname, make_pair(1, termc= ount_diff(wdf)))); + } else { + ++i->second.first; + i->second.second +=3D wdf; + } + + // Add did to tname's postlist + map > >::iterat= or j; + j =3D mod_plists.find(tname); + if (j =3D=3D mod_plists.end()) { + map > m; + j =3D mod_plists.insert(make_pair(tname, m)).first; + } + map >::iterator k; + k =3D j->second.find(did); + if (k !=3D j->second.end()) { + Assert(k->second.first =3D=3D 'D'); + k->second.first =3D 'M'; + k->second.second =3D wdf; + } else { + j->second.insert(make_pair(did, make_pair('A', wdf))); + } + } + + PositionIterator it =3D term.positionlist_begin(); + PositionIterator it_end =3D term.positionlist_end(); + if (it !=3D it_end) { + position_table.set_positionlist( + did, tname, it, it_end); + } else { + position_table.delete_positionlist(did, tname); + } + } + if (termlist.at_end()) + ++term; + else if (term =3D=3D term_end) + termlist.next(); + else { + if (cmp >=3D 0) + ++term; + if (cmp <=3D 0) + termlist.next(); + } + } =20 =2D // Add did to tname's postlist =2D map > >::iterator j; =2D j =3D mod_plists.find(tname); =2D if (j =3D=3D mod_plists.end()) { =2D map > m; =2D j =3D mod_plists.insert(make_pair(tname, m)).first; =2D } =2D map >::iterator k; =2D k =3D j->second.find(did); =2D if (k !=3D j->second.end()) { =2D Assert(k->second.first =3D=3D 'D'); =2D k->second.first =3D 'M'; =2D k->second.second =3D wdf; =2D } else { =2D j->second.insert(make_pair(did, make_pair('A', wdf))); =2D } =2D =2D PositionIterator it =3D term.positionlist_begin(); =2D PositionIterator it_end =3D term.positionlist_end(); =2D if (it !=3D it_end) { =2D position_table.set_positionlist( =2D did, tname, it, it_end); =2D } else { =2D position_table.delete_positionlist(did, tname); =2D } =2D } LOGLINE(DB, "Calculated doclen for replacement document " << did << "= as " << new_doclen); =20 // Set the termlist =2D-=20 Kan-Ru Chen | http://kanru.info Q: Why are my replies five sentences or less? A: http://five.sentenc.es/ --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.13 (GNU/Linux) iQIcBAEBCAAGBQJLIJydAAoJEBsTLgHOxq1G07kP/0CkSSK7vOdEM/vComrNqMGJ wh8XcfZrKqsl+irunqSUKG4g1EDHRfWMheeJYggSzyZvZB4uVCEJkUnHMtIqVb9b olzCOgqqYQL0ImbX06RktX6lF8U3vCkt+xthX2poyn6/wxnwQoBh1uj52mbIaLXo BIyFBK7rNyazD/5+i/OXIB4wUjUXT6xTWt7J9DVK4CPqhS7mxHvdzV8ZAcw8lAOp w46Nr/OqjoBQbHcIz0rw3ZTxl7VCMxXT/NDam6fR2bNgJR+/klbEDAhwwzWw2yMp meWZAzrRGm3tCfD5tirjOAf+CxD/3wY85ThfC1RZt62u0WEv/HzFVFONmGp09lb2 sgaELQOO5yTOpaL6A/bfTmv1bn49elCgcuc1EbgToU5rcjXg1D+5foWQQP5hj/hU xY4mEEa2AfCRCNDEKkSmdi+7v7QBjJudP52twyi2mdZW0M3xdCrX3twKtS7taDC3 FUZ30tceNxf9w26jlQRRXV/jtfi826JR9Yx4E4X25jAABmBDAGq5XHnPU9y9HRzF ykwTWM9DH+yjsTffyFLb+Ce204ihNKBA6ldyDkIYPYDc9YtLow+THCdbguQm2fCU JAkykYkYBpI1sZvCavIq31CCM4uxD29F6SKjm3yXG2Zlfyu2M0aq6swZMGF0WvVM 6n1Y9+6DgjsdI32dJ8h8 =jjvD -----END PGP SIGNATURE----- --=-=-=--