From: Austin Clements Date: Thu, 24 Oct 2013 21:08:37 +0000 (+2000) Subject: Re: [PATCH] new: Don't scan unchanged directories with no sub-directories X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=eed5f3edfd1f82e2a1dc3df44ec59ddf0c2283f7;p=notmuch-archives.git Re: [PATCH] new: Don't scan unchanged directories with no sub-directories --- diff --git a/29/64936897ab583a429b05e9c22fd69f93a9ee01 b/29/64936897ab583a429b05e9c22fd69f93a9ee01 new file mode 100644 index 000000000..c51cbf017 --- /dev/null +++ b/29/64936897ab583a429b05e9c22fd69f93a9ee01 @@ -0,0 +1,127 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 9F89D431FBC + for ; Thu, 24 Oct 2013 14:08:49 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: -0.7 +X-Spam-Level: +X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 + tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id HacakvWkBfXv for ; + Thu, 24 Oct 2013 14:08:43 -0700 (PDT) +Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu + [18.7.68.37]) + by olra.theworths.org (Postfix) with ESMTP id 51C7B431FB6 + for ; Thu, 24 Oct 2013 14:08:43 -0700 (PDT) +X-AuditID: 12074425-b7f1c8e0000009c7-2f-52698c5a4118 +Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) + by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP + id 86.BB.02503.A5C89625; Thu, 24 Oct 2013 17:08:42 -0400 (EDT) +Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) + by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id r9OL8f3q027338; + Thu, 24 Oct 2013 17:08:41 -0400 +Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) + (authenticated bits=0) + (User authenticated as amdragon@ATHENA.MIT.EDU) + by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r9OL8dFW030896 + (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); + Thu, 24 Oct 2013 17:08:41 -0400 +Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) + (envelope-from ) + id 1VZS90-0005w3-Vb; Thu, 24 Oct 2013 17:08:39 -0400 +Date: Thu, 24 Oct 2013 17:08:37 -0400 +From: Austin Clements +To: notmuch@notmuchmail.org +Subject: Re: [PATCH] new: Don't scan unchanged directories with no + sub-directories +Message-ID: <20131024210837.GH20337@mit.edu> +References: <1382646822-24556-1-git-send-email-amdragon@mit.edu> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +In-Reply-To: <1382646822-24556-1-git-send-email-amdragon@mit.edu> +User-Agent: Mutt/1.5.21 (2010-09-15) +X-Brightmail-Tracker: + H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IRYrdT0Y3qyQwyaN1jZHH95kxmi47bu9kc + mDyerbrF7PHx6S2WAKYoLpuU1JzMstQifbsEroxHXSuYCx4KV0zb9JqtgbGXv4uRk0NCwETi + 8vzr7BC2mMSFe+vZuhi5OIQE9jFKLF3wAMrZyCixa+MvKOc0k8Sm43+YQVqEBJYwSrzqkQax + WQRUJZonfwIbxSagIbFt/3JGEFtEQFpi593ZrF2MHBzMArISr38pgISFBUIk9m86CVbCK6Aj + cWDaaiaIkQ4S/X+vsEPEBSVOznzCAmIzC2hJ3Pj3kglijLTE8n8cIGFOAUeJJ0fXg5WICqhI + TDm5jW0Co9AsJN2zkHTPQuhewMi8ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdCLzezRC81pXQT + IzioXVR3ME44pHSIUYCDUYmHt+FTepAQa2JZcWXuIUZJDiYlUd6E9swgIb6k/JTKjMTijPii + 0pzU4kOMEhzMSiK80/SAcrwpiZVVqUX5MClpDhYlcd5bHPZBQgLpiSWp2ampBalFMFkZDg4l + CV6hbqBGwaLU9NSKtMycEoQ0EwcnyHAeoOGPukCGFxck5hZnpkPkTzEqSonz6oI0C4AkMkrz + 4HphSecVozjQK8K8hSBVPMCEBdf9CmgwE9DgKUvSQAaXJCKkpBoYrRsX7Mgvnvc2V8fXcf+8 + U5LiSiYLPn4UrQ9OurpZ6LLLqojij7IXEzf4ef46mLLyoQ/zi3v/WtfIrK/REqvzlfPQjPVc + 6DfVSsm2tvxLIbthndGZ4xIblq168TWR5bnfKdYpbxv6DBWUrMrFPng0mW7ady7o4W+LA+aJ + n55/1Zjitr5pbqmwEktxRqKhFnNRcSIAHdCTPxUDAAA= +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Thu, 24 Oct 2013 21:08:49 -0000 + +There might be a problem with this patch. Directory entries that are +*symlinks* to other directories do not increase the containing +directory's link count, but we do count them as directories in +add_files pass 1 and traverse in to them. Hence, if you had a +directory that contained no sub-directories, but did contain symlinks +to other directories, we would fail to notice changes in the symlinked +directories. + +We could check if the database thinks there are sub-directories and +only bail early if the directory is unchanged and *both* the file +system and the database think there are no sub-directories. + +Quoth myself on Oct 24 at 4:33 pm: +> This can substantially reduce the cost of notmuch new in some +> situations, such as when the file system cache is cold or when the +> Maildir is on NFS. +> --- +> notmuch-new.c | 20 ++++++++++++++++++++ +> 1 file changed, 20 insertions(+) +> +> diff --git a/notmuch-new.c b/notmuch-new.c +> index faa33f1..364c73a 100644 +> --- a/notmuch-new.c +> +++ b/notmuch-new.c +> @@ -323,6 +323,26 @@ add_files (notmuch_database_t *notmuch, +> } +> db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0; +> +> + /* If the directory is unchanged from our last scan and has no +> + * sub-directories, then return without scanning it at all. In +> + * some situations, skipping the scan can substantially reduce the +> + * cost of notmuch new, especially since the huge numbers of files +> + * in Maildirs make scans expensive, but all files live in leaf +> + * directories. +> + * +> + * To check for sub-directories, we borrow a trick from find, +> + * kpathsea, and many other UNIX tools: since a directory's link +> + * count is the number of sub-directories (specifically, their +> + * '..' entries) plus 2 (the link from the parent and the link for +> + * '.'). This check is safe even on weird file systems, since +> + * file systems that can't compute this will return 0 or 1. This +> + * is safe even on *really* weird file systems like HFS+ that +> + * mistakenly return the total number of directory entries, since +> + * that only inflates the count beyond 2. +> + */ +> + if (directory && fs_mtime == db_mtime && st.st_nlink == 2) +> + goto DONE; +> + +> /* If the database knows about this directory, then we sort based +> * on strcmp to match the database sorting. Otherwise, we can do +> * inode-based sorting for faster filesystem operation. */