From: Tomi Ollila Date: Fri, 25 Oct 2013 11:46:21 +0000 (+0300) Subject: Re: [PATCH v2] new: Don't scan unchanged directories with no sub-directories X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=f557331e3164349ac4db7b48f9ff9ef2b0fb6a52;p=notmuch-archives.git Re: [PATCH v2] new: Don't scan unchanged directories with no sub-directories --- diff --git a/d0/82d2c2c80e8237b8adf04727d65cf13a69d4ad b/d0/82d2c2c80e8237b8adf04727d65cf13a69d4ad new file mode 100644 index 000000000..722d38564 --- /dev/null +++ b/d0/82d2c2c80e8237b8adf04727d65cf13a69d4ad @@ -0,0 +1,118 @@ +Return-Path: +X-Original-To: notmuch@notmuchmail.org +Delivered-To: notmuch@notmuchmail.org +Received: from localhost (localhost [127.0.0.1]) + by olra.theworths.org (Postfix) with ESMTP id 61830431FC2 + for ; Fri, 25 Oct 2013 04:46:39 -0700 (PDT) +X-Virus-Scanned: Debian amavisd-new at olra.theworths.org +X-Spam-Flag: NO +X-Spam-Score: 0 +X-Spam-Level: +X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] + autolearn=disabled +Received: from olra.theworths.org ([127.0.0.1]) + by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id u38zQyKnXA4L for ; + Fri, 25 Oct 2013 04:46:31 -0700 (PDT) +Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) + by olra.theworths.org (Postfix) with ESMTP id CF22C431FB6 + for ; Fri, 25 Oct 2013 04:46:30 -0700 (PDT) +Received: from guru.guru-group.fi (localhost [IPv6:::1]) + by guru.guru-group.fi (Postfix) with ESMTP id 364EB100217; + Fri, 25 Oct 2013 14:46:21 +0300 (EEST) +From: Tomi Ollila +To: Austin Clements , notmuch@notmuchmail.org +Subject: Re: [PATCH v2] new: Don't scan unchanged directories with no + sub-directories +In-Reply-To: <1382650739-12438-1-git-send-email-amdragon@mit.edu> +References: <20131024210837.GH20337@mit.edu> + <1382650739-12438-1-git-send-email-amdragon@mit.edu> +User-Agent: Notmuch/0.16+115~g11c2ff5 (http://notmuchmail.org) Emacs/24.3.1 + (x86_64-unknown-linux-gnu) +X-Face: HhBM'cA~ +MIME-Version: 1.0 +Content-Type: text/plain +X-BeenThere: notmuch@notmuchmail.org +X-Mailman-Version: 2.1.13 +Precedence: list +List-Id: "Use and development of the notmuch mail system." + +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +X-List-Received-Date: Fri, 25 Oct 2013 11:46:39 -0000 + +On Fri, Oct 25 2013, Austin Clements wrote: + +> This can substantially reduce the cost of notmuch new in some +> situations, such as when the file system cache is cold or when the +> Maildir is on NFS. +> --- + +LGTM. The creation and destruction of child directories happens +only if there are symlinks to directories in otherwise leaf directories. + +Tomi + +> +> This should fix the problem with directories containing symlinks to +> other directories, but no actual sub-directories. +> +> notmuch-new.c | 29 +++++++++++++++++++++++++++++ +> 1 file changed, 29 insertions(+) +> +> diff --git a/notmuch-new.c b/notmuch-new.c +> index faa33f1..ba05cb4 100644 +> --- a/notmuch-new.c +> +++ b/notmuch-new.c +> @@ -323,6 +323,35 @@ add_files (notmuch_database_t *notmuch, +> } +> db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0; +> +> + /* If the directory is unchanged from our last scan and has no +> + * sub-directories, then return without scanning it at all. In +> + * some situations, skipping the scan can substantially reduce the +> + * cost of notmuch new, especially since the huge numbers of files +> + * in Maildirs make scans expensive, but all files live in leaf +> + * directories. +> + * +> + * To check for sub-directories, we borrow a trick from find, +> + * kpathsea, and many other UNIX tools: since a directory's link +> + * count is the number of sub-directories (specifically, their +> + * '..' entries) plus 2 (the link from the parent and the link for +> + * '.'). This check is safe even on weird file systems, since +> + * file systems that can't compute this will return 0 or 1. This +> + * is safe even on *really* weird file systems like HFS+ that +> + * mistakenly return the total number of directory entries, since +> + * that only inflates the count beyond 2. +> + */ +> + if (directory && fs_mtime == db_mtime && st.st_nlink == 2) { +> + /* There's one catch: pass 1 below considers symlinks to +> + * directories to be directories, but these don't increase the +> + * file system link count. So, only bail early if the +> + * database agrees that there are no sub-directories. */ +> + db_subdirs = notmuch_directory_get_child_directories (directory); +> + if (!notmuch_filenames_valid (db_subdirs)) +> + goto DONE; +> + notmuch_filenames_destroy (db_subdirs); +> + db_subdirs = NULL; +> + } +> + +> /* If the database knows about this directory, then we sort based +> * on strcmp to match the database sorting. Otherwise, we can do +> * inode-based sorting for faster filesystem operation. */ +> -- +> 1.8.4.rc3 +> +> _______________________________________________ +> notmuch mailing list +> notmuch@notmuchmail.org +> http://notmuchmail.org/mailman/listinfo/notmuch