Return-Path: X-Original-To: notmuch@notmuchmail.org Delivered-To: notmuch@notmuchmail.org Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 394C3431FC3 for ; Sun, 16 Dec 2012 11:11:30 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Rju8Y7CW83jj for ; Sun, 16 Dec 2012 11:11:29 -0800 (PST) Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU [18.7.68.35]) by olra.theworths.org (Postfix) with ESMTP id 0A360431FAE for ; Sun, 16 Dec 2012 11:11:28 -0800 (PST) X-AuditID: 12074423-b7fcb6d000000927-40-50ce1cdf359b Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id 87.6D.02343.FDC1EC05; Sun, 16 Dec 2012 14:11:27 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id qBGJBQZb000674; Sun, 16 Dec 2012 14:11:26 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id qBGJBMxX024218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 16 Dec 2012 14:11:23 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1TkJcQ-0001oQ-0X; Sun, 16 Dec 2012 14:11:22 -0500 Date: Sun, 16 Dec 2012 14:11:21 -0500 From: Austin Clements To: david@tethera.net Subject: Re: [PATCH 3/4] perf-test: initial version of memory test infrastructure. Message-ID: <20121216191121.GH6187@mit.edu> References: <1355682208-13729-1-git-send-email-david@tethera.net> <1355682208-13729-4-git-send-email-david@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1355682208-13729-4-git-send-email-david@tethera.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprJKsWRmVeSWpSXmKPExsUixG6nrntf5lyAQfcVBYuNy34yW9xo7Wa0 uH5zJrMDs8evtrnMHs9W3WL22HLoPXMAcxSXTUpqTmZZapG+XQJXxue1L1kLJgVXzDz8gqWB sdupi5GTQ0LAROL6vF9MELaYxIV769m6GLk4hAT2MUpsvrqHGcLZwCgxYd0SZpAqIYGLTBKd S1MgEksYJZ7eWswGkmARUJV4eGMpO4jNJqAhsW3/ckYQW0RAVOL0pkmsIDazgL3EotmTwGqE BUIk5ndcBKvhFdCW6J+8iRFiQaVE5/dTbBBxQYmTM5+wQPRqSdz49xLoVA4gW1pi+T8OkDCn gJNE9+UNYONFBVQkppzcxjaBUWgWku5ZSLpnIXQvYGRexSibklulm5uYmVOcmqxbnJyYl5da pGuml5tZopeaUrqJERzqLso7GP8cVDrEKMDBqMTDa/H1TIAQa2JZcWXuIUZJDiYlUd4rUucC hPiS8lMqMxKLM+KLSnNSiw8xSnAwK4nwXhUHyvGmJFZWpRblw6SkOViUxHmvpdz0FxJITyxJ zU5NLUgtgsnKcHAoSfC+kgZqFCxKTU+tSMvMKUFIM3FwggznARp+DqSGt7ggMbc4Mx0if4pR l2PW2TtPGYVY8vLzUqXEed+CFAmAFGWU5sHNgaWoV4ziQG8J8z4GqeIBpje4Sa+AljABLVlu cwZkSUkiQkqqgXHDA4G1t5Pn69/4tzzXZsKvmoXrfa5u7TxhqsLbx7gvpmvpw87urIdnOnIq npkuUAtaVyinuMPZu+bM/WfnN7RpTBYJOH2pfe/WD6z39txhKT387N3t7Rcu2WzPNxWp/fHe L89YbPN0reakrGyDr/O2N+/mTRB9diJH71DqBbF1vd7MD3TuK99UYinOSDTUYi4qTgQAiOE/ hywDAAA= Cc: notmuch@notmuchmail.org, David Bremner X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2012 19:11:30 -0000 Quoth david@tethera.net on Dec 16 at 2:23 pm: > From: David Bremner > > The idea is run some code under valgrind --leak-check=full and report > a summary, leaving the user to peruse the log file if they want. > > We go to some lengths to preserve the log files from accidental > overwriting; the full corpus takes about 3 hours to run under valgrind > on my machine. > > The naming of the log directories is probably overkill; I find it nice > to have them sequenced by time. Arguably the mktemp is then overkill, > but I know people will be nervous if it looks like timestamps are > being used for uniqueness. > > One new test is included, to check notmuch new for memory leaks. > --- > performance-test/.gitignore | 1 + > performance-test/M00-new | 14 +++++++++ > performance-test/Makefile.local | 17 +++++++++-- > performance-test/README | 57 +++++++++++++++++++++++++------------ > performance-test/perf-test-lib.sh | 55 ++++++++++++++++++++++++++++------- > 5 files changed, 113 insertions(+), 31 deletions(-) > create mode 100755 performance-test/M00-new > > diff --git a/performance-test/.gitignore b/performance-test/.gitignore > index 6421a9a..f3f9be4 100644 > --- a/performance-test/.gitignore > +++ b/performance-test/.gitignore > @@ -1,3 +1,4 @@ > tmp.*/ > +log.*/ > corpus/ > notmuch.cache.*/ > diff --git a/performance-test/M00-new b/performance-test/M00-new > new file mode 100755 > index 0000000..733e9b0 > --- /dev/null > +++ b/performance-test/M00-new > @@ -0,0 +1,14 @@ > +#!/bin/bash > + > +test_description='notmuch new' > + > +. ./perf-test-lib.sh > + > +# ensure initial 'notmuch new' is run by memory_start > +uncache_database > + > +memory_start > + > +memory_run "notmuch new" "notmuch new" This will run notmuch new twice. Is that intentional? If so, it might be worth a comment. > + > +memory_done > diff --git a/performance-test/Makefile.local b/performance-test/Makefile.local > index 57beb44..357d800 100644 > --- a/performance-test/Makefile.local > +++ b/performance-test/Makefile.local > @@ -4,14 +4,25 @@ dir := performance-test > > include $(dir)/version.sh > > +# these two are just make sure dir is expanded at the right time. > +TIME_TEST_SCRIPT := ${dir}/notmuch-time-test > +MEMORY_TEST_SCRIPT := ${dir}/notmuch-memory-test This is obviously fine, but I don't understand the comment. There are all sorts of places where we have to := assign to get ${dir} right (including TXZFILE file below). Is there something different here? > + > CORPUS_NAME := notmuch-email-corpus-$(PERFTEST_VERSION).tar.xz > TXZFILE := ${dir}/download/${CORPUS_NAME} > SIGFILE := ${TXZFILE}.asc > -TEST_SCRIPT := ${dir}/notmuch-perf-test > DEFAULT_URL := http://notmuchmail.org/releases/${CORPUS_NAME} > > +perf-test: time-test memory-test > + > time-test: setup-perf-test all > - $(TEST_SCRIPT) $(OPTIONS) > + @echo > + $(TIME_TEST_SCRIPT) $(TEST_OPTIONS) Why not use the same OPTIONS variable name as the correctness tests? > + > +memory-test: setup-perf-test all > + @echo > + $(MEMORY_TEST_SCRIPT) $(TEST_OPTIONS) > + > > .PHONY: download-corpus setup-perf-test > > @@ -29,4 +40,4 @@ $(TXZFILE): > download-corpus: > wget -O ${TXZFILE} ${DEFAULT_URL} > > -CLEAN := $(CLEAN) $(dir)/tmp.* $(dir)/corpus $(dir)/notmuch.cache.* > +CLEAN := $(CLEAN) $(dir)/tmp.* $(dir)/log.* $(dir)/corpus $(dir)/notmuch.cache.* > diff --git a/performance-test/README b/performance-test/README > index d1fb6de..7eaf5f7 100644 > --- a/performance-test/README > +++ b/performance-test/README > @@ -1,3 +1,10 @@ > +Performance Tests > +----------------- > + > +This directory contains two kinds of performance tests, time tests, s/performance tests,/performance tests:/ > +and memory tests. The former use gnu time, and the latter use > +valgrind. > + > Pre-requisites > -------------- > > @@ -5,9 +12,10 @@ In addition to having notmuch, you need: > > - gpg > - gnu tar > -- gnu time > +- gnu time (for the time tests). No period? > - xz. Some speedup can be gotten by installing "pixz", but this is > probably only worthwhile if you are debugging the tests. > +- valgrind (for the memory tests) > > Getting set up to run tests: > ---------------------------- > @@ -36,34 +44,47 @@ for a list of mirrors. > Running tests > ------------- > > -The easiest way to run performance tests is to say "make time-test", (or > -simply run the notmuch-time-test script). Either command will run all > -available performance tests. > - > -Alternately, you can run a specific subset of tests by simply invoking > -one of the executable scripts in this directory, (such as ./basic). > -Each test script supports the following arguments > +The easiest way to run performance tests is to say "make perf-test". > +This will run all time and memory tests. Be aware that the memory > +tests are quite time consuming when run on the full corpus, and that > +depending on your interests it may be more sensible to run "make > +time-test" or "make memory-test". You can also invoke one of the > +scripts notmuch-time-test or notmuch-memory-test or run a more > +specific subset of tests by simply invoking one of the executable > +scripts in this directory, (such as ./T00-new). Each test script > +supports the following arguments > > --small / --medium / --large Choose corpus size. > --debug Enable debugging. In particular don't delete > temporary directories. > > +When using the make targets, you can pass arguments to all test > +scripts by defining the make variable TEST_OPTIONS. > + > Writing tests > ------------- > > -Have a look at "T01-dump-restore" for an example. Sourcing > -"perf-test-lib.sh" is mandatory. Utility functions include > +Have a look at "T01-dump-restore" for an example time test and and s/and and/and/ > +"M00-new" for an example memory tests. In both cases sourcing > +"perf-test-lib.sh" is mandatory. > > -- 'add_email_corpus' unpacks a set of messages and adds them to the database. > -- 'cache_database': makes a snapshot of the current database > -- 'uncache_database': forces the next 'add_email_corpus' to rebuild the > - database. > -- 'time_start' unpacks the mail corpus and calls notmuch new if it > +Basics: > + > +- '(time|memory)_start' unpacks the mail corpus and calls notmuch new if it > cannot find a cache of the appropriate corpus. > -- 'time_done' does the cleanup; comment it out or pass --debug to the > +- '(time|memory)_run' runs the command under time or valgrind. Currently > + "memory_run" does not support i/o redirection in the command. > +- '(time|memory)_done' does the cleanup; comment it out or pass --debug to the > script to leave the temporary files around. > > +Utility functions include > + > +- 'add_email_corpus' unpacks a set of messages and tags > +- 'cache_database': makes a snapshot of the current database > +- 'uncache_database': forces the next '(time|memory)_start' to rebuild the > + database. > + > Scripts are run in the order specified in notmuch-perf-test. In the > future this order might be chosen automatically so please follow the > -convention of starting the name with 'T' followed by two digits to > -specify the order. > +convention of starting the name with 'T' or 'M' followed by two digits > +to specify the order. > diff --git a/performance-test/perf-test-lib.sh b/performance-test/perf-test-lib.sh > index 3a7ef2b..79eb2c5 100644 > --- a/performance-test/perf-test-lib.sh > +++ b/performance-test/perf-test-lib.sh > @@ -92,20 +92,52 @@ add_email_corpus () > > } > > -time_start () { > - > - add_email_corpus > - > - print_header > +notmuch_new_with_cache () { > Out of curiosity, why the blank lines at the beginning of functions? > if [ -d $DB_CACHE_DIR ]; then > cp -r $DB_CACHE_DIR ${MAIL_DIR}/.notmuch > else > - time_run 'Initial notmuch new' "notmuch new" > + "$1" 'Initial notmuch new' "notmuch new" > cache_database > fi > } > > +time_start () { > + > + add_email_corpus > + > + print_header > + > + notmuch_new_with_cache time_run > +} > + > +memory_start () { > + > + add_email_corpus > + > + _timestamp=$(printf "%x" $(date +"%s")) You could use 'local' here. > + log_dir=$(mktemp -d "${TEST_DIRECTORY}/log.$(basename $0)-$corpus_size-${_timestamp}-XXXXXX") This really does seem like overkill. I'd say just use timestamps for uniqueness. These tests take a long time, and if you do manage to run the same test twice in less than a second, does it really matter that you have to run it again if you want the first log? How about using ISO 8601 timestamps? They're both compact and more user-friendly than hexadecimal epoch time. local timestamp=$(date +%Y%m%dT%H%M%S) log_dir="${TEST_DIRECTORY}/log.$(basename $0)-$corpus_size-${timestamp}" > + > + notmuch_new_with_cache memory_run > +} > + > +memory_run () { > + test_count=$(($test_count+1)) > + > + log_file=$log_dir/$test_count.log > + > + printf "[ %d ]\t%s\n" $test_count "$1" > + > + valgrind --leak-check=full --log-file="$log_file" $2 > + > + awk '/LEAK SUMMARY/,/suppressed/ { sub(/^==[0-9]*==/," "); print }' "$log_file" > + echo > +} > + > +memory_done () { > + time_done > +} > + > cache_database () { > if [ -d $MAIL_DIR/.notmuch ]; then > cp -r $MAIL_DIR/.notmuch $DB_CACHE_DIR > @@ -119,12 +151,13 @@ uncache_database () { > } > > print_header () { > - printf "[v%4s %6s] Wall(s)\tUsr(s)\tSys(s)\tRes(K)\tIn/Out(512B)\n" \ > - ${PERFTEST_VERSION} ${corpus_size} > + printf "\t\t\tWall(s)\tUsr(s)\tSys(s)\tRes(K)\tIn/Out(512B)\n" > + > } > > time_run () { > printf " %-22s" "$1" > + test_count=$(($test_count+1)) > if test "$verbose" != "t"; then exec 4>test.output 3>&4; fi > if ! eval >&3 "/usr/bin/time -f '%e\t%U\t%S\t%M\t%I/%O' $2" ; then > test_failure=$(($test_failure + 1)) > @@ -144,6 +177,8 @@ time_done () { > > cd -P "$test" || error "Cannot setup test environment" > test_failure=0 > +test_count=0 > > -echo > -echo $(basename "$0"): "Testing ${test_description:-notmuch performance}" > +printf "\n%-55s [%s %s]\n" \ > + "$(basename "$0"): Testing ${test_description:-notmuch performance}" \ > + "${PERFTEST_VERSION}" "${corpus_size}"