From: Karthik Ram Date: Fri, 13 Sep 2013 07:59:05 +0000 (+0100) Subject: Updated testing material X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=a252c1ce2f461a0a655561716659f36d8613835c;p=swc-testing-nose.git Updated testing material W. Trevor King: I dropped the binary testing/Testing.ppt from the original c7cdb5c. Conflicts: testing/Testing.ppt --- diff --git a/testing/README.md b/testing/README.md old mode 100755 new mode 100644 index 91e15b7..80ddef3 --- a/testing/README.md +++ b/testing/README.md @@ -1,10 +1,364 @@ -# Testing - crib sheet +# Testing -## Detecting errors +* * * * * -What we know about software development - code reviews work. Fagan (1976) discovered that a rigorous inspection can remove 60-90% of errors before the first test is run. M.E., Fagan (1976). [Design and Code inspections to reduce errors in program development](http://www.mfagan.com/pdfs/ibmfagan.pdf). IBM Systems Journal 15 (3): pp. 182-211. +**Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony +Scopatz** -What we know about software development - code reviews should be about 60 minutes long. Cohen (2006) discovered that all the value of a code review comes within the first hour, after which reviewers can become exhausted and the issues they find become ever more trivial. J. Cohen (2006). [Best Kept Secrets of Peer Code Review](http://smartbear.com/SmartBear/media/pdfs/best-kept-secrets-of-peer-code-review.pdf). SmartBear, 2006. ISBN-10: 1599160676. ISBN-13: 978-1599160672. +![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg) +# What is testing? + +Software testing is a process by which one or more expected behaviors +and results from a piece of software are exercised and confirmed. Well +chosen tests will confirm expected code behavior for the extreme +boundaries of the input domains, output ranges, parametric combinations, +and other behavioral **edge cases**. + +# Why test software? + +Unless you write flawless, bug-free, perfectly accurate, fully precise, +and predictable code **every time**, you must test your code in order to +trust it enough to answer in the affirmative to at least a few of the +following questions: + +- Does your code work? +- **Always?** +- Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html)) +- Does it continue to work after changes are made? +- Does it continue to work after system configurations or libraries + are upgraded? +- Does it respond properly for a full range of input parameters? +- What about **edge or corner cases**? +- What's the limit on that input parameter? +- How will it affect your + [publications](http://www.nature.com/news/2010/101013/full/467775a.html)? + +## Verification + +*Verification* is the process of asking, "Have we built the software +correctly?" That is, is the code bug free, precise, accurate, and +repeatable? + +## Validation + +*Validation* is the process of asking, "Have we built the right +software?" That is, is the code designed in such a way as to produce the +answers we are interested in, data we want, etc. + +## Uncertainty Quantification + +*Uncertainty Quantification* is the process of asking, "Given that our +algorithm may not be deterministic, was our execution within acceptable +error bounds?" This is particularly important for anything which uses +random numbers, eg Monte Carlo methods. + +# Where are tests? + +Say we have an averaging function: + +```python +def mean(numlist): + total = sum(numlist) + length = len(numlist) + return total/length +``` + +Tests could be implemented as runtime **exceptions in the function**: + +```python +def mean(numlist): + try: + total = sum(numlist) + length = len(numlist) + except TypeError: + raise TypeError("The number list was not a list of numbers.") + except: + print "There was a problem evaluating the number list." + return total/length +``` + +Sometimes tests they are functions alongside the function definitions +they are testing. + +```python +def mean(numlist): + try: + total = sum(numlist) + length = len(numlist) + except TypeError: + raise TypeError("The number list was not a list of numbers.") + except: + print "There was a problem evaluating the number list." + return total/length + + +def test_mean(): + assert mean([0, 0, 0, 0]) == 0 + assert mean([0, 200]) == 100 + assert mean([0, -200]) == -100 + assert mean([0]) == 0 + + +def test_floating_mean(): + assert mean([1, 2]) == 1.5 +``` + +Sometimes they are in an executable independent of the main executable. + +```python +def mean(numlist): + try: + total = sum(numlist) + length = len(numlist) + except TypeError: + raise TypeError("The number list was not a list of numbers.") + except: + print "There was a problem evaluating the number list." + return total/length +``` + +Where, in a different file exists a test module: + +```python +import mean + +def test_mean(): + assert mean([0, 0, 0, 0]) == 0 + assert mean([0, 200]) == 100 + assert mean([0, -200]) == -100 + assert mean([0]) == 0 + + +def test_floating_mean(): + assert mean([1, 2]) == 1.5 +``` + +# When should we test? + +The three right answers are: + +- **ALWAYS!** +- **EARLY!** +- **OFTEN!** + +The longer answer is that testing either before or after your software +is written will improve your code, but testing after your program is +used for something important is too late. + +If we have a robust set of tests, we can run them before adding +something new and after adding something new. If the tests give the same +results (as appropriate), we can have some assurance that we didn't +wreak anything. The same idea applies to making changes in your system +configuration, updating support codes, etc. + +Another important feature of testing is that it helps you remember what +all the parts of your code do. If you are working on a large project +over three years and you end up with 200 classes, it may be hard to +remember what the widget class does in detail. If you have a test that +checks all of the widget's functionality, you can look at the test to +remember what it's supposed to do. + +# Who should test? + +In a collaborative coding environment, where many developers contribute +to the same code base, developers should be responsible individually for +testing the functions they create and collectively for testing the code +as a whole. + +Professionals often test their code, and take pride in test coverage, +the percent of their functions that they feel confident are +comprehensively tested. + +# How are tests written? + +The type of tests that are written is determined by the testing +framework you adopt. Don't worry, there are a lot of choices. + +## Types of Tests + +**Exceptions:** Exceptions can be thought of as type of runtime test. +They alert the user to exceptional behavior in the code. Often, +exceptions are related to functions that depend on input that is unknown +at compile time. Checks that occur within the code to handle exceptional +behavior that results from this type of input are called Exceptions. + +**Unit Tests:** Unit tests are a type of test which test the fundamental +units of a program's functionality. Often, this is on the class or +function level of detail. However what defines a *code unit* is not +formally defined. + +To test functions and classes, the interfaces (API) - rather than the +implementation - should be tested. Treating the implementation as a +black box, we can probe the expected behavior with boundary cases for +the inputs. + +**System Tests:** System level tests are intended to test the code as a +whole. As opposed to unit tests, system tests ask for the behavior as a +whole. This sort of testing involves comparison with other validated +codes, analytical solutions, etc. + +**Regression Tests:** A regression test ensures that new code does +change anything. If you change the default answer, for example, or add a +new question, you'll need to make sure that missing entries are still +found and fixed. + +**Integration Tests:** Integration tests query the ability of the code +to integrate well with the system configuration and third party +libraries and modules. This type of test is essential for codes that +depend on libraries which might be updated independently of your code or +when your code might be used by a number of users who may have various +versions of libraries. + +**Test Suites:** Putting a series of unit tests into a collection of +modules creates, a test suite. Typically the suite as a whole is +executed (rather than each test individually) when verifying that the +code base still functions after changes have been made. + +# Elements of a Test + +**Behavior:** The behavior you want to test. For example, you might want +to test the fun() function. + +**Expected Result:** This might be a single number, a range of numbers, +a new fully defined object, a system state, an exception, etc. When we +run the fun() function, we expect to generate some fun. If we don't +generate any fun, the fun() function should fail its test. +Alternatively, if it does create some fun, the fun() function should +pass this test. The the expected result should known *a priori*. For +numerical functions, this is result is ideally analytically determined +even if the function being tested isn't. + +**Assertions:** Require that some conditional be true. If the +conditional is false, the test fails. + +**Fixtures:** Sometimes you have to do some legwork to create the +objects that are necessary to run one or many tests. These objects are +called fixtures as they are not really part of the test themselves but +rather involve getting the computer into the appropriate state. + +For example, since fun varies a lot between people, the fun() function +is a method of the Person class. In order to check the fun function, +then, we need to create an appropriate Person object on which to run +fun(). + +**Setup and teardown:** Creating fixtures is often done in a call to a +setup function. Deleting them and other cleanup is done in a teardown +function. + +**The Big Picture:** Putting all this together, the testing algorithm is +often: + +```python +setup() +test() +teardown() +``` + +But, sometimes it's the case that your tests change the fixtures. If so, +it's better for the setup() and teardown() functions to occur on either +side of each test. In that case, the testing algorithm should be: + +```python +setup() +test1() +teardown() + +setup() +test2() +teardown() + +setup() +test3() +teardown() +``` + +* * * * * + +# Nose: A Python Testing Framework + +The testing framework we'll discuss today is called nose. However, there +are several other testing frameworks available in most language. Most +notably there is [JUnit](http://www.junit.org/) in Java which can +arguably attributed to inventing the testing framework. + +## Where do nose tests live? + +Nose tests are files that begin with `Test-`, `Test_`, `test-`, or +`test_`. Specifically, these satisfy the testMatch regular expression +`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them +in the unittest.TestCase subclasses chat you create in your code. You +can also create test functions which are not unittest.TestCase +subclasses if they are named with the configured testMatch regular +expression.) + +## Nose Test Syntax + +To write a nose test, we make assertions. + +```python +assert should_be_true() +assert not should_not_be_true() +``` + +Additionally, nose itself defines number of assert functions which can +be used to test more specific aspects of the code base. + +```python +from nose.tools import * + +assert_equal(a, b) +assert_almost_equal(a, b) +assert_true(a) +assert_false(a) +assert_raises(exception, func, *args, **kwargs) +assert_is_instance(a, b) +# and many more! +``` + +Moreover, numpy offers similar testing functions for arrays: + +```python +from numpy.testing import * + +assert_array_equal(a, b) +assert_array_almost_equal(a, b) +# etc. +``` + +## Exercise: Writing tests for mean() + +There are a few tests for the mean() function that we listed in this +lesson. What are some tests that should fail? Add at least three test +cases to this set. Edit the `test_mean.py` file which tests the mean() +function in `mean.py`. + +*Hint:* Think about what form your input could take and what you should +do to handle it. Also, think about the type of the elements in the list. +What should be done if you pass a list of integers? What if you pass a +list of strings? + +**Example**: + + nosetests test_mean.py + +# Test Driven Development + +Test driven development (TDD) is a philosophy whereby the developer +creates code by **writing the tests first**. That is to say you write the +tests *before* writing the associated code! + +This is an iterative process whereby you write a test then write the +minimum amount code to make the test pass. If a new feature is needed, +another test is written and the code is expanded to meet this new use +case. This continues until the code does what is needed. + +TDD operates on the YAGNI principle (You Ain't Gonna Need It). People +who diligently follow TDD swear by its effectiveness. This development +style was put forth most strongly by [Kent Beck in +2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530). + +--- +# Testing morse.py ## Runtime tests @@ -93,179 +447,176 @@ Run tests. $ python test_morse.py -Modularise functions. - - def test_encode_sos(self): - ... - def test_decode_sos(self): - ... - - test_translator.test_encode_sos() - test_translator.test_decode_sos() - -Remove duplicated code: - - def __init__(self): - self.translator = MorseTranslator() - -Test function: - -* Set up inputs and expected outputs. -* Runs function / component on inputs to get actual outputs. -* Checks actual outputs match expected outputs. - -Verbose, but equivalent, version of `test_encode_sos`. - - def test_encode_sos(self): - expected = "... --- ..." - actual = self.translator.encode("SOS") - assert expected == actual - -## `nose` - a Python test framework - -[nose](https://pypi.python.org/pypi/nose/) automatically finds, runs and reports on tests. - -[xUnit test framework](http://en.wikipedia.org/wiki/XUnit). - -`test_` file and function prefix, `Test` class prefix. - - $ nosetests test_morse.py - -`.` denotes successful tests. - -Remove `__main__`. - - $ nosetests test_morse.py -xUnit test report, standard format, convert to HTML, present online. - $ nosetests --with-xunit test_morse.py - $ cat nosetests.xml -## Exercise: propose some more tests -Consider: +--- -* What haven't we tested for so far? -* Have we covered all possible strings? -* Have we covered all possible arguments? +## Additional test driven development example +*Please try this on your own time* -Examples. +Say you want to write a fib() function which generates values of the +Fibonacci sequence of given indexes. You would - of course - start by +writing the test, possibly testing a single value: - encode("sos") - encode("") - decode("") - encode("1 + 2 = 3") - decode("...---...") +```python +from nose.tools import assert_equal -## Exercise: implement examples +from pisa import fib -Tests for illegal arguments. +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) +``` - def test_encode_illegal(self): - try: - self.translator.encode("1 + 2 = 3") - assert False - except KeyError: - assert True +You would *then* go ahead and write the actual function: -Alternative. +```python +def fib(n): + # you snarky so-and-so + return 1 +``` - from nose.tools import assert_raises +And that is it right?! Well, not quite. This implementation fails for +most other values. Adding tests we see that: - def test_encode_illegal(self): - assert_raises(KeyError, self.translator.encode, "1 + 2 = 3") +```python +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) -Testing components together: - assert "sos" == decode(encode("sos")) - assert "... --- ..." == encode(decode("... --- ...")) +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) -## Testing in practice + obs = fib(1) + exp = 1 + assert_equal(obs, exp) +``` -Legacy code of 10000s of lines, with many input and output files, +This extra test now requires that we bother to implement at least the +initial values: -* Run code on set of input files. -* Save output files. -* Refactor code e.g. to optimise it or parallelise it. -* Run code on set of input files. -* Check that outputs match saved outputs. +```python +def fib(n): + # a little better + if n == 0 or n == 1: + return n + return 1 +``` -EPCC and the Colon Cancer Genetics Group (CCGG) of MRC Human Genetics Unit at the Western General Hospital, Edinburgh - [Oncology](http://www.edikt.org/edikt2/OncologyActivity) project to optimise and parallelise FORTRAN genetics code. +However, this function still falls over for `2 < n`. Time for more +tests! -Continuous integration server e.g. [Jenkins](http://jenkins-ci.org/) - detect commit to version control, build, run tests, publish. +```python +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) -[Muon Ion Cooling Experiment](http://www.mice.iit.edu/) (MICE) - Bazaar version control, Python tests, Jenkins, [published online](https://micewww.pp.rl.ac.uk/tab/show/maus). -[Apache Hadoop Common Jenkins dashboard](https://builds.apache.org/job/Hadoop-Common-trunk/) +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) -## When 1 + 1 = 2.0000001 + obs = fib(1) + exp = 1 + assert_equal(obs, exp) -Computers don't do floating point arithmetic too well. - $ python - >>> expected = 0 - >>> actual = 0.1 + 0.1 + 0.1 - 0.3 - >>> assert expected == actual - >>> print actual +def test_fib3(): + obs = fib(3) + exp = 2 + assert_equal(obs, exp) -Compare to within a threshold, or delta e.g. expected == actual if expected - actual < 0.0000000000000001. + obs = fib(6) + exp = 8 + assert_equal(obs, exp) +``` -Thresholds are application-specific. +At this point, we had better go ahead and try do the right thing... -Python [decimal](http://docs.python.org/2/library/decimal.html), floating-point arithmetic functions. +```python +def fib(n): + # finally, some math + if n == 0 or n == 1: + return n + else: + return fib(n - 1) + fib(n - 2) +``` - $ python - >>> from nose.tools import assert_almost_equal - >>> assert_almost_equal(expected, actual, 0) - >>> assert_almost_equal(expected, actual, 10) - >>> assert_almost_equal(expected, actual, 15) - >>> assert_almost_equal(expected, actual, 16) +Here it becomes very tempting to take an extended coffee break or +possibly a power lunch. But then you remember those pesky negative +numbers and floats. Perhaps the right thing to do here is to just be +undefined. -`nose.testing` uses absolute tolerance: abs(x, y) <= delta +```python +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) -[Numpy](http://www.numpy.org/)'s `numpy.testing` uses relative tolerance: abs(x, y) <= delta * (max(abs(x), abs(y)). -`assert_allclose(actual_array, expected_array, relative_tolerance, absolute_tolerance)` +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) -## When should we test? + obs = fib(1) + exp = 1 + assert_equal(obs, exp) -* Always! -* Early, and not wait till after we've used it to generate data for our important paper, or given it to someone else to use. -* Often, so that we know that any changes we've made to our code, or to things that our code needs (e.g. libraries, configuration files etc.) haven't introduced any bugs. -How much is enough? +def test_fib3(): + obs = fib(3) + exp = 2 + assert_equal(obs, exp) -What we know about software development - we can't test everything. It is nearly impossible to test software at the level of 100 percent of its logic paths", fact 32 in R. L. Glass (2002). + obs = fib(6) + exp = 8 + assert_equal(obs, exp) -No excuse for testing nothing! Learn by experience, like writing a paper. -Review tests, like code, to avoid +def test_fib3(): + obs = fib(13.37) + exp = NotImplemented + assert_equal(obs, exp) -* Pass when they should fail, false positives. -* Fail when they should pass, false negatives. -* Don't test anything. + obs = fib(-9000) + exp = NotImplemented + assert_equal(obs, exp) +``` -Example. +This means that it is time to add the appropriate case to the function +itself: - def test_critical_correctness(): - # TODO - will complete this tomorrow! - pass +```python +def fib(n): + # sequence and you shall find + if n < 0 or int(n) != n: + return NotImplemented + elif n == 0 or n == 1: + return n + else: + return fib(n - 1) + fib(n - 2) +``` -## Summary +# Quality Assurance Exercise -Testing +Can you think of other tests to make for the fibonacci function? I promise there +are at least two. -* Saves time. -* Gives confidence that code does what we want and expect it to. -* Promotes trust that code, and so research, is correct. +Implement one new test in test_fib.py, run nosetests, and if it fails, implement +a more robust function for that case. -Remember [Geoffrey Chang](http://en.wikipedia.org/wiki/Geoffrey_Chang) +And thus - finally - we have a robust function together with working +tests! -Bruce Eckel, [Thinking in Java, 3rd Edition](http://www.mindview.net/Books/TIJ/), "If it's not tested, it's broken". -## Links -* [Software Carpentry](http://software-carpentry.org/)'s online [testing](http://software-carpentry.org/4_0/test/index.html) lectures. -* A discussion on [is it worthwhile to write unit tests for scientific research codes?](http://scicomp.stackexchange.com/questions/206/is-it-worthwhile-to-write-unit-tests-for-scientific-research-codes) -* G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. Huff, I. M. Mitchell, M. Plumbley, B. Waugh, E. P. White, P. Wilson (2012) "[Best Practices for Scientific Computing](http://arxiv.org/abs/1210.0530)", arXiv:1210.0530 [cs.MS]. diff --git a/testing/additional_notes.md b/testing/additional_notes.md new file mode 100755 index 0000000..91e15b7 --- /dev/null +++ b/testing/additional_notes.md @@ -0,0 +1,271 @@ +# Testing - crib sheet + +## Detecting errors + +What we know about software development - code reviews work. Fagan (1976) discovered that a rigorous inspection can remove 60-90% of errors before the first test is run. M.E., Fagan (1976). [Design and Code inspections to reduce errors in program development](http://www.mfagan.com/pdfs/ibmfagan.pdf). IBM Systems Journal 15 (3): pp. 182-211. + +What we know about software development - code reviews should be about 60 minutes long. Cohen (2006) discovered that all the value of a code review comes within the first hour, after which reviewers can become exhausted and the issues they find become ever more trivial. J. Cohen (2006). [Best Kept Secrets of Peer Code Review](http://smartbear.com/SmartBear/media/pdfs/best-kept-secrets-of-peer-code-review.pdf). SmartBear, 2006. ISBN-10: 1599160676. ISBN-13: 978-1599160672. + +## Runtime tests + +[morse.py](python/morse/morse.py) + + $ python morse.py + encode + 1 + 2 = 3 + +`KeyError` is an exception. + +Traceback shows Python's exception stack trace. + +Runtime tests can make code robust and behave gracefully. + + try: + print "Encoded is '%s'" % translator.encode(message) + except KeyError: + print "The input should be a string of a-z, A-Z, 0-9 or space" + +Exception is caught by the `except` block. + +Exception can be converted and passed e.g. if this was deep within a function we would not want to print but to keep UI separate. + +Can `raise` an exception e.g. + + except KeyError: + raise ValueError("The input should be a string of a-z, A-Z, 0-9 or space") + +## Exercise: add runtime test for decode + +## Correctness tests + +Testing manually works but is time-consuming and error prone - might forget to run a test. + +Write down set of test steps so won't forget. + +Still time-consuming. + + def test(self): + print "sos is ", self.encode("sos") + print "... --- ... is ", self.decode("... --- ...") + print "OK" + +Extend UI. + + while True: + + elif line == "test": + print "Testing..." + translator.test() + break + +Automate checking. + + def test(self): + assert "... --- ..." == self.encode("sos") + assert "sos" == self.decode("... --- ...") + print "OK" + +`assert` checks whether condition is true and, if not, raises an exception. + +Put test functions in separate file for modularity. + + $ cp morse.py test_morse.py + $ nano test_morse.py + + from morse import MorseTranslator + + class TestMorseTranslator: + + def test(self): + translator = MorseTranslator() + assert "... --- ..." == translator.encode("SOS") + assert "sos" == translator.decode("... --- ...") + print "OK" + + if __name__ == "__main__": + + test_translator = TestMorseTranslator() + test_translator.test() + +Remove test code from `MorseTranslator`. + +Run tests. + + $ python test_morse.py + +Modularise functions. + + def test_encode_sos(self): + ... + def test_decode_sos(self): + ... + + test_translator.test_encode_sos() + test_translator.test_decode_sos() + +Remove duplicated code: + + def __init__(self): + self.translator = MorseTranslator() + +Test function: + +* Set up inputs and expected outputs. +* Runs function / component on inputs to get actual outputs. +* Checks actual outputs match expected outputs. + +Verbose, but equivalent, version of `test_encode_sos`. + + def test_encode_sos(self): + expected = "... --- ..." + actual = self.translator.encode("SOS") + assert expected == actual + +## `nose` - a Python test framework + +[nose](https://pypi.python.org/pypi/nose/) automatically finds, runs and reports on tests. + +[xUnit test framework](http://en.wikipedia.org/wiki/XUnit). + +`test_` file and function prefix, `Test` class prefix. + + $ nosetests test_morse.py + +`.` denotes successful tests. + +Remove `__main__`. + + $ nosetests test_morse.py + +xUnit test report, standard format, convert to HTML, present online. + + $ nosetests --with-xunit test_morse.py + $ cat nosetests.xml + +## Exercise: propose some more tests + +Consider: + +* What haven't we tested for so far? +* Have we covered all possible strings? +* Have we covered all possible arguments? + +Examples. + + encode("sos") + encode("") + decode("") + encode("1 + 2 = 3") + decode("...---...") + +## Exercise: implement examples + +Tests for illegal arguments. + + def test_encode_illegal(self): + try: + self.translator.encode("1 + 2 = 3") + assert False + except KeyError: + assert True + +Alternative. + + from nose.tools import assert_raises + + def test_encode_illegal(self): + assert_raises(KeyError, self.translator.encode, "1 + 2 = 3") + +Testing components together: + + assert "sos" == decode(encode("sos")) + assert "... --- ..." == encode(decode("... --- ...")) + +## Testing in practice + +Legacy code of 10000s of lines, with many input and output files, + +* Run code on set of input files. +* Save output files. +* Refactor code e.g. to optimise it or parallelise it. +* Run code on set of input files. +* Check that outputs match saved outputs. + +EPCC and the Colon Cancer Genetics Group (CCGG) of MRC Human Genetics Unit at the Western General Hospital, Edinburgh - [Oncology](http://www.edikt.org/edikt2/OncologyActivity) project to optimise and parallelise FORTRAN genetics code. + +Continuous integration server e.g. [Jenkins](http://jenkins-ci.org/) - detect commit to version control, build, run tests, publish. + +[Muon Ion Cooling Experiment](http://www.mice.iit.edu/) (MICE) - Bazaar version control, Python tests, Jenkins, [published online](https://micewww.pp.rl.ac.uk/tab/show/maus). + +[Apache Hadoop Common Jenkins dashboard](https://builds.apache.org/job/Hadoop-Common-trunk/) + +## When 1 + 1 = 2.0000001 + +Computers don't do floating point arithmetic too well. + + $ python + >>> expected = 0 + >>> actual = 0.1 + 0.1 + 0.1 - 0.3 + >>> assert expected == actual + >>> print actual + +Compare to within a threshold, or delta e.g. expected == actual if expected - actual < 0.0000000000000001. + +Thresholds are application-specific. + +Python [decimal](http://docs.python.org/2/library/decimal.html), floating-point arithmetic functions. + + $ python + >>> from nose.tools import assert_almost_equal + >>> assert_almost_equal(expected, actual, 0) + >>> assert_almost_equal(expected, actual, 10) + >>> assert_almost_equal(expected, actual, 15) + >>> assert_almost_equal(expected, actual, 16) + +`nose.testing` uses absolute tolerance: abs(x, y) <= delta + +[Numpy](http://www.numpy.org/)'s `numpy.testing` uses relative tolerance: abs(x, y) <= delta * (max(abs(x), abs(y)). + +`assert_allclose(actual_array, expected_array, relative_tolerance, absolute_tolerance)` + +## When should we test? + +* Always! +* Early, and not wait till after we've used it to generate data for our important paper, or given it to someone else to use. +* Often, so that we know that any changes we've made to our code, or to things that our code needs (e.g. libraries, configuration files etc.) haven't introduced any bugs. + +How much is enough? + +What we know about software development - we can't test everything. It is nearly impossible to test software at the level of 100 percent of its logic paths", fact 32 in R. L. Glass (2002). + +No excuse for testing nothing! Learn by experience, like writing a paper. + +Review tests, like code, to avoid + +* Pass when they should fail, false positives. +* Fail when they should pass, false negatives. +* Don't test anything. + +Example. + + def test_critical_correctness(): + # TODO - will complete this tomorrow! + pass + +## Summary + +Testing + +* Saves time. +* Gives confidence that code does what we want and expect it to. +* Promotes trust that code, and so research, is correct. + +Remember [Geoffrey Chang](http://en.wikipedia.org/wiki/Geoffrey_Chang) + +Bruce Eckel, [Thinking in Java, 3rd Edition](http://www.mindview.net/Books/TIJ/), "If it's not tested, it's broken". + +## Links + +* [Software Carpentry](http://software-carpentry.org/)'s online [testing](http://software-carpentry.org/4_0/test/index.html) lectures. +* A discussion on [is it worthwhile to write unit tests for scientific research codes?](http://scicomp.stackexchange.com/questions/206/is-it-worthwhile-to-write-unit-tests-for-scientific-research-codes) +* G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. Huff, I. M. Mitchell, M. Plumbley, B. Waugh, E. P. White, P. Wilson (2012) "[Best Practices for Scientific Computing](http://arxiv.org/abs/1210.0530)", arXiv:1210.0530 [cs.MS]. diff --git a/testing/mean.py b/testing/mean.py new file mode 100644 index 0000000..7b419d6 --- /dev/null +++ b/testing/mean.py @@ -0,0 +1,9 @@ +def mean(numlist): + try : + total = sum(numlist) + length = len(numlist) + except TypeError : + raise TypeError("The list was not numbers.") + except : + print "Something unknown happened with the list." + return total/length diff --git a/testing/morse.py b/testing/morse.py new file mode 100644 index 0000000..16a10a8 --- /dev/null +++ b/testing/morse.py @@ -0,0 +1,82 @@ + +import string +import sys + +class MorseTranslator: + """This class can translate to and from morse code.""" + def __init__(self): + self._letter_to_morse = {'a':'.-', 'b':'-...', 'c':'-.-.', 'd':'-..', 'e':'.', 'f':'..-.', + 'g':'--.', 'h':'....', 'i':'..', 'j':'.---', 'k':'-.-', 'l':'.-..', 'm':'--', + 'n':'-.', 'o':'---', 'p':'.--.', 'q':'--.-', 'r':'.-.', 's':'...', 't':'-', + 'u':'..-', 'v':'...-', 'w':'.--', 'x':'-..-', 'y':'-.--', 'z':'--..', + '0':'-----', '1':'.----', '2':'..---', '3':'...--', '4':'....-', + '5':'.....', '6':'-....', '7':'--...', '8':'---..', '9':'----.', + ' ':'/', '':'' } + + self._morse_to_letter = {} + + for letter in self._letter_to_morse: + morse = self._letter_to_morse[letter] + self._morse_to_letter[morse] = letter + + def encode(self, message): + """This function encodes the passed message into morse, + and returns the morse code string""" + morse = [] + + for letter in message: + letter = letter.lower() + morse.append(self._letter_to_morse[letter]) + + return string.join(morse," ") + + def decode(self, message): + """This function decodes the passed morse code message + and returns a string containing the decoded message""" + + english = [] + + # Now we cannot read by letter. We know that morse letters are + # separated by a space, so we split the morse string by spaces + morse_letters = string.split(message, " ") + + for letter in morse_letters: + english.append(self._morse_to_letter[letter]) + + # Rejoin, but now we don't need to add any spaces + return string.join(english,"") + +if __name__ == "__main__": + + translator = MorseTranslator() + + while True: + print "Instruction (encode, decode, quit) :-> ", + + # Read a line from standard input + line = sys.stdin.readline() + line = line.rstrip() + + # the first line should be either "encode", "decode" + # or "quit" to tell us what to do next... + if line == "encode": + # read the line to be encoded + message = sys.stdin.readline().rstrip() + + print "Message is '%s'" % message + print "Encoded is '%s'" % translator.encode(message) + + elif line == "decode": + # read the morse to be decoded + message = sys.stdin.readline().rstrip() + + print "Morse is '%s'" % message + print "Decoded is '%s'" % translator.decode(message) + + elif line == "quit": + print "Exiting..." + break + + else: + print "Cannot understand '%s'. Instruction should be 'encode', 'decode' or 'quit'." % line + diff --git a/testing/morse_test.py b/testing/morse_test.py new file mode 100644 index 0000000..ffb59e9 --- /dev/null +++ b/testing/morse_test.py @@ -0,0 +1,14 @@ +from morse import MorseTranslator + +class TestMorseTranslator: + + def test(self): + translator = MorseTranslator() + assert "... --- ..." == translator.encode("SOS") + assert "sos" == translator.decode("... --- ...") + print "OK" + + if __name__ == "__main__": + + test_translator = TestMorseTranslator() + test_translator.test() \ No newline at end of file diff --git a/testing/test_mean.py b/testing/test_mean.py new file mode 100644 index 0000000..809686a --- /dev/null +++ b/testing/test_mean.py @@ -0,0 +1,21 @@ +from nose.tools import assert_equal, assert_almost_equal, assert_true, \ + assert_false, assert_raises, assert_is_instance + +from mean import mean + +def test_mean1(): + obs = mean([0, 0, 0, 0]) + exp = 0 + assert_equal(obs, exp) + + obs = mean([0, 200]) + exp = 100 + assert_equal(obs, exp) + + obs = mean([0, -200]) + exp = -100 + assert_equal(obs, exp) + + obs = mean([0]) + exp = 0 + assert_equal(obs, exp) diff --git a/testing/test_prod.jpg b/testing/test_prod.jpg new file mode 100644 index 0000000..6970075 Binary files /dev/null and b/testing/test_prod.jpg differ