Updated testing material

author Karthik Ram <karthik.ram@gmail.com>

Fri, 13 Sep 2013 07:59:05 +0000 (08:59 +0100)

committer W. Trevor King <wking@tremily.us>

Fri, 8 Nov 2013 03:50:33 +0000 (19:50 -0800)
author Karthik Ram <karthik.ram@gmail.com>
Fri, 13 Sep 2013 07:59:05 +0000 (08:59 +0100)
committer W. Trevor King <wking@tremily.us>
Fri, 8 Nov 2013 03:50:33 +0000 (19:50 -0800)
diff --git a/testing/README.md b/testing/README.md

old mode 100755 (executable)

new mode 100644 (file)

index 91e15b7..80ddef3
--- a/testing/README.md
+++ b/testing/README.md
@@ -1,10 +1,364 @@
-# Testing - crib sheet
+# Testing
  
-## Detecting errors
+* * * * *
  
-What we know about software development - code reviews work. Fagan (1976) discovered that a rigorous inspection can remove 60-90% of errors before the first test is run. M.E., Fagan (1976). [Design and Code inspections to reduce errors in program development](http://www.mfagan.com/pdfs/ibmfagan.pdf). IBM Systems Journal 15 (3): pp. 182-211.
+**Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
+Scopatz**
  
-What we know about software development - code reviews should be about 60 minutes long. Cohen (2006) discovered that all the value of a code review comes within the first hour, after which reviewers can become exhausted and the issues they find become ever more trivial. J. Cohen (2006). [Best Kept Secrets of Peer Code Review](http://smartbear.com/SmartBear/media/pdfs/best-kept-secrets-of-peer-code-review.pdf). SmartBear, 2006. ISBN-10: 1599160676. ISBN-13: 978-1599160672.
+![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
+# What is testing?
+
+Software testing is a process by which one or more expected behaviors
+and results from a piece of software are exercised and confirmed. Well
+chosen tests will confirm expected code behavior for the extreme
+boundaries of the input domains, output ranges, parametric combinations,
+and other behavioral **edge cases**.
+
+# Why test software?
+
+Unless you write flawless, bug-free, perfectly accurate, fully precise,
+and predictable code **every time**, you must test your code in order to
+trust it enough to answer in the affirmative to at least a few of the
+following questions:
+
+-   Does your code work?
+-   **Always?**
+-   Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
+-   Does it continue to work after changes are made?
+-   Does it continue to work after system configurations or libraries
+    are upgraded?
+-   Does it respond properly for a full range of input parameters?
+-   What about **edge or corner cases**?
+-   What's the limit on that input parameter?
+-   How will it affect your
+    [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
+
+## Verification
+
+*Verification* is the process of asking, "Have we built the software
+correctly?" That is, is the code bug free, precise, accurate, and
+repeatable?
+
+## Validation
+
+*Validation* is the process of asking, "Have we built the right
+software?" That is, is the code designed in such a way as to produce the
+answers we are interested in, data we want, etc.
+
+## Uncertainty Quantification
+
+*Uncertainty Quantification* is the process of asking, "Given that our
+algorithm may not be deterministic, was our execution within acceptable
+error bounds?" This is particularly important for anything which uses
+random numbers, eg Monte Carlo methods.
+
+# Where are tests?
+
+Say we have an averaging function:
+
+```python
+def mean(numlist):
+    total = sum(numlist)
+    length = len(numlist)
+    return total/length
+```
+
+Tests could be implemented as runtime **exceptions in the function**:
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+```
+
+Sometimes tests they are functions alongside the function definitions
+they are testing.
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+
+
+def test_mean():
+    assert mean([0, 0, 0, 0]) == 0
+    assert mean([0, 200]) == 100
+    assert mean([0, -200]) == -100
+    assert mean([0]) == 0
+
+
+def test_floating_mean():
+    assert mean([1, 2]) == 1.5
+```
+
+Sometimes they are in an executable independent of the main executable.
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+```
+
+Where, in a different file exists a test module:
+
+```python
+import mean
+
+def test_mean():
+    assert mean([0, 0, 0, 0]) == 0
+    assert mean([0, 200]) == 100
+    assert mean([0, -200]) == -100
+    assert mean([0]) == 0
+
+
+def test_floating_mean():
+    assert mean([1, 2]) == 1.5
+```
+
+# When should we test?
+
+The three right answers are:
+
+-   **ALWAYS!**
+-   **EARLY!**
+-   **OFTEN!**
+
+The longer answer is that testing either before or after your software
+is written will improve your code, but testing after your program is
+used for something important is too late.
+
+If we have a robust set of tests, we can run them before adding
+something new and after adding something new. If the tests give the same
+results (as appropriate), we can have some assurance that we didn't
+wreak anything. The same idea applies to making changes in your system
+configuration, updating support codes, etc.
+
+Another important feature of testing is that it helps you remember what
+all the parts of your code do. If you are working on a large project
+over three years and you end up with 200 classes, it may be hard to
+remember what the widget class does in detail. If you have a test that
+checks all of the widget's functionality, you can look at the test to
+remember what it's supposed to do.
+
+# Who should test?
+
+In a collaborative coding environment, where many developers contribute
+to the same code base, developers should be responsible individually for
+testing the functions they create and collectively for testing the code
+as a whole.
+
+Professionals often test their code, and take pride in test coverage,
+the percent of their functions that they feel confident are
+comprehensively tested.
+
+# How are tests written?
+
+The type of tests that are written is determined by the testing
+framework you adopt. Don't worry, there are a lot of choices.
+
+## Types of Tests
+
+**Exceptions:** Exceptions can be thought of as type of runtime test.
+They alert the user to exceptional behavior in the code. Often,
+exceptions are related to functions that depend on input that is unknown
+at compile time. Checks that occur within the code to handle exceptional
+behavior that results from this type of input are called Exceptions.
+
+**Unit Tests:** Unit tests are a type of test which test the fundamental
+units of a program's functionality. Often, this is on the class or
+function level of detail. However what defines a *code unit* is not
+formally defined.
+
+To test functions and classes, the interfaces (API) - rather than the
+implementation - should be tested. Treating the implementation as a
+black box, we can probe the expected behavior with boundary cases for
+the inputs.
+
+**System Tests:** System level tests are intended to test the code as a
+whole. As opposed to unit tests, system tests ask for the behavior as a
+whole. This sort of testing involves comparison with other validated
+codes, analytical solutions, etc.
+
+**Regression Tests:** A regression test ensures that new code does
+change anything. If you change the default answer, for example, or add a
+new question, you'll need to make sure that missing entries are still
+found and fixed.
+
+**Integration Tests:** Integration tests query the ability of the code
+to integrate well with the system configuration and third party
+libraries and modules. This type of test is essential for codes that
+depend on libraries which might be updated independently of your code or
+when your code might be used by a number of users who may have various
+versions of libraries.
+
+**Test Suites:** Putting a series of unit tests into a collection of
+modules creates, a test suite. Typically the suite as a whole is
+executed (rather than each test individually) when verifying that the
+code base still functions after changes have been made.
+
+# Elements of a Test
+
+**Behavior:** The behavior you want to test. For example, you might want
+to test the fun() function.
+
+**Expected Result:** This might be a single number, a range of numbers,
+a new fully defined object, a system state, an exception, etc. When we
+run the fun() function, we expect to generate some fun. If we don't
+generate any fun, the fun() function should fail its test.
+Alternatively, if it does create some fun, the fun() function should
+pass this test. The the expected result should known *a priori*. For
+numerical functions, this is result is ideally analytically determined
+even if the function being tested isn't.
+
+**Assertions:** Require that some conditional be true. If the
+conditional is false, the test fails.
+
+**Fixtures:** Sometimes you have to do some legwork to create the
+objects that are necessary to run one or many tests. These objects are
+called fixtures as they are not really part of the test themselves but
+rather involve getting the computer into the appropriate state.
+
+For example, since fun varies a lot between people, the fun() function
+is a method of the Person class. In order to check the fun function,
+then, we need to create an appropriate Person object on which to run
+fun().
+
+**Setup and teardown:** Creating fixtures is often done in a call to a
+setup function. Deleting them and other cleanup is done in a teardown
+function.
+
+**The Big Picture:** Putting all this together, the testing algorithm is
+often:
+
+```python
+setup()
+test()
+teardown()
+```
+
+But, sometimes it's the case that your tests change the fixtures. If so,
+it's better for the setup() and teardown() functions to occur on either
+side of each test. In that case, the testing algorithm should be:
+
+```python
+setup()
+test1()
+teardown()
+
+setup()
+test2()
+teardown()
+
+setup()
+test3()
+teardown()
+```
+
+* * * * *
+
+# Nose: A Python Testing Framework
+
+The testing framework we'll discuss today is called nose. However, there
+are several other testing frameworks available in most language. Most
+notably there is [JUnit](http://www.junit.org/) in Java which can
+arguably attributed to inventing the testing framework.
+
+## Where do nose tests live?
+
+Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
+`test_`. Specifically, these satisfy the testMatch regular expression
+`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
+in the unittest.TestCase subclasses chat you create in your code. You
+can also create test functions which are not unittest.TestCase
+subclasses if they are named with the configured testMatch regular
+expression.)
+
+## Nose Test Syntax
+
+To write a nose test, we make assertions.
+
+```python
+assert should_be_true()
+assert not should_not_be_true()
+```
+
+Additionally, nose itself defines number of assert functions which can
+be used to test more specific aspects of the code base.
+
+```python
+from nose.tools import *
+
+assert_equal(a, b)
+assert_almost_equal(a, b)
+assert_true(a)
+assert_false(a)
+assert_raises(exception, func, *args, **kwargs)
+assert_is_instance(a, b)
+# and many more!
+```
+
+Moreover, numpy offers similar testing functions for arrays:
+
+```python
+from numpy.testing import *
+
+assert_array_equal(a, b)
+assert_array_almost_equal(a, b)
+# etc.
+```
+
+## Exercise: Writing tests for mean()
+
+There are a few tests for the mean() function that we listed in this
+lesson. What are some tests that should fail? Add at least three test
+cases to this set. Edit the `test_mean.py` file which tests the mean()
+function in `mean.py`.
+
+*Hint:* Think about what form your input could take and what you should
+do to handle it. Also, think about the type of the elements in the list.
+What should be done if you pass a list of integers? What if you pass a
+list of strings?
+
+**Example**:
+
+    nosetests test_mean.py
+
+# Test Driven Development
+
+Test driven development (TDD) is a philosophy whereby the developer
+creates code by **writing the tests first**. That is to say you write the
+tests *before* writing the associated code!
+
+This is an iterative process whereby you write a test then write the
+minimum amount code to make the test pass. If a new feature is needed,
+another test is written and the code is expanded to meet this new use
+case. This continues until the code does what is needed.
+
+TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
+who diligently follow TDD swear by its effectiveness. This development
+style was put forth most strongly by [Kent Beck in
+2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
+
+---
+# Testing morse.py
  
  ## Runtime tests
  
@@ -93,179 +447,176 @@ Run tests.
  
      $ python test_morse.py
  
-Modularise functions.
-
-    def test_encode_sos(self):
-        ...
-    def test_decode_sos(self):
-        ...
-
-    test_translator.test_encode_sos()
-    test_translator.test_decode_sos()
-
-Remove duplicated code:
-
-    def __init__(self):
-        self.translator = MorseTranslator()
-
-Test function:
-
-* Set up inputs and expected outputs.
-* Runs function / component on inputs to get actual outputs.
-* Checks actual outputs match expected outputs. 
-
-Verbose, but equivalent, version of `test_encode_sos`.
-
-    def test_encode_sos(self):
-        expected = "... --- ..."
-        actual = self.translator.encode("SOS")                     
-        assert expected == actual
-
-## `nose` - a Python test framework
-
-[nose](https://pypi.python.org/pypi/nose/) automatically finds, runs and reports on tests.
-
-[xUnit test framework](http://en.wikipedia.org/wiki/XUnit).
-
-`test_` file and function prefix, `Test` class prefix.
-
-    $ nosetests test_morse.py
-
-`.` denotes successful tests.
-
-Remove `__main__`.
-
-    $ nosetests test_morse.py
  
-xUnit test report, standard format, convert to HTML, present online.
  
-    $ nosetests --with-xunit test_morse.py
-    $ cat nosetests.xml
  
-## Exercise: propose some more tests
  
-Consider:
+---
  
-* What haven't we tested for so far? 
-* Have we covered all possible strings?
-* Have we covered all possible arguments?
+## Additional test driven development example
+*Please try this on your own time*
  
-Examples.
+Say you want to write a fib() function which generates values of the
+Fibonacci sequence of given indexes. You would - of course - start by
+writing the test, possibly testing a single value:
  
-    encode("sos")
-    encode("")
-    decode("")
-    encode("1 + 2 = 3")
-    decode("...---...")
+```python
+from nose.tools import assert_equal
  
-## Exercise: implement examples
+from pisa import fib
  
-Tests for illegal arguments.
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
+```
  
-    def test_encode_illegal(self):
-        try:
-            self.translator.encode("1 + 2 = 3")
-            assert False
-        except KeyError:
-            assert True
+You would *then* go ahead and write the actual function:
  
-Alternative.
+```python
+def fib(n):
+    # you snarky so-and-so
+    return 1
+```
  
-    from nose.tools import assert_raises
+And that is it right?! Well, not quite. This implementation fails for
+most other values. Adding tests we see that:
  
-    def test_encode_illegal(self):
-        assert_raises(KeyError, self.translator.encode, "1 + 2 = 3")
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
  
-Testing components together:
  
-    assert "sos" == decode(encode("sos"))
-    assert "... --- ..." == encode(decode("... --- ..."))
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
  
-## Testing in practice
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
+```
  
-Legacy code of 10000s of lines, with many input and output files,
+This extra test now requires that we bother to implement at least the
+initial values:
  
-* Run code on set of input files.
-* Save output files.
-* Refactor code e.g. to optimise it or parallelise it.
-* Run code on set of input files.
-* Check that outputs match saved outputs. 
+```python
+def fib(n):
+    # a little better
+    if n == 0 or n == 1:
+        return n
+    return 1
+```
  
-EPCC and the Colon Cancer Genetics Group (CCGG) of MRC Human Genetics Unit at the Western General Hospital, Edinburgh - [Oncology](http://www.edikt.org/edikt2/OncologyActivity) project to optimise and parallelise FORTRAN genetics code.
+However, this function still falls over for `2 < n`. Time for more
+tests!
  
-Continuous integration server e.g. [Jenkins](http://jenkins-ci.org/) - detect commit to version control, build, run tests, publish.
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
  
-[Muon Ion Cooling Experiment](http://www.mice.iit.edu/) (MICE) - Bazaar version control, Python tests, Jenkins, [published online](https://micewww.pp.rl.ac.uk/tab/show/maus).
  
-[Apache Hadoop Common Jenkins dashboard](https://builds.apache.org/job/Hadoop-Common-trunk/)
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
  
-## When 1 + 1 = 2.0000001
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
  
-Computers don't do floating point arithmetic too well.
  
-    $ python
-    >>> expected = 0
-    >>> actual = 0.1 + 0.1 + 0.1 - 0.3
-    >>> assert expected == actual
-    >>> print actual
+def test_fib3():
+    obs = fib(3)
+    exp = 2
+    assert_equal(obs, exp)
  
-Compare to within a threshold, or delta e.g. expected == actual  if expected - actual < 0.0000000000000001.
+    obs = fib(6)
+    exp = 8
+    assert_equal(obs, exp)
+```
  
-Thresholds are application-specific. 
+At this point, we had better go ahead and try do the right thing...
  
-Python [decimal](http://docs.python.org/2/library/decimal.html), floating-point arithmetic functions.
+```python
+def fib(n):
+    # finally, some math
+    if n == 0 or n == 1:
+        return n
+    else:
+        return fib(n - 1) + fib(n - 2)
+```
  
-    $ python
-    >>> from nose.tools import assert_almost_equal
-    >>> assert_almost_equal(expected, actual, 0)
-    >>> assert_almost_equal(expected, actual, 10)
-    >>> assert_almost_equal(expected, actual, 15)
-    >>> assert_almost_equal(expected, actual, 16)
+Here it becomes very tempting to take an extended coffee break or
+possibly a power lunch. But then you remember those pesky negative
+numbers and floats. Perhaps the right thing to do here is to just be
+undefined.
  
-`nose.testing` uses absolute tolerance: abs(x, y) <= delta
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
  
-[Numpy](http://www.numpy.org/)'s `numpy.testing` uses relative tolerance: abs(x, y) <= delta * (max(abs(x), abs(y)). 
  
-`assert_allclose(actual_array, expected_array, relative_tolerance, absolute_tolerance)`
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
  
-## When should we test?
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
  
-* Always!
-* Early, and not wait till after we've used it to generate data for our important paper, or given it to someone else to use.
-* Often, so that we know that any changes we've made to our code, or to things that our code needs (e.g. libraries, configuration files etc.) haven't introduced any bugs.
  
-How much is enough? 
+def test_fib3():
+    obs = fib(3)
+    exp = 2
+    assert_equal(obs, exp)
  
-What we know about software development - we can't test everything. It is nearly impossible to test software at the level of 100 percent of its logic paths", fact 32 in R. L. Glass (2002).
+    obs = fib(6)
+    exp = 8
+    assert_equal(obs, exp)
  
-No excuse for testing nothing! Learn by experience, like writing a paper.
  
-Review tests, like code, to avoid
+def test_fib3():
+    obs = fib(13.37)
+    exp = NotImplemented
+    assert_equal(obs, exp)
  
-* Pass when they should fail, false positives.
-* Fail when they should pass, false negatives.
-* Don't test anything. 
+    obs = fib(-9000)
+    exp = NotImplemented
+    assert_equal(obs, exp)
+```
  
-Example.
+This means that it is time to add the appropriate case to the function
+itself:
  
-    def test_critical_correctness():
-        # TODO - will complete this tomorrow!
-        pass
+```python
+def fib(n):
+    # sequence and you shall find
+    if n < 0 or int(n) != n:
+        return NotImplemented
+    elif n == 0 or n == 1:
+        return n
+    else:
+        return fib(n - 1) + fib(n - 2)
+```
  
-## Summary
+# Quality Assurance Exercise
  
-Testing
+Can you think of other tests to make for the fibonacci function? I promise there 
+are at least two. 
  
-* Saves time.
-* Gives confidence that code does what we want and expect it to.
-* Promotes trust that code, and so research, is correct.
+Implement one new test in test_fib.py, run nosetests, and if it fails, implement 
+a more robust function for that case.
  
-Remember [Geoffrey Chang](http://en.wikipedia.org/wiki/Geoffrey_Chang)
+And thus - finally - we have a robust function together with working
+tests!
  
-Bruce Eckel, [Thinking in Java, 3rd Edition](http://www.mindview.net/Books/TIJ/),  "If it's not tested, it's broken".
  
-## Links
  
-* [Software Carpentry](http://software-carpentry.org/)'s online [testing](http://software-carpentry.org/4_0/test/index.html) lectures.
-* A discussion on [is it worthwhile to write unit tests for scientific research codes?](http://scicomp.stackexchange.com/questions/206/is-it-worthwhile-to-write-unit-tests-for-scientific-research-codes)
-* G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. Huff, I. M. Mitchell, M. Plumbley, B. Waugh, E. P. White, P. Wilson (2012) "[Best Practices for Scientific Computing](http://arxiv.org/abs/1210.0530)", arXiv:1210.0530 [cs.MS].
diff --git a/testing/additional_notes.md b/testing/additional_notes.md

new file mode 100755 (executable)

index 0000000..91e15b7
--- /dev/null
+++ b/testing/additional_notes.md
@@ -0,0 +1,271 @@
+# Testing - crib sheet
+
+## Detecting errors
+
+What we know about software development - code reviews work. Fagan (1976) discovered that a rigorous inspection can remove 60-90% of errors before the first test is run. M.E., Fagan (1976). [Design and Code inspections to reduce errors in program development](http://www.mfagan.com/pdfs/ibmfagan.pdf). IBM Systems Journal 15 (3): pp. 182-211.
+
+What we know about software development - code reviews should be about 60 minutes long. Cohen (2006) discovered that all the value of a code review comes within the first hour, after which reviewers can become exhausted and the issues they find become ever more trivial. J. Cohen (2006). [Best Kept Secrets of Peer Code Review](http://smartbear.com/SmartBear/media/pdfs/best-kept-secrets-of-peer-code-review.pdf). SmartBear, 2006. ISBN-10: 1599160676. ISBN-13: 978-1599160672.
+
+## Runtime tests
+
+[morse.py](python/morse/morse.py)
+
+    $ python morse.py
+    encode
+    1 + 2 = 3
+
+`KeyError` is an exception.
+
+Traceback shows Python's exception stack trace.
+
+Runtime tests can make code robust and behave gracefully.
+
+    try:
+        print "Encoded is '%s'" % translator.encode(message)
+    except KeyError:
+        print "The input should be a string of a-z, A-Z, 0-9 or space"
+
+Exception is caught by the `except` block.
+
+Exception can be converted and passed e.g. if this was deep within a function we would not want to print but to keep UI separate.
+
+Can `raise` an exception e.g.
+
+    except KeyError:
+        raise ValueError("The input should be a string of a-z, A-Z, 0-9 or space")
+
+## Exercise: add runtime test for decode
+
+## Correctness tests
+
+Testing manually works but is time-consuming and error prone - might forget to run a test.
+
+Write down set of test steps so won't forget. 
+
+Still time-consuming.
+
+    def test(self):
+        print "sos is ", self.encode("sos")
+        print "... --- ... is ", self.decode("... --- ...")
+        print "OK"
+
+Extend UI.
+
+    while True:
+
+        elif line == "test":
+            print "Testing..."
+            translator.test()
+            break
+
+Automate checking.
+
+    def test(self):
+        assert "... --- ..." == self.encode("sos")
+        assert "sos" == self.decode("... --- ...")
+        print "OK"
+
+`assert` checks whether condition is true and, if not, raises an exception.
+
+Put test functions in separate file for modularity.
+
+    $ cp morse.py test_morse.py
+    $ nano test_morse.py
+
+    from morse import MorseTranslator
+
+    class TestMorseTranslator:
+
+        def test(self):
+            translator = MorseTranslator()
+            assert "... --- ..." == translator.encode("SOS")
+            assert "sos" == translator.decode("... --- ...")
+            print "OK"
+
+    if __name__ == "__main__":    
+
+        test_translator = TestMorseTranslator()
+        test_translator.test()
+
+Remove test code from `MorseTranslator`.
+
+Run tests.
+
+    $ python test_morse.py
+
+Modularise functions.
+
+    def test_encode_sos(self):
+        ...
+    def test_decode_sos(self):
+        ...
+
+    test_translator.test_encode_sos()
+    test_translator.test_decode_sos()
+
+Remove duplicated code:
+
+    def __init__(self):
+        self.translator = MorseTranslator()
+
+Test function:
+
+* Set up inputs and expected outputs.
+* Runs function / component on inputs to get actual outputs.
+* Checks actual outputs match expected outputs. 
+
+Verbose, but equivalent, version of `test_encode_sos`.
+
+    def test_encode_sos(self):
+        expected = "... --- ..."
+        actual = self.translator.encode("SOS")                     
+        assert expected == actual
+
+## `nose` - a Python test framework
+
+[nose](https://pypi.python.org/pypi/nose/) automatically finds, runs and reports on tests.
+
+[xUnit test framework](http://en.wikipedia.org/wiki/XUnit).
+
+`test_` file and function prefix, `Test` class prefix.
+
+    $ nosetests test_morse.py
+
+`.` denotes successful tests.
+
+Remove `__main__`.
+
+    $ nosetests test_morse.py
+
+xUnit test report, standard format, convert to HTML, present online.
+
+    $ nosetests --with-xunit test_morse.py
+    $ cat nosetests.xml
+
+## Exercise: propose some more tests
+
+Consider:
+
+* What haven't we tested for so far? 
+* Have we covered all possible strings?
+* Have we covered all possible arguments?
+
+Examples.
+
+    encode("sos")
+    encode("")
+    decode("")
+    encode("1 + 2 = 3")
+    decode("...---...")
+
+## Exercise: implement examples
+
+Tests for illegal arguments.
+
+    def test_encode_illegal(self):
+        try:
+            self.translator.encode("1 + 2 = 3")
+            assert False
+        except KeyError:
+            assert True
+
+Alternative.
+
+    from nose.tools import assert_raises
+
+    def test_encode_illegal(self):
+        assert_raises(KeyError, self.translator.encode, "1 + 2 = 3")
+
+Testing components together:
+
+    assert "sos" == decode(encode("sos"))
+    assert "... --- ..." == encode(decode("... --- ..."))
+
+## Testing in practice
+
+Legacy code of 10000s of lines, with many input and output files,
+
+* Run code on set of input files.
+* Save output files.
+* Refactor code e.g. to optimise it or parallelise it.
+* Run code on set of input files.
+* Check that outputs match saved outputs. 
+
+EPCC and the Colon Cancer Genetics Group (CCGG) of MRC Human Genetics Unit at the Western General Hospital, Edinburgh - [Oncology](http://www.edikt.org/edikt2/OncologyActivity) project to optimise and parallelise FORTRAN genetics code.
+
+Continuous integration server e.g. [Jenkins](http://jenkins-ci.org/) - detect commit to version control, build, run tests, publish.
+
+[Muon Ion Cooling Experiment](http://www.mice.iit.edu/) (MICE) - Bazaar version control, Python tests, Jenkins, [published online](https://micewww.pp.rl.ac.uk/tab/show/maus).
+
+[Apache Hadoop Common Jenkins dashboard](https://builds.apache.org/job/Hadoop-Common-trunk/)
+
+## When 1 + 1 = 2.0000001
+
+Computers don't do floating point arithmetic too well.
+
+    $ python
+    >>> expected = 0
+    >>> actual = 0.1 + 0.1 + 0.1 - 0.3
+    >>> assert expected == actual
+    >>> print actual
+
+Compare to within a threshold, or delta e.g. expected == actual  if expected - actual < 0.0000000000000001.
+
+Thresholds are application-specific. 
+
+Python [decimal](http://docs.python.org/2/library/decimal.html), floating-point arithmetic functions.
+
+    $ python
+    >>> from nose.tools import assert_almost_equal
+    >>> assert_almost_equal(expected, actual, 0)
+    >>> assert_almost_equal(expected, actual, 10)
+    >>> assert_almost_equal(expected, actual, 15)
+    >>> assert_almost_equal(expected, actual, 16)
+
+`nose.testing` uses absolute tolerance: abs(x, y) <= delta
+
+[Numpy](http://www.numpy.org/)'s `numpy.testing` uses relative tolerance: abs(x, y) <= delta * (max(abs(x), abs(y)). 
+
+`assert_allclose(actual_array, expected_array, relative_tolerance, absolute_tolerance)`
+
+## When should we test?
+
+* Always!
+* Early, and not wait till after we've used it to generate data for our important paper, or given it to someone else to use.
+* Often, so that we know that any changes we've made to our code, or to things that our code needs (e.g. libraries, configuration files etc.) haven't introduced any bugs.
+
+How much is enough? 
+
+What we know about software development - we can't test everything. It is nearly impossible to test software at the level of 100 percent of its logic paths", fact 32 in R. L. Glass (2002).
+
+No excuse for testing nothing! Learn by experience, like writing a paper.
+
+Review tests, like code, to avoid
+
+* Pass when they should fail, false positives.
+* Fail when they should pass, false negatives.
+* Don't test anything. 
+
+Example.
+
+    def test_critical_correctness():
+        # TODO - will complete this tomorrow!
+        pass
+
+## Summary
+
+Testing
+
+* Saves time.
+* Gives confidence that code does what we want and expect it to.
+* Promotes trust that code, and so research, is correct.
+
+Remember [Geoffrey Chang](http://en.wikipedia.org/wiki/Geoffrey_Chang)
+
+Bruce Eckel, [Thinking in Java, 3rd Edition](http://www.mindview.net/Books/TIJ/),  "If it's not tested, it's broken".
+
+## Links
+
+* [Software Carpentry](http://software-carpentry.org/)'s online [testing](http://software-carpentry.org/4_0/test/index.html) lectures.
+* A discussion on [is it worthwhile to write unit tests for scientific research codes?](http://scicomp.stackexchange.com/questions/206/is-it-worthwhile-to-write-unit-tests-for-scientific-research-codes)
+* G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, K. Huff, I. M. Mitchell, M. Plumbley, B. Waugh, E. P. White, P. Wilson (2012) "[Best Practices for Scientific Computing](http://arxiv.org/abs/1210.0530)", arXiv:1210.0530 [cs.MS].
diff --git a/testing/mean.py b/testing/mean.py

new file mode 100644 (file)

index 0000000..7b419d6
--- /dev/null
+++ b/testing/mean.py
@@ -0,0 +1,9 @@
+def mean(numlist):
+    try :
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError :
+        raise TypeError("The list was not numbers.")
+    except :
+        print "Something unknown happened with the list."
+    return total/length
diff --git a/testing/morse.py b/testing/morse.py

new file mode 100644 (file)

index 0000000..16a10a8
--- /dev/null
+++ b/testing/morse.py
@@ -0,0 +1,82 @@
+
+import string
+import sys
+
+class MorseTranslator:
+    """This class can translate to and from morse code."""
+    def __init__(self):
+        self._letter_to_morse = {'a':'.-', 'b':'-...', 'c':'-.-.', 'd':'-..', 'e':'.', 'f':'..-.', 
+                                 'g':'--.', 'h':'....', 'i':'..', 'j':'.---', 'k':'-.-', 'l':'.-..', 'm':'--', 
+                                 'n':'-.', 'o':'---', 'p':'.--.', 'q':'--.-', 'r':'.-.', 's':'...', 't':'-',
+                                 'u':'..-', 'v':'...-', 'w':'.--', 'x':'-..-', 'y':'-.--', 'z':'--..',
+                                 '0':'-----', '1':'.----', '2':'..---', '3':'...--', '4':'....-',
+                                 '5':'.....', '6':'-....', '7':'--...', '8':'---..', '9':'----.',
+                                 ' ':'/', '':'' }
+
+        self._morse_to_letter = {}
+
+        for letter in self._letter_to_morse:
+            morse = self._letter_to_morse[letter]
+            self._morse_to_letter[morse] = letter
+
+    def encode(self, message):
+        """This function encodes the passed message into morse,
+           and returns the morse code string"""
+        morse = []
+
+        for letter in message:
+            letter = letter.lower()
+            morse.append(self._letter_to_morse[letter])
+
+        return string.join(morse," ")
+
+    def decode(self, message):
+        """This function decodes the passed morse code message
+           and returns a string containing the decoded message"""
+
+        english = []
+
+        # Now we cannot read by letter. We know that morse letters are
+        # separated by a space, so we split the morse string by spaces
+        morse_letters = string.split(message, " ")
+
+        for letter in morse_letters:
+            english.append(self._morse_to_letter[letter])
+
+        # Rejoin, but now we don't need to add any spaces
+        return string.join(english,"")
+
+if __name__ == "__main__":    
+
+    translator = MorseTranslator()
+
+    while True:
+        print "Instruction (encode, decode, quit) :-> ",
+
+        # Read a line from standard input
+        line = sys.stdin.readline()
+        line = line.rstrip()
+
+        # the first line should be either "encode", "decode"
+        # or "quit" to tell us what to do next...
+        if line == "encode":
+            # read the line to be encoded
+            message = sys.stdin.readline().rstrip()
+
+            print "Message is '%s'" % message
+            print "Encoded is '%s'" % translator.encode(message)
+
+        elif line == "decode":
+            # read the morse to be decoded
+            message = sys.stdin.readline().rstrip()
+
+            print "Morse is   '%s'" % message
+            print "Decoded is '%s'" % translator.decode(message)
+
+        elif line == "quit":
+            print "Exiting..."
+            break
+
+        else:
+            print "Cannot understand '%s'. Instruction should be 'encode', 'decode' or 'quit'." % line
+
diff --git a/testing/morse_test.py b/testing/morse_test.py

new file mode 100644 (file)

index 0000000..ffb59e9
--- /dev/null
+++ b/testing/morse_test.py
@@ -0,0 +1,14 @@
+from morse import MorseTranslator
+
+class TestMorseTranslator:
+
+        def test(self):
+            translator = MorseTranslator()
+            assert "... --- ..." == translator.encode("SOS")
+            assert "sos" == translator.decode("... --- ...")
+            print "OK"
+
+       if __name__ == "__main__":    
+
+               test_translator = TestMorseTranslator()
+               test_translator.test()
+\ No newline at end of file
diff --git a/testing/test_mean.py b/testing/test_mean.py

new file mode 100644 (file)

index 0000000..809686a
--- /dev/null
+++ b/testing/test_mean.py
@@ -0,0 +1,21 @@
+from nose.tools import assert_equal, assert_almost_equal, assert_true, \
+    assert_false, assert_raises, assert_is_instance
+
+from mean import mean
+
+def test_mean1():
+    obs = mean([0, 0, 0, 0])
+    exp = 0
+    assert_equal(obs, exp)
+
+    obs = mean([0, 200])
+    exp = 100
+    assert_equal(obs, exp)
+
+    obs = mean([0, -200])
+    exp = -100
+    assert_equal(obs, exp)
+
+    obs = mean([0]) 
+    exp = 0
+    assert_equal(obs, exp)
diff --git a/testing/test_prod.jpg b/testing/test_prod.jpg

new file mode 100644 (file)

index 0000000..6970075

Binary files /dev/null and b/testing/test_prod.jpg differ
author	Karthik Ram <karthik.ram@gmail.com>
	Fri, 13 Sep 2013 07:59:05 +0000 (08:59 +0100)
committer	W. Trevor King <wking@tremily.us>
	Fri, 8 Nov 2013 03:50:33 +0000 (19:50 -0800)
testing/README.md	[changed mode: 0755->0644]	patch \| blob \| history
testing/additional_notes.md	[new file with mode: 0755]	patch \| blob
testing/mean.py	[new file with mode: 0644]	patch \| blob
testing/morse.py	[new file with mode: 0644]	patch \| blob
testing/morse_test.py	[new file with mode: 0644]	patch \| blob
testing/test_mean.py	[new file with mode: 0644]	patch \| blob
testing/test_prod.jpg	[new file with mode: 0644]	patch \| blob