Merge pull request #34 from pipitone/swc_exercises

author Ethan White <ethan@weecology.org>

Sun, 14 Apr 2013 19:25:10 +0000 (12:25 -0700)

committer W. Trevor King <wking@tremily.us>

Fri, 1 Nov 2013 16:39:03 +0000 (09:39 -0700)
author Ethan White <ethan@weecology.org>
Sun, 14 Apr 2013 19:25:10 +0000 (12:25 -0700)
committer W. Trevor King <wking@tremily.us>
Fri, 1 Nov 2013 16:39:03 +0000 (09:39 -0700)
diff --git a/python/testing/Readme.md b/python/testing/Readme.md

new file mode 100644 (file)

index 0000000..8d53b87
--- /dev/null
+++ b/python/testing/Readme.md
@@ -0,0 +1,550 @@
+# Testing
+
+* * * * *
+
+**Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
+Scopatz**
+
+![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
+# What is testing?
+
+Software testing is a process by which one or more expected behaviors
+and results from a piece of software are exercised and confirmed. Well
+chosen tests will confirm expected code behavior for the extreme
+boundaries of the input domains, output ranges, parametric combinations,
+and other behavioral **edge cases**.
+
+# Why test software?
+
+Unless you write flawless, bug-free, perfectly accurate, fully precise,
+and predictable code **every time**, you must test your code in order to
+trust it enough to answer in the affirmative to at least a few of the
+following questions:
+
+-   Does your code work?
+-   **Always?**
+-   Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
+-   Does it continue to work after changes are made?
+-   Does it continue to work after system configurations or libraries
+    are upgraded?
+-   Does it respond properly for a full range of input parameters?
+-   What about **edge or corner cases**?
+-   What's the limit on that input parameter?
+-   How will it affect your
+    [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
+
+## Verification
+
+*Verification* is the process of asking, "Have we built the software
+correctly?" That is, is the code bug free, precise, accurate, and
+repeatable?
+
+## Validation
+
+*Validation* is the process of asking, "Have we built the right
+software?" That is, is the code designed in such a way as to produce the
+answers we are interested in, data we want, etc.
+
+## Uncertainty Quantification
+
+*Uncertainty Quantification* is the process of asking, "Given that our
+algorithm may not be deterministic, was our execution within acceptable
+error bounds?" This is particularly important for anything which uses
+random numbers, eg Monte Carlo methods.
+
+# Where are tests?
+
+Say we have an averaging function:
+
+```python
+def mean(numlist):
+    total = sum(numlist)
+    length = len(numlist)
+    return total/length
+```
+
+Tests could be implemented as runtime **exceptions in the function**:
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+```
+
+Sometimes tests they are functions alongside the function definitions
+they are testing.
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+
+
+def test_mean():
+    assert mean([0, 0, 0, 0]) == 0
+    assert mean([0, 200]) == 100
+    assert mean([0, -200]) == -100
+    assert mean([0]) == 0
+
+
+def test_floating_mean():
+    assert mean([1, 2]) == 1.5
+```
+
+Sometimes they are in an executable independent of the main executable.
+
+```python
+def mean(numlist):
+    try:
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError:
+        raise TypeError("The number list was not a list of numbers.")
+    except:
+        print "There was a problem evaluating the number list."
+    return total/length
+```
+
+Where, in a different file exists a test module:
+
+```python
+import mean
+
+def test_mean():
+    assert mean([0, 0, 0, 0]) == 0
+    assert mean([0, 200]) == 100
+    assert mean([0, -200]) == -100
+    assert mean([0]) == 0
+
+
+def test_floating_mean():
+    assert mean([1, 2]) == 1.5
+```
+
+# When should we test?
+
+The three right answers are:
+
+-   **ALWAYS!**
+-   **EARLY!**
+-   **OFTEN!**
+
+The longer answer is that testing either before or after your software
+is written will improve your code, but testing after your program is
+used for something important is too late.
+
+If we have a robust set of tests, we can run them before adding
+something new and after adding something new. If the tests give the same
+results (as appropriate), we can have some assurance that we didn't
+wreak anything. The same idea applies to making changes in your system
+configuration, updating support codes, etc.
+
+Another important feature of testing is that it helps you remember what
+all the parts of your code do. If you are working on a large project
+over three years and you end up with 200 classes, it may be hard to
+remember what the widget class does in detail. If you have a test that
+checks all of the widget's functionality, you can look at the test to
+remember what it's supposed to do.
+
+# Who should test?
+
+In a collaborative coding environment, where many developers contribute
+to the same code base, developers should be responsible individually for
+testing the functions they create and collectively for testing the code
+as a whole.
+
+Professionals often test their code, and take pride in test coverage,
+the percent of their functions that they feel confident are
+comprehensively tested.
+
+# How are tests written?
+
+The type of tests that are written is determined by the testing
+framework you adopt. Don't worry, there are a lot of choices.
+
+## Types of Tests
+
+**Exceptions:** Exceptions can be thought of as type of runtime test.
+They alert the user to exceptional behavior in the code. Often,
+exceptions are related to functions that depend on input that is unknown
+at compile time. Checks that occur within the code to handle exceptional
+behavior that results from this type of input are called Exceptions.
+
+**Unit Tests:** Unit tests are a type of test which test the fundamental
+units of a program's functionality. Often, this is on the class or
+function level of detail. However what defines a *code unit* is not
+formally defined.
+
+To test functions and classes, the interfaces (API) - rather than the
+implementation - should be tested. Treating the implementation as a
+black box, we can probe the expected behavior with boundary cases for
+the inputs.
+
+**System Tests:** System level tests are intended to test the code as a
+whole. As opposed to unit tests, system tests ask for the behavior as a
+whole. This sort of testing involves comparison with other validated
+codes, analytical solutions, etc.
+
+**Regression Tests:** A regression test ensures that new code does
+change anything. If you change the default answer, for example, or add a
+new question, you'll need to make sure that missing entries are still
+found and fixed.
+
+**Integration Tests:** Integration tests query the ability of the code
+to integrate well with the system configuration and third party
+libraries and modules. This type of test is essential for codes that
+depend on libraries which might be updated independently of your code or
+when your code might be used by a number of users who may have various
+versions of libraries.
+
+**Test Suites:** Putting a series of unit tests into a collection of
+modules creates, a test suite. Typically the suite as a whole is
+executed (rather than each test individually) when verifying that the
+code base still functions after changes have been made.
+
+# Elements of a Test
+
+**Behavior:** The behavior you want to test. For example, you might want
+to test the fun() function.
+
+**Expected Result:** This might be a single number, a range of numbers,
+a new fully defined object, a system state, an exception, etc. When we
+run the fun() function, we expect to generate some fun. If we don't
+generate any fun, the fun() function should fail its test.
+Alternatively, if it does create some fun, the fun() function should
+pass this test. The the expected result should known *a priori*. For
+numerical functions, this is result is ideally analytically determined
+even if the function being tested isn't.
+
+**Assertions:** Require that some conditional be true. If the
+conditional is false, the test fails.
+
+**Fixtures:** Sometimes you have to do some legwork to create the
+objects that are necessary to run one or many tests. These objects are
+called fixtures as they are not really part of the test themselves but
+rather involve getting the computer into the appropriate state.
+
+For example, since fun varies a lot between people, the fun() function
+is a method of the Person class. In order to check the fun function,
+then, we need to create an appropriate Person object on which to run
+fun().
+
+**Setup and teardown:** Creating fixtures is often done in a call to a
+setup function. Deleting them and other cleanup is done in a teardown
+function.
+
+**The Big Picture:** Putting all this together, the testing algorithm is
+often:
+
+```python
+setup()
+test()
+teardown()
+```
+
+But, sometimes it's the case that your tests change the fixtures. If so,
+it's better for the setup() and teardown() functions to occur on either
+side of each test. In that case, the testing algorithm should be:
+
+```python
+setup()
+test1()
+teardown()
+
+setup()
+test2()
+teardown()
+
+setup()
+test3()
+teardown()
+```
+
+* * * * *
+
+# Nose: A Python Testing Framework
+
+The testing framework we'll discuss today is called nose. However, there
+are several other testing frameworks available in most language. Most
+notably there is [JUnit](http://www.junit.org/) in Java which can
+arguably attributed to inventing the testing framework.
+
+## Where do nose tests live?
+
+Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
+`test_`. Specifically, these satisfy the testMatch regular expression
+`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
+in the unittest.TestCase subclasses chat you create in your code. You
+can also create test functions which are not unittest.TestCase
+subclasses if they are named with the configured testMatch regular
+expression.)
+
+## Nose Test Syntax
+
+To write a nose test, we make assertions.
+
+```python
+assert should_be_true()
+assert not should_not_be_true()
+```
+
+Additionally, nose itself defines number of assert functions which can
+be used to test more specific aspects of the code base.
+
+```python
+from nose.tools import *
+
+assert_equal(a, b)
+assert_almost_equal(a, b)
+assert_true(a)
+assert_false(a)
+assert_raises(exception, func, *args, **kwargs)
+assert_is_instance(a, b)
+# and many more!
+```
+
+Moreover, numpy offers similar testing functions for arrays:
+
+```python
+from numpy.testing import *
+
+assert_array_equal(a, b)
+assert_array_almost_equal(a, b)
+# etc.
+```
+
+## Exercise: Writing tests for mean()
+
+There are a few tests for the mean() function that we listed in this
+lesson. What are some tests that should fail? Add at least three test
+cases to this set. Edit the `test_mean.py` file which tests the mean()
+function in `mean.py`.
+
+*Hint:* Think about what form your input could take and what you should
+do to handle it. Also, think about the type of the elements in the list.
+What should be done if you pass a list of integers? What if you pass a
+list of strings?
+
+**Example**:
+
+    nosetests test_mean.py
+
+# Test Driven Development
+
+Test driven development (TDD) is a philosophy whereby the developer
+creates code by **writing the tests first**. That is to say you write the
+tests *before* writing the associated code!
+
+This is an iterative process whereby you write a test then write the
+minimum amount code to make the test pass. If a new feature is needed,
+another test is written and the code is expanded to meet this new use
+case. This continues until the code does what is needed.
+
+TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
+who diligently follow TDD swear by its effectiveness. This development
+style was put forth most strongly by [Kent Beck in
+2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
+
+## A TDD Example
+
+Say you want to write a fib() function which generates values of the
+Fibonacci sequence of given indexes. You would - of course - start by
+writing the test, possibly testing a single value:
+
+```python
+from nose.tools import assert_equal
+
+from pisa import fib
+
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
+```
+
+You would *then* go ahead and write the actual function:
+
+```python
+def fib(n):
+    # you snarky so-and-so
+    return 1
+```
+
+And that is it right?! Well, not quite. This implementation fails for
+most other values. Adding tests we see that:
+
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
+
+
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
+
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
+```
+
+This extra test now requires that we bother to implement at least the
+initial values:
+
+```python
+def fib(n):
+    # a little better
+    if n == 0 or n == 1:
+        return n
+    return 1
+```
+
+However, this function still falls over for `2 < n`. Time for more
+tests!
+
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
+
+
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
+
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
+
+
+def test_fib3():
+    obs = fib(3)
+    exp = 2
+    assert_equal(obs, exp)
+
+    obs = fib(6)
+    exp = 8
+    assert_equal(obs, exp)
+```
+
+At this point, we had better go ahead and try do the right thing...
+
+```python
+def fib(n):
+    # finally, some math
+    if n == 0 or n == 1:
+        return n
+    else:
+        return fib(n - 1) + fib(n - 2)
+```
+
+Here it becomes very tempting to take an extended coffee break or
+possibly a power lunch. But then you remember those pesky negative
+numbers and floats. Perhaps the right thing to do here is to just be
+undefined.
+
+```python
+def test_fib1():
+    obs = fib(2)
+    exp = 1
+    assert_equal(obs, exp)
+
+
+def test_fib2():
+    obs = fib(0)
+    exp = 0
+    assert_equal(obs, exp)
+
+    obs = fib(1)
+    exp = 1
+    assert_equal(obs, exp)
+
+
+def test_fib3():
+    obs = fib(3)
+    exp = 2
+    assert_equal(obs, exp)
+
+    obs = fib(6)
+    exp = 8
+    assert_equal(obs, exp)
+
+
+def test_fib3():
+    obs = fib(13.37)
+    exp = NotImplemented
+    assert_equal(obs, exp)
+
+    obs = fib(-9000)
+    exp = NotImplemented
+    assert_equal(obs, exp)
+```
+
+This means that it is time to add the appropriate case to the function
+itself:
+
+```python
+def fib(n):
+    # sequence and you shall find
+    if n < 0 or int(n) != n:
+        return NotImplemented
+    elif n == 0 or n == 1:
+        return n
+    else:
+        return fib(n - 1) + fib(n - 2)
+```
+
+# Quality Assurance Exercise
+
+Can you think of other tests to make for the fibonacci function? I promise there 
+are at least two. 
+
+Implement one new test in test_fib.py, run nosetests, and if it fails, implement 
+a more robust function for that case.
+
+And thus - finally - we have a robust function together with working
+tests!
+
+# Exercise
+
+**The Problem:** In 2D or 3D, we have two points (p1 and p2) which
+define a line segment. Additionally there exists experimental data which
+can be anywhere in the domain. Find the data point which is closest to
+the line segment.
+
+In the `close_line.py` file there are four different implementations
+which all solve this problem. [You can read more about them
+here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
+there are no tests! Please write from scratch a `test_close_line.py`
+file which tests the closest\_data\_to\_line() functions.
+
+*Hint:* you can use one implementation function to test another. Below
+is some sample data to help you get started.
+
+![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
+> -
+
+```python
+import numpy as np
+
+p1 = np.array([0.0, 0.0])
+p2 = np.array([1.0, 1.0])
+data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
+```
+
diff --git a/python/testing/close_line.py b/python/testing/close_line.py

new file mode 100644 (file)

index 0000000..a6a5c5e
--- /dev/null
+++ b/python/testing/close_line.py
@@ -0,0 +1,69 @@
+import numpy as np
+from scipy.optimize import fmin
+
+#
+# Attempt 1
+#
+
+def point_on_line1(x, p1, p2):
+    y = p1[1] + (x - p1[0])*(p2[1] - p1[1]) / (p2[0] - p1[0])
+    return np.array([x, y])
+
+
+def dist_from_line1(x, pdata, p1, p2):
+    pline = point_on_line1(x, p1, p2)
+    return np.sqrt(np.sum((pline - pdata)**2))
+
+
+def closest_data_to_line1(data, p1, p2):
+    dists = np.empty(len(data), dtype=float)
+    for i, pdata in enumerate(data):
+        x = fmin(dist_from_line1, p1[0], (pdata, p1, p2), disp=False)[0]
+        dists[i] = dist_from_line1(x, pdata, p1, p2)
+    imin = np.argmin(dists)
+    return imin, data[imin]
+
+
+#
+# Attempt 2
+#
+
+def dist_from_line2(pdata, p1, p2):
+    a = np.sqrt(np.sum((p1 - pdata)**2))
+    b = np.sqrt(np.sum((p2 - pdata)**2))
+    c = np.sqrt(np.sum((p2 - p1)**2))
+    h = a * np.sqrt(1.0 - ((a**2 + c**2 - b**2) / (2.0 * a * c))**2)
+    return h
+
+def closest_data_to_line2(data, p1, p2):
+    dists = np.empty(len(data), dtype=float)
+    for i, pdata in enumerate(data):
+        dists[i] = dist_from_line2(pdata, p1, p2)
+    imin = np.argmin(dists)
+    return imin, data[imin]
+
+#
+# Attempt 3
+#
+
+def perimeter3(pdata, p1, p2):
+    a = np.sqrt(np.sum((p1 - pdata)**2))
+    b = np.sqrt(np.sum((p2 - pdata)**2))
+    c = np.sqrt(np.sum((p2 - p1)**2))
+    return (a + b + c)
+
+def closest_data_to_line3(data, p1, p2):
+    peris = np.empty(len(data), dtype=float)
+    for i, pdata in enumerate(data):
+        peris[i] = perimeter3(pdata, p1, p2)
+    imin = np.argmin(peris)
+    return imin, data[imin]
+
+#
+# Attempt 4
+#
+
+def closest_data_to_line4(data, p1, p2):
+    return data[np.argmin(np.sqrt(np.sum((p1 - data)**2, axis=1)) + \
+                np.sqrt(np.sum((p2 - data)**2, axis=1)))]
+
diff --git a/python/testing/evo_sol1.png b/python/testing/evo_sol1.png

new file mode 100644 (file)

index 0000000..cb131bf

Binary files /dev/null and b/python/testing/evo_sol1.png differ
diff --git a/python/testing/mean.py b/python/testing/mean.py

new file mode 100644 (file)

index 0000000..7b419d6
--- /dev/null
+++ b/python/testing/mean.py
@@ -0,0 +1,9 @@
+def mean(numlist):
+    try :
+        total = sum(numlist)
+        length = len(numlist)
+    except TypeError :
+        raise TypeError("The list was not numbers.")
+    except :
+        print "Something unknown happened with the list."
+    return total/length
diff --git a/python/testing/test_mean.py b/python/testing/test_mean.py

new file mode 100644 (file)

index 0000000..44c3b7c
--- /dev/null
+++ b/python/testing/test_mean.py
@@ -0,0 +1,26 @@
+from nose.tools import assert_equal, assert_almost_equal, assert_true, \
+    assert_false, assert_raises, assert_is_instance
+
+from mean import mean
+
+def test_mean1():
+    obs = mean([0, 0, 0, 0])
+    exp = 0
+    assert_equal(obs, exp)
+
+    obs = mean([0, 200])
+    exp = 100
+    assert_equal(obs, exp)
+
+    obs = mean([0, -200])
+    exp = -100
+    assert_equal(obs, exp)
+
+    obs = mean([0]) 
+    exp = 0
+    assert_equal(obs, exp)
+
+def test_floating_mean1():
+    obs = mean([1, 2])
+    exp = 1.5
+    assert_equal(obs, exp)
diff --git a/python/testing/test_prod.jpg b/python/testing/test_prod.jpg

new file mode 100644 (file)

index 0000000..6970075

Binary files /dev/null and b/python/testing/test_prod.jpg differ
diff --git a/testing/cheat-sheet.md b/testing/cheat-sheet.md

new file mode 100644 (file)

index 0000000..dbc03e0
--- /dev/null
+++ b/testing/cheat-sheet.md
@@ -0,0 +1,50 @@
+Testing Cheat Sheet
+===================
+
+Terminology
+-----------
+
+* A *unit test* acts on an isolated component within an application.
+* A *test fixture* is the input data that a test acts on.
+* The *interface* of a function is its public face, defined by its
+  input and output.
+* The *implementation* of a function is how it gets from the input to
+  the output.
+* A *stub* is a very simple implementation of one function that is
+  used in a test of a different function.
+
+Unit Testing
+------------
+
+* A *normal case* is a test case that reflects what is expected to be
+  typical usage of a function.
+* A *boundary case* is a test case that reflects a less typical but
+  potentially troublesome type of usage.
+
+Exceptions
+----------
+
+* raise
+* catch
+* define
+
+Assertions
+----------
+
+* syntax
+* stops execution
+
+Unittest
+--------
+
+* extending TestCase
+* assertions, e.g. self.assertEquals
+
+Nose
+----
+
+* invocation: nosetests
+* naming conventions: test_*
+* fixtures: setup
+* per-test fixtures with @with_setup decorator
+* assertions, e.g. assert_equal, assert_almost_equal...
diff --git a/testing/outline.md b/testing/outline.md

new file mode 100644 (file)

index 0000000..8a60783
--- /dev/null
+++ b/testing/outline.md
@@ -0,0 +1,7 @@
+Testing Outline
+===============
+
+* Testing: how to save time and write better code
+* Exceptions: how to handle errors and detect bugs
+* Unit tests: how to isolate components of your code and make sure they work
+* Floating-point numbers: how to test results that have limited precision
author	Ethan White <ethan@weecology.org>
	Sun, 14 Apr 2013 19:25:10 +0000 (12:25 -0700)
committer	W. Trevor King <wking@tremily.us>
	Fri, 1 Nov 2013 16:39:03 +0000 (09:39 -0700)
python/testing/Readme.md	[new file with mode: 0644]	patch \| blob
python/testing/close_line.py	[new file with mode: 0644]	patch \| blob
python/testing/evo_sol1.png	[new file with mode: 0644]	patch \| blob
python/testing/mean.py	[new file with mode: 0644]	patch \| blob
python/testing/test_mean.py	[new file with mode: 0644]	patch \| blob
python/testing/test_prod.jpg	[new file with mode: 0644]	patch \| blob
testing/cheat-sheet.md	[new file with mode: 0644]	patch \| blob
testing/outline.md	[new file with mode: 0644]	patch \| blob