From: W. Trevor King Date: Thu, 27 Jun 2013 11:04:49 +0000 (-0400) Subject: testing/nose: Restructure to split out examples X-Git-Url: http://git.tremily.us/?p=swc-testing-nose.git;a=commitdiff_plain;h=bc46273316ff9c36091042a2bd9b035f4daa1e82 testing/nose: Restructure to split out examples --- diff --git a/testing/nose/README.md b/testing/nose/README.md new file mode 100644 index 0000000..b1e949a --- /dev/null +++ b/testing/nose/README.md @@ -0,0 +1,52 @@ +# Testing + +![image](media/test-in-production.jpg) + +# What is testing? + +Software testing is a process by which one or more expected behaviors +and results from a piece of software are exercised and confirmed. Well +chosen tests will confirm expected code behavior for the extreme +boundaries of the input domains, output ranges, parametric combinations, +and other behavioral **edge cases**. + +# Why test software? + +Unless you write flawless, bug-free, perfectly accurate, fully precise, +and predictable code **every time**, you must test your code in order to +trust it enough to answer in the affirmative to at least a few of the +following questions: + +- Does your code work? +- **Always?** +- Does it do what you think it does? ([Patriot Missile Failure][patriot]) +- Does it continue to work after changes are made? +- Does it continue to work after system configurations or libraries + are upgraded? +- Does it respond properly for a full range of input parameters? +- What's the limit on that input parameter? +- What about **edge or corner cases**? +- How will it affect your [publications][]? + +## Verification + +*Verification* is the process of asking, "Have we built the software +correctly?" That is, is the code bug free, precise, accurate, and +repeatable? + +## Validation + +*Validation* is the process of asking, "Have we built the right +software?" That is, is the code designed in such a way as to produce the +answers we are interested in, data we want, etc. + +## Uncertainty Quantification + +*Uncertainty quantification* is the process of asking, "Given that our +algorithm may not be deterministic, was our execution within acceptable +error bounds?" This is particularly important for anything which uses +random numbers, eg Monte Carlo methods. + + +[patriot]: http://www.ima.umn.edu/~arnold/disasters/patriot.html +[publications]: http://www.nature.com/news/2010/101013/full/467775a.html diff --git a/testing/nose/exercises/close-line/README.md b/testing/nose/exercises/close-line/README.md new file mode 100644 index 0000000..73af9bb --- /dev/null +++ b/testing/nose/exercises/close-line/README.md @@ -0,0 +1,18 @@ +**The Problem:** In 2D or 3D, we have two points (p1 and p2) which +define a line segment. Additionally there exists experimental data which +can be anywhere in the domain. Find the data point which is closest to +the line segment. + +In the [close_line.py][close-line] file there are four different +implementations which all solve this problem. [You can read more about +them here][evolution-of-a-solution]. However, there are no tests! +Please write from scratch a `test_close_line.py` file which tests the +`closest_data_to_line()` functions. + +*Hint:* you can use one implementation function to test another. Below +is some sample data to help you get started. + +![image](../../media/evolution-of-a-solution-1.png) + +[close-line]: close_line.py +[evolution-of-a-solution]: http://inscight.org/2012/03/31/evolution_of_a_solution/ diff --git a/testing/nose/exercises/close_line.py b/testing/nose/exercises/close-line/close_line.py similarity index 100% rename from testing/nose/exercises/close_line.py rename to testing/nose/exercises/close-line/close_line.py diff --git a/testing/nose/exercises/close-line/test_close_line.py b/testing/nose/exercises/close-line/test_close_line.py new file mode 100644 index 0000000..cdea143 --- /dev/null +++ b/testing/nose/exercises/close-line/test_close_line.py @@ -0,0 +1,7 @@ +import numpy as np + + +# generate some sample data +p1 = np.array([0.0, 0.0]) +p2 = np.array([1.0, 1.0]) +data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]]) diff --git a/testing/nose/exercises/fibonacci/1.1.one/test_fibonacci.py b/testing/nose/exercises/fibonacci/1.1.one/test_fibonacci.py new file mode 100644 index 0000000..774ab87 --- /dev/null +++ b/testing/nose/exercises/fibonacci/1.1.one/test_fibonacci.py @@ -0,0 +1,9 @@ +from nose.tools import assert_equal + +from fibonacci import fib + + +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) diff --git a/testing/nose/exercises/fibonacci/1.2.one/fibonacci.py b/testing/nose/exercises/fibonacci/1.2.one/fibonacci.py new file mode 100644 index 0000000..83679d0 --- /dev/null +++ b/testing/nose/exercises/fibonacci/1.2.one/fibonacci.py @@ -0,0 +1,3 @@ +def fib(n): + # you snarky so-and-so + return 1 diff --git a/testing/nose/exercises/fibonacci/1.2.one/test_fibonacci.py b/testing/nose/exercises/fibonacci/1.2.one/test_fibonacci.py new file mode 120000 index 0000000..63114e3 --- /dev/null +++ b/testing/nose/exercises/fibonacci/1.2.one/test_fibonacci.py @@ -0,0 +1 @@ +../1.1.one/test_fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/2.1.zero/fibonacci.py b/testing/nose/exercises/fibonacci/2.1.zero/fibonacci.py new file mode 120000 index 0000000..5180f22 --- /dev/null +++ b/testing/nose/exercises/fibonacci/2.1.zero/fibonacci.py @@ -0,0 +1 @@ +../1.2.one/fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/2.1.zero/test_fibonacci.py b/testing/nose/exercises/fibonacci/2.1.zero/test_fibonacci.py new file mode 100644 index 0000000..566edb6 --- /dev/null +++ b/testing/nose/exercises/fibonacci/2.1.zero/test_fibonacci.py @@ -0,0 +1,19 @@ +from nose.tools import assert_equal + +from fibonacci import fib + + +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) + + +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) + + obs = fib(1) + exp = 1 + assert_equal(obs, exp) diff --git a/testing/nose/exercises/fibonacci/2.2.zero/fibonacci.py b/testing/nose/exercises/fibonacci/2.2.zero/fibonacci.py new file mode 100644 index 0000000..39d8a5d --- /dev/null +++ b/testing/nose/exercises/fibonacci/2.2.zero/fibonacci.py @@ -0,0 +1,5 @@ +def fib(n): + # a little better + if n == 0 or n == 1: + return n + return 1 diff --git a/testing/nose/exercises/fibonacci/2.2.zero/test_fibonacci.py b/testing/nose/exercises/fibonacci/2.2.zero/test_fibonacci.py new file mode 120000 index 0000000..90d7475 --- /dev/null +++ b/testing/nose/exercises/fibonacci/2.2.zero/test_fibonacci.py @@ -0,0 +1 @@ +../2.1.zero/test_fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/3.1.natural/fibonacci.py b/testing/nose/exercises/fibonacci/3.1.natural/fibonacci.py new file mode 120000 index 0000000..1f9bfb4 --- /dev/null +++ b/testing/nose/exercises/fibonacci/3.1.natural/fibonacci.py @@ -0,0 +1 @@ +../2.2.zero/fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/3.1.natural/test_fibonacci.py b/testing/nose/exercises/fibonacci/3.1.natural/test_fibonacci.py new file mode 100644 index 0000000..c1e938e --- /dev/null +++ b/testing/nose/exercises/fibonacci/3.1.natural/test_fibonacci.py @@ -0,0 +1,29 @@ +from nose.tools import assert_equal + +from fibonacci import fib + + +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) + + +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) + + obs = fib(1) + exp = 1 + assert_equal(obs, exp) + + +def test_fib3(): + obs = fib(3) + exp = 2 + assert_equal(obs, exp) + + obs = fib(6) + exp = 8 + assert_equal(obs, exp) diff --git a/testing/nose/exercises/fibonacci/3.2.natural/fibonacci.py b/testing/nose/exercises/fibonacci/3.2.natural/fibonacci.py new file mode 100644 index 0000000..f47cb1c --- /dev/null +++ b/testing/nose/exercises/fibonacci/3.2.natural/fibonacci.py @@ -0,0 +1,6 @@ +def fib(n): + # finally, some math + if n == 0 or n == 1: + return n + else: + return fib(n - 1) + fib(n - 2) diff --git a/testing/nose/exercises/fibonacci/3.2.natural/test_fibonacci.py b/testing/nose/exercises/fibonacci/3.2.natural/test_fibonacci.py new file mode 120000 index 0000000..87b2e35 --- /dev/null +++ b/testing/nose/exercises/fibonacci/3.2.natural/test_fibonacci.py @@ -0,0 +1 @@ +../3.1.natural/test_fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/4.1.other/fibonacci.py b/testing/nose/exercises/fibonacci/4.1.other/fibonacci.py new file mode 120000 index 0000000..6dd99ac --- /dev/null +++ b/testing/nose/exercises/fibonacci/4.1.other/fibonacci.py @@ -0,0 +1 @@ +../3.2.natural/fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/4.1.other/test_fibonacci.py b/testing/nose/exercises/fibonacci/4.1.other/test_fibonacci.py new file mode 100644 index 0000000..318d5dc --- /dev/null +++ b/testing/nose/exercises/fibonacci/4.1.other/test_fibonacci.py @@ -0,0 +1,39 @@ +from nose.tools import assert_equal + +from fibonacci import fib + + +def test_fib1(): + obs = fib(2) + exp = 1 + assert_equal(obs, exp) + + +def test_fib2(): + obs = fib(0) + exp = 0 + assert_equal(obs, exp) + + obs = fib(1) + exp = 1 + assert_equal(obs, exp) + + +def test_fib3(): + obs = fib(3) + exp = 2 + assert_equal(obs, exp) + + obs = fib(6) + exp = 8 + assert_equal(obs, exp) + + +def test_fib3(): + obs = fib(13.37) + exp = NotImplemented + assert_equal(obs, exp) + + obs = fib(-9000) + exp = NotImplemented + assert_equal(obs, exp) diff --git a/testing/nose/exercises/fibonacci/4.2.other/fibonacci.py b/testing/nose/exercises/fibonacci/4.2.other/fibonacci.py new file mode 100644 index 0000000..0586344 --- /dev/null +++ b/testing/nose/exercises/fibonacci/4.2.other/fibonacci.py @@ -0,0 +1,8 @@ +def fib(n): + # sequence and you shall find + if n < 0 or int(n) != n: + return NotImplemented + elif n == 0 or n == 1: + return n + else: + return fib(n - 1) + fib(n - 2) diff --git a/testing/nose/exercises/fibonacci/4.2.other/test_fibonacci.py b/testing/nose/exercises/fibonacci/4.2.other/test_fibonacci.py new file mode 120000 index 0000000..2c1d890 --- /dev/null +++ b/testing/nose/exercises/fibonacci/4.2.other/test_fibonacci.py @@ -0,0 +1 @@ +../4.1.other/test_fibonacci.py \ No newline at end of file diff --git a/testing/nose/exercises/fibonacci/README.md b/testing/nose/exercises/fibonacci/README.md new file mode 100644 index 0000000..e0c31ee --- /dev/null +++ b/testing/nose/exercises/fibonacci/README.md @@ -0,0 +1,12 @@ +Test-driven development (TDD) of the Fibonacci series. In the first +part of each phase (1: one, 2: zero, 3: natural, 4: other), we extend +the tests to be more robust. Then we improve the implementation until +the tests pass. Repeat as needed ;). + + $ cd 1.1.one # add some tests + $ nosetests test_fibonacci.py # fails + $ cd ../1.2.one # add an implemenation + $ nosetests test_fibonacci.py # passes + $ cd ../2.1.zero # add better testing + $ nosetests test_fibonacci.py # fails + … diff --git a/testing/nose/exercises/mean/basic/mean.py b/testing/nose/exercises/mean/basic/mean.py new file mode 100644 index 0000000..56e1263 --- /dev/null +++ b/testing/nose/exercises/mean/basic/mean.py @@ -0,0 +1,4 @@ +def mean(numlist): + total = sum(numlist) + length = len(numlist) + return total/length diff --git a/testing/nose/exercises/mean/embedded-tests/mean.py b/testing/nose/exercises/mean/embedded-tests/mean.py new file mode 100644 index 0000000..a54b779 --- /dev/null +++ b/testing/nose/exercises/mean/embedded-tests/mean.py @@ -0,0 +1,20 @@ +def mean(numlist): + try: + total = sum(numlist) + length = len(numlist) + except TypeError: + raise TypeError("The number list was not a list of numbers.") + except: + print "There was a problem evaluating the number list." + return total/length + + +def test_mean(): + assert mean([0, 0, 0, 0]) == 0 + assert mean([0, 200]) == 100 + assert mean([0, -200]) == -100 + assert mean([0]) == 0 + + +def test_floating_mean(): + assert mean([1, 2]) == 1.5 diff --git a/testing/nose/exercises/mean.py b/testing/nose/exercises/mean/exceptions/mean.py similarity index 76% rename from testing/nose/exercises/mean.py rename to testing/nose/exercises/mean/exceptions/mean.py index 7b419d6..6046fbf 100644 --- a/testing/nose/exercises/mean.py +++ b/testing/nose/exercises/mean/exceptions/mean.py @@ -3,7 +3,7 @@ def mean(numlist): total = sum(numlist) length = len(numlist) except TypeError : - raise TypeError("The list was not numbers.") + raise TypeError("numlist was not a list of numbers.") except : print "Something unknown happened with the list." return total/length diff --git a/testing/nose/exercises/test_mean.py b/testing/nose/exercises/mean/test_mean.py similarity index 100% rename from testing/nose/exercises/test_mean.py rename to testing/nose/exercises/mean/test_mean.py diff --git a/testing/nose/instructor.md b/testing/nose/instructor.md new file mode 100644 index 0000000..1923ea8 --- /dev/null +++ b/testing/nose/instructor.md @@ -0,0 +1,242 @@ +# Mean-calculation example + +* Basic implementation: [mean.py][basic-mean] +* Internal exception catching: [mean.py][exception-mean] +* Embedded tests: [mean.py][embedded-test-mean] +* Independent tests: [test_mean.py][test-mean] + +# When should we test? + +Short answers: + +- **ALWAYS!** +- **EARLY!** +- **OFTEN!** + +Long answers: + +* Definitely before you do something important with your software + (e.g. publishing data generated by your program, launching a + satellite that depends on your software, …). +* Before and after adding something new, to avoid accidental breakage. +* To help remember ([TDD][]: define) what your code actually does. + +# Who should test? + +* Write tests for the stuff you code, to convince your collaborators + that it works. +* Write tests for the stuff others code, to convince yourself that it + works (and will continue to work). + +Professionals often test their code, and take pride in test coverage, +the percent of their functions that they feel confident are +comprehensively tested. + +# How are tests written? + +The type of tests that are written is determined by the testing +framework you adopt. Don't worry, there are a lot of choices. + +## Types of Tests + +**Exceptions:** Exceptions can be thought of as type of runtime test. +They alert the user to exceptional behavior in the code. Often, +exceptions are related to functions that depend on input that is unknown +at compile time. Checks that occur within the code to handle exceptional +behavior that results from this type of input are called Exceptions. + +**Unit Tests:** Unit tests are a type of test which test the fundamental +units of a program's functionality. Often, this is on the class or +function level of detail. However what defines a *code unit* is not +formally defined. + +To test functions and classes, the interfaces (API) - rather than the +implementation - should be tested. Treating the implementation as a +black box, we can probe the expected behavior with boundary cases for +the inputs. + +**System Tests:** System level tests are intended to test the code as a +whole. As opposed to unit tests, system tests ask for the behavior as a +whole. This sort of testing involves comparison with other validated +codes, analytical solutions, etc. + +**Regression Tests:** A regression test ensures that new code does +change anything. If you change the default answer, for example, or add a +new question, you'll need to make sure that missing entries are still +found and fixed. + +**Integration Tests:** Integration tests query the ability of the code +to integrate well with the system configuration and third party +libraries and modules. This type of test is essential for codes that +depend on libraries which might be updated independently of your code or +when your code might be used by a number of users who may have various +versions of libraries. + +**Test Suites:** Putting a series of unit tests into a collection of +modules creates, a test suite. Typically the suite as a whole is +executed (rather than each test individually) when verifying that the +code base still functions after changes have been made. + +# Elements of a Test + +**Behavior:** The behavior you want to test. For example, you might want +to test the fun() function. + +**Expected Result:** This might be a single number, a range of numbers, +a new fully defined object, a system state, an exception, etc. When we +run the fun() function, we expect to generate some fun. If we don't +generate any fun, the fun() function should fail its test. +Alternatively, if it does create some fun, the fun() function should +pass this test. The the expected result should known *a priori*. For +numerical functions, this is result is ideally analytically determined +even if the function being tested isn't. + +**Assertions:** Require that some conditional be true. If the +conditional is false, the test fails. + +**Fixtures:** Sometimes you have to do some legwork to create the +objects that are necessary to run one or many tests. These objects are +called fixtures as they are not really part of the test themselves but +rather involve getting the computer into the appropriate state. + +For example, since fun varies a lot between people, the fun() function +is a method of the Person class. In order to check the fun function, +then, we need to create an appropriate Person object on which to run +fun(). + +**Setup and teardown:** Creating fixtures is often done in a call to a +setup function. Deleting them and other cleanup is done in a teardown +function. + +**The Big Picture:** Putting all this together, the testing algorithm is +often: + +```python +setup() +test() +teardown() +``` + +But, sometimes it's the case that your tests change the fixtures. If so, +it's better for the setup() and teardown() functions to occur on either +side of each test. In that case, the testing algorithm should be: + +```python +setup() +test1() +teardown() + +setup() +test2() +teardown() + +setup() +test3() +teardown() +``` + +* * * * * + +# Nose: A Python Testing Framework + +The testing framework we'll discuss today is called nose. However, there +are several other testing frameworks available in most language. Most +notably there is [JUnit](http://www.junit.org/) in Java which can +arguably attributed to inventing the testing framework. + +## Where do nose tests live? + +Nose tests are files that begin with `Test-`, `Test_`, `test-`, or +`test_`. Specifically, these satisfy the testMatch regular expression +`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them +in the unittest.TestCase subclasses chat you create in your code. You +can also create test functions which are not unittest.TestCase +subclasses if they are named with the configured testMatch regular +expression.) + +## Nose Test Syntax + +To write a nose test, we make assertions. + +```python +assert should_be_true() +assert not should_not_be_true() +``` + +Additionally, nose itself defines number of assert functions which can +be used to test more specific aspects of the code base. + +```python +from nose.tools import * + +assert_equal(a, b) +assert_almost_equal(a, b) +assert_true(a) +assert_false(a) +assert_raises(exception, func, *args, **kwargs) +assert_is_instance(a, b) +# and many more! +``` + +Moreover, numpy offers similar testing functions for arrays: + +```python +from numpy.testing import * + +assert_array_equal(a, b) +assert_array_almost_equal(a, b) +# etc. +``` + +## Exercise: Writing tests for mean() + +There are a few tests for the mean() function that we listed in this +lesson. What are some tests that should fail? Add at least three test +cases to this set. Edit the `test_mean.py` file which tests the mean() +function in `mean.py`. + +*Hint:* Think about what form your input could take and what you should +do to handle it. Also, think about the type of the elements in the list. +What should be done if you pass a list of integers? What if you pass a +list of strings? + +**Example**: + + nosetests test_mean.py + +# Test Driven Development + +Test driven development (TDD) is a philosophy whereby the developer +creates code by **writing the tests first**. That is to say you write the +tests *before* writing the associated code! + +This is an iterative process whereby you write a test then write the +minimum amount code to make the test pass. If a new feature is needed, +another test is written and the code is expanded to meet this new use +case. This continues until the code does what is needed. + +TDD operates on the YAGNI principle (You Ain't Gonna Need It). People +who diligently follow TDD swear by its effectiveness. This development +style was put forth most strongly by [Kent Beck in +2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530). + +For an example of TDD, see [the Fibonacci example][fibonacci]. + +# Quality Assurance Exercise + +Can you think of other tests to make for the fibonacci function? I promise there +are at least two. + +Implement one new test in test_fib.py, run nosetests, and if it fails, implement +a more robust function for that case. + +And thus - finally - we have a robust function together with working +tests! + + +[basic-mean]: exercises/mean/basic/mean.py +[exception-mean]: exercises/mean/exceptions/mean.py +[embedded-test-mean]: exercises/embedded-tests/mean.py +[test-mean]: exercises/test_mean.py +[TDD]: http://en.wikipedia.org/wiki/Test-driven_development +[fibonacci]: exercises/fibonacci