From: W. Trevor King Date: Thu, 27 Jun 2013 12:09:05 +0000 (-0400) Subject: testing: More restructuring for the Nose source X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;p=swc-testing-nose.git testing: More restructuring for the Nose source * Split the general-purpose testing discussion into the testing/ directory, keeping only nose stuff in testing/nose. * Pull example discussions into example-specific READMEs. * Condense testing/nose/instructor.md, and add links to upstream docs. --- diff --git a/testing/README.md b/testing/README.md new file mode 100644 index 0000000..b1e949a --- /dev/null +++ b/testing/README.md @@ -0,0 +1,52 @@ +# Testing + +![image](media/test-in-production.jpg) + +# What is testing? + +Software testing is a process by which one or more expected behaviors +and results from a piece of software are exercised and confirmed. Well +chosen tests will confirm expected code behavior for the extreme +boundaries of the input domains, output ranges, parametric combinations, +and other behavioral **edge cases**. + +# Why test software? + +Unless you write flawless, bug-free, perfectly accurate, fully precise, +and predictable code **every time**, you must test your code in order to +trust it enough to answer in the affirmative to at least a few of the +following questions: + +- Does your code work? +- **Always?** +- Does it do what you think it does? ([Patriot Missile Failure][patriot]) +- Does it continue to work after changes are made? +- Does it continue to work after system configurations or libraries + are upgraded? +- Does it respond properly for a full range of input parameters? +- What's the limit on that input parameter? +- What about **edge or corner cases**? +- How will it affect your [publications][]? + +## Verification + +*Verification* is the process of asking, "Have we built the software +correctly?" That is, is the code bug free, precise, accurate, and +repeatable? + +## Validation + +*Validation* is the process of asking, "Have we built the right +software?" That is, is the code designed in such a way as to produce the +answers we are interested in, data we want, etc. + +## Uncertainty Quantification + +*Uncertainty quantification* is the process of asking, "Given that our +algorithm may not be deterministic, was our execution within acceptable +error bounds?" This is particularly important for anything which uses +random numbers, eg Monte Carlo methods. + + +[patriot]: http://www.ima.umn.edu/~arnold/disasters/patriot.html +[publications]: http://www.nature.com/news/2010/101013/full/467775a.html diff --git a/testing/instructor.md b/testing/instructor.md new file mode 100644 index 0000000..60ad2dd --- /dev/null +++ b/testing/instructor.md @@ -0,0 +1,148 @@ +# When should we test? + +Short answers: + +- **ALWAYS!** +- **EARLY!** +- **OFTEN!** + +Long answers: + +* Definitely before you do something important with your software + (e.g. publishing data generated by your program, launching a + satellite that depends on your software, …). +* Before and after adding something new, to avoid accidental breakage. +* To help remember ([TDD][]: define) what your code actually does. + +# Who should test? + +* Write tests for the stuff you code, to convince your collaborators + that it works. +* Write tests for the stuff others code, to convince yourself that it + works (and will continue to work). + +Professionals often test their code, and take pride in test coverage, +the percent of their functions that they feel confident are +comprehensively tested. + +# How are tests written? + +The type of tests that are written is determined by the testing +framework you adopt. Don't worry, there are a lot of choices. + +## Types of Tests + +**Exceptions:** Exceptions can be thought of as type of runtime test. +They alert the user to exceptional behavior in the code. Often, +exceptions are related to functions that depend on input that is unknown +at compile time. Checks that occur within the code to handle exceptional +behavior that results from this type of input are called Exceptions. + +**Unit Tests:** Unit tests are a type of test which test the fundamental +units of a program's functionality. Often, this is on the class or +function level of detail. However what defines a *code unit* is not +formally defined. + +To test functions and classes, the interfaces (API) - rather than the +implementation - should be tested. Treating the implementation as a +black box, we can probe the expected behavior with boundary cases for +the inputs. + +**System Tests:** System level tests are intended to test the code as a +whole. As opposed to unit tests, system tests ask for the behavior as a +whole. This sort of testing involves comparison with other validated +codes, analytical solutions, etc. + +**Regression Tests:** A regression test ensures that new code does +change anything. If you change the default answer, for example, or add a +new question, you'll need to make sure that missing entries are still +found and fixed. + +**Integration Tests:** Integration tests query the ability of the code +to integrate well with the system configuration and third party +libraries and modules. This type of test is essential for codes that +depend on libraries which might be updated independently of your code or +when your code might be used by a number of users who may have various +versions of libraries. + +**Test Suites:** Putting a series of unit tests into a collection of +modules creates, a test suite. Typically the suite as a whole is +executed (rather than each test individually) when verifying that the +code base still functions after changes have been made. + +# Elements of a Test + +**Behavior:** The behavior you want to test. For example, you might want +to test the fun() function. + +**Expected Result:** This might be a single number, a range of numbers, +a new fully defined object, a system state, an exception, etc. When we +run the fun() function, we expect to generate some fun. If we don't +generate any fun, the fun() function should fail its test. +Alternatively, if it does create some fun, the fun() function should +pass this test. The the expected result should known *a priori*. For +numerical functions, this is result is ideally analytically determined +even if the function being tested isn't. + +**Assertions:** Require that some conditional be true. If the +conditional is false, the test fails. + +**Fixtures:** Sometimes you have to do some legwork to create the +objects that are necessary to run one or many tests. These objects are +called fixtures as they are not really part of the test themselves but +rather involve getting the computer into the appropriate state. + +For example, since fun varies a lot between people, the fun() function +is a method of the Person class. In order to check the fun function, +then, we need to create an appropriate Person object on which to run +fun(). + +**Setup and teardown:** Creating fixtures is often done in a call to a +setup function. Deleting them and other cleanup is done in a teardown +function. + +**The Big Picture:** Putting all this together, the testing algorithm is +often: + +```python +setup() +test() +teardown() +``` + +But, sometimes it's the case that your tests change the fixtures. If so, +it's better for the setup() and teardown() functions to occur on either +side of each test. In that case, the testing algorithm should be: + +```python +setup() +test1() +teardown() + +setup() +test2() +teardown() + +setup() +test3() +teardown() +``` + +# Test-Driven Development + +[Test-driven development][TDD] (TDD) is a philosophy whereby the +developer creates code by **writing the tests first**. That is to say +you write the tests *before* writing the associated code! + +This is an iterative process whereby you write a test then write the +minimum amount code to make the test pass. If a new feature is needed, +another test is written and the code is expanded to meet this new use +case. This continues until the code does what is needed. + +TDD operates on the YAGNI principle (You Ain't Gonna Need It). People +who diligently follow TDD swear by its effectiveness. This development +style was put forth most strongly by [Kent Beck in 2002][KB]. + + +[TDD]: http://en.wikipedia.org/wiki/Test-driven_development +[KB]: http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530 diff --git a/testing/nose/media/test-in-production.jpg b/testing/media/test-in-production.jpg similarity index 100% rename from testing/nose/media/test-in-production.jpg rename to testing/media/test-in-production.jpg diff --git a/testing/nose/README.md b/testing/nose/README.md index b1e949a..49596d9 100644 --- a/testing/nose/README.md +++ b/testing/nose/README.md @@ -1,52 +1,10 @@ -# Testing +# Nose: A Python Testing Framework -![image](media/test-in-production.jpg) +The testing framework we'll discuss today is called [nose][]. However, +there are several other testing frameworks available in most +language. Most notably there is [JUnit][] in Java which arguably +invented the testing framework. -# What is testing? -Software testing is a process by which one or more expected behaviors -and results from a piece of software are exercised and confirmed. Well -chosen tests will confirm expected code behavior for the extreme -boundaries of the input domains, output ranges, parametric combinations, -and other behavioral **edge cases**. - -# Why test software? - -Unless you write flawless, bug-free, perfectly accurate, fully precise, -and predictable code **every time**, you must test your code in order to -trust it enough to answer in the affirmative to at least a few of the -following questions: - -- Does your code work? -- **Always?** -- Does it do what you think it does? ([Patriot Missile Failure][patriot]) -- Does it continue to work after changes are made? -- Does it continue to work after system configurations or libraries - are upgraded? -- Does it respond properly for a full range of input parameters? -- What's the limit on that input parameter? -- What about **edge or corner cases**? -- How will it affect your [publications][]? - -## Verification - -*Verification* is the process of asking, "Have we built the software -correctly?" That is, is the code bug free, precise, accurate, and -repeatable? - -## Validation - -*Validation* is the process of asking, "Have we built the right -software?" That is, is the code designed in such a way as to produce the -answers we are interested in, data we want, etc. - -## Uncertainty Quantification - -*Uncertainty quantification* is the process of asking, "Given that our -algorithm may not be deterministic, was our execution within acceptable -error bounds?" This is particularly important for anything which uses -random numbers, eg Monte Carlo methods. - - -[patriot]: http://www.ima.umn.edu/~arnold/disasters/patriot.html -[publications]: http://www.nature.com/news/2010/101013/full/467775a.html +[nose]: https://nose.readthedocs.org/en/latest/ +[JUnit]: http://www.junit.org/ diff --git a/testing/nose/Readme.md b/testing/nose/Readme.md deleted file mode 100644 index d886cdd..0000000 --- a/testing/nose/Readme.md +++ /dev/null @@ -1,551 +0,0 @@ -# Testing - -* * * * * - -**Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony -Scopatz** - -![image](media/test-in-production.jpg) - -# What is testing? - -Software testing is a process by which one or more expected behaviors -and results from a piece of software are exercised and confirmed. Well -chosen tests will confirm expected code behavior for the extreme -boundaries of the input domains, output ranges, parametric combinations, -and other behavioral **edge cases**. - -# Why test software? - -Unless you write flawless, bug-free, perfectly accurate, fully precise, -and predictable code **every time**, you must test your code in order to -trust it enough to answer in the affirmative to at least a few of the -following questions: - -- Does your code work? -- **Always?** -- Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html)) -- Does it continue to work after changes are made? -- Does it continue to work after system configurations or libraries - are upgraded? -- Does it respond properly for a full range of input parameters? -- What about **edge or corner cases**? -- What's the limit on that input parameter? -- How will it affect your - [publications](http://www.nature.com/news/2010/101013/full/467775a.html)? - -## Verification - -*Verification* is the process of asking, "Have we built the software -correctly?" That is, is the code bug free, precise, accurate, and -repeatable? - -## Validation - -*Validation* is the process of asking, "Have we built the right -software?" That is, is the code designed in such a way as to produce the -answers we are interested in, data we want, etc. - -## Uncertainty Quantification - -*Uncertainty Quantification* is the process of asking, "Given that our -algorithm may not be deterministic, was our execution within acceptable -error bounds?" This is particularly important for anything which uses -random numbers, eg Monte Carlo methods. - -# Where are tests? - -Say we have an averaging function: - -```python -def mean(numlist): - total = sum(numlist) - length = len(numlist) - return total/length -``` - -Tests could be implemented as runtime **exceptions in the function**: - -```python -def mean(numlist): - try: - total = sum(numlist) - length = len(numlist) - except TypeError: - raise TypeError("The number list was not a list of numbers.") - except: - print "There was a problem evaluating the number list." - return total/length -``` - -Sometimes tests they are functions alongside the function definitions -they are testing. - -```python -def mean(numlist): - try: - total = sum(numlist) - length = len(numlist) - except TypeError: - raise TypeError("The number list was not a list of numbers.") - except: - print "There was a problem evaluating the number list." - return total/length - - -def test_mean(): - assert mean([0, 0, 0, 0]) == 0 - assert mean([0, 200]) == 100 - assert mean([0, -200]) == -100 - assert mean([0]) == 0 - - -def test_floating_mean(): - assert mean([1, 2]) == 1.5 -``` - -Sometimes they are in an executable independent of the main executable. - -```python -def mean(numlist): - try: - total = sum(numlist) - length = len(numlist) - except TypeError: - raise TypeError("The number list was not a list of numbers.") - except: - print "There was a problem evaluating the number list." - return total/length -``` - -Where, in a different file exists a test module: - -```python -import mean - -def test_mean(): - assert mean([0, 0, 0, 0]) == 0 - assert mean([0, 200]) == 100 - assert mean([0, -200]) == -100 - assert mean([0]) == 0 - - -def test_floating_mean(): - assert mean([1, 2]) == 1.5 -``` - -# When should we test? - -The three right answers are: - -- **ALWAYS!** -- **EARLY!** -- **OFTEN!** - -The longer answer is that testing either before or after your software -is written will improve your code, but testing after your program is -used for something important is too late. - -If we have a robust set of tests, we can run them before adding -something new and after adding something new. If the tests give the same -results (as appropriate), we can have some assurance that we didn't -wreak anything. The same idea applies to making changes in your system -configuration, updating support codes, etc. - -Another important feature of testing is that it helps you remember what -all the parts of your code do. If you are working on a large project -over three years and you end up with 200 classes, it may be hard to -remember what the widget class does in detail. If you have a test that -checks all of the widget's functionality, you can look at the test to -remember what it's supposed to do. - -# Who should test? - -In a collaborative coding environment, where many developers contribute -to the same code base, developers should be responsible individually for -testing the functions they create and collectively for testing the code -as a whole. - -Professionals often test their code, and take pride in test coverage, -the percent of their functions that they feel confident are -comprehensively tested. - -# How are tests written? - -The type of tests that are written is determined by the testing -framework you adopt. Don't worry, there are a lot of choices. - -## Types of Tests - -**Exceptions:** Exceptions can be thought of as type of runtime test. -They alert the user to exceptional behavior in the code. Often, -exceptions are related to functions that depend on input that is unknown -at compile time. Checks that occur within the code to handle exceptional -behavior that results from this type of input are called Exceptions. - -**Unit Tests:** Unit tests are a type of test which test the fundamental -units of a program's functionality. Often, this is on the class or -function level of detail. However what defines a *code unit* is not -formally defined. - -To test functions and classes, the interfaces (API) - rather than the -implementation - should be tested. Treating the implementation as a -black box, we can probe the expected behavior with boundary cases for -the inputs. - -**System Tests:** System level tests are intended to test the code as a -whole. As opposed to unit tests, system tests ask for the behavior as a -whole. This sort of testing involves comparison with other validated -codes, analytical solutions, etc. - -**Regression Tests:** A regression test ensures that new code does -change anything. If you change the default answer, for example, or add a -new question, you'll need to make sure that missing entries are still -found and fixed. - -**Integration Tests:** Integration tests query the ability of the code -to integrate well with the system configuration and third party -libraries and modules. This type of test is essential for codes that -depend on libraries which might be updated independently of your code or -when your code might be used by a number of users who may have various -versions of libraries. - -**Test Suites:** Putting a series of unit tests into a collection of -modules creates, a test suite. Typically the suite as a whole is -executed (rather than each test individually) when verifying that the -code base still functions after changes have been made. - -# Elements of a Test - -**Behavior:** The behavior you want to test. For example, you might want -to test the fun() function. - -**Expected Result:** This might be a single number, a range of numbers, -a new fully defined object, a system state, an exception, etc. When we -run the fun() function, we expect to generate some fun. If we don't -generate any fun, the fun() function should fail its test. -Alternatively, if it does create some fun, the fun() function should -pass this test. The the expected result should known *a priori*. For -numerical functions, this is result is ideally analytically determined -even if the function being tested isn't. - -**Assertions:** Require that some conditional be true. If the -conditional is false, the test fails. - -**Fixtures:** Sometimes you have to do some legwork to create the -objects that are necessary to run one or many tests. These objects are -called fixtures as they are not really part of the test themselves but -rather involve getting the computer into the appropriate state. - -For example, since fun varies a lot between people, the fun() function -is a method of the Person class. In order to check the fun function, -then, we need to create an appropriate Person object on which to run -fun(). - -**Setup and teardown:** Creating fixtures is often done in a call to a -setup function. Deleting them and other cleanup is done in a teardown -function. - -**The Big Picture:** Putting all this together, the testing algorithm is -often: - -```python -setup() -test() -teardown() -``` - -But, sometimes it's the case that your tests change the fixtures. If so, -it's better for the setup() and teardown() functions to occur on either -side of each test. In that case, the testing algorithm should be: - -```python -setup() -test1() -teardown() - -setup() -test2() -teardown() - -setup() -test3() -teardown() -``` - -* * * * * - -# Nose: A Python Testing Framework - -The testing framework we'll discuss today is called nose. However, there -are several other testing frameworks available in most language. Most -notably there is [JUnit](http://www.junit.org/) in Java which can -arguably attributed to inventing the testing framework. - -## Where do nose tests live? - -Nose tests are files that begin with `Test-`, `Test_`, `test-`, or -`test_`. Specifically, these satisfy the testMatch regular expression -`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them -in the unittest.TestCase subclasses chat you create in your code. You -can also create test functions which are not unittest.TestCase -subclasses if they are named with the configured testMatch regular -expression.) - -## Nose Test Syntax - -To write a nose test, we make assertions. - -```python -assert should_be_true() -assert not should_not_be_true() -``` - -Additionally, nose itself defines number of assert functions which can -be used to test more specific aspects of the code base. - -```python -from nose.tools import * - -assert_equal(a, b) -assert_almost_equal(a, b) -assert_true(a) -assert_false(a) -assert_raises(exception, func, *args, **kwargs) -assert_is_instance(a, b) -# and many more! -``` - -Moreover, numpy offers similar testing functions for arrays: - -```python -from numpy.testing import * - -assert_array_equal(a, b) -assert_array_almost_equal(a, b) -# etc. -``` - -## Exercise: Writing tests for mean() - -There are a few tests for the mean() function that we listed in this -lesson. What are some tests that should fail? Add at least three test -cases to this set. Edit the `test_mean.py` file which tests the mean() -function in `mean.py`. - -*Hint:* Think about what form your input could take and what you should -do to handle it. Also, think about the type of the elements in the list. -What should be done if you pass a list of integers? What if you pass a -list of strings? - -**Example**: - - nosetests test_mean.py - -# Test Driven Development - -Test driven development (TDD) is a philosophy whereby the developer -creates code by **writing the tests first**. That is to say you write the -tests *before* writing the associated code! - -This is an iterative process whereby you write a test then write the -minimum amount code to make the test pass. If a new feature is needed, -another test is written and the code is expanded to meet this new use -case. This continues until the code does what is needed. - -TDD operates on the YAGNI principle (You Ain't Gonna Need It). People -who diligently follow TDD swear by its effectiveness. This development -style was put forth most strongly by [Kent Beck in -2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530). - -## A TDD Example - -Say you want to write a fib() function which generates values of the -Fibonacci sequence of given indexes. You would - of course - start by -writing the test, possibly testing a single value: - -```python -from nose.tools import assert_equal - -from pisa import fib - -def test_fib1(): - obs = fib(2) - exp = 1 - assert_equal(obs, exp) -``` - -You would *then* go ahead and write the actual function: - -```python -def fib(n): - # you snarky so-and-so - return 1 -``` - -And that is it right?! Well, not quite. This implementation fails for -most other values. Adding tests we see that: - -```python -def test_fib1(): - obs = fib(2) - exp = 1 - assert_equal(obs, exp) - - -def test_fib2(): - obs = fib(0) - exp = 0 - assert_equal(obs, exp) - - obs = fib(1) - exp = 1 - assert_equal(obs, exp) -``` - -This extra test now requires that we bother to implement at least the -initial values: - -```python -def fib(n): - # a little better - if n == 0 or n == 1: - return n - return 1 -``` - -However, this function still falls over for `2 < n`. Time for more -tests! - -```python -def test_fib1(): - obs = fib(2) - exp = 1 - assert_equal(obs, exp) - - -def test_fib2(): - obs = fib(0) - exp = 0 - assert_equal(obs, exp) - - obs = fib(1) - exp = 1 - assert_equal(obs, exp) - - -def test_fib3(): - obs = fib(3) - exp = 2 - assert_equal(obs, exp) - - obs = fib(6) - exp = 8 - assert_equal(obs, exp) -``` - -At this point, we had better go ahead and try do the right thing... - -```python -def fib(n): - # finally, some math - if n == 0 or n == 1: - return n - else: - return fib(n - 1) + fib(n - 2) -``` - -Here it becomes very tempting to take an extended coffee break or -possibly a power lunch. But then you remember those pesky negative -numbers and floats. Perhaps the right thing to do here is to just be -undefined. - -```python -def test_fib1(): - obs = fib(2) - exp = 1 - assert_equal(obs, exp) - - -def test_fib2(): - obs = fib(0) - exp = 0 - assert_equal(obs, exp) - - obs = fib(1) - exp = 1 - assert_equal(obs, exp) - - -def test_fib3(): - obs = fib(3) - exp = 2 - assert_equal(obs, exp) - - obs = fib(6) - exp = 8 - assert_equal(obs, exp) - - -def test_fib3(): - obs = fib(13.37) - exp = NotImplemented - assert_equal(obs, exp) - - obs = fib(-9000) - exp = NotImplemented - assert_equal(obs, exp) -``` - -This means that it is time to add the appropriate case to the function -itself: - -```python -def fib(n): - # sequence and you shall find - if n < 0 or int(n) != n: - return NotImplemented - elif n == 0 or n == 1: - return n - else: - return fib(n - 1) + fib(n - 2) -``` - -# Quality Assurance Exercise - -Can you think of other tests to make for the fibonacci function? I promise there -are at least two. - -Implement one new test in test_fib.py, run nosetests, and if it fails, implement -a more robust function for that case. - -And thus - finally - we have a robust function together with working -tests! - -# Exercise - -**The Problem:** In 2D or 3D, we have two points (p1 and p2) which -define a line segment. Additionally there exists experimental data which -can be anywhere in the domain. Find the data point which is closest to -the line segment. - -In the `close_line.py` file there are four different implementations -which all solve this problem. [You can read more about them -here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However, -there are no tests! Please write from scratch a `test_close_line.py` -file which tests the closest\_data\_to\_line() functions. - -*Hint:* you can use one implementation function to test another. Below -is some sample data to help you get started. - -![image](media/evolution-of-a-solution-1.png) -> - - -```python -import numpy as np - -p1 = np.array([0.0, 0.0]) -p2 = np.array([1.0, 1.0]) -data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]]) -``` - diff --git a/testing/nose/exercises/mean/README.md b/testing/nose/exercises/mean/README.md new file mode 100644 index 0000000..0dd83cf --- /dev/null +++ b/testing/nose/exercises/mean/README.md @@ -0,0 +1,16 @@ +# Writing tests for `mean()` + +There are a few tests for the `mean()` function. What are some tests +that should fail? Add at least three test cases to this set. Edit +the `test_mean.py` file which tests the `mean()` function in +`mean.py`. + +*Hint:* Think about what form your input could take and what you should +do to handle it. Also, think about the type of the elements in the list. +What should be done if you pass a list of integers? What if you pass a +list of strings? + + You can test a particular implementation by using `PYTHONPATH`: + + $ PYTHONPATH=basic nosetests test_mean.py + $ PYTHONPATH=exceptions nosetests test_mean.py diff --git a/testing/nose/exercises/mean/test_mean.py b/testing/nose/exercises/mean/test_mean.py index 44c3b7c..08dd251 100644 --- a/testing/nose/exercises/mean/test_mean.py +++ b/testing/nose/exercises/mean/test_mean.py @@ -3,6 +3,7 @@ from nose.tools import assert_equal, assert_almost_equal, assert_true, \ from mean import mean + def test_mean1(): obs = mean([0, 0, 0, 0]) exp = 0 diff --git a/testing/nose/instructor.md b/testing/nose/instructor.md index 1923ea8..a5068ff 100644 --- a/testing/nose/instructor.md +++ b/testing/nose/instructor.md @@ -1,242 +1,59 @@ -# Mean-calculation example +# Test locations -* Basic implementation: [mean.py][basic-mean] -* Internal exception catching: [mean.py][exception-mean] -* Embedded tests: [mean.py][embedded-test-mean] -* Independent tests: [test_mean.py][test-mean] - -# When should we test? - -Short answers: - -- **ALWAYS!** -- **EARLY!** -- **OFTEN!** - -Long answers: - -* Definitely before you do something important with your software - (e.g. publishing data generated by your program, launching a - satellite that depends on your software, …). -* Before and after adding something new, to avoid accidental breakage. -* To help remember ([TDD][]: define) what your code actually does. - -# Who should test? - -* Write tests for the stuff you code, to convince your collaborators - that it works. -* Write tests for the stuff others code, to convince yourself that it - works (and will continue to work). - -Professionals often test their code, and take pride in test coverage, -the percent of their functions that they feel confident are -comprehensively tested. - -# How are tests written? - -The type of tests that are written is determined by the testing -framework you adopt. Don't worry, there are a lot of choices. - -## Types of Tests - -**Exceptions:** Exceptions can be thought of as type of runtime test. -They alert the user to exceptional behavior in the code. Often, -exceptions are related to functions that depend on input that is unknown -at compile time. Checks that occur within the code to handle exceptional -behavior that results from this type of input are called Exceptions. - -**Unit Tests:** Unit tests are a type of test which test the fundamental -units of a program's functionality. Often, this is on the class or -function level of detail. However what defines a *code unit* is not -formally defined. - -To test functions and classes, the interfaces (API) - rather than the -implementation - should be tested. Treating the implementation as a -black box, we can probe the expected behavior with boundary cases for -the inputs. - -**System Tests:** System level tests are intended to test the code as a -whole. As opposed to unit tests, system tests ask for the behavior as a -whole. This sort of testing involves comparison with other validated -codes, analytical solutions, etc. - -**Regression Tests:** A regression test ensures that new code does -change anything. If you change the default answer, for example, or add a -new question, you'll need to make sure that missing entries are still -found and fixed. - -**Integration Tests:** Integration tests query the ability of the code -to integrate well with the system configuration and third party -libraries and modules. This type of test is essential for codes that -depend on libraries which might be updated independently of your code or -when your code might be used by a number of users who may have various -versions of libraries. - -**Test Suites:** Putting a series of unit tests into a collection of -modules creates, a test suite. Typically the suite as a whole is -executed (rather than each test individually) when verifying that the -code base still functions after changes have been made. - -# Elements of a Test - -**Behavior:** The behavior you want to test. For example, you might want -to test the fun() function. - -**Expected Result:** This might be a single number, a range of numbers, -a new fully defined object, a system state, an exception, etc. When we -run the fun() function, we expect to generate some fun. If we don't -generate any fun, the fun() function should fail its test. -Alternatively, if it does create some fun, the fun() function should -pass this test. The the expected result should known *a priori*. For -numerical functions, this is result is ideally analytically determined -even if the function being tested isn't. - -**Assertions:** Require that some conditional be true. If the -conditional is false, the test fails. - -**Fixtures:** Sometimes you have to do some legwork to create the -objects that are necessary to run one or many tests. These objects are -called fixtures as they are not really part of the test themselves but -rather involve getting the computer into the appropriate state. - -For example, since fun varies a lot between people, the fun() function -is a method of the Person class. In order to check the fun function, -then, we need to create an appropriate Person object on which to run -fun(). - -**Setup and teardown:** Creating fixtures is often done in a call to a -setup function. Deleting them and other cleanup is done in a teardown -function. - -**The Big Picture:** Putting all this together, the testing algorithm is -often: - -```python -setup() -test() -teardown() -``` - -But, sometimes it's the case that your tests change the fixtures. If so, -it's better for the setup() and teardown() functions to occur on either -side of each test. In that case, the testing algorithm should be: - -```python -setup() -test1() -teardown() - -setup() -test2() -teardown() - -setup() -test3() -teardown() -``` +Nose [looks in the usual places][finding-tests]. -* * * * * +* Nose tests live in files matching `[Tt]est[-_]` +* Nose can find `unittest.TestCase` subclasses. +* Nose also finds functions matching the `testMatch` regular + expression. -# Nose: A Python Testing Framework +# Test syntax -The testing framework we'll discuss today is called nose. However, there -are several other testing frameworks available in most language. Most -notably there is [JUnit](http://www.junit.org/) in Java which can -arguably attributed to inventing the testing framework. - -## Where do nose tests live? - -Nose tests are files that begin with `Test-`, `Test_`, `test-`, or -`test_`. Specifically, these satisfy the testMatch regular expression -`[Tt]est[-_]`. (You can also teach nose to find tests by declaring them -in the unittest.TestCase subclasses chat you create in your code. You -can also create test functions which are not unittest.TestCase -subclasses if they are named with the configured testMatch regular -expression.) - -## Nose Test Syntax - -To write a nose test, we make assertions. +Nose tests use assertions ```python assert should_be_true() assert not should_not_be_true() ``` -Additionally, nose itself defines number of assert functions which can -be used to test more specific aspects of the code base. - -```python -from nose.tools import * - -assert_equal(a, b) -assert_almost_equal(a, b) -assert_true(a) -assert_false(a) -assert_raises(exception, func, *args, **kwargs) -assert_is_instance(a, b) -# and many more! -``` - -Moreover, numpy offers similar testing functions for arrays: - -```python -from numpy.testing import * - -assert_array_equal(a, b) -assert_array_almost_equal(a, b) -# etc. -``` - -## Exercise: Writing tests for mean() - -There are a few tests for the mean() function that we listed in this -lesson. What are some tests that should fail? Add at least three test -cases to this set. Edit the `test_mean.py` file which tests the mean() -function in `mean.py`. - -*Hint:* Think about what form your input could take and what you should -do to handle it. Also, think about the type of the elements in the list. -What should be done if you pass a list of integers? What if you pass a -list of strings? +There are lots of assertion helpers in `nose.tools` +([docs][nose-assertions]), which [exports][nose-assertion-export] the +[unittest assertions][unittest-assertions] in [PEP 8][pep8] syntax +(`assert_equal` rather than `assertEqual`). There are more assertion +helpers in `numpy.testing` ([docs][NumPy-assertions]) for arrays and +numerical comparisons. -**Example**: +# Basic nose - nosetests test_mean.py +Writing tests for `mean()`: -# Test Driven Development - -Test driven development (TDD) is a philosophy whereby the developer -creates code by **writing the tests first**. That is to say you write the -tests *before* writing the associated code! - -This is an iterative process whereby you write a test then write the -minimum amount code to make the test pass. If a new feature is needed, -another test is written and the code is expanded to meet this new use -case. This continues until the code does what is needed. - -TDD operates on the YAGNI principle (You Ain't Gonna Need It). People -who diligently follow TDD swear by its effectiveness. This development -style was put forth most strongly by [Kent Beck in -2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530). +* Basic implementation: [mean.py][basic-mean] +* Internal exception catching: [mean.py][exception-mean] +* Embedded tests: [mean.py][embedded-test-mean] +* Independent tests: [test_mean.py][test-mean] -For an example of TDD, see [the Fibonacci example][fibonacci]. +# Test-driven development -# Quality Assurance Exercise +We have [a Fibonacci example][fibonacci] with a series of increasingly +detailed tests and implementations. -Can you think of other tests to make for the fibonacci function? I promise there -are at least two. +# Quality assurance -Implement one new test in test_fib.py, run nosetests, and if it fails, implement -a more robust function for that case. +Can you think of other tests to make for the Fibonacci function? I +promise there are at least two. -And thus - finally - we have a robust function together with working -tests! +Implement one new test in `test_fibonacci.py`, run `nosetests`, and if +it fails, implement a more robust function for that case. +[finding-tests]: https://nose.readthedocs.org/en/latest/finding_tests.html +[nose-assertions]: https://nose.readthedocs.org/en/latest/testing_tools.html#testing-tools +[nose-assertion-export]: https://github.com/nose-devs/nose/blob/master/nose/tools/trivial.py#L33 +[unittest-assertions]: http://docs.python.org/2/library/unittest.html#assert-methods +[pep8]: http://www.python.org/dev/peps/pep-0008/ +[NumPy-assertions]: http://docs.scipy.org/doc/numpy/reference/routines.testing.html#asserts [basic-mean]: exercises/mean/basic/mean.py [exception-mean]: exercises/mean/exceptions/mean.py [embedded-test-mean]: exercises/embedded-tests/mean.py [test-mean]: exercises/test_mean.py -[TDD]: http://en.wikipedia.org/wiki/Test-driven_development [fibonacci]: exercises/fibonacci