testing/nose/instructor.md

   1 # Mean-calculation example
   2
   3 * Basic implementation: [mean.py][basic-mean]
   4 * Internal exception catching: [mean.py][exception-mean]
   5 * Embedded tests: [mean.py][embedded-test-mean]
   6 * Independent tests: [test_mean.py][test-mean]
   7
   8 # When should we test?
   9
  10 Short answers:
  11
  12 -   **ALWAYS!**
  13 -   **EARLY!**
  14 -   **OFTEN!**
  15
  16 Long answers:
  17
  18 * Definitely before you do something important with your software
  19   (e.g. publishing data generated by your program, launching a
  20   satellite that depends on your software, …).
  21 * Before and after adding something new, to avoid accidental breakage.
  22 * To help remember ([TDD][]: define) what your code actually does.
  23
  24 # Who should test?
  25
  26 * Write tests for the stuff you code, to convince your collaborators
  27   that it works.
  28 * Write tests for the stuff others code, to convince yourself that it
  29   works (and will continue to work).
  30
  31 Professionals often test their code, and take pride in test coverage,
  32 the percent of their functions that they feel confident are
  33 comprehensively tested.
  34
  35 # How are tests written?
  36
  37 The type of tests that are written is determined by the testing
  38 framework you adopt. Don't worry, there are a lot of choices.
  39
  40 ## Types of Tests
  41
  42 **Exceptions:** Exceptions can be thought of as type of runtime test.
  43 They alert the user to exceptional behavior in the code. Often,
  44 exceptions are related to functions that depend on input that is unknown
  45 at compile time. Checks that occur within the code to handle exceptional
  46 behavior that results from this type of input are called Exceptions.
  47
  48 **Unit Tests:** Unit tests are a type of test which test the fundamental
  49 units of a program's functionality. Often, this is on the class or
  50 function level of detail. However what defines a *code unit* is not
  51 formally defined.
  52
  53 To test functions and classes, the interfaces (API) - rather than the
  54 implementation - should be tested. Treating the implementation as a
  55 black box, we can probe the expected behavior with boundary cases for
  56 the inputs.
  57
  58 **System Tests:** System level tests are intended to test the code as a
  59 whole. As opposed to unit tests, system tests ask for the behavior as a
  60 whole. This sort of testing involves comparison with other validated
  61 codes, analytical solutions, etc.
  62
  63 **Regression Tests:** A regression test ensures that new code does
  64 change anything. If you change the default answer, for example, or add a
  65 new question, you'll need to make sure that missing entries are still
  66 found and fixed.
  67
  68 **Integration Tests:** Integration tests query the ability of the code
  69 to integrate well with the system configuration and third party
  70 libraries and modules. This type of test is essential for codes that
  71 depend on libraries which might be updated independently of your code or
  72 when your code might be used by a number of users who may have various
  73 versions of libraries.
  74
  75 **Test Suites:** Putting a series of unit tests into a collection of
  76 modules creates, a test suite. Typically the suite as a whole is
  77 executed (rather than each test individually) when verifying that the
  78 code base still functions after changes have been made.
  79
  80 # Elements of a Test
  81
  82 **Behavior:** The behavior you want to test. For example, you might want
  83 to test the fun() function.
  84
  85 **Expected Result:** This might be a single number, a range of numbers,
  86 a new fully defined object, a system state, an exception, etc. When we
  87 run the fun() function, we expect to generate some fun. If we don't
  88 generate any fun, the fun() function should fail its test.
  89 Alternatively, if it does create some fun, the fun() function should
  90 pass this test. The the expected result should known *a priori*. For
  91 numerical functions, this is result is ideally analytically determined
  92 even if the function being tested isn't.
  93
  94 **Assertions:** Require that some conditional be true. If the
  95 conditional is false, the test fails.
  96
  97 **Fixtures:** Sometimes you have to do some legwork to create the
  98 objects that are necessary to run one or many tests. These objects are
  99 called fixtures as they are not really part of the test themselves but
 100 rather involve getting the computer into the appropriate state.
 101
 102 For example, since fun varies a lot between people, the fun() function
 103 is a method of the Person class. In order to check the fun function,
 104 then, we need to create an appropriate Person object on which to run
 105 fun().
 106
 107 **Setup and teardown:** Creating fixtures is often done in a call to a
 108 setup function. Deleting them and other cleanup is done in a teardown
 109 function.
 110
 111 **The Big Picture:** Putting all this together, the testing algorithm is
 112 often:
 113
 114 ```python
 115 setup()
 116 test()
 117 teardown()
 118 ```
 119
 120 But, sometimes it's the case that your tests change the fixtures. If so,
 121 it's better for the setup() and teardown() functions to occur on either
 122 side of each test. In that case, the testing algorithm should be:
 123
 124 ```python
 125 setup()
 126 test1()
 127 teardown()
 128
 129 setup()
 130 test2()
 131 teardown()
 132
 133 setup()
 134 test3()
 135 teardown()
 136 ```
 137
 138 * * * * *
 139
 140 # Nose: A Python Testing Framework
 141
 142 The testing framework we'll discuss today is called nose. However, there
 143 are several other testing frameworks available in most language. Most
 144 notably there is [JUnit](http://www.junit.org/) in Java which can
 145 arguably attributed to inventing the testing framework.
 146
 147 ## Where do nose tests live?
 148
 149 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 150 `test_`. Specifically, these satisfy the testMatch regular expression
 151 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 152 in the unittest.TestCase subclasses chat you create in your code. You
 153 can also create test functions which are not unittest.TestCase
 154 subclasses if they are named with the configured testMatch regular
 155 expression.)
 156
 157 ## Nose Test Syntax
 158
 159 To write a nose test, we make assertions.
 160
 161 ```python
 162 assert should_be_true()
 163 assert not should_not_be_true()
 164 ```
 165
 166 Additionally, nose itself defines number of assert functions which can
 167 be used to test more specific aspects of the code base.
 168
 169 ```python
 170 from nose.tools import *
 171
 172 assert_equal(a, b)
 173 assert_almost_equal(a, b)
 174 assert_true(a)
 175 assert_false(a)
 176 assert_raises(exception, func, *args, **kwargs)
 177 assert_is_instance(a, b)
 178 # and many more!
 179 ```
 180
 181 Moreover, numpy offers similar testing functions for arrays:
 182
 183 ```python
 184 from numpy.testing import *
 185
 186 assert_array_equal(a, b)
 187 assert_array_almost_equal(a, b)
 188 # etc.
 189 ```
 190
 191 ## Exercise: Writing tests for mean()
 192
 193 There are a few tests for the mean() function that we listed in this
 194 lesson. What are some tests that should fail? Add at least three test
 195 cases to this set. Edit the `test_mean.py` file which tests the mean()
 196 function in `mean.py`.
 197
 198 *Hint:* Think about what form your input could take and what you should
 199 do to handle it. Also, think about the type of the elements in the list.
 200 What should be done if you pass a list of integers? What if you pass a
 201 list of strings?
 202
 203 **Example**:
 204
 205     nosetests test_mean.py
 206
 207 # Test Driven Development
 208
 209 Test driven development (TDD) is a philosophy whereby the developer
 210 creates code by **writing the tests first**. That is to say you write the
 211 tests *before* writing the associated code!
 212
 213 This is an iterative process whereby you write a test then write the
 214 minimum amount code to make the test pass. If a new feature is needed,
 215 another test is written and the code is expanded to meet this new use
 216 case. This continues until the code does what is needed.
 217
 218 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 219 who diligently follow TDD swear by its effectiveness. This development
 220 style was put forth most strongly by [Kent Beck in
 221 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 222
 223 For an example of TDD, see [the Fibonacci example][fibonacci].
 224
 225 # Quality Assurance Exercise
 226
 227 Can you think of other tests to make for the fibonacci function? I promise there
 228 are at least two.
 229
 230 Implement one new test in test_fib.py, run nosetests, and if it fails, implement
 231 a more robust function for that case.
 232
 233 And thus - finally - we have a robust function together with working
 234 tests!
 235
 236
 237 [basic-mean]: exercises/mean/basic/mean.py
 238 [exception-mean]: exercises/mean/exceptions/mean.py
 239 [embedded-test-mean]: exercises/embedded-tests/mean.py
 240 [test-mean]: exercises/test_mean.py
 241 [TDD]: http://en.wikipedia.org/wiki/Test-driven_development
 242 [fibonacci]: exercises/fibonacci