testing/nose/Readme.md

   1 # Testing
   2
   3 * * * * *
   4
   5 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
   6 Scopatz**
   7
   8 ![image](media/test-in-production.jpg)
   9
  10 # What is testing?
  11
  12 Software testing is a process by which one or more expected behaviors
  13 and results from a piece of software are exercised and confirmed. Well
  14 chosen tests will confirm expected code behavior for the extreme
  15 boundaries of the input domains, output ranges, parametric combinations,
  16 and other behavioral **edge cases**.
  17
  18 # Why test software?
  19
  20 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  21 and predictable code **every time**, you must test your code in order to
  22 trust it enough to answer in the affirmative to at least a few of the
  23 following questions:
  24
  25 -   Does your code work?
  26 -   **Always?**
  27 -   Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
  28 -   Does it continue to work after changes are made?
  29 -   Does it continue to work after system configurations or libraries
  30     are upgraded?
  31 -   Does it respond properly for a full range of input parameters?
  32 -   What about **edge or corner cases**?
  33 -   What's the limit on that input parameter?
  34 -   How will it affect your
  35     [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
  36
  37 ## Verification
  38
  39 *Verification* is the process of asking, "Have we built the software
  40 correctly?" That is, is the code bug free, precise, accurate, and
  41 repeatable?
  42
  43 ## Validation
  44
  45 *Validation* is the process of asking, "Have we built the right
  46 software?" That is, is the code designed in such a way as to produce the
  47 answers we are interested in, data we want, etc.
  48
  49 ## Uncertainty Quantification
  50
  51 *Uncertainty Quantification* is the process of asking, "Given that our
  52 algorithm may not be deterministic, was our execution within acceptable
  53 error bounds?" This is particularly important for anything which uses
  54 random numbers, eg Monte Carlo methods.
  55
  56 # Where are tests?
  57
  58 Say we have an averaging function:
  59
  60 ```python
  61 def mean(numlist):
  62     total = sum(numlist)
  63     length = len(numlist)
  64     return total/length
  65 ```
  66
  67 Tests could be implemented as runtime **exceptions in the function**:
  68
  69 ```python
  70 def mean(numlist):
  71     try:
  72         total = sum(numlist)
  73         length = len(numlist)
  74     except TypeError:
  75         raise TypeError("The number list was not a list of numbers.")
  76     except:
  77         print "There was a problem evaluating the number list."
  78     return total/length
  79 ```
  80
  81 Sometimes tests they are functions alongside the function definitions
  82 they are testing.
  83
  84 ```python
  85 def mean(numlist):
  86     try:
  87         total = sum(numlist)
  88         length = len(numlist)
  89     except TypeError:
  90         raise TypeError("The number list was not a list of numbers.")
  91     except:
  92         print "There was a problem evaluating the number list."
  93     return total/length
  94
  95
  96 def test_mean():
  97     assert mean([0, 0, 0, 0]) == 0
  98     assert mean([0, 200]) == 100
  99     assert mean([0, -200]) == -100
 100     assert mean([0]) == 0
 101
 102
 103 def test_floating_mean():
 104     assert mean([1, 2]) == 1.5
 105 ```
 106
 107 Sometimes they are in an executable independent of the main executable.
 108
 109 ```python
 110 def mean(numlist):
 111     try:
 112         total = sum(numlist)
 113         length = len(numlist)
 114     except TypeError:
 115         raise TypeError("The number list was not a list of numbers.")
 116     except:
 117         print "There was a problem evaluating the number list."
 118     return total/length
 119 ```
 120
 121 Where, in a different file exists a test module:
 122
 123 ```python
 124 import mean
 125
 126 def test_mean():
 127     assert mean([0, 0, 0, 0]) == 0
 128     assert mean([0, 200]) == 100
 129     assert mean([0, -200]) == -100
 130     assert mean([0]) == 0
 131
 132
 133 def test_floating_mean():
 134     assert mean([1, 2]) == 1.5
 135 ```
 136
 137 # When should we test?
 138
 139 The three right answers are:
 140
 141 -   **ALWAYS!**
 142 -   **EARLY!**
 143 -   **OFTEN!**
 144
 145 The longer answer is that testing either before or after your software
 146 is written will improve your code, but testing after your program is
 147 used for something important is too late.
 148
 149 If we have a robust set of tests, we can run them before adding
 150 something new and after adding something new. If the tests give the same
 151 results (as appropriate), we can have some assurance that we didn't
 152 wreak anything. The same idea applies to making changes in your system
 153 configuration, updating support codes, etc.
 154
 155 Another important feature of testing is that it helps you remember what
 156 all the parts of your code do. If you are working on a large project
 157 over three years and you end up with 200 classes, it may be hard to
 158 remember what the widget class does in detail. If you have a test that
 159 checks all of the widget's functionality, you can look at the test to
 160 remember what it's supposed to do.
 161
 162 # Who should test?
 163
 164 In a collaborative coding environment, where many developers contribute
 165 to the same code base, developers should be responsible individually for
 166 testing the functions they create and collectively for testing the code
 167 as a whole.
 168
 169 Professionals often test their code, and take pride in test coverage,
 170 the percent of their functions that they feel confident are
 171 comprehensively tested.
 172
 173 # How are tests written?
 174
 175 The type of tests that are written is determined by the testing
 176 framework you adopt. Don't worry, there are a lot of choices.
 177
 178 ## Types of Tests
 179
 180 **Exceptions:** Exceptions can be thought of as type of runtime test.
 181 They alert the user to exceptional behavior in the code. Often,
 182 exceptions are related to functions that depend on input that is unknown
 183 at compile time. Checks that occur within the code to handle exceptional
 184 behavior that results from this type of input are called Exceptions.
 185
 186 **Unit Tests:** Unit tests are a type of test which test the fundamental
 187 units of a program's functionality. Often, this is on the class or
 188 function level of detail. However what defines a *code unit* is not
 189 formally defined.
 190
 191 To test functions and classes, the interfaces (API) - rather than the
 192 implementation - should be tested. Treating the implementation as a
 193 black box, we can probe the expected behavior with boundary cases for
 194 the inputs.
 195
 196 **System Tests:** System level tests are intended to test the code as a
 197 whole. As opposed to unit tests, system tests ask for the behavior as a
 198 whole. This sort of testing involves comparison with other validated
 199 codes, analytical solutions, etc.
 200
 201 **Regression Tests:** A regression test ensures that new code does
 202 change anything. If you change the default answer, for example, or add a
 203 new question, you'll need to make sure that missing entries are still
 204 found and fixed.
 205
 206 **Integration Tests:** Integration tests query the ability of the code
 207 to integrate well with the system configuration and third party
 208 libraries and modules. This type of test is essential for codes that
 209 depend on libraries which might be updated independently of your code or
 210 when your code might be used by a number of users who may have various
 211 versions of libraries.
 212
 213 **Test Suites:** Putting a series of unit tests into a collection of
 214 modules creates, a test suite. Typically the suite as a whole is
 215 executed (rather than each test individually) when verifying that the
 216 code base still functions after changes have been made.
 217
 218 # Elements of a Test
 219
 220 **Behavior:** The behavior you want to test. For example, you might want
 221 to test the fun() function.
 222
 223 **Expected Result:** This might be a single number, a range of numbers,
 224 a new fully defined object, a system state, an exception, etc. When we
 225 run the fun() function, we expect to generate some fun. If we don't
 226 generate any fun, the fun() function should fail its test.
 227 Alternatively, if it does create some fun, the fun() function should
 228 pass this test. The the expected result should known *a priori*. For
 229 numerical functions, this is result is ideally analytically determined
 230 even if the function being tested isn't.
 231
 232 **Assertions:** Require that some conditional be true. If the
 233 conditional is false, the test fails.
 234
 235 **Fixtures:** Sometimes you have to do some legwork to create the
 236 objects that are necessary to run one or many tests. These objects are
 237 called fixtures as they are not really part of the test themselves but
 238 rather involve getting the computer into the appropriate state.
 239
 240 For example, since fun varies a lot between people, the fun() function
 241 is a method of the Person class. In order to check the fun function,
 242 then, we need to create an appropriate Person object on which to run
 243 fun().
 244
 245 **Setup and teardown:** Creating fixtures is often done in a call to a
 246 setup function. Deleting them and other cleanup is done in a teardown
 247 function.
 248
 249 **The Big Picture:** Putting all this together, the testing algorithm is
 250 often:
 251
 252 ```python
 253 setup()
 254 test()
 255 teardown()
 256 ```
 257
 258 But, sometimes it's the case that your tests change the fixtures. If so,
 259 it's better for the setup() and teardown() functions to occur on either
 260 side of each test. In that case, the testing algorithm should be:
 261
 262 ```python
 263 setup()
 264 test1()
 265 teardown()
 266
 267 setup()
 268 test2()
 269 teardown()
 270
 271 setup()
 272 test3()
 273 teardown()
 274 ```
 275
 276 * * * * *
 277
 278 # Nose: A Python Testing Framework
 279
 280 The testing framework we'll discuss today is called nose. However, there
 281 are several other testing frameworks available in most language. Most
 282 notably there is [JUnit](http://www.junit.org/) in Java which can
 283 arguably attributed to inventing the testing framework.
 284
 285 ## Where do nose tests live?
 286
 287 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 288 `test_`. Specifically, these satisfy the testMatch regular expression
 289 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 290 in the unittest.TestCase subclasses chat you create in your code. You
 291 can also create test functions which are not unittest.TestCase
 292 subclasses if they are named with the configured testMatch regular
 293 expression.)
 294
 295 ## Nose Test Syntax
 296
 297 To write a nose test, we make assertions.
 298
 299 ```python
 300 assert should_be_true()
 301 assert not should_not_be_true()
 302 ```
 303
 304 Additionally, nose itself defines number of assert functions which can
 305 be used to test more specific aspects of the code base.
 306
 307 ```python
 308 from nose.tools import *
 309
 310 assert_equal(a, b)
 311 assert_almost_equal(a, b)
 312 assert_true(a)
 313 assert_false(a)
 314 assert_raises(exception, func, *args, **kwargs)
 315 assert_is_instance(a, b)
 316 # and many more!
 317 ```
 318
 319 Moreover, numpy offers similar testing functions for arrays:
 320
 321 ```python
 322 from numpy.testing import *
 323
 324 assert_array_equal(a, b)
 325 assert_array_almost_equal(a, b)
 326 # etc.
 327 ```
 328
 329 ## Exercise: Writing tests for mean()
 330
 331 There are a few tests for the mean() function that we listed in this
 332 lesson. What are some tests that should fail? Add at least three test
 333 cases to this set. Edit the `test_mean.py` file which tests the mean()
 334 function in `mean.py`.
 335
 336 *Hint:* Think about what form your input could take and what you should
 337 do to handle it. Also, think about the type of the elements in the list.
 338 What should be done if you pass a list of integers? What if you pass a
 339 list of strings?
 340
 341 **Example**:
 342
 343     nosetests test_mean.py
 344
 345 # Test Driven Development
 346
 347 Test driven development (TDD) is a philosophy whereby the developer
 348 creates code by **writing the tests first**. That is to say you write the
 349 tests *before* writing the associated code!
 350
 351 This is an iterative process whereby you write a test then write the
 352 minimum amount code to make the test pass. If a new feature is needed,
 353 another test is written and the code is expanded to meet this new use
 354 case. This continues until the code does what is needed.
 355
 356 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 357 who diligently follow TDD swear by its effectiveness. This development
 358 style was put forth most strongly by [Kent Beck in
 359 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 360
 361 ## A TDD Example
 362
 363 Say you want to write a fib() function which generates values of the
 364 Fibonacci sequence of given indexes. You would - of course - start by
 365 writing the test, possibly testing a single value:
 366
 367 ```python
 368 from nose.tools import assert_equal
 369
 370 from pisa import fib
 371
 372 def test_fib1():
 373     obs = fib(2)
 374     exp = 1
 375     assert_equal(obs, exp)
 376 ```
 377
 378 You would *then* go ahead and write the actual function:
 379
 380 ```python
 381 def fib(n):
 382     # you snarky so-and-so
 383     return 1
 384 ```
 385
 386 And that is it right?! Well, not quite. This implementation fails for
 387 most other values. Adding tests we see that:
 388
 389 ```python
 390 def test_fib1():
 391     obs = fib(2)
 392     exp = 1
 393     assert_equal(obs, exp)
 394
 395
 396 def test_fib2():
 397     obs = fib(0)
 398     exp = 0
 399     assert_equal(obs, exp)
 400
 401     obs = fib(1)
 402     exp = 1
 403     assert_equal(obs, exp)
 404 ```
 405
 406 This extra test now requires that we bother to implement at least the
 407 initial values:
 408
 409 ```python
 410 def fib(n):
 411     # a little better
 412     if n == 0 or n == 1:
 413         return n
 414     return 1
 415 ```
 416
 417 However, this function still falls over for `2 < n`. Time for more
 418 tests!
 419
 420 ```python
 421 def test_fib1():
 422     obs = fib(2)
 423     exp = 1
 424     assert_equal(obs, exp)
 425
 426
 427 def test_fib2():
 428     obs = fib(0)
 429     exp = 0
 430     assert_equal(obs, exp)
 431
 432     obs = fib(1)
 433     exp = 1
 434     assert_equal(obs, exp)
 435
 436
 437 def test_fib3():
 438     obs = fib(3)
 439     exp = 2
 440     assert_equal(obs, exp)
 441
 442     obs = fib(6)
 443     exp = 8
 444     assert_equal(obs, exp)
 445 ```
 446
 447 At this point, we had better go ahead and try do the right thing...
 448
 449 ```python
 450 def fib(n):
 451     # finally, some math
 452     if n == 0 or n == 1:
 453         return n
 454     else:
 455         return fib(n - 1) + fib(n - 2)
 456 ```
 457
 458 Here it becomes very tempting to take an extended coffee break or
 459 possibly a power lunch. But then you remember those pesky negative
 460 numbers and floats. Perhaps the right thing to do here is to just be
 461 undefined.
 462
 463 ```python
 464 def test_fib1():
 465     obs = fib(2)
 466     exp = 1
 467     assert_equal(obs, exp)
 468
 469
 470 def test_fib2():
 471     obs = fib(0)
 472     exp = 0
 473     assert_equal(obs, exp)
 474
 475     obs = fib(1)
 476     exp = 1
 477     assert_equal(obs, exp)
 478
 479
 480 def test_fib3():
 481     obs = fib(3)
 482     exp = 2
 483     assert_equal(obs, exp)
 484
 485     obs = fib(6)
 486     exp = 8
 487     assert_equal(obs, exp)
 488
 489
 490 def test_fib3():
 491     obs = fib(13.37)
 492     exp = NotImplemented
 493     assert_equal(obs, exp)
 494
 495     obs = fib(-9000)
 496     exp = NotImplemented
 497     assert_equal(obs, exp)
 498 ```
 499
 500 This means that it is time to add the appropriate case to the function
 501 itself:
 502
 503 ```python
 504 def fib(n):
 505     # sequence and you shall find
 506     if n < 0 or int(n) != n:
 507         return NotImplemented
 508     elif n == 0 or n == 1:
 509         return n
 510     else:
 511         return fib(n - 1) + fib(n - 2)
 512 ```
 513
 514 # Quality Assurance Exercise
 515
 516 Can you think of other tests to make for the fibonacci function? I promise there
 517 are at least two.
 518
 519 Implement one new test in test_fib.py, run nosetests, and if it fails, implement
 520 a more robust function for that case.
 521
 522 And thus - finally - we have a robust function together with working
 523 tests!
 524
 525 # Exercise
 526
 527 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
 528 define a line segment. Additionally there exists experimental data which
 529 can be anywhere in the domain. Find the data point which is closest to
 530 the line segment.
 531
 532 In the `close_line.py` file there are four different implementations
 533 which all solve this problem. [You can read more about them
 534 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
 535 there are no tests! Please write from scratch a `test_close_line.py`
 536 file which tests the closest\_data\_to\_line() functions.
 537
 538 *Hint:* you can use one implementation function to test another. Below
 539 is some sample data to help you get started.
 540
 541 ![image](media/evolution-of-a-solution-1.png)
 542 > -
 543
 544 ```python
 545 import numpy as np
 546
 547 p1 = np.array([0.0, 0.0])
 548 p2 = np.array([1.0, 1.0])
 549 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 550 ```
 551