testing/nose/Readme.md

   1 # Testing
   2
   3 * * * * *
   4
   5 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
   6 Scopatz**
   7
   8 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
   9 # What is testing?
  10
  11 Software testing is a process by which one or more expected behaviors
  12 and results from a piece of software are exercised and confirmed. Well
  13 chosen tests will confirm expected code behavior for the extreme
  14 boundaries of the input domains, output ranges, parametric combinations,
  15 and other behavioral **edge cases**.
  16
  17 # Why test software?
  18
  19 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  20 and predictable code **every time**, you must test your code in order to
  21 trust it enough to answer in the affirmative to at least a few of the
  22 following questions:
  23
  24 -   Does your code work?
  25 -   **Always?**
  26 -   Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
  27 -   Does it continue to work after changes are made?
  28 -   Does it continue to work after system configurations or libraries
  29     are upgraded?
  30 -   Does it respond properly for a full range of input parameters?
  31 -   What about **edge or corner cases**?
  32 -   What's the limit on that input parameter?
  33 -   How will it affect your
  34     [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
  35
  36 ## Verification
  37
  38 *Verification* is the process of asking, "Have we built the software
  39 correctly?" That is, is the code bug free, precise, accurate, and
  40 repeatable?
  41
  42 ## Validation
  43
  44 *Validation* is the process of asking, "Have we built the right
  45 software?" That is, is the code designed in such a way as to produce the
  46 answers we are interested in, data we want, etc.
  47
  48 ## Uncertainty Quantification
  49
  50 *Uncertainty Quantification* is the process of asking, "Given that our
  51 algorithm may not be deterministic, was our execution within acceptable
  52 error bounds?" This is particularly important for anything which uses
  53 random numbers, eg Monte Carlo methods.
  54
  55 # Where are tests?
  56
  57 Say we have an averaging function:
  58
  59 ```python
  60 def mean(numlist):
  61     total = sum(numlist)
  62     length = len(numlist)
  63     return total/length
  64 ```
  65
  66 Tests could be implemented as runtime **exceptions in the function**:
  67
  68 ```python
  69 def mean(numlist):
  70     try:
  71         total = sum(numlist)
  72         length = len(numlist)
  73     except TypeError:
  74         raise TypeError("The number list was not a list of numbers.")
  75     except:
  76         print "There was a problem evaluating the number list."
  77     return total/length
  78 ```
  79
  80 Sometimes tests they are functions alongside the function definitions
  81 they are testing.
  82
  83 ```python
  84 def mean(numlist):
  85     try:
  86         total = sum(numlist)
  87         length = len(numlist)
  88     except TypeError:
  89         raise TypeError("The number list was not a list of numbers.")
  90     except:
  91         print "There was a problem evaluating the number list."
  92     return total/length
  93
  94
  95 def test_mean():
  96     assert mean([0, 0, 0, 0]) == 0
  97     assert mean([0, 200]) == 100
  98     assert mean([0, -200]) == -100
  99     assert mean([0]) == 0
 100
 101
 102 def test_floating_mean():
 103     assert mean([1, 2]) == 1.5
 104 ```
 105
 106 Sometimes they are in an executable independent of the main executable.
 107
 108 ```python
 109 def mean(numlist):
 110     try:
 111         total = sum(numlist)
 112         length = len(numlist)
 113     except TypeError:
 114         raise TypeError("The number list was not a list of numbers.")
 115     except:
 116         print "There was a problem evaluating the number list."
 117     return total/length
 118 ```
 119
 120 Where, in a different file exists a test module:
 121
 122 ```python
 123 import mean
 124
 125 def test_mean():
 126     assert mean([0, 0, 0, 0]) == 0
 127     assert mean([0, 200]) == 100
 128     assert mean([0, -200]) == -100
 129     assert mean([0]) == 0
 130
 131
 132 def test_floating_mean():
 133     assert mean([1, 2]) == 1.5
 134 ```
 135
 136 # When should we test?
 137
 138 The three right answers are:
 139
 140 -   **ALWAYS!**
 141 -   **EARLY!**
 142 -   **OFTEN!**
 143
 144 The longer answer is that testing either before or after your software
 145 is written will improve your code, but testing after your program is
 146 used for something important is too late.
 147
 148 If we have a robust set of tests, we can run them before adding
 149 something new and after adding something new. If the tests give the same
 150 results (as appropriate), we can have some assurance that we didn't
 151 wreak anything. The same idea applies to making changes in your system
 152 configuration, updating support codes, etc.
 153
 154 Another important feature of testing is that it helps you remember what
 155 all the parts of your code do. If you are working on a large project
 156 over three years and you end up with 200 classes, it may be hard to
 157 remember what the widget class does in detail. If you have a test that
 158 checks all of the widget's functionality, you can look at the test to
 159 remember what it's supposed to do.
 160
 161 # Who should test?
 162
 163 In a collaborative coding environment, where many developers contribute
 164 to the same code base, developers should be responsible individually for
 165 testing the functions they create and collectively for testing the code
 166 as a whole.
 167
 168 Professionals often test their code, and take pride in test coverage,
 169 the percent of their functions that they feel confident are
 170 comprehensively tested.
 171
 172 # How are tests written?
 173
 174 The type of tests that are written is determined by the testing
 175 framework you adopt. Don't worry, there are a lot of choices.
 176
 177 ## Types of Tests
 178
 179 **Exceptions:** Exceptions can be thought of as type of runtime test.
 180 They alert the user to exceptional behavior in the code. Often,
 181 exceptions are related to functions that depend on input that is unknown
 182 at compile time. Checks that occur within the code to handle exceptional
 183 behavior that results from this type of input are called Exceptions.
 184
 185 **Unit Tests:** Unit tests are a type of test which test the fundamental
 186 units of a program's functionality. Often, this is on the class or
 187 function level of detail. However what defines a *code unit* is not
 188 formally defined.
 189
 190 To test functions and classes, the interfaces (API) - rather than the
 191 implementation - should be tested. Treating the implementation as a
 192 black box, we can probe the expected behavior with boundary cases for
 193 the inputs.
 194
 195 **System Tests:** System level tests are intended to test the code as a
 196 whole. As opposed to unit tests, system tests ask for the behavior as a
 197 whole. This sort of testing involves comparison with other validated
 198 codes, analytical solutions, etc.
 199
 200 **Regression Tests:** A regression test ensures that new code does
 201 change anything. If you change the default answer, for example, or add a
 202 new question, you'll need to make sure that missing entries are still
 203 found and fixed.
 204
 205 **Integration Tests:** Integration tests query the ability of the code
 206 to integrate well with the system configuration and third party
 207 libraries and modules. This type of test is essential for codes that
 208 depend on libraries which might be updated independently of your code or
 209 when your code might be used by a number of users who may have various
 210 versions of libraries.
 211
 212 **Test Suites:** Putting a series of unit tests into a collection of
 213 modules creates, a test suite. Typically the suite as a whole is
 214 executed (rather than each test individually) when verifying that the
 215 code base still functions after changes have been made.
 216
 217 # Elements of a Test
 218
 219 **Behavior:** The behavior you want to test. For example, you might want
 220 to test the fun() function.
 221
 222 **Expected Result:** This might be a single number, a range of numbers,
 223 a new fully defined object, a system state, an exception, etc. When we
 224 run the fun() function, we expect to generate some fun. If we don't
 225 generate any fun, the fun() function should fail its test.
 226 Alternatively, if it does create some fun, the fun() function should
 227 pass this test. The the expected result should known *a priori*. For
 228 numerical functions, this is result is ideally analytically determined
 229 even if the function being tested isn't.
 230
 231 **Assertions:** Require that some conditional be true. If the
 232 conditional is false, the test fails.
 233
 234 **Fixtures:** Sometimes you have to do some legwork to create the
 235 objects that are necessary to run one or many tests. These objects are
 236 called fixtures as they are not really part of the test themselves but
 237 rather involve getting the computer into the appropriate state.
 238
 239 For example, since fun varies a lot between people, the fun() function
 240 is a method of the Person class. In order to check the fun function,
 241 then, we need to create an appropriate Person object on which to run
 242 fun().
 243
 244 **Setup and teardown:** Creating fixtures is often done in a call to a
 245 setup function. Deleting them and other cleanup is done in a teardown
 246 function.
 247
 248 **The Big Picture:** Putting all this together, the testing algorithm is
 249 often:
 250
 251 ```python
 252 setup()
 253 test()
 254 teardown()
 255 ```
 256
 257 But, sometimes it's the case that your tests change the fixtures. If so,
 258 it's better for the setup() and teardown() functions to occur on either
 259 side of each test. In that case, the testing algorithm should be:
 260
 261 ```python
 262 setup()
 263 test1()
 264 teardown()
 265
 266 setup()
 267 test2()
 268 teardown()
 269
 270 setup()
 271 test3()
 272 teardown()
 273 ```
 274
 275 * * * * *
 276
 277 # Nose: A Python Testing Framework
 278
 279 The testing framework we'll discuss today is called nose. However, there
 280 are several other testing frameworks available in most language. Most
 281 notably there is [JUnit](http://www.junit.org/) in Java which can
 282 arguably attributed to inventing the testing framework.
 283
 284 ## Where do nose tests live?
 285
 286 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 287 `test_`. Specifically, these satisfy the testMatch regular expression
 288 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 289 in the unittest.TestCase subclasses chat you create in your code. You
 290 can also create test functions which are not unittest.TestCase
 291 subclasses if they are named with the configured testMatch regular
 292 expression.)
 293
 294 ## Nose Test Syntax
 295
 296 To write a nose test, we make assertions.
 297
 298 ```python
 299 assert should_be_true()
 300 assert not should_not_be_true()
 301 ```
 302
 303 Additionally, nose itself defines number of assert functions which can
 304 be used to test more specific aspects of the code base.
 305
 306 ```python
 307 from nose.tools import *
 308
 309 assert_equal(a, b)
 310 assert_almost_equal(a, b)
 311 assert_true(a)
 312 assert_false(a)
 313 assert_raises(exception, func, *args, **kwargs)
 314 assert_is_instance(a, b)
 315 # and many more!
 316 ```
 317
 318 Moreover, numpy offers similar testing functions for arrays:
 319
 320 ```python
 321 from numpy.testing import *
 322
 323 assert_array_equal(a, b)
 324 assert_array_almost_equal(a, b)
 325 # etc.
 326 ```
 327
 328 ## Exercise: Writing tests for mean()
 329
 330 There are a few tests for the mean() function that we listed in this
 331 lesson. What are some tests that should fail? Add at least three test
 332 cases to this set. Edit the `test_mean.py` file which tests the mean()
 333 function in `mean.py`.
 334
 335 *Hint:* Think about what form your input could take and what you should
 336 do to handle it. Also, think about the type of the elements in the list.
 337 What should be done if you pass a list of integers? What if you pass a
 338 list of strings?
 339
 340 **Example**:
 341
 342     nosetests test_mean.py
 343
 344 # Test Driven Development
 345
 346 Test driven development (TDD) is a philosophy whereby the developer
 347 creates code by **writing the tests first**. That is to say you write the
 348 tests *before* writing the associated code!
 349
 350 This is an iterative process whereby you write a test then write the
 351 minimum amount code to make the test pass. If a new feature is needed,
 352 another test is written and the code is expanded to meet this new use
 353 case. This continues until the code does what is needed.
 354
 355 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 356 who diligently follow TDD swear by its effectiveness. This development
 357 style was put forth most strongly by [Kent Beck in
 358 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 359
 360 ## A TDD Example
 361
 362 Say you want to write a fib() function which generates values of the
 363 Fibonacci sequence of given indexes. You would - of course - start by
 364 writing the test, possibly testing a single value:
 365
 366 ```python
 367 from nose.tools import assert_equal
 368
 369 from pisa import fib
 370
 371 def test_fib1():
 372     obs = fib(2)
 373     exp = 1
 374     assert_equal(obs, exp)
 375 ```
 376
 377 You would *then* go ahead and write the actual function:
 378
 379 ```python
 380 def fib(n):
 381     # you snarky so-and-so
 382     return 1
 383 ```
 384
 385 And that is it right?! Well, not quite. This implementation fails for
 386 most other values. Adding tests we see that:
 387
 388 ```python
 389 def test_fib1():
 390     obs = fib(2)
 391     exp = 1
 392     assert_equal(obs, exp)
 393
 394
 395 def test_fib2():
 396     obs = fib(0)
 397     exp = 0
 398     assert_equal(obs, exp)
 399
 400     obs = fib(1)
 401     exp = 1
 402     assert_equal(obs, exp)
 403 ```
 404
 405 This extra test now requires that we bother to implement at least the
 406 initial values:
 407
 408 ```python
 409 def fib(n):
 410     # a little better
 411     if n == 0 or n == 1:
 412         return n
 413     return 1
 414 ```
 415
 416 However, this function still falls over for `2 < n`. Time for more
 417 tests!
 418
 419 ```python
 420 def test_fib1():
 421     obs = fib(2)
 422     exp = 1
 423     assert_equal(obs, exp)
 424
 425
 426 def test_fib2():
 427     obs = fib(0)
 428     exp = 0
 429     assert_equal(obs, exp)
 430
 431     obs = fib(1)
 432     exp = 1
 433     assert_equal(obs, exp)
 434
 435
 436 def test_fib3():
 437     obs = fib(3)
 438     exp = 2
 439     assert_equal(obs, exp)
 440
 441     obs = fib(6)
 442     exp = 8
 443     assert_equal(obs, exp)
 444 ```
 445
 446 At this point, we had better go ahead and try do the right thing...
 447
 448 ```python
 449 def fib(n):
 450     # finally, some math
 451     if n == 0 or n == 1:
 452         return n
 453     else:
 454         return fib(n - 1) + fib(n - 2)
 455 ```
 456
 457 Here it becomes very tempting to take an extended coffee break or
 458 possibly a power lunch. But then you remember those pesky negative
 459 numbers and floats. Perhaps the right thing to do here is to just be
 460 undefined.
 461
 462 ```python
 463 def test_fib1():
 464     obs = fib(2)
 465     exp = 1
 466     assert_equal(obs, exp)
 467
 468
 469 def test_fib2():
 470     obs = fib(0)
 471     exp = 0
 472     assert_equal(obs, exp)
 473
 474     obs = fib(1)
 475     exp = 1
 476     assert_equal(obs, exp)
 477
 478
 479 def test_fib3():
 480     obs = fib(3)
 481     exp = 2
 482     assert_equal(obs, exp)
 483
 484     obs = fib(6)
 485     exp = 8
 486     assert_equal(obs, exp)
 487
 488
 489 def test_fib3():
 490     obs = fib(13.37)
 491     exp = NotImplemented
 492     assert_equal(obs, exp)
 493
 494     obs = fib(-9000)
 495     exp = NotImplemented
 496     assert_equal(obs, exp)
 497 ```
 498
 499 This means that it is time to add the appropriate case to the function
 500 itself:
 501
 502 ```python
 503 def fib(n):
 504     # sequence and you shall find
 505     if n < 0 or int(n) != n:
 506         return NotImplemented
 507     elif n == 0 or n == 1:
 508         return n
 509     else:
 510         return fib(n - 1) + fib(n - 2)
 511 ```
 512
 513 # Quality Assurance Exercise
 514
 515 Can you think of other tests to make for the fibonacci function? I promise there
 516 are at least two.
 517
 518 Implement one new test in test_fib.py, run nosetests, and if it fails, implement
 519 a more robust function for that case.
 520
 521 And thus - finally - we have a robust function together with working
 522 tests!
 523
 524 # Exercise
 525
 526 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
 527 define a line segment. Additionally there exists experimental data which
 528 can be anywhere in the domain. Find the data point which is closest to
 529 the line segment.
 530
 531 In the `close_line.py` file there are four different implementations
 532 which all solve this problem. [You can read more about them
 533 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
 534 there are no tests! Please write from scratch a `test_close_line.py`
 535 file which tests the closest\_data\_to\_line() functions.
 536
 537 *Hint:* you can use one implementation function to test another. Below
 538 is some sample data to help you get started.
 539
 540 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
 541 > -
 542
 543 ```python
 544 import numpy as np
 545
 546 p1 = np.array([0.0, 0.0])
 547 p2 = np.array([1.0, 1.0])
 548 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 549 ```
 550