testing/Readme.md

   1 [Back To
   2 Debugging](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/)
   3 - [Forward To
   4 Documentation](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/)
   5
   6 * * * * *
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
  11 Scopatz**
  12
  13 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
  14 # What is testing?
  15
  16 Software testing is a process by which one or more expected behaviors
  17 and results from a piece of software are exercised and confirmed. Well
  18 chosen tests will confirm expected code behavior for the extreme
  19 boundaries of the input domains, output ranges, parametric combinations,
  20 and other behavioral edge cases.
  21
  22 # Why test software?
  23
  24 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  25 and predictable code every time, you must test your code in order to
  26 trust it enough to answer in the affirmative to at least a few of the
  27 following questions:
  28
  29 -   Does your code work?
  30 -   Always?
  31 -   Does it do what you think it does?
  32 -   Does it continue to work after changes are made?
  33 -   Does it continue to work after system configurations or libraries
  34     are upgraded?
  35 -   Does it respond properly for a full range of input parameters?
  36 -   What about edge or corner cases?
  37 -   What's the limit on that input parameter?
  38 -   How will it affect your
  39     [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
  40
  41 ## Verification
  42
  43 *Verification* is the process of asking, "Have we built the software
  44 correctly?" That is, is the code bug free, precise, accurate, and
  45 repeatable?
  46
  47 ## Validation
  48
  49 *Validation* is the process of asking, "Have we built the right
  50 software?" That is, is the code designed in such a way as to produce the
  51 answers we are interested in, data we want, etc.
  52
  53 ## Uncertainty Quantification
  54
  55 *Uncertainty Quantification* is the process of asking, "Given that our
  56 algorithm may not be deterministic, was our execution within acceptable
  57 error bounds?" This is particularly important for anything which uses
  58 random numbers, eg Monte Carlo methods.
  59
  60 # Where are tests?
  61
  62 Say we have an averaging function:
  63
  64 ```python
  65 def mean(numlist):
  66     total = sum(numlist)
  67     length = len(numlist)
  68     return total/length
  69 ```
  70
  71 Tests could be implemented as runtime exceptions in the function:
  72
  73 ```python
  74 def mean(numlist):
  75     try:
  76         total = sum(numlist)
  77         length = len(numlist)
  78     except ValueError:
  79         print "The number list was not a list of numbers."
  80     except:
  81         print "There was a problem evaluating the number list."
  82     return total/length
  83 ```
  84
  85 Sometimes tests they are functions alongside the function definitions
  86 they are testing.
  87
  88 ```python
  89 def mean(numlist):
  90     try:
  91         total = sum(numlist)
  92         length = len(numlist)
  93     except ValueError:
  94         print "The number list was not a list of numbers."
  95     except:
  96         print "There was a problem evaluating the number list."
  97     return total/length
  98
  99
 100 def test_mean():
 101     assert mean([0, 0, 0, 0]) == 0
 102     assert mean([0, 200]) == 100
 103     assert mean([0, -200]) == -100
 104     assert mean([0]) == 0
 105
 106
 107 def test_floating_mean():
 108     assert mean([1, 2]) == 1.5
 109 ```
 110
 111 Sometimes they are in an executable independent of the main executable.
 112
 113 ```python
 114 def mean(numlist):
 115     try:
 116         total = sum(numlist)
 117         length = len(numlist)
 118     except ValueError:
 119         print "The number list was not a list of numbers."
 120     except:
 121         print "There was a problem evaluating the number list."
 122     return total/length
 123 ```
 124
 125 Where, in a different file exists a test module:
 126
 127 ```python
 128 import mean
 129
 130 def test_mean():
 131     assert mean([0, 0, 0, 0]) == 0
 132     assert mean([0, 200]) == 100
 133     assert mean([0, -200]) == -100
 134     assert mean([0]) == 0
 135
 136
 137 def test_floating_mean():
 138     assert mean([1, 2]) == 1.5
 139 ```
 140
 141 # When should we test?
 142
 143 The three right answers are:
 144
 145 -   **ALWAYS!**
 146 -   **EARLY!**
 147 -   **OFTEN!**
 148
 149 The longer answer is that testing either before or after your software
 150 is written will improve your code, but testing after your program is
 151 used for something important is too late.
 152
 153 If we have a robust set of tests, we can run them before adding
 154 something new and after adding something new. If the tests give the same
 155 results (as appropriate), we can have some assurance that we didn't
 156 wreak anything. The same idea applies to making changes in your system
 157 configuration, updating support codes, etc.
 158
 159 Another important feature of testing is that it helps you remember what
 160 all the parts of your code do. If you are working on a large project
 161 over three years and you end up with 200 classes, it may be hard to
 162 remember what the widget class does in detail. If you have a test that
 163 checks all of the widget's functionality, you can look at the test to
 164 remember what it's supposed to do.
 165
 166 # Who should test?
 167
 168 In a collaborative coding environment, where many developers contribute
 169 to the same code base, developers should be responsible individually for
 170 testing the functions they create and collectively for testing the code
 171 as a whole.
 172
 173 Professionals often test their code, and take pride in test coverage,
 174 the percent of their functions that they feel confident are
 175 comprehensively tested.
 176
 177 # How are tests written?
 178
 179 The type of tests that are written is determined by the testing
 180 framework you adopt. Don't worry, there are a lot of choices.
 181
 182 ## Types of Tests
 183
 184 **Exceptions:** Exceptions can be thought of as type of runtime test.
 185 They alert the user to exceptional behavior in the code. Often,
 186 exceptions are related to functions that depend on input that is unknown
 187 at compile time. Checks that occur within the code to handle exceptional
 188 behavior that results from this type of input are called Exceptions.
 189
 190 **Unit Tests:** Unit tests are a type of test which test the fundamental
 191 units of a program's functionality. Often, this is on the class or
 192 function level of detail. However what defines a *code unit* is not
 193 formally defined.
 194
 195 To test functions and classes, the interfaces (API) - rather than the
 196 implementation - should be tested. Treating the implementation as a
 197 black box, we can probe the expected behavior with boundary cases for
 198 the inputs.
 199
 200 **System Tests:** System level tests are intended to test the code as a
 201 whole. As opposed to unit tests, system tests ask for the behavior as a
 202 whole. This sort of testing involves comparison with other validated
 203 codes, analytical solutions, etc.
 204
 205 **Regression Tests:** A regression test ensures that new code does
 206 change anything. If you change the default answer, for example, or add a
 207 new question, you'll need to make sure that missing entries are still
 208 found and fixed.
 209
 210 **Integration Tests:** Integration tests query the ability of the code
 211 to integrate well with the system configuration and third party
 212 libraries and modules. This type of test is essential for codes that
 213 depend on libraries which might be updated independently of your code or
 214 when your code might be used by a number of users who may have various
 215 versions of libraries.
 216
 217 **Test Suites:** Putting a series of unit tests into a collection of
 218 modules creates, a test suite. Typically the suite as a whole is
 219 executed (rather than each test individually) when verifying that the
 220 code base still functions after changes have been made.
 221
 222 # Elements of a Test
 223
 224 **Behavior:** The behavior you want to test. For example, you might want
 225 to test the fun() function.
 226
 227 **Expected Result:** This might be a single number, a range of numbers,
 228 a new fully defined object, a system state, an exception, etc. When we
 229 run the fun() function, we expect to generate some fun. If we don't
 230 generate any fun, the fun() function should fail its test.
 231 Alternatively, if it does create some fun, the fun() function should
 232 pass this test. The the expected result should known *a priori*. For
 233 numerical functions, this is result is ideally analytically determined
 234 even if the function being tested isn't.
 235
 236 **Assertions:** Require that some conditional be true. If the
 237 conditional is false, the test fails.
 238
 239 **Fixtures:** Sometimes you have to do some legwork to create the
 240 objects that are necessary to run one or many tests. These objects are
 241 called fixtures as they are not really part of the test themselves but
 242 rather involve getting the computer into the appropriate state.
 243
 244 For example, since fun varies a lot between people, the fun() function
 245 is a method of the Person class. In order to check the fun function,
 246 then, we need to create an appropriate Person object on which to run
 247 fun().
 248
 249 **Setup and teardown:** Creating fixtures is often done in a call to a
 250 setup function. Deleting them and other cleanup is done in a teardown
 251 function.
 252
 253 **The Big Picture:** Putting all this together, the testing algorithm is
 254 often:
 255
 256 ```python
 257 setup()
 258 test()
 259 teardown()
 260 ```
 261
 262 But, sometimes it's the case that your tests change the fixtures. If so,
 263 it's better for the setup() and teardown() functions to occur on either
 264 side of each test. In that case, the testing algorithm should be:
 265
 266 ```python
 267 setup()
 268 test1()
 269 teardown()
 270
 271 setup()
 272 test2()
 273 teardown()
 274
 275 setup()
 276 test3()
 277 teardown()
 278 ```
 279
 280 * * * * *
 281
 282 # Nose: A Python Testing Framework
 283
 284 The testing framework we'll discuss today is called nose. However, there
 285 are several other testing frameworks available in most language. Most
 286 notably there is [JUnit](http://www.junit.org/) in Java which can
 287 arguably attributed to inventing the testing framework.
 288
 289 ## Where do nose tests live?
 290
 291 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 292 `test_`. Specifically, these satisfy the testMatch regular expression
 293 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 294 in the unittest.TestCase subclasses chat you create in your code. You
 295 can also create test functions which are not unittest.TestCase
 296 subclasses if they are named with the configured testMatch regular
 297 expression.)
 298
 299 ## Nose Test Syntax
 300
 301 To write a nose test, we make assertions.
 302
 303 ```python
 304 assert should_be_true()
 305 assert not should_not_be_true()
 306 ```
 307
 308 Additionally, nose itself defines number of assert functions which can
 309 be used to test more specific aspects of the code base.
 310
 311 ```python
 312 from nose.tools import *
 313
 314 assert_equal(a, b)
 315 assert_almost_equal(a, b)
 316 assert_true(a)
 317 assert_false(a)
 318 assert_raises(exception, func, *args, **kwargs)
 319 assert_is_instance(a, b)
 320 # and many more!
 321 ```
 322
 323 Moreover, numpy offers similar testing functions for arrays:
 324
 325 ```python
 326 from numpy.testing import *
 327
 328 assert_array_equal(a, b)
 329 assert_array_almost_equal(a, b)
 330 # etc.
 331 ```
 332
 333 ## Exercise: Writing tests for mean()
 334
 335 There are a few tests for the mean() function that we listed in this
 336 lesson. What are some tests that should fail? Add at least three test
 337 cases to this set. Edit the `test_mean.py` file which tests the mean()
 338 function in `mean.py`.
 339
 340 *Hint:* Think about what form your input could take and what you should
 341 do to handle it. Also, think about the type of the elements in the list.
 342 What should be done if you pass a list of integers? What if you pass a
 343 list of strings?
 344
 345 **Example**:
 346
 347     nosetests test_mean.py
 348
 349 # Test Driven Development
 350
 351 Test driven development (TDD) is a philosophy whereby the developer
 352 creates code by **writing the tests fist**. That is to say you write the
 353 tests *before* writing the associated code!
 354
 355 This is an iterative process whereby you write a test then write the
 356 minimum amount code to make the test pass. If a new feature is needed,
 357 another test is written and the code is expanded to meet this new use
 358 case. This continues until the code does what is needed.
 359
 360 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 361 who diligently follow TDD swear by its effectiveness. This development
 362 style was put forth most strongly by [Kent Beck in
 363 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 364
 365 ## A TDD Example
 366
 367 Say you want to write a fib() function which generates values of the
 368 Fibonacci sequence of given indexes. You would - of course - start by
 369 writing the test, possibly testing a single value:
 370
 371 ```python
 372 from nose import assert_equal
 373
 374 from pisa import fib
 375
 376 def test_fib1():
 377     obs = fib(2)
 378     exp = 1
 379     assert_equal(obs, exp)
 380 ```
 381
 382 You would *then* go ahead and write the actual function:
 383
 384 ```python
 385 def fib(n):
 386     # you snarky so-and-so
 387     return 1
 388 ```
 389
 390 And that is it right?! Well, not quite. This implementation fails for
 391 most other values. Adding tests we see that:
 392
 393 ```python
 394 def test_fib1():
 395     obs = fib(2)
 396     exp = 1
 397     assert_equal(obs, exp)
 398
 399
 400 def test_fib2():
 401     obs = fib(0)
 402     exp = 0
 403     assert_equal(obs, exp)
 404
 405     obs = fib(1)
 406     exp = 1
 407     assert_equal(obs, exp)
 408 ```
 409
 410 This extra test now requires that we bother to implement at least the
 411 initial values:
 412
 413 ```python
 414 def fib(n):
 415     # a little better
 416     if n == 0 or n == 1:
 417         return n
 418     return 1
 419 ```
 420
 421 However, this function still falls over for `2 < n`. Time for more
 422 tests!
 423
 424 ```python
 425 def test_fib1():
 426     obs = fib(2)
 427     exp = 1
 428     assert_equal(obs, exp)
 429
 430
 431 def test_fib2():
 432     obs = fib(0)
 433     exp = 0
 434     assert_equal(obs, exp)
 435
 436     obs = fib(1)
 437     exp = 1
 438     assert_equal(obs, exp)
 439
 440
 441 def test_fib3():
 442     obs = fib(3)
 443     exp = 2
 444     assert_equal(obs, exp)
 445
 446     obs = fib(6)
 447     exp = 8
 448     assert_equal(obs, exp)
 449 ```
 450
 451 At this point, we had better go ahead and try do the right thing...
 452
 453 ```python
 454 def fib(n):
 455     # finally, some math
 456     if n == 0 or n == 1:
 457         return n
 458     else:
 459         return fib(n - 1) + fib(n - 2)
 460 ```
 461
 462 Here it becomes very tempting to take an extended coffee break or
 463 possibly a power lunch. But then you remember those pesky negative
 464 numbers and floats. Perhaps the right thing to do here is to just be
 465 undefined.
 466
 467 ```python
 468 def test_fib1():
 469     obs = fib(2)
 470     exp = 1
 471     assert_equal(obs, exp)
 472
 473
 474 def test_fib2():
 475     obs = fib(0)
 476     exp = 0
 477     assert_equal(obs, exp)
 478
 479     obs = fib(1)
 480     exp = 1
 481     assert_equal(obs, exp)
 482
 483
 484 def test_fib3():
 485     obs = fib(3)
 486     exp = 2
 487     assert_equal(obs, exp)
 488
 489     obs = fib(6)
 490     exp = 8
 491     assert_equal(obs, exp)
 492
 493
 494 def test_fib3():
 495     obs = fib(13.37)
 496     exp = NotImplemented
 497     assert_equal(obs, exp)
 498
 499     obs = fib(-9000)
 500     exp = NotImplemented
 501     assert_equal(obs, exp)
 502 ```
 503
 504 This means that it is time to add the appropriate case to the function
 505 itself:
 506
 507 ```python
 508 def fib(n):
 509     # sequence and you shall find
 510     if n < 0 or int(n) != n:
 511         return NotImplemented
 512     elif n == 0 or n == 1:
 513         return n
 514     else:
 515         return fib(n - 1) + fib(n - 2)
 516 ```
 517
 518 And thus - finally - we have a robust function together with working
 519 tests!
 520
 521 # Exercise
 522
 523 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
 524 define a line segment. Additionally there exists experimental data which
 525 can be anywhere in the domain. Find the data point which is closest to
 526 the line segment.
 527
 528 In the `close_line.py` file there are four different implementations
 529 which all solve this problem. [You can read more about them
 530 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
 531 there are no tests! Please write from scratch a `test_close_line.py`
 532 file which tests the closest\_data\_to\_line() functions.
 533
 534 *Hint:* you can use one implementation function to test another. Below
 535 is some sample data to help you get started.
 536
 537 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
 538 > -
 539
 540 ```python
 541 import numpy as np
 542
 543 p1 = np.array([0.0, 0.0])
 544 p2 = np.array([1.0, 1.0])
 545 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 546 ```