testing/Readme.md

   1 [Back To Debugging](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/debugging)
   2 - [Forward To
   3 Documentation](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/documentation)
   4
   5 * * * * *
   6
   7 **Presented By Anthony Scopatz**
   8
   9 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
  10 Scopatz**
  11
  12 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
  13 # What is testing?
  14
  15 Software testing is a process by which one or more expected behaviors
  16 and results from a piece of software are exercised and confirmed. Well
  17 chosen tests will confirm expected code behavior for the extreme
  18 boundaries of the input domains, output ranges, parametric combinations,
  19 and other behavioral edge cases.
  20
  21 # Why test software?
  22
  23 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  24 and predictable code every time, you must test your code in order to
  25 trust it enough to answer in the affirmative to at least a few of the
  26 following questions:
  27
  28 -   Does your code work?
  29 -   Always?
  30 -   Does it do what you think it does?
  31 -   Does it continue to work after changes are made?
  32 -   Does it continue to work after system configurations or libraries
  33     are upgraded?
  34 -   Does it respond properly for a full range of input parameters?
  35 -   What about edge or corner cases?
  36 -   What's the limit on that input parameter?
  37 -   How will it affect your
  38     [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
  39
  40 ## Verification
  41
  42 *Verification* is the process of asking, "Have we built the software
  43 correctly?" That is, is the code bug free, precise, accurate, and
  44 repeatable?
  45
  46 ## Validation
  47
  48 *Validation* is the process of asking, "Have we built the right
  49 software?" That is, is the code designed in such a way as to produce the
  50 answers we are interested in, data we want, etc.
  51
  52 ## Uncertainty Quantification
  53
  54 *Uncertainty Quantification* is the process of asking, "Given that our
  55 algorithm may not be deterministic, was our execution within acceptable
  56 error bounds?" This is particularly important for anything which uses
  57 random numbers, eg Monte Carlo methods.
  58
  59 # Where are tests?
  60
  61 Say we have an averaging function:
  62
  63 ```python
  64 def mean(numlist):
  65     total = sum(numlist)
  66     length = len(numlist)
  67     return total/length
  68 ```
  69
  70 Tests could be implemented as runtime exceptions in the function:
  71
  72 ```python
  73 def mean(numlist):
  74     try:
  75         total = sum(numlist)
  76         length = len(numlist)
  77     except ValueError:
  78         print "The number list was not a list of numbers."
  79     except:
  80         print "There was a problem evaluating the number list."
  81     return total/length
  82 ```
  83
  84 Sometimes tests they are functions alongside the function definitions
  85 they are testing.
  86
  87 ```python
  88 def mean(numlist):
  89     try:
  90         total = sum(numlist)
  91         length = len(numlist)
  92     except ValueError:
  93         print "The number list was not a list of numbers."
  94     except:
  95         print "There was a problem evaluating the number list."
  96     return total/length
  97
  98
  99 def test_mean():
 100     assert mean([0, 0, 0, 0]) == 0
 101     assert mean([0, 200]) == 100
 102     assert mean([0, -200]) == -100
 103     assert mean([0]) == 0
 104
 105
 106 def test_floating_mean():
 107     assert mean([1, 2]) == 1.5
 108 ```
 109
 110 Sometimes they are in an executable independent of the main executable.
 111
 112 ```python
 113 def mean(numlist):
 114     try:
 115         total = sum(numlist)
 116         length = len(numlist)
 117     except ValueError:
 118         print "The number list was not a list of numbers."
 119     except:
 120         print "There was a problem evaluating the number list."
 121     return total/length
 122 ```
 123
 124 Where, in a different file exists a test module:
 125
 126 ```python
 127 import mean
 128
 129 def test_mean():
 130     assert mean([0, 0, 0, 0]) == 0
 131     assert mean([0, 200]) == 100
 132     assert mean([0, -200]) == -100
 133     assert mean([0]) == 0
 134
 135
 136 def test_floating_mean():
 137     assert mean([1, 2]) == 1.5
 138 ```
 139
 140 # When should we test?
 141
 142 The three right answers are:
 143
 144 -   **ALWAYS!**
 145 -   **EARLY!**
 146 -   **OFTEN!**
 147
 148 The longer answer is that testing either before or after your software
 149 is written will improve your code, but testing after your program is
 150 used for something important is too late.
 151
 152 If we have a robust set of tests, we can run them before adding
 153 something new and after adding something new. If the tests give the same
 154 results (as appropriate), we can have some assurance that we didn't
 155 wreak anything. The same idea applies to making changes in your system
 156 configuration, updating support codes, etc.
 157
 158 Another important feature of testing is that it helps you remember what
 159 all the parts of your code do. If you are working on a large project
 160 over three years and you end up with 200 classes, it may be hard to
 161 remember what the widget class does in detail. If you have a test that
 162 checks all of the widget's functionality, you can look at the test to
 163 remember what it's supposed to do.
 164
 165 # Who should test?
 166
 167 In a collaborative coding environment, where many developers contribute
 168 to the same code base, developers should be responsible individually for
 169 testing the functions they create and collectively for testing the code
 170 as a whole.
 171
 172 Professionals often test their code, and take pride in test coverage,
 173 the percent of their functions that they feel confident are
 174 comprehensively tested.
 175
 176 # How are tests written?
 177
 178 The type of tests that are written is determined by the testing
 179 framework you adopt. Don't worry, there are a lot of choices.
 180
 181 ## Types of Tests
 182
 183 **Exceptions:** Exceptions can be thought of as type of runtime test.
 184 They alert the user to exceptional behavior in the code. Often,
 185 exceptions are related to functions that depend on input that is unknown
 186 at compile time. Checks that occur within the code to handle exceptional
 187 behavior that results from this type of input are called Exceptions.
 188
 189 **Unit Tests:** Unit tests are a type of test which test the fundamental
 190 units of a program's functionality. Often, this is on the class or
 191 function level of detail. However what defines a *code unit* is not
 192 formally defined.
 193
 194 To test functions and classes, the interfaces (API) - rather than the
 195 implementation - should be tested. Treating the implementation as a
 196 black box, we can probe the expected behavior with boundary cases for
 197 the inputs.
 198
 199 **System Tests:** System level tests are intended to test the code as a
 200 whole. As opposed to unit tests, system tests ask for the behavior as a
 201 whole. This sort of testing involves comparison with other validated
 202 codes, analytical solutions, etc.
 203
 204 **Regression Tests:** A regression test ensures that new code does
 205 change anything. If you change the default answer, for example, or add a
 206 new question, you'll need to make sure that missing entries are still
 207 found and fixed.
 208
 209 **Integration Tests:** Integration tests query the ability of the code
 210 to integrate well with the system configuration and third party
 211 libraries and modules. This type of test is essential for codes that
 212 depend on libraries which might be updated independently of your code or
 213 when your code might be used by a number of users who may have various
 214 versions of libraries.
 215
 216 **Test Suites:** Putting a series of unit tests into a collection of
 217 modules creates, a test suite. Typically the suite as a whole is
 218 executed (rather than each test individually) when verifying that the
 219 code base still functions after changes have been made.
 220
 221 # Elements of a Test
 222
 223 **Behavior:** The behavior you want to test. For example, you might want
 224 to test the fun() function.
 225
 226 **Expected Result:** This might be a single number, a range of numbers,
 227 a new fully defined object, a system state, an exception, etc. When we
 228 run the fun() function, we expect to generate some fun. If we don't
 229 generate any fun, the fun() function should fail its test.
 230 Alternatively, if it does create some fun, the fun() function should
 231 pass this test. The the expected result should known *a priori*. For
 232 numerical functions, this is result is ideally analytically determined
 233 even if the function being tested isn't.
 234
 235 **Assertions:** Require that some conditional be true. If the
 236 conditional is false, the test fails.
 237
 238 **Fixtures:** Sometimes you have to do some legwork to create the
 239 objects that are necessary to run one or many tests. These objects are
 240 called fixtures as they are not really part of the test themselves but
 241 rather involve getting the computer into the appropriate state.
 242
 243 For example, since fun varies a lot between people, the fun() function
 244 is a method of the Person class. In order to check the fun function,
 245 then, we need to create an appropriate Person object on which to run
 246 fun().
 247
 248 **Setup and teardown:** Creating fixtures is often done in a call to a
 249 setup function. Deleting them and other cleanup is done in a teardown
 250 function.
 251
 252 **The Big Picture:** Putting all this together, the testing algorithm is
 253 often:
 254
 255 ```python
 256 setup()
 257 test()
 258 teardown()
 259 ```
 260
 261 But, sometimes it's the case that your tests change the fixtures. If so,
 262 it's better for the setup() and teardown() functions to occur on either
 263 side of each test. In that case, the testing algorithm should be:
 264
 265 ```python
 266 setup()
 267 test1()
 268 teardown()
 269
 270 setup()
 271 test2()
 272 teardown()
 273
 274 setup()
 275 test3()
 276 teardown()
 277 ```
 278
 279 * * * * *
 280
 281 # Nose: A Python Testing Framework
 282
 283 The testing framework we'll discuss today is called nose. However, there
 284 are several other testing frameworks available in most language. Most
 285 notably there is [JUnit](http://www.junit.org/) in Java which can
 286 arguably attributed to inventing the testing framework.
 287
 288 ## Where do nose tests live?
 289
 290 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 291 `test_`. Specifically, these satisfy the testMatch regular expression
 292 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 293 in the unittest.TestCase subclasses chat you create in your code. You
 294 can also create test functions which are not unittest.TestCase
 295 subclasses if they are named with the configured testMatch regular
 296 expression.)
 297
 298 ## Nose Test Syntax
 299
 300 To write a nose test, we make assertions.
 301
 302 ```python
 303 assert should_be_true()
 304 assert not should_not_be_true()
 305 ```
 306
 307 Additionally, nose itself defines number of assert functions which can
 308 be used to test more specific aspects of the code base.
 309
 310 ```python
 311 from nose.tools import *
 312
 313 assert_equal(a, b)
 314 assert_almost_equal(a, b)
 315 assert_true(a)
 316 assert_false(a)
 317 assert_raises(exception, func, *args, **kwargs)
 318 assert_is_instance(a, b)
 319 # and many more!
 320 ```
 321
 322 Moreover, numpy offers similar testing functions for arrays:
 323
 324 ```python
 325 from numpy.testing import *
 326
 327 assert_array_equal(a, b)
 328 assert_array_almost_equal(a, b)
 329 # etc.
 330 ```
 331
 332 ## Exercise: Writing tests for mean()
 333
 334 There are a few tests for the mean() function that we listed in this
 335 lesson. What are some tests that should fail? Add at least three test
 336 cases to this set. Edit the `test_mean.py` file which tests the mean()
 337 function in `mean.py`.
 338
 339 *Hint:* Think about what form your input could take and what you should
 340 do to handle it. Also, think about the type of the elements in the list.
 341 What should be done if you pass a list of integers? What if you pass a
 342 list of strings?
 343
 344 **Example**:
 345
 346     nosetests test_mean.py
 347
 348 # Test Driven Development
 349
 350 Test driven development (TDD) is a philosophy whereby the developer
 351 creates code by **writing the tests fist**. That is to say you write the
 352 tests *before* writing the associated code!
 353
 354 This is an iterative process whereby you write a test then write the
 355 minimum amount code to make the test pass. If a new feature is needed,
 356 another test is written and the code is expanded to meet this new use
 357 case. This continues until the code does what is needed.
 358
 359 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 360 who diligently follow TDD swear by its effectiveness. This development
 361 style was put forth most strongly by [Kent Beck in
 362 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 363
 364 ## A TDD Example
 365
 366 Say you want to write a fib() function which generates values of the
 367 Fibonacci sequence of given indexes. You would - of course - start by
 368 writing the test, possibly testing a single value:
 369
 370 ```python
 371 from nose import assert_equal
 372
 373 from pisa import fib
 374
 375 def test_fib1():
 376     obs = fib(2)
 377     exp = 1
 378     assert_equal(obs, exp)
 379 ```
 380
 381 You would *then* go ahead and write the actual function:
 382
 383 ```python
 384 def fib(n):
 385     # you snarky so-and-so
 386     return 1
 387 ```
 388
 389 And that is it right?! Well, not quite. This implementation fails for
 390 most other values. Adding tests we see that:
 391
 392 ```python
 393 def test_fib1():
 394     obs = fib(2)
 395     exp = 1
 396     assert_equal(obs, exp)
 397
 398
 399 def test_fib2():
 400     obs = fib(0)
 401     exp = 0
 402     assert_equal(obs, exp)
 403
 404     obs = fib(1)
 405     exp = 1
 406     assert_equal(obs, exp)
 407 ```
 408
 409 This extra test now requires that we bother to implement at least the
 410 initial values:
 411
 412 ```python
 413 def fib(n):
 414     # a little better
 415     if n == 0 or n == 1:
 416         return n
 417     return 1
 418 ```
 419
 420 However, this function still falls over for `2 < n`. Time for more
 421 tests!
 422
 423 ```python
 424 def test_fib1():
 425     obs = fib(2)
 426     exp = 1
 427     assert_equal(obs, exp)
 428
 429
 430 def test_fib2():
 431     obs = fib(0)
 432     exp = 0
 433     assert_equal(obs, exp)
 434
 435     obs = fib(1)
 436     exp = 1
 437     assert_equal(obs, exp)
 438
 439
 440 def test_fib3():
 441     obs = fib(3)
 442     exp = 2
 443     assert_equal(obs, exp)
 444
 445     obs = fib(6)
 446     exp = 8
 447     assert_equal(obs, exp)
 448 ```
 449
 450 At this point, we had better go ahead and try do the right thing...
 451
 452 ```python
 453 def fib(n):
 454     # finally, some math
 455     if n == 0 or n == 1:
 456         return n
 457     else:
 458         return fib(n - 1) + fib(n - 2)
 459 ```
 460
 461 Here it becomes very tempting to take an extended coffee break or
 462 possibly a power lunch. But then you remember those pesky negative
 463 numbers and floats. Perhaps the right thing to do here is to just be
 464 undefined.
 465
 466 ```python
 467 def test_fib1():
 468     obs = fib(2)
 469     exp = 1
 470     assert_equal(obs, exp)
 471
 472
 473 def test_fib2():
 474     obs = fib(0)
 475     exp = 0
 476     assert_equal(obs, exp)
 477
 478     obs = fib(1)
 479     exp = 1
 480     assert_equal(obs, exp)
 481
 482
 483 def test_fib3():
 484     obs = fib(3)
 485     exp = 2
 486     assert_equal(obs, exp)
 487
 488     obs = fib(6)
 489     exp = 8
 490     assert_equal(obs, exp)
 491
 492
 493 def test_fib3():
 494     obs = fib(13.37)
 495     exp = NotImplemented
 496     assert_equal(obs, exp)
 497
 498     obs = fib(-9000)
 499     exp = NotImplemented
 500     assert_equal(obs, exp)
 501 ```
 502
 503 This means that it is time to add the appropriate case to the function
 504 itself:
 505
 506 ```python
 507 def fib(n):
 508     # sequence and you shall find
 509     if n < 0 or int(n) != n:
 510         return NotImplemented
 511     elif n == 0 or n == 1:
 512         return n
 513     else:
 514         return fib(n - 1) + fib(n - 2)
 515 ```
 516
 517 And thus - finally - we have a robust function together with working
 518 tests!
 519
 520 # Exercise
 521
 522 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
 523 define a line segment. Additionally there exists experimental data which
 524 can be anywhere in the domain. Find the data point which is closest to
 525 the line segment.
 526
 527 In the `close_line.py` file there are four different implementations
 528 which all solve this problem. [You can read more about them
 529 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
 530 there are no tests! Please write from scratch a `test_close_line.py`
 531 file which tests the closest\_data\_to\_line() functions.
 532
 533 *Hint:* you can use one implementation function to test another. Below
 534 is some sample data to help you get started.
 535
 536 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
 537 > -
 538
 539 ```python
 540 import numpy as np
 541
 542 p1 = np.array([0.0, 0.0])
 543 p2 = np.array([1.0, 1.0])
 544 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 545 ```