python/testing/Readme.md

   1 [Back To Debugging](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/debugging) - [Forward To Documentation](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/documentation)
   2
   3 * * * * *
   4
   5 **Presented By Katy Huff**
   6
   7 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
   8 Scopatz**
   9
  10 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
  11 # What is testing?
  12
  13 Software testing is a process by which one or more expected behaviors
  14 and results from a piece of software are exercised and confirmed. Well
  15 chosen tests will confirm expected code behavior for the extreme
  16 boundaries of the input domains, output ranges, parametric combinations,
  17 and other behavioral **edge cases**.
  18
  19 # Why test software?
  20
  21 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  22 and predictable code **every time**, you must test your code in order to
  23 trust it enough to answer in the affirmative to at least a few of the
  24 following questions:
  25
  26 -   Does your code work?
  27 -   **Always?**
  28 -   Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
  29 -   Does it continue to work after changes are made?
  30 -   Does it continue to work after system configurations or libraries
  31     are upgraded?
  32 -   Does it respond properly for a full range of input parameters?
  33 -   What about **edge or corner cases**?
  34 -   What's the limit on that input parameter?
  35 -   How will it affect your
  36     [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
  37
  38 ## Verification
  39
  40 *Verification* is the process of asking, "Have we built the software
  41 correctly?" That is, is the code bug free, precise, accurate, and
  42 repeatable?
  43
  44 ## Validation
  45
  46 *Validation* is the process of asking, "Have we built the right
  47 software?" That is, is the code designed in such a way as to produce the
  48 answers we are interested in, data we want, etc.
  49
  50 ## Uncertainty Quantification
  51
  52 *Uncertainty Quantification* is the process of asking, "Given that our
  53 algorithm may not be deterministic, was our execution within acceptable
  54 error bounds?" This is particularly important for anything which uses
  55 random numbers, eg Monte Carlo methods.
  56
  57 # Where are tests?
  58
  59 Say we have an averaging function:
  60
  61 ```python
  62 def mean(numlist):
  63     total = sum(numlist)
  64     length = len(numlist)
  65     return total/length
  66 ```
  67
  68 Tests could be implemented as runtime **exceptions in the function**:
  69
  70 ```python
  71 def mean(numlist):
  72     try:
  73         total = sum(numlist)
  74         length = len(numlist)
  75     except TypeError:
  76         raise TypeError("The number list was not a list of numbers.")
  77     except:
  78         print "There was a problem evaluating the number list."
  79     return total/length
  80 ```
  81
  82 Sometimes tests they are functions alongside the function definitions
  83 they are testing.
  84
  85 ```python
  86 def mean(numlist):
  87     try:
  88         total = sum(numlist)
  89         length = len(numlist)
  90     except TypeError:
  91         raise TypeError("The number list was not a list of numbers.")
  92     except:
  93         print "There was a problem evaluating the number list."
  94     return total/length
  95
  96
  97 def test_mean():
  98     assert mean([0, 0, 0, 0]) == 0
  99     assert mean([0, 200]) == 100
 100     assert mean([0, -200]) == -100
 101     assert mean([0]) == 0
 102
 103
 104 def test_floating_mean():
 105     assert mean([1, 2]) == 1.5
 106 ```
 107
 108 Sometimes they are in an executable independent of the main executable.
 109
 110 ```python
 111 def mean(numlist):
 112     try:
 113         total = sum(numlist)
 114         length = len(numlist)
 115     except TypeError:
 116         raise TypeError("The number list was not a list of numbers.")
 117     except:
 118         print "There was a problem evaluating the number list."
 119     return total/length
 120 ```
 121
 122 Where, in a different file exists a test module:
 123
 124 ```python
 125 import mean
 126
 127 def test_mean():
 128     assert mean([0, 0, 0, 0]) == 0
 129     assert mean([0, 200]) == 100
 130     assert mean([0, -200]) == -100
 131     assert mean([0]) == 0
 132
 133
 134 def test_floating_mean():
 135     assert mean([1, 2]) == 1.5
 136 ```
 137
 138 # When should we test?
 139
 140 The three right answers are:
 141
 142 -   **ALWAYS!**
 143 -   **EARLY!**
 144 -   **OFTEN!**
 145
 146 The longer answer is that testing either before or after your software
 147 is written will improve your code, but testing after your program is
 148 used for something important is too late.
 149
 150 If we have a robust set of tests, we can run them before adding
 151 something new and after adding something new. If the tests give the same
 152 results (as appropriate), we can have some assurance that we didn't
 153 wreak anything. The same idea applies to making changes in your system
 154 configuration, updating support codes, etc.
 155
 156 Another important feature of testing is that it helps you remember what
 157 all the parts of your code do. If you are working on a large project
 158 over three years and you end up with 200 classes, it may be hard to
 159 remember what the widget class does in detail. If you have a test that
 160 checks all of the widget's functionality, you can look at the test to
 161 remember what it's supposed to do.
 162
 163 # Who should test?
 164
 165 In a collaborative coding environment, where many developers contribute
 166 to the same code base, developers should be responsible individually for
 167 testing the functions they create and collectively for testing the code
 168 as a whole.
 169
 170 Professionals often test their code, and take pride in test coverage,
 171 the percent of their functions that they feel confident are
 172 comprehensively tested.
 173
 174 # How are tests written?
 175
 176 The type of tests that are written is determined by the testing
 177 framework you adopt. Don't worry, there are a lot of choices.
 178
 179 ## Types of Tests
 180
 181 **Exceptions:** Exceptions can be thought of as type of runtime test.
 182 They alert the user to exceptional behavior in the code. Often,
 183 exceptions are related to functions that depend on input that is unknown
 184 at compile time. Checks that occur within the code to handle exceptional
 185 behavior that results from this type of input are called Exceptions.
 186
 187 **Unit Tests:** Unit tests are a type of test which test the fundamental
 188 units of a program's functionality. Often, this is on the class or
 189 function level of detail. However what defines a *code unit* is not
 190 formally defined.
 191
 192 To test functions and classes, the interfaces (API) - rather than the
 193 implementation - should be tested. Treating the implementation as a
 194 black box, we can probe the expected behavior with boundary cases for
 195 the inputs.
 196
 197 **System Tests:** System level tests are intended to test the code as a
 198 whole. As opposed to unit tests, system tests ask for the behavior as a
 199 whole. This sort of testing involves comparison with other validated
 200 codes, analytical solutions, etc.
 201
 202 **Regression Tests:** A regression test ensures that new code does
 203 change anything. If you change the default answer, for example, or add a
 204 new question, you'll need to make sure that missing entries are still
 205 found and fixed.
 206
 207 **Integration Tests:** Integration tests query the ability of the code
 208 to integrate well with the system configuration and third party
 209 libraries and modules. This type of test is essential for codes that
 210 depend on libraries which might be updated independently of your code or
 211 when your code might be used by a number of users who may have various
 212 versions of libraries.
 213
 214 **Test Suites:** Putting a series of unit tests into a collection of
 215 modules creates, a test suite. Typically the suite as a whole is
 216 executed (rather than each test individually) when verifying that the
 217 code base still functions after changes have been made.
 218
 219 # Elements of a Test
 220
 221 **Behavior:** The behavior you want to test. For example, you might want
 222 to test the fun() function.
 223
 224 **Expected Result:** This might be a single number, a range of numbers,
 225 a new fully defined object, a system state, an exception, etc. When we
 226 run the fun() function, we expect to generate some fun. If we don't
 227 generate any fun, the fun() function should fail its test.
 228 Alternatively, if it does create some fun, the fun() function should
 229 pass this test. The the expected result should known *a priori*. For
 230 numerical functions, this is result is ideally analytically determined
 231 even if the function being tested isn't.
 232
 233 **Assertions:** Require that some conditional be true. If the
 234 conditional is false, the test fails.
 235
 236 **Fixtures:** Sometimes you have to do some legwork to create the
 237 objects that are necessary to run one or many tests. These objects are
 238 called fixtures as they are not really part of the test themselves but
 239 rather involve getting the computer into the appropriate state.
 240
 241 For example, since fun varies a lot between people, the fun() function
 242 is a method of the Person class. In order to check the fun function,
 243 then, we need to create an appropriate Person object on which to run
 244 fun().
 245
 246 **Setup and teardown:** Creating fixtures is often done in a call to a
 247 setup function. Deleting them and other cleanup is done in a teardown
 248 function.
 249
 250 **The Big Picture:** Putting all this together, the testing algorithm is
 251 often:
 252
 253 ```python
 254 setup()
 255 test()
 256 teardown()
 257 ```
 258
 259 But, sometimes it's the case that your tests change the fixtures. If so,
 260 it's better for the setup() and teardown() functions to occur on either
 261 side of each test. In that case, the testing algorithm should be:
 262
 263 ```python
 264 setup()
 265 test1()
 266 teardown()
 267
 268 setup()
 269 test2()
 270 teardown()
 271
 272 setup()
 273 test3()
 274 teardown()
 275 ```
 276
 277 * * * * *
 278
 279 # Nose: A Python Testing Framework
 280
 281 The testing framework we'll discuss today is called nose. However, there
 282 are several other testing frameworks available in most language. Most
 283 notably there is [JUnit](http://www.junit.org/) in Java which can
 284 arguably attributed to inventing the testing framework.
 285
 286 ## Where do nose tests live?
 287
 288 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 289 `test_`. Specifically, these satisfy the testMatch regular expression
 290 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 291 in the unittest.TestCase subclasses chat you create in your code. You
 292 can also create test functions which are not unittest.TestCase
 293 subclasses if they are named with the configured testMatch regular
 294 expression.)
 295
 296 ## Nose Test Syntax
 297
 298 To write a nose test, we make assertions.
 299
 300 ```python
 301 assert should_be_true()
 302 assert not should_not_be_true()
 303 ```
 304
 305 Additionally, nose itself defines number of assert functions which can
 306 be used to test more specific aspects of the code base.
 307
 308 ```python
 309 from nose.tools import *
 310
 311 assert_equal(a, b)
 312 assert_almost_equal(a, b)
 313 assert_true(a)
 314 assert_false(a)
 315 assert_raises(exception, func, *args, **kwargs)
 316 assert_is_instance(a, b)
 317 # and many more!
 318 ```
 319
 320 Moreover, numpy offers similar testing functions for arrays:
 321
 322 ```python
 323 from numpy.testing import *
 324
 325 assert_array_equal(a, b)
 326 assert_array_almost_equal(a, b)
 327 # etc.
 328 ```
 329
 330 ## Exercise: Writing tests for mean()
 331
 332 There are a few tests for the mean() function that we listed in this
 333 lesson. What are some tests that should fail? Add at least three test
 334 cases to this set. Edit the `test_mean.py` file which tests the mean()
 335 function in `mean.py`.
 336
 337 *Hint:* Think about what form your input could take and what you should
 338 do to handle it. Also, think about the type of the elements in the list.
 339 What should be done if you pass a list of integers? What if you pass a
 340 list of strings?
 341
 342 **Example**:
 343
 344     nosetests test_mean.py
 345
 346 # Test Driven Development
 347
 348 Test driven development (TDD) is a philosophy whereby the developer
 349 creates code by **writing the tests first**. That is to say you write the
 350 tests *before* writing the associated code!
 351
 352 This is an iterative process whereby you write a test then write the
 353 minimum amount code to make the test pass. If a new feature is needed,
 354 another test is written and the code is expanded to meet this new use
 355 case. This continues until the code does what is needed.
 356
 357 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 358 who diligently follow TDD swear by its effectiveness. This development
 359 style was put forth most strongly by [Kent Beck in
 360 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 361
 362 ## A TDD Example
 363
 364 Say you want to write a fib() function which generates values of the
 365 Fibonacci sequence of given indexes. You would - of course - start by
 366 writing the test, possibly testing a single value:
 367
 368 ```python
 369 from nose.tools import assert_equal
 370
 371 from pisa import fib
 372
 373 def test_fib1():
 374     obs = fib(2)
 375     exp = 1
 376     assert_equal(obs, exp)
 377 ```
 378
 379 You would *then* go ahead and write the actual function:
 380
 381 ```python
 382 def fib(n):
 383     # you snarky so-and-so
 384     return 1
 385 ```
 386
 387 And that is it right?! Well, not quite. This implementation fails for
 388 most other values. Adding tests we see that:
 389
 390 ```python
 391 def test_fib1():
 392     obs = fib(2)
 393     exp = 1
 394     assert_equal(obs, exp)
 395
 396
 397 def test_fib2():
 398     obs = fib(0)
 399     exp = 0
 400     assert_equal(obs, exp)
 401
 402     obs = fib(1)
 403     exp = 1
 404     assert_equal(obs, exp)
 405 ```
 406
 407 This extra test now requires that we bother to implement at least the
 408 initial values:
 409
 410 ```python
 411 def fib(n):
 412     # a little better
 413     if n == 0 or n == 1:
 414         return n
 415     return 1
 416 ```
 417
 418 However, this function still falls over for `2 < n`. Time for more
 419 tests!
 420
 421 ```python
 422 def test_fib1():
 423     obs = fib(2)
 424     exp = 1
 425     assert_equal(obs, exp)
 426
 427
 428 def test_fib2():
 429     obs = fib(0)
 430     exp = 0
 431     assert_equal(obs, exp)
 432
 433     obs = fib(1)
 434     exp = 1
 435     assert_equal(obs, exp)
 436
 437
 438 def test_fib3():
 439     obs = fib(3)
 440     exp = 2
 441     assert_equal(obs, exp)
 442
 443     obs = fib(6)
 444     exp = 8
 445     assert_equal(obs, exp)
 446 ```
 447
 448 At this point, we had better go ahead and try do the right thing...
 449
 450 ```python
 451 def fib(n):
 452     # finally, some math
 453     if n == 0 or n == 1:
 454         return n
 455     else:
 456         return fib(n - 1) + fib(n - 2)
 457 ```
 458
 459 Here it becomes very tempting to take an extended coffee break or
 460 possibly a power lunch. But then you remember those pesky negative
 461 numbers and floats. Perhaps the right thing to do here is to just be
 462 undefined.
 463
 464 ```python
 465 def test_fib1():
 466     obs = fib(2)
 467     exp = 1
 468     assert_equal(obs, exp)
 469
 470
 471 def test_fib2():
 472     obs = fib(0)
 473     exp = 0
 474     assert_equal(obs, exp)
 475
 476     obs = fib(1)
 477     exp = 1
 478     assert_equal(obs, exp)
 479
 480
 481 def test_fib3():
 482     obs = fib(3)
 483     exp = 2
 484     assert_equal(obs, exp)
 485
 486     obs = fib(6)
 487     exp = 8
 488     assert_equal(obs, exp)
 489
 490
 491 def test_fib3():
 492     obs = fib(13.37)
 493     exp = NotImplemented
 494     assert_equal(obs, exp)
 495
 496     obs = fib(-9000)
 497     exp = NotImplemented
 498     assert_equal(obs, exp)
 499 ```
 500
 501 This means that it is time to add the appropriate case to the function
 502 itself:
 503
 504 ```python
 505 def fib(n):
 506     # sequence and you shall find
 507     if n < 0 or int(n) != n:
 508         return NotImplemented
 509     elif n == 0 or n == 1:
 510         return n
 511     else:
 512         return fib(n - 1) + fib(n - 2)
 513 ```
 514
 515 # Quality Assurance Exercise
 516
 517 Can you think of other tests to make for the fibonacci function? I promise there
 518 are at least two.
 519
 520 Implement one new test in test_fib.py, run nosetests, and if it fails, implement
 521 a more robust function for that case.
 522
 523 And thus - finally - we have a robust function together with working
 524 tests!
 525
 526 # Exercise
 527
 528 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
 529 define a line segment. Additionally there exists experimental data which
 530 can be anywhere in the domain. Find the data point which is closest to
 531 the line segment.
 532
 533 In the `close_line.py` file there are four different implementations
 534 which all solve this problem. [You can read more about them
 535 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
 536 there are no tests! Please write from scratch a `test_close_line.py`
 537 file which tests the closest\_data\_to\_line() functions.
 538
 539 *Hint:* you can use one implementation function to test another. Below
 540 is some sample data to help you get started.
 541
 542 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
 543 > -
 544
 545 ```python
 546 import numpy as np
 547
 548 p1 = np.array([0.0, 0.0])
 549 p2 = np.array([1.0, 1.0])
 550 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 551 ```
 552 # Building a Library of Code you Trust
 553
 554 Suppose we’re going to be dealing a lot with these animal count files,
 555 and doing many different kinds of analysis with them. In the
 556 introduction to Python lesson we wrote a function that reads these files
 557 but it’s stuck off in an IPython notebook. We could copy and paste it
 558 into a new notebook every time we want to use it but that gets tedious
 559 and makes it difficult to add features to the function. The ideal
 560 solution would be to keep the function in one spot and use it over and
 561 over again from many different places. Python modules to the rescue!
 562
 563 We’re going to move beyond the IPython notebook. Most Python code is
 564 stored in \`.py\` files and then used in other \`.py\` files where it
 565 has been pulled in using an \`import\` statement. Today we’ll show you
 566 how to do that.
 567
 568 ## Exercises
 569
 570 ### Exercise 1
 571
 572 Make a new text file called \`animals.py\`. Copy the file reading
 573 function from yesterday’s IPython notebook into the file and modify it
 574 so that it returns the columns of the file as lists (instead of printing
 575 certain lines).
 576
 577 ### Exercise 2
 578
 579 We’re going to make a function to calculate the mean of all the values
 580 in a list, but we’re going to write the tests for it first. Make a new
 581 text file called \`test\_animals.py\`. Make a function called
 582 \`test\_mean\` that runs your theoretical mean function through several
 583 tests.
 584
 585 ### Exercise 3
 586
 587 Write the mean function in \`animals.py\` and verify that it passes your
 588 tests.
 589
 590 ### Exercise 4
 591
 592 Write tests for a function that will take a file name and animal name as
 593 arguments, and return the average number of animals per sighting.
 594
 595 ### Exercise 5
 596
 597 Write a function that takes a file name and animal name and returns the
 598 average number of animals per sighting. Make sure it passes your tests.