5-Testing/Readme.md

   1 [Back To
   2 Debugging](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/)
   3 - [Forward To
   4 Documentation](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/)
   5
   6 * * * * *
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
  11 Scopatz**
  12
  13 ![image](http://memecreator.net/the-most-interesting-man-in-the-world/showimage.php/169/I-don't-always-test-my-code-But-when-I-do-I-do-it-in-production.jpg)
  14 # What is testing?
  15
  16 Software testing is a process by which one or more expected behaviors
  17 and results from a piece of software are exercised and confirmed. Well
  18 chosen tests will confirm expected code behavior for the extreme
  19 boundaries of the input domains, output ranges, parametric combinations,
  20 and other behavioral edge cases.
  21
  22 # Why test software?
  23
  24 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  25 and predictable code every time, you must test your code in order to
  26 trust it enough to answer in the affirmative to at least a few of the
  27 following questions:
  28
  29 -   Does your code work?
  30 -   Always?
  31 -   Does it do what you think it does?
  32 -   Does it continue to work after changes are made?
  33 -   Does it continue to work after system configurations or libraries
  34     are upgraded?
  35 -   Does it respond properly for a full range of input parameters?
  36 -   What about edge or corner cases?
  37 -   What's the limit on that input parameter?
  38
  39 ## Verification
  40
  41 *Verification* is the process of asking, "Have we built the software
  42 correctly?" That is, is the code bug free, precise, accurate, and
  43 repeatable?
  44
  45 ## Validation
  46
  47 *Validation* is the process of asking, "Have we built the right
  48 software?" That is, is the code designed in such a way as to produce the
  49 answers we are interested in, data we want, etc.
  50
  51 ## Uncertainty Quantification
  52
  53 *Uncertainty Quantification* is the process of asking, "Given that our
  54 algorithm may not be deterministic, was our execution within acceptable
  55 error bounds?" This is particularly important for anything which uses
  56 random numbers, eg Monte Carlo methods.
  57
  58 # Where are tests?
  59
  60 Say we have an averaging function:
  61
  62 ```python
  63 def mean(numlist):
  64     total = sum(numlist)
  65     length = len(numlist)
  66     return total/length
  67 ```
  68
  69 Tests could be implemented as runtime exceptions in the function:
  70
  71 ```python
  72 def mean(numlist):
  73     try:
  74         total = sum(numlist)
  75         length = len(numlist)
  76     except ValueError:
  77         print "The number list was not a list of numbers."
  78     except:
  79         print "There was a problem evaluating the number list."
  80     return total/length
  81 ```
  82
  83 Sometimes tests they are functions alongside the function definitions
  84 they are testing.
  85
  86 ```python
  87 def mean(numlist):
  88     try:
  89         total = sum(numlist)
  90         length = len(numlist)
  91     except ValueError:
  92         print "The number list was not a list of numbers."
  93     except:
  94         print "There was a problem evaluating the number list."
  95     return total/length
  96
  97
  98 def test_mean():
  99     assert mean([0, 0, 0, 0]) == 0
 100     assert mean([0, 200]) == 100
 101     assert mean([0, -200]) == -100
 102     assert mean([0]) == 0
 103
 104
 105 def test_floating_mean():
 106     assert mean([1, 2]) == 1.5
 107 ```
 108
 109 Sometimes they are in an executable independent of the main executable.
 110
 111 ```python
 112 def mean(numlist):
 113     try:
 114         total = sum(numlist)
 115         length = len(numlist)
 116     except ValueError:
 117         print "The number list was not a list of numbers."
 118     except:
 119         print "There was a problem evaluating the number list."
 120     return total/length
 121 ```
 122
 123 Where, in a different file exists a test module:
 124
 125 ```python
 126 import mean
 127
 128 def test_mean():
 129     assert mean([0, 0, 0, 0]) == 0
 130     assert mean([0, 200]) == 100
 131     assert mean([0, -200]) == -100
 132     assert mean([0]) == 0
 133
 134
 135 def test_floating_mean():
 136     assert mean([1, 2]) == 1.5
 137 ```
 138
 139 # When should we test?
 140
 141 The three right answers are:
 142
 143 -   **ALWAYS!**
 144 -   **EARLY!**
 145 -   **OFTEN!**
 146
 147 The longer answer is that testing either before or after your software
 148 is written will improve your code, but testing after your program is
 149 used for something important is too late.
 150
 151 If we have a robust set of tests, we can run them before adding
 152 something new and after adding something new. If the tests give the same
 153 results (as appropriate), we can have some assurance that we didn'treak
 154 anything. The same idea applies to making changes in your system
 155 configuration, updating support codes, etc.
 156
 157 Another important feature of testing is that it helps you remember what
 158 all the parts of your code do. If you are working on a large project
 159 over three years and you end up with 200 classes, it may be hard to
 160 remember what the widget class does in detail. If you have a test that
 161 checks all of the widget's functionality, you can look at the test to
 162 remember what it's supposed to do.
 163
 164 # Who should test?
 165
 166 In a collaborative coding environment, where many developers contribute
 167 to the same code base, developers should be responsible individually for
 168 testing the functions they create and collectively for testing the code
 169 as a whole.
 170
 171 Professionals often test their code, and take pride in test coverage,
 172 the percent of their functions that they feel confident are
 173 comprehensively tested.
 174
 175 # How are tests written?
 176
 177 The type of tests that are written is determined by the testing
 178 framework you adopt. Don't worry, there are a lot of choices.
 179
 180 ## Types of Tests
 181
 182 **Exceptions:** Exceptions can be thought of as type of runttime test.
 183 They alert the user to exceptional behavior in the code. Often,
 184 exceptions are related to functions that depend on input that is unknown
 185 at compile time. Checks that occur within the code to handle exceptional
 186 behavior that results from this type of input are called Exceptions.
 187
 188 **Unit Tests:** Unit tests are a type of test which test the fundametal
 189 units of a program's functionality. Often, this is on the class or
 190 function level of detail. However what defines a *code unit* is not
 191 formally defined.
 192
 193 To test functions and classes, the interfaces (API) - rather than the
 194 implmentation - should be tested. Treating the implementation as a ack
 195 box, we can probe the expected behavior with boundary cases for the
 196 inputs.
 197
 198 **System Tests:** System level tests are intended to test the code as a
 199 whole. As opposed to unit tests, system tests ask for the behavior as a
 200 whole. This sort of testing involves comparison with other validated
 201 codes, analytical solutions, etc.
 202
 203 **Regression Tests:** A regression test ensures that new code does
 204 change anything. If you change the default answer, for example, or add a
 205 new question, you'll need to make sure that missing entries are still
 206 found and fixed.
 207
 208 **Integration Tests:** Integration tests query the ability of the code
 209 to integrate well with the system configuration and third party
 210 libraries and modules. This type of test is essential for codes that
 211 depend on libraries which might be updated independently of your code or
 212 when your code might be used by a number of users who may have various
 213 versions of libraries.
 214
 215 **Test Suites:** Putting a series of unit tests into a collection of
 216 modules creates, a test suite. Typically the suite as a whole is
 217 executed (rather than each test individually) when verifying that the
 218 code base still functions after changes have been made.
 219
 220 # Elements of a Test
 221
 222 **Behavior:** The behavior you want to test. For example, you might want
 223 to test the fun() function.
 224
 225 **Expected Result:** This might be a single number, a range of numbers,
 226 a new fully defined object, a system state, an exception, etc. When we
 227 run the fun() function, we expect to generate some fun. If we don't
 228 generate any fun, the fun() function should fail its test.
 229 Alternatively, if it does create some fun, the fun() function should
 230 pass this test. The the expected result should known *a priori*. For
 231 numerical functions, this is result is ideally analytically determined
 232 even if the fucntion being tested isn't.
 233
 234 **Assertions:** Require that some conditional be true. If the
 235 conditional is false, the test fails.
 236
 237 **Fixtures:** Sometimes you have to do some legwork to create the
 238 objects that are necessary to run one or many tests. These objects are
 239 called fixtures as they are not really part of the test themselves but
 240 rather involve getting the computer into the appropriate state.
 241
 242 For example, since fun varies a lot between people, the fun() function
 243 is a method of the Person class. In order to check the fun function,
 244 then, we need to create an appropriate Person object on which to run
 245 fun().
 246
 247 **Setup and teardown:** Creating fixtures is often done in a call to a
 248 setup function. Deleting them and other cleanup is done in a teardown
 249 function.
 250
 251 **The Big Picture:** Putting all this together, the testing algorithm is
 252 often:
 253
 254 ```python
 255 setup()
 256 test()
 257 teardown()
 258 ```
 259
 260 But, sometimes it's the case that your tests change the fixtures. If so,
 261 it's better for the setup() and teardown() functions to occur on either
 262 side of each test. In that case, the testing algorithm should be:
 263
 264 ```python
 265 setup()
 266 test1()
 267 teardown()
 268
 269 setup()
 270 test2()
 271 teardown()
 272
 273 setup()
 274 test3()
 275 teardown()
 276 ```
 277
 278 * * * * *
 279
 280 # Nose: A Python Testing Framework
 281
 282 The testing framework we'll discuss today is called nose. However, there
 283 are several other testing frameworks available in most language. Most
 284 notably there is [JUnit](http://www.junit.org/) in Java which can
 285 arguably attributed to inventing the testing framework.
 286
 287 ## Where do nose tests live?
 288
 289 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
 290 `test_`. Specifically, these satisfy the testMatch regular expression
 291 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
 292 in the unittest.TestCase subclasses chat you create in your code. You
 293 can also create test functions which are not unittest.TestCase
 294 subclasses if they are named with the configured testMatch regular
 295 expression.)
 296
 297 ## Nose Test Syntax
 298
 299 To write a nose test, we make assertions.
 300
 301 ```python
 302 assert should_be_true()
 303 assert not should_not_be_true()
 304 ```
 305
 306 Additionally, nose itself defines number of assert functions which can
 307 be used to test more specific aspects of the code base.
 308
 309 ```python
 310 from nose.tools import *
 311
 312 assert_equal(a, b)
 313 assert_almost_equal(a, b)
 314 assert_true(a)
 315 assert_false(a)
 316 assert_raises(exception, func, *args, **kwargs)
 317 assert_is_instance(a, b)
 318 # and many more!
 319 ```
 320
 321 Moreover, numpy offers similar testing functions for arrays:
 322
 323 ```python
 324 from numpy.testing import *
 325
 326 assert_array_equal(a, b)
 327 assert_array_almost_equal(a, b)
 328 # etc.
 329 ```
 330
 331 ## Exersize: Writing tests for mean()
 332
 333 There are a few tests for the mean() function that we listed in this
 334 lesson. What are some tests that should fail? Add at least three test
 335 cases to this set. Edit the `test_mean.py` file which tests the mean()
 336 function in `mean.py`.
 337
 338 *Hint:* Think about what form your input could take and what you should
 339 do to handle it. Also, think about the type of the elements in the list.
 340 What should be done if you pass a list of integers? What if you pass a
 341 list of strings?
 342
 343 **Example**:
 344
 345     nosetests test_mean.py
 346
 347 # Test Driven Development
 348
 349 Test driven development (TDD) is a philosophy whereby the developer
 350 creates code by **writing the tests fist**. That is to say you write the
 351 tests *before* writing the associated code!
 352
 353 This is an iterative process whereby you write a test then write the
 354 minimum amount code to make the test pass. If a new feature is needed,
 355 another test is written and the code is expanded to meet this new use
 356 case. This continues until the code does what is needed.
 357
 358 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
 359 who diligently follow TDD swear by its effectiveness. This development
 360 style was put forth most strongly by [Kent Beck in
 361 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
 362
 363 ## A TDD Example
 364
 365 Say you want to write a fib() function which generates values of the
 366 Fibinacci sequence fof given indexes. You would - of course - start by
 367 writing the test, possibly testing a single value:
 368
 369 ```python
 370 from nose import assert_equal
 371
 372 from pisa import fib
 373
 374 def test_fib1():
 375     obs = fib(2)
 376     exp = 1
 377     assert_equal(obs, exp)
 378 ```
 379
 380 You would *then* go ahead and write the actual function:
 381
 382 ```python
 383 def fib(n):
 384     # you snarky so-and-so
 385     return 1
 386 ```
 387
 388 And that is it right?! Well, not quite. This implementation fails for
 389 most other values. Adding tests we see that:
 390
 391 ```python
 392 def test_fib1():
 393     obs = fib(2)
 394     exp = 1
 395     assert_equal(obs, exp)
 396
 397
 398 def test_fib2():
 399     obs = fib(0)
 400     exp = 0
 401     assert_equal(obs, exp)
 402
 403     obs = fib(1)
 404     exp = 1
 405     assert_equal(obs, exp)
 406 ```
 407
 408 This extra test now requires that we bother to implement at least the
 409 intial values:
 410
 411 ```python
 412 def fib(n):
 413     # a little better
 414     if n == 0 or n == 1:
 415         return n
 416     return 1
 417 ```
 418
 419 However, this function still falls over for `2 < n`. Time for more
 420 tests!
 421
 422 ```python
 423 def test_fib1():
 424     obs = fib(2)
 425     exp = 1
 426     assert_equal(obs, exp)
 427
 428
 429 def test_fib2():
 430     obs = fib(0)
 431     exp = 0
 432     assert_equal(obs, exp)
 433
 434     obs = fib(1)
 435     exp = 1
 436     assert_equal(obs, exp)
 437
 438
 439 def test_fib3():
 440     obs = fib(3)
 441     exp = 2
 442     assert_equal(obs, exp)
 443
 444     obs = fib(6)
 445     exp = 8
 446     assert_equal(obs, exp)
 447 ```
 448
 449 At this point, we had better go ahead and try do the right thing...
 450
 451 ```python
 452 def fib(n):
 453     # finally, some math
 454     if n == 0 or n == 1:
 455         return n
 456     else:
 457         return fib(n - 1) + fib(n - 2)
 458 ```
 459
 460 Here it becomes very tempting to take an extended coffee break or
 461 possibly a power lunch. But then you remember those pesky negative
 462 numbers and floats. Perhaps the right thing to do here is to just be
 463 undefined.
 464
 465 ```python
 466 def test_fib1():
 467     obs = fib(2)
 468     exp = 1
 469     assert_equal(obs, exp)
 470
 471
 472 def test_fib2():
 473     obs = fib(0)
 474     exp = 0
 475     assert_equal(obs, exp)
 476
 477     obs = fib(1)
 478     exp = 1
 479     assert_equal(obs, exp)
 480
 481
 482 def test_fib3():
 483     obs = fib(3)
 484     exp = 2
 485     assert_equal(obs, exp)
 486
 487     obs = fib(6)
 488     exp = 8
 489     assert_equal(obs, exp)
 490
 491
 492 def test_fib3():
 493     obs = fib(13.37)
 494     exp = NotImplemented
 495     assert_equal(obs, exp)
 496
 497     obs = fib(-9000)
 498     exp = NotImplemented
 499     assert_equal(obs, exp)
 500 ```
 501
 502 This means that it is time to add the appropriate case to the funtion
 503 itself:
 504
 505 ```python
 506 def fib(n):
 507     # sequence and you shall find
 508     if n < 0 or int(n) != n:
 509         return NotImplemented
 510     elif n == 0 or n == 1:
 511         return n
 512     else:
 513         return fib(n - 1) + fib(n - 2)
 514 ```
 515
 516 And thus - finally - we have a robust function together with working
 517 tests!