5-Testing/Readme.rst

   1 `Back To Debugging`_ - `Forward To Documentation`_
   2
   3 .. _Back To Debugging: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/
   4 .. _Forward To Documentation: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/
   5
   6 -----------
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony Scopatz**
  11
  12 .. image:: http://s3.amazonaws.com/inscight/img/blog/evo_sol1.png
  13
  14 http://memecreator.net/the-most-interesting-man-in-the-world/showimage.php/169/I-don%27t-always-test-my-code-But-when-I-do-I-do-it-in-production.jpg
  15
  16
  17 What is testing?
  18 ================
  19 Software testing is a process by which one or more expected behaviors and
  20 results from a piece of software are exercised and confirmed. Well chosen
  21 tests will confirm expected code behavior for the extreme boundaries of the
  22 input domains, output ranges, parametric combinations, and other behavioral
  23 edge cases.
  24
  25 Why test software?
  26 ==================
  27 Unless you write flawless, bug-free, perfectly accurate, fully precise, and
  28 predictable code every time, you must test your code in order to trust it
  29 enough to answer in the affirmative to at least a few of the following questions:
  30
  31 * Does your code work?
  32 * Always?
  33 * Does it do what you think it does?
  34 * Does it continue to work after changes are made?
  35 * Does it continue to work after system configurations or libraries are upgraded?
  36 * Does it respond properly for a full range of input parameters?
  37 * What about edge or corner cases?
  38 * What's the limit on that input parameter?
  39
  40 Verification
  41 ************
  42 *Verification* is the process of asking, "Have we built the software correctly?"
  43 That is, is the code bug free, precise, accurate, and repeatable?
  44
  45 Validation
  46 **********
  47 *Validation* is the process of asking, "Have we built the right software?"
  48 That is, is the code designed in such a way as to produce the answers we are
  49 interested in, data we want, etc.
  50
  51 Uncertainty Quantification
  52 **************************
  53 *Uncertainty Quantification* is the process of asking, "Given that our algorithm
  54 may not be deterministic, was our execution within acceptable error bounds?"  This
  55 is particularly important for anything which uses random numbers, eg Monte Carlo methods.
  56
  57
  58 Where are tests?
  59 ================
  60 Say we have an averaging function:
  61
  62 .. code-block:: python
  63
  64     def mean(numlist):
  65         total = sum(numlist)
  66         length = len(numlist)
  67         return total/length
  68
  69 Tests could be implemented as runtime exceptions in the function:
  70
  71 .. code-block:: python
  72
  73     def mean(numlist):
  74         try:
  75             total = sum(numlist)
  76             length = len(numlist)
  77         except ValueError:
  78             print "The number list was not a list of numbers."
  79         except:
  80             print "There was a problem evaluating the number list."
  81         return total/length
  82
  83
  84 Sometimes tests they are functions alongside the function definitions they are testing.
  85
  86 .. code-block:: python
  87
  88     def mean(numlist):
  89         try:
  90             total = sum(numlist)
  91             length = len(numlist)
  92         except ValueError:
  93             print "The number list was not a list of numbers."
  94         except:
  95             print "There was a problem evaluating the number list."
  96         return total/length
  97
  98
  99     def test_mean():
 100         assert mean([0, 0, 0, 0]) == 0
 101         assert mean([0, 200]) == 100
 102         assert mean([0, -200]) == -100
 103         assert mean([0]) == 0
 104
 105
 106     def test_floating_mean():
 107         assert mean([1, 2]) == 1.5
 108
 109 Sometimes they are in an executable independent of the main executable.
 110
 111 .. code-block:: python
 112
 113     def mean(numlist):
 114         try:
 115             total = sum(numlist)
 116             length = len(numlist)
 117         except ValueError:
 118             print "The number list was not a list of numbers."
 119         except:
 120             print "There was a problem evaluating the number list."
 121         return total/length
 122
 123
 124 Where, in a different file exists a test module:
 125
 126 .. code-block:: python
 127
 128     import mean
 129
 130     def test_mean():
 131         assert mean([0, 0, 0, 0]) == 0
 132         assert mean([0, 200]) == 100
 133         assert mean([0, -200]) == -100
 134         assert mean([0]) == 0
 135
 136
 137     def test_floating_mean():
 138         assert mean([1, 2]) == 1.5
 139
 140 When should we test?
 141 ====================
 142 The three right answers are:
 143
 144 * **ALWAYS!**
 145 * **EARLY!**
 146 * **OFTEN!**
 147
 148 The longer answer is that testing either before or after your software
 149 is written will improve your code, but testing after your program is used for
 150 something important is too late.
 151
 152 If we have a robust set of tests, we can run them before adding something new and after
 153 adding something new. If the tests give the same results (as appropriate), we can have
 154 some assurance that we didn'treak anything. The same idea applies to making changes in
 155 your system configuration, updating support codes, etc.
 156
 157 Another important feature of testing is that it helps you remember what all the parts
 158 of your code do. If you are working on a large project over three years and you end up
 159 with 200 classes, it may be hard to remember what the widget class does in detail. If
 160 you have a test that checks all of the widget's functionality, you can look at the test
 161 to remember what it's supposed to do.
 162
 163 Who should test?
 164 ================
 165 In a collaborative coding environment, where many developers contribute to the same code base,
 166 developers should be responsible individually for testing the functions they create and
 167 collectively for testing the code as a whole.
 168
 169 Professionals often test their code, and take pride in test coverage, the percent
 170 of their functions that they feel confident are comprehensively tested.
 171
 172 How are tests written?
 173 ======================
 174 The type of tests that are written is determined by the testing framework you adopt.
 175 Don't worry, there are a lot of choices.
 176
 177 Types of Tests
 178 ****************
 179 **Exceptions:** Exceptions can be thought of as type of runttime test. They alert
 180 the user to exceptional behavior in the code. Often, exceptions are related to
 181 functions that depend on input that is unknown at compile time. Checks that occur
 182 within the code to handle exceptional behavior that results from this type of input
 183 are called Exceptions.
 184
 185 **Unit Tests:** Unit tests are a type of test which test the fundametal units of a
 186 program's functionality. Often, this is on the class or function level of detail.
 187 However what defines a *code unit* is not formally defined.
 188
 189 To test functions and classes, the interfaces (API) - rather than the implmentation - should
 190 be tested.  Treating the implementation as a ack box, we can probe the expected behavior
 191 with boundary cases for the inputs.
 192
 193 **System Tests:** System level tests are intended to test the code as a whole. As opposed
 194 to unit tests, system tests ask for the behavior as a whole. This sort of testing involves
 195 comparison with other validated codes, analytical solutions, etc.
 196
 197 **Regression Tests:**  A regression test ensures that new code does change anything.
 198 If you change the default answer, for example, or add a new question, you'll need to
 199 make sure that missing entries are still found and fixed.
 200
 201 **Integration Tests:** Integration tests query the ability of the code to integrate
 202 well with the system configuration and third party libraries and modules. This type
 203 of test is essential for codes that depend on libraries which might be updated
 204 independently of your code or when your code might be used by a number of users
 205 who may have various versions of libraries.
 206
 207 **Test Suites:** Putting a series of unit tests into a collection of modules creates,
 208 a test suite.  Typically the suite as a whole is executed (rather than each test individually)
 209 when verifying that the code base still functions after changes have been made.
 210
 211 Elements of a Test
 212 ==================
 213 **Behavior:** The behavior you want to test. For example, you might want to test the fun()
 214 function.
 215
 216 **Expected Result:** This might be a single number, a range of numbers, a new fully defined
 217 object, a system state, an exception, etc.  When we run the fun() function, we expect to
 218 generate some fun. If we don't generate any fun, the fun() function should fail its test.
 219 Alternatively, if it does create some fun, the fun() function should pass this test.
 220 The the expected result should known *a priori*.  For numerical functions, this is
 221 result is ideally analytically determined even if the fucntion being tested isn't.
 222
 223 **Assertions:** Require that some conditional be true. If the conditional is false,
 224 the test fails.
 225
 226 **Fixtures:**  Sometimes you have to do some legwork to create the objects that are
 227 necessary to run one or many tests. These objects are called fixtures as they are not really
 228 part of the test themselves but rather involve getting the computer into the appropriate state.
 229
 230 For example, since fun varies a lot between people, the fun() function is a method of
 231 the Person class. In order to check the fun function, then, we need to create an appropriate
 232 Person object on which to run fun().
 233
 234 **Setup and teardown:** Creating fixtures is often done in a call to a setup function.
 235 Deleting them and other cleanup is done in a teardown function.
 236
 237 **The Big Picture:** Putting all this together, the testing algorithm is often:
 238
 239 .. code-block:: python
 240
 241     setup()
 242     test()
 243     teardown()
 244
 245
 246 But, sometimes it's the case that your tests change the fixtures. If so, it's better
 247 for the setup() and teardown() functions to occur on either side of each test. In
 248 that case, the testing algorithm should be:
 249
 250 .. code-block:: python
 251
 252     setup()
 253     test1()
 254     teardown()
 255
 256     setup()
 257     test2()
 258     teardown()
 259
 260     setup()
 261     test3()
 262     teardown()
 263
 264 ----------------------------------------------------------
 265
 266 Nose: A Python Testing Framework
 267 ================================
 268 The testing framework we'll discuss today is called nose.  However, there are several
 269 other testing frameworks available in most language.  Most notably there is `JUnit`_
 270 in Java which can arguably attributed to inventing the testing framework.
 271
 272 .. _nose: http://readthedocs.org/docs/nose/en/latest/
 273 .. _JUnit: http://www.junit.org/
 274
 275 Where do nose tests live?
 276 *************************
 277 Nose tests are files that begin with ``Test-``, ``Test_``, ``test-``, or ``test_``.
 278 Specifically, these satisfy the testMatch regular expression ``[Tt]est[-_]``.
 279 (You can also teach nose to find tests by declaring them in the unittest.TestCase
 280 subclasses chat you create in your code. You can also create test functions which
 281 are not unittest.TestCase subclasses if they are named with the configured
 282 testMatch regular expression.)
 283
 284 Nose Test Syntax
 285 ****************
 286 To write a nose test, we make assertions.
 287
 288 .. code-block:: python
 289
 290     assert should_be_true()
 291     assert not should_not_be_true()
 292
 293 Additionally, nose itself defines number of assert functions which can be used to
 294 test more specific aspects of the code base.
 295
 296 .. code-block:: python
 297
 298     from nose.tools import *
 299
 300     assert_equal(a, b)
 301     assert_almost_equal(a, b)
 302     assert_true(a)
 303     assert_false(a)
 304     assert_raises(exception, func, *args, **kwargs)
 305     assert_is_instance(a, b)
 306     # and many more!
 307
 308 Moreover, numpy offers similar testing functions for arrays:
 309
 310 .. code-block:: python
 311
 312     from numpy.testing import *
 313
 314     assert_array_equal(a, b)
 315     assert_array_almost_equal(a, b)
 316     # etc.
 317
 318 Exersize: Writing tests for mean()
 319 **********************************
 320 There are a few tests for the mean() function that we listed in this lesson.
 321 What are some tests that should fail? Add at least three test cases to this set.
 322 Edit the ``test_mean.py`` file which tests the mean() function in ``mean.py``.
 323
 324 *Hint:* Think about what form your input could take and what you should do to handle it.
 325 Also, think about the type of the elements in the list. What should be done if you pass
 326 a list of integers? What if you pass a list of strings?
 327
 328 **Example**::
 329
 330     nosetests test_mean.py
 331
 332 Test Driven Development
 333 =======================
 334 Test driven development (TDD) is a philosophy whereby the developer creates code by
 335 **writing the tests fist**.  That is to say you write the tests *before* writing the
 336 associated code!
 337
 338 This is an iterative process whereby you write a test then write the minimum amount
 339 code to make the test pass.  If a new feature is needed, another test is written and
 340 the code is expanded to meet this new use case.  This continues until the code does
 341 what is needed.
 342
 343 TDD operates on the YAGNI principle (You Ain't Gonna Need It).  People who diligently
 344 follow TDD swear by its effectiveness.  This development style was put forth most
 345 strongly by `Kent Beck in 2002`_.
 346
 347 .. _Kent Beck in 2002: http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530
 348
 349 A TDD Example
 350 *************
 351 Say you want to write a fib() function which generates values of the
 352 Fibinacci sequence fof given indexes.  You would - of course - start
 353 by writing the test, possibly testing a single value:
 354
 355 .. code-block:: python
 356
 357     from nose import assert_equal
 358
 359     from pisa import fib
 360
 361     def test_fib1():
 362         obs = fib(2)
 363         exp = 1
 364         assert_equal(obs, exp)
 365
 366 You would *then* go ahead and write the actual function:
 367
 368 .. code-block:: python
 369
 370     def fib(n):
 371         # you snarky so-and-so
 372         return 1
 373
 374 And that is it right?!  Well, not quite.  This implementation fails for
 375 most other values.  Adding tests we see that:
 376
 377 .. code-block:: python
 378
 379     def test_fib1():
 380         obs = fib(2)
 381         exp = 1
 382         assert_equal(obs, exp)
 383
 384
 385     def test_fib2():
 386         obs = fib(0)
 387         exp = 0
 388         assert_equal(obs, exp)
 389
 390         obs = fib(1)
 391         exp = 1
 392         assert_equal(obs, exp)
 393
 394 This extra test now requires that we bother to implement at least the intial values:
 395
 396 .. code-block:: python
 397
 398     def fib(n):
 399         # a little better
 400         if n == 0 or n == 1:
 401             return n
 402         return 1
 403
 404 However, this function still falls over for ``2 < n``.  Time for more tests!
 405
 406 .. code-block:: python
 407
 408     def test_fib1():
 409         obs = fib(2)
 410         exp = 1
 411         assert_equal(obs, exp)
 412
 413
 414     def test_fib2():
 415         obs = fib(0)
 416         exp = 0
 417         assert_equal(obs, exp)
 418
 419         obs = fib(1)
 420         exp = 1
 421         assert_equal(obs, exp)
 422
 423
 424     def test_fib3():
 425         obs = fib(3)
 426         exp = 2
 427         assert_equal(obs, exp)
 428
 429         obs = fib(6)
 430         exp = 8
 431         assert_equal(obs, exp)
 432
 433 At this point, we had better go ahead and try do the right thing...
 434
 435 .. code-block:: python
 436
 437     def fib(n):
 438         # finally, some math
 439         if n == 0 or n == 1:
 440             return n
 441         else:
 442             return fib(n - 1) + fib(n - 2)
 443
 444 Here it becomes very tempting to take an extended coffee break or possibly a
 445 power lunch.  But then you remember those pesky negative numbers and floats.
 446 Perhaps the right thing to do here is to just be undefined.
 447
 448 .. code-block:: python
 449
 450     def test_fib1():
 451         obs = fib(2)
 452         exp = 1
 453         assert_equal(obs, exp)
 454
 455
 456     def test_fib2():
 457         obs = fib(0)
 458         exp = 0
 459         assert_equal(obs, exp)
 460
 461         obs = fib(1)
 462         exp = 1
 463         assert_equal(obs, exp)
 464
 465
 466     def test_fib3():
 467         obs = fib(3)
 468         exp = 2
 469         assert_equal(obs, exp)
 470
 471         obs = fib(6)
 472         exp = 8
 473         assert_equal(obs, exp)
 474
 475
 476     def test_fib3():
 477         obs = fib(13.37)
 478         exp = NotImplemented
 479         assert_equal(obs, exp)
 480
 481         obs = fib(-9000)
 482         exp = NotImplemented
 483         assert_equal(obs, exp)
 484
 485 This means that it is time to add the appropriate case to the funtion itself:
 486
 487 .. code-block:: python
 488
 489     def fib(n):
 490         # sequence and you shall find
 491         if n < 0 or int(n) != n:
 492             return NotImplemented
 493         elif n == 0 or n == 1:
 494             return n
 495         else:
 496             return fib(n - 1) + fib(n - 2)
 497
 498 And thus - finally - we have a robust function together with working tests!