5-Testing/Readme.rst

   1 `Back To Debugging`_ - `Forward To Documentation`_
   2
   3 .. _Back To Debugging: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/
   4 .. _Forward To Documentation: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/
   5
   6 -----------
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony Scopatz**
  11
  12 .. image:: http://memecreator.net/the-most-interesting-man-in-the-world/showimage.php/169/I-don%27t-always-test-my-code-But-when-I-do-I-do-it-in-production.jpg
  13
  14
  15 What is testing?
  16 ================
  17 Software testing is a process by which one or more expected behaviors and
  18 results from a piece of software are exercised and confirmed. Well chosen
  19 tests will confirm expected code behavior for the extreme boundaries of the
  20 input domains, output ranges, parametric combinations, and other behavioral
  21 edge cases.
  22
  23 Why test software?
  24 ==================
  25 Unless you write flawless, bug-free, perfectly accurate, fully precise, and
  26 predictable code every time, you must test your code in order to trust it
  27 enough to answer in the affirmative to at least a few of the following questions:
  28
  29 * Does your code work?
  30 * Always?
  31 * Does it do what you think it does?
  32 * Does it continue to work after changes are made?
  33 * Does it continue to work after system configurations or libraries are upgraded?
  34 * Does it respond properly for a full range of input parameters?
  35 * What about edge or corner cases?
  36 * What's the limit on that input parameter?
  37
  38 Verification
  39 ************
  40 *Verification* is the process of asking, "Have we built the software correctly?"
  41 That is, is the code bug free, precise, accurate, and repeatable?
  42
  43 Validation
  44 **********
  45 *Validation* is the process of asking, "Have we built the right software?"
  46 That is, is the code designed in such a way as to produce the answers we are
  47 interested in, data we want, etc.
  48
  49 Uncertainty Quantification
  50 **************************
  51 *Uncertainty Quantification* is the process of asking, "Given that our algorithm
  52 may not be deterministic, was our execution within acceptable error bounds?"  This
  53 is particularly important for anything which uses random numbers, eg Monte Carlo methods.
  54
  55
  56 Where are tests?
  57 ================
  58 Say we have an averaging function:
  59
  60 .. code-block:: python
  61
  62     def mean(numlist):
  63         total = sum(numlist)
  64         length = len(numlist)
  65         return total/length
  66
  67 Tests could be implemented as runtime exceptions in the function:
  68
  69 .. code-block:: python
  70
  71     def mean(numlist):
  72         try:
  73             total = sum(numlist)
  74             length = len(numlist)
  75         except ValueError:
  76             print "The number list was not a list of numbers."
  77         except:
  78             print "There was a problem evaluating the number list."
  79         return total/length
  80
  81
  82 Sometimes tests they are functions alongside the function definitions they are testing.
  83
  84 .. code-block:: python
  85
  86     def mean(numlist):
  87         try:
  88             total = sum(numlist)
  89             length = len(numlist)
  90         except ValueError:
  91             print "The number list was not a list of numbers."
  92         except:
  93             print "There was a problem evaluating the number list."
  94         return total/length
  95
  96
  97     def test_mean():
  98         assert mean([0, 0, 0, 0]) == 0
  99         assert mean([0, 200]) == 100
 100         assert mean([0, -200]) == -100
 101         assert mean([0]) == 0
 102
 103
 104     def test_floating_mean():
 105         assert mean([1, 2]) == 1.5
 106
 107 Sometimes they are in an executable independent of the main executable.
 108
 109 .. code-block:: python
 110
 111     def mean(numlist):
 112         try:
 113             total = sum(numlist)
 114             length = len(numlist)
 115         except ValueError:
 116             print "The number list was not a list of numbers."
 117         except:
 118             print "There was a problem evaluating the number list."
 119         return total/length
 120
 121
 122 Where, in a different file exists a test module:
 123
 124 .. code-block:: python
 125
 126     import mean
 127
 128     def test_mean():
 129         assert mean([0, 0, 0, 0]) == 0
 130         assert mean([0, 200]) == 100
 131         assert mean([0, -200]) == -100
 132         assert mean([0]) == 0
 133
 134
 135     def test_floating_mean():
 136         assert mean([1, 2]) == 1.5
 137
 138 When should we test?
 139 ====================
 140 The three right answers are:
 141
 142 * **ALWAYS!**
 143 * **EARLY!**
 144 * **OFTEN!**
 145
 146 The longer answer is that testing either before or after your software
 147 is written will improve your code, but testing after your program is used for
 148 something important is too late.
 149
 150 If we have a robust set of tests, we can run them before adding something new and after
 151 adding something new. If the tests give the same results (as appropriate), we can have
 152 some assurance that we didn'treak anything. The same idea applies to making changes in
 153 your system configuration, updating support codes, etc.
 154
 155 Another important feature of testing is that it helps you remember what all the parts
 156 of your code do. If you are working on a large project over three years and you end up
 157 with 200 classes, it may be hard to remember what the widget class does in detail. If
 158 you have a test that checks all of the widget's functionality, you can look at the test
 159 to remember what it's supposed to do.
 160
 161 Who should test?
 162 ================
 163 In a collaborative coding environment, where many developers contribute to the same code base,
 164 developers should be responsible individually for testing the functions they create and
 165 collectively for testing the code as a whole.
 166
 167 Professionals often test their code, and take pride in test coverage, the percent
 168 of their functions that they feel confident are comprehensively tested.
 169
 170 How are tests written?
 171 ======================
 172 The type of tests that are written is determined by the testing framework you adopt.
 173 Don't worry, there are a lot of choices.
 174
 175 Types of Tests
 176 ****************
 177 **Exceptions:** Exceptions can be thought of as type of runttime test. They alert
 178 the user to exceptional behavior in the code. Often, exceptions are related to
 179 functions that depend on input that is unknown at compile time. Checks that occur
 180 within the code to handle exceptional behavior that results from this type of input
 181 are called Exceptions.
 182
 183 **Unit Tests:** Unit tests are a type of test which test the fundametal units of a
 184 program's functionality. Often, this is on the class or function level of detail.
 185 However what defines a *code unit* is not formally defined.
 186
 187 To test functions and classes, the interfaces (API) - rather than the implmentation - should
 188 be tested.  Treating the implementation as a ack box, we can probe the expected behavior
 189 with boundary cases for the inputs.
 190
 191 **System Tests:** System level tests are intended to test the code as a whole. As opposed
 192 to unit tests, system tests ask for the behavior as a whole. This sort of testing involves
 193 comparison with other validated codes, analytical solutions, etc.
 194
 195 **Regression Tests:**  A regression test ensures that new code does change anything.
 196 If you change the default answer, for example, or add a new question, you'll need to
 197 make sure that missing entries are still found and fixed.
 198
 199 **Integration Tests:** Integration tests query the ability of the code to integrate
 200 well with the system configuration and third party libraries and modules. This type
 201 of test is essential for codes that depend on libraries which might be updated
 202 independently of your code or when your code might be used by a number of users
 203 who may have various versions of libraries.
 204
 205 **Test Suites:** Putting a series of unit tests into a collection of modules creates,
 206 a test suite.  Typically the suite as a whole is executed (rather than each test individually)
 207 when verifying that the code base still functions after changes have been made.
 208
 209 Elements of a Test
 210 ==================
 211 **Behavior:** The behavior you want to test. For example, you might want to test the fun()
 212 function.
 213
 214 **Expected Result:** This might be a single number, a range of numbers, a new fully defined
 215 object, a system state, an exception, etc.  When we run the fun() function, we expect to
 216 generate some fun. If we don't generate any fun, the fun() function should fail its test.
 217 Alternatively, if it does create some fun, the fun() function should pass this test.
 218 The the expected result should known *a priori*.  For numerical functions, this is
 219 result is ideally analytically determined even if the fucntion being tested isn't.
 220
 221 **Assertions:** Require that some conditional be true. If the conditional is false,
 222 the test fails.
 223
 224 **Fixtures:**  Sometimes you have to do some legwork to create the objects that are
 225 necessary to run one or many tests. These objects are called fixtures as they are not really
 226 part of the test themselves but rather involve getting the computer into the appropriate state.
 227
 228 For example, since fun varies a lot between people, the fun() function is a method of
 229 the Person class. In order to check the fun function, then, we need to create an appropriate
 230 Person object on which to run fun().
 231
 232 **Setup and teardown:** Creating fixtures is often done in a call to a setup function.
 233 Deleting them and other cleanup is done in a teardown function.
 234
 235 **The Big Picture:** Putting all this together, the testing algorithm is often:
 236
 237 .. code-block:: python
 238
 239     setup()
 240     test()
 241     teardown()
 242
 243
 244 But, sometimes it's the case that your tests change the fixtures. If so, it's better
 245 for the setup() and teardown() functions to occur on either side of each test. In
 246 that case, the testing algorithm should be:
 247
 248 .. code-block:: python
 249
 250     setup()
 251     test1()
 252     teardown()
 253
 254     setup()
 255     test2()
 256     teardown()
 257
 258     setup()
 259     test3()
 260     teardown()
 261
 262 ----------------------------------------------------------
 263
 264 Nose: A Python Testing Framework
 265 ================================
 266 The testing framework we'll discuss today is called nose.  However, there are several
 267 other testing frameworks available in most language.  Most notably there is `JUnit`_
 268 in Java which can arguably attributed to inventing the testing framework.
 269
 270 .. _nose: http://readthedocs.org/docs/nose/en/latest/
 271 .. _JUnit: http://www.junit.org/
 272
 273 Where do nose tests live?
 274 *************************
 275 Nose tests are files that begin with ``Test-``, ``Test_``, ``test-``, or ``test_``.
 276 Specifically, these satisfy the testMatch regular expression ``[Tt]est[-_]``.
 277 (You can also teach nose to find tests by declaring them in the unittest.TestCase
 278 subclasses chat you create in your code. You can also create test functions which
 279 are not unittest.TestCase subclasses if they are named with the configured
 280 testMatch regular expression.)
 281
 282 Nose Test Syntax
 283 ****************
 284 To write a nose test, we make assertions.
 285
 286 .. code-block:: python
 287
 288     assert should_be_true()
 289     assert not should_not_be_true()
 290
 291 Additionally, nose itself defines number of assert functions which can be used to
 292 test more specific aspects of the code base.
 293
 294 .. code-block:: python
 295
 296     from nose.tools import *
 297
 298     assert_equal(a, b)
 299     assert_almost_equal(a, b)
 300     assert_true(a)
 301     assert_false(a)
 302     assert_raises(exception, func, *args, **kwargs)
 303     assert_is_instance(a, b)
 304     # and many more!
 305
 306 Moreover, numpy offers similar testing functions for arrays:
 307
 308 .. code-block:: python
 309
 310     from numpy.testing import *
 311
 312     assert_array_equal(a, b)
 313     assert_array_almost_equal(a, b)
 314     # etc.
 315
 316 Exersize: Writing tests for mean()
 317 **********************************
 318 There are a few tests for the mean() function that we listed in this lesson.
 319 What are some tests that should fail? Add at least three test cases to this set.
 320 Edit the ``test_mean.py`` file which tests the mean() function in ``mean.py``.
 321
 322 *Hint:* Think about what form your input could take and what you should do to handle it.
 323 Also, think about the type of the elements in the list. What should be done if you pass
 324 a list of integers? What if you pass a list of strings?
 325
 326 **Example**::
 327
 328     nosetests test_mean.py
 329
 330 Test Driven Development
 331 =======================
 332 Test driven development (TDD) is a philosophy whereby the developer creates code by
 333 **writing the tests fist**.  That is to say you write the tests *before* writing the
 334 associated code!
 335
 336 This is an iterative process whereby you write a test then write the minimum amount
 337 code to make the test pass.  If a new feature is needed, another test is written and
 338 the code is expanded to meet this new use case.  This continues until the code does
 339 what is needed.
 340
 341 TDD operates on the YAGNI principle (You Ain't Gonna Need It).  People who diligently
 342 follow TDD swear by its effectiveness.  This development style was put forth most
 343 strongly by `Kent Beck in 2002`_.
 344
 345 .. _Kent Beck in 2002: http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530
 346
 347 A TDD Example
 348 *************
 349 Say you want to write a fib() function which generates values of the
 350 Fibinacci sequence fof given indexes.  You would - of course - start
 351 by writing the test, possibly testing a single value:
 352
 353 .. code-block:: python
 354
 355     from nose import assert_equal
 356
 357     from pisa import fib
 358
 359     def test_fib1():
 360         obs = fib(2)
 361         exp = 1
 362         assert_equal(obs, exp)
 363
 364 You would *then* go ahead and write the actual function:
 365
 366 .. code-block:: python
 367
 368     def fib(n):
 369         # you snarky so-and-so
 370         return 1
 371
 372 And that is it right?!  Well, not quite.  This implementation fails for
 373 most other values.  Adding tests we see that:
 374
 375 .. code-block:: python
 376
 377     def test_fib1():
 378         obs = fib(2)
 379         exp = 1
 380         assert_equal(obs, exp)
 381
 382
 383     def test_fib2():
 384         obs = fib(0)
 385         exp = 0
 386         assert_equal(obs, exp)
 387
 388         obs = fib(1)
 389         exp = 1
 390         assert_equal(obs, exp)
 391
 392 This extra test now requires that we bother to implement at least the intial values:
 393
 394 .. code-block:: python
 395
 396     def fib(n):
 397         # a little better
 398         if n == 0 or n == 1:
 399             return n
 400         return 1
 401
 402 However, this function still falls over for ``2 < n``.  Time for more tests!
 403
 404 .. code-block:: python
 405
 406     def test_fib1():
 407         obs = fib(2)
 408         exp = 1
 409         assert_equal(obs, exp)
 410
 411
 412     def test_fib2():
 413         obs = fib(0)
 414         exp = 0
 415         assert_equal(obs, exp)
 416
 417         obs = fib(1)
 418         exp = 1
 419         assert_equal(obs, exp)
 420
 421
 422     def test_fib3():
 423         obs = fib(3)
 424         exp = 2
 425         assert_equal(obs, exp)
 426
 427         obs = fib(6)
 428         exp = 8
 429         assert_equal(obs, exp)
 430
 431 At this point, we had better go ahead and try do the right thing...
 432
 433 .. code-block:: python
 434
 435     def fib(n):
 436         # finally, some math
 437         if n == 0 or n == 1:
 438             return n
 439         else:
 440             return fib(n - 1) + fib(n - 2)
 441
 442 Here it becomes very tempting to take an extended coffee break or possibly a
 443 power lunch.  But then you remember those pesky negative numbers and floats.
 444 Perhaps the right thing to do here is to just be undefined.
 445
 446 .. code-block:: python
 447
 448     def test_fib1():
 449         obs = fib(2)
 450         exp = 1
 451         assert_equal(obs, exp)
 452
 453
 454     def test_fib2():
 455         obs = fib(0)
 456         exp = 0
 457         assert_equal(obs, exp)
 458
 459         obs = fib(1)
 460         exp = 1
 461         assert_equal(obs, exp)
 462
 463
 464     def test_fib3():
 465         obs = fib(3)
 466         exp = 2
 467         assert_equal(obs, exp)
 468
 469         obs = fib(6)
 470         exp = 8
 471         assert_equal(obs, exp)
 472
 473
 474     def test_fib3():
 475         obs = fib(13.37)
 476         exp = NotImplemented
 477         assert_equal(obs, exp)
 478
 479         obs = fib(-9000)
 480         exp = NotImplemented
 481         assert_equal(obs, exp)
 482
 483 This means that it is time to add the appropriate case to the funtion itself:
 484
 485 .. code-block:: python
 486
 487     def fib(n):
 488         # sequence and you shall find
 489         if n < 0 or int(n) != n:
 490             return NotImplemented
 491         elif n == 0 or n == 1:
 492             return n
 493         else:
 494             return fib(n - 1) + fib(n - 2)
 495
 496 And thus - finally - we have a robust function together with working tests!