5-Testing/Readme.rst

   1 `Back To Debugging`_ - `Forward To Documentation`_
   2
   3 .. _Back To Debugging: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/
   4 .. _Forward To Documentation: https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/
   5
   6 -----------
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony Scopatz**
  11
  12 .. image:: https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg
  13
  14
  15 What is testing?
  16 ================
  17 Software testing is a process by which one or more expected behaviors and
  18 results from a piece of software are exercised and confirmed. Well chosen
  19 tests will confirm expected code behavior for the extreme boundaries of the
  20 input domains, output ranges, parametric combinations, and other behavioral
  21 edge cases.
  22
  23 Why test software?
  24 ==================
  25 Unless you write flawless, bug-free, perfectly accurate, fully precise, and
  26 predictable code every time, you must test your code in order to trust it
  27 enough to answer in the affirmative to at least a few of the following questions:
  28
  29 * Does your code work?
  30 * Always?
  31 * Does it do what you think it does?
  32 * Does it continue to work after changes are made?
  33 * Does it continue to work after system configurations or libraries are upgraded?
  34 * Does it respond properly for a full range of input parameters?
  35 * What about edge or corner cases?
  36 * What's the limit on that input parameter?
  37 * How will it affect your `publications`_?
  38
  39 .. _publications: http://www.nature.com/news/2010/101013/full/467775a.html
  40
  41 Verification
  42 ************
  43 *Verification* is the process of asking, "Have we built the software correctly?"
  44 That is, is the code bug free, precise, accurate, and repeatable?
  45
  46 Validation
  47 **********
  48 *Validation* is the process of asking, "Have we built the right software?"
  49 That is, is the code designed in such a way as to produce the answers we are
  50 interested in, data we want, etc.
  51
  52 Uncertainty Quantification
  53 **************************
  54 *Uncertainty Quantification* is the process of asking, "Given that our algorithm
  55 may not be deterministic, was our execution within acceptable error bounds?"  This
  56 is particularly important for anything which uses random numbers, eg Monte Carlo methods.
  57
  58
  59 Where are tests?
  60 ================
  61 Say we have an averaging function:
  62
  63 .. code-block:: python
  64
  65     def mean(numlist):
  66         total = sum(numlist)
  67         length = len(numlist)
  68         return total/length
  69
  70 Tests could be implemented as runtime exceptions in the function:
  71
  72 .. code-block:: python
  73
  74     def mean(numlist):
  75         try:
  76             total = sum(numlist)
  77             length = len(numlist)
  78         except ValueError:
  79             print "The number list was not a list of numbers."
  80         except:
  81             print "There was a problem evaluating the number list."
  82         return total/length
  83
  84
  85 Sometimes tests they are functions alongside the function definitions they are testing.
  86
  87 .. code-block:: python
  88
  89     def mean(numlist):
  90         try:
  91             total = sum(numlist)
  92             length = len(numlist)
  93         except ValueError:
  94             print "The number list was not a list of numbers."
  95         except:
  96             print "There was a problem evaluating the number list."
  97         return total/length
  98
  99
 100     def test_mean():
 101         assert mean([0, 0, 0, 0]) == 0
 102         assert mean([0, 200]) == 100
 103         assert mean([0, -200]) == -100
 104         assert mean([0]) == 0
 105
 106
 107     def test_floating_mean():
 108         assert mean([1, 2]) == 1.5
 109
 110 Sometimes they are in an executable independent of the main executable.
 111
 112 .. code-block:: python
 113
 114     def mean(numlist):
 115         try:
 116             total = sum(numlist)
 117             length = len(numlist)
 118         except ValueError:
 119             print "The number list was not a list of numbers."
 120         except:
 121             print "There was a problem evaluating the number list."
 122         return total/length
 123
 124
 125 Where, in a different file exists a test module:
 126
 127 .. code-block:: python
 128
 129     import mean
 130
 131     def test_mean():
 132         assert mean([0, 0, 0, 0]) == 0
 133         assert mean([0, 200]) == 100
 134         assert mean([0, -200]) == -100
 135         assert mean([0]) == 0
 136
 137
 138     def test_floating_mean():
 139         assert mean([1, 2]) == 1.5
 140
 141 When should we test?
 142 ====================
 143 The three right answers are:
 144
 145 * **ALWAYS!**
 146 * **EARLY!**
 147 * **OFTEN!**
 148
 149 The longer answer is that testing either before or after your software
 150 is written will improve your code, but testing after your program is used for
 151 something important is too late.
 152
 153 If we have a robust set of tests, we can run them before adding something new and after
 154 adding something new. If the tests give the same results (as appropriate), we can have
 155 some assurance that we didn't wreak anything. The same idea applies to making changes in
 156 your system configuration, updating support codes, etc.
 157
 158 Another important feature of testing is that it helps you remember what all the parts
 159 of your code do. If you are working on a large project over three years and you end up
 160 with 200 classes, it may be hard to remember what the widget class does in detail. If
 161 you have a test that checks all of the widget's functionality, you can look at the test
 162 to remember what it's supposed to do.
 163
 164 Who should test?
 165 ================
 166 In a collaborative coding environment, where many developers contribute to the same code base,
 167 developers should be responsible individually for testing the functions they create and
 168 collectively for testing the code as a whole.
 169
 170 Professionals often test their code, and take pride in test coverage, the percent
 171 of their functions that they feel confident are comprehensively tested.
 172
 173 How are tests written?
 174 ======================
 175 The type of tests that are written is determined by the testing framework you adopt.
 176 Don't worry, there are a lot of choices.
 177
 178 Types of Tests
 179 ****************
 180 **Exceptions:** Exceptions can be thought of as type of runtime test. They alert
 181 the user to exceptional behavior in the code. Often, exceptions are related to
 182 functions that depend on input that is unknown at compile time. Checks that occur
 183 within the code to handle exceptional behavior that results from this type of input
 184 are called Exceptions.
 185
 186 **Unit Tests:** Unit tests are a type of test which test the fundamental units of a
 187 program's functionality. Often, this is on the class or function level of detail.
 188 However what defines a *code unit* is not formally defined.
 189
 190 To test functions and classes, the interfaces (API) - rather than the implementation - should
 191 be tested.  Treating the implementation as a black box, we can probe the expected behavior
 192 with boundary cases for the inputs.
 193
 194 **System Tests:** System level tests are intended to test the code as a whole. As opposed
 195 to unit tests, system tests ask for the behavior as a whole. This sort of testing involves
 196 comparison with other validated codes, analytical solutions, etc.
 197
 198 **Regression Tests:**  A regression test ensures that new code does change anything.
 199 If you change the default answer, for example, or add a new question, you'll need to
 200 make sure that missing entries are still found and fixed.
 201
 202 **Integration Tests:** Integration tests query the ability of the code to integrate
 203 well with the system configuration and third party libraries and modules. This type
 204 of test is essential for codes that depend on libraries which might be updated
 205 independently of your code or when your code might be used by a number of users
 206 who may have various versions of libraries.
 207
 208 **Test Suites:** Putting a series of unit tests into a collection of modules creates,
 209 a test suite.  Typically the suite as a whole is executed (rather than each test individually)
 210 when verifying that the code base still functions after changes have been made.
 211
 212 Elements of a Test
 213 ==================
 214 **Behavior:** The behavior you want to test. For example, you might want to test the fun()
 215 function.
 216
 217 **Expected Result:** This might be a single number, a range of numbers, a new fully defined
 218 object, a system state, an exception, etc.  When we run the fun() function, we expect to
 219 generate some fun. If we don't generate any fun, the fun() function should fail its test.
 220 Alternatively, if it does create some fun, the fun() function should pass this test.
 221 The the expected result should known *a priori*.  For numerical functions, this is
 222 result is ideally analytically determined even if the function being tested isn't.
 223
 224 **Assertions:** Require that some conditional be true. If the conditional is false,
 225 the test fails.
 226
 227 **Fixtures:**  Sometimes you have to do some legwork to create the objects that are
 228 necessary to run one or many tests. These objects are called fixtures as they are not really
 229 part of the test themselves but rather involve getting the computer into the appropriate state.
 230
 231 For example, since fun varies a lot between people, the fun() function is a method of
 232 the Person class. In order to check the fun function, then, we need to create an appropriate
 233 Person object on which to run fun().
 234
 235 **Setup and teardown:** Creating fixtures is often done in a call to a setup function.
 236 Deleting them and other cleanup is done in a teardown function.
 237
 238 **The Big Picture:** Putting all this together, the testing algorithm is often:
 239
 240 .. code-block:: python
 241
 242     setup()
 243     test()
 244     teardown()
 245
 246
 247 But, sometimes it's the case that your tests change the fixtures. If so, it's better
 248 for the setup() and teardown() functions to occur on either side of each test. In
 249 that case, the testing algorithm should be:
 250
 251 .. code-block:: python
 252
 253     setup()
 254     test1()
 255     teardown()
 256
 257     setup()
 258     test2()
 259     teardown()
 260
 261     setup()
 262     test3()
 263     teardown()
 264
 265 ----------------------------------------------------------
 266
 267 Nose: A Python Testing Framework
 268 ================================
 269 The testing framework we'll discuss today is called nose.  However, there are several
 270 other testing frameworks available in most language.  Most notably there is `JUnit`_
 271 in Java which can arguably attributed to inventing the testing framework.
 272
 273 .. _nose: http://readthedocs.org/docs/nose/en/latest/
 274 .. _JUnit: http://www.junit.org/
 275
 276 Where do nose tests live?
 277 *************************
 278 Nose tests are files that begin with ``Test-``, ``Test_``, ``test-``, or ``test_``.
 279 Specifically, these satisfy the testMatch regular expression ``[Tt]est[-_]``.
 280 (You can also teach nose to find tests by declaring them in the unittest.TestCase
 281 subclasses chat you create in your code. You can also create test functions which
 282 are not unittest.TestCase subclasses if they are named with the configured
 283 testMatch regular expression.)
 284
 285 Nose Test Syntax
 286 ****************
 287 To write a nose test, we make assertions.
 288
 289 .. code-block:: python
 290
 291     assert should_be_true()
 292     assert not should_not_be_true()
 293
 294 Additionally, nose itself defines number of assert functions which can be used to
 295 test more specific aspects of the code base.
 296
 297 .. code-block:: python
 298
 299     from nose.tools import *
 300
 301     assert_equal(a, b)
 302     assert_almost_equal(a, b)
 303     assert_true(a)
 304     assert_false(a)
 305     assert_raises(exception, func, *args, **kwargs)
 306     assert_is_instance(a, b)
 307     # and many more!
 308
 309 Moreover, numpy offers similar testing functions for arrays:
 310
 311 .. code-block:: python
 312
 313     from numpy.testing import *
 314
 315     assert_array_equal(a, b)
 316     assert_array_almost_equal(a, b)
 317     # etc.
 318
 319 Exercise: Writing tests for mean()
 320 **********************************
 321 There are a few tests for the mean() function that we listed in this lesson.
 322 What are some tests that should fail? Add at least three test cases to this set.
 323 Edit the ``test_mean.py`` file which tests the mean() function in ``mean.py``.
 324
 325 *Hint:* Think about what form your input could take and what you should do to handle it.
 326 Also, think about the type of the elements in the list. What should be done if you pass
 327 a list of integers? What if you pass a list of strings?
 328
 329 **Example**::
 330
 331     nosetests test_mean.py
 332
 333 Test Driven Development
 334 =======================
 335 Test driven development (TDD) is a philosophy whereby the developer creates code by
 336 **writing the tests fist**.  That is to say you write the tests *before* writing the
 337 associated code!
 338
 339 This is an iterative process whereby you write a test then write the minimum amount
 340 code to make the test pass.  If a new feature is needed, another test is written and
 341 the code is expanded to meet this new use case.  This continues until the code does
 342 what is needed.
 343
 344 TDD operates on the YAGNI principle (You Ain't Gonna Need It).  People who diligently
 345 follow TDD swear by its effectiveness.  This development style was put forth most
 346 strongly by `Kent Beck in 2002`_.
 347
 348 .. _Kent Beck in 2002: http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530
 349
 350 A TDD Example
 351 *************
 352 Say you want to write a fib() function which generates values of the
 353 Fibonacci sequence of given indexes.  You would - of course - start
 354 by writing the test, possibly testing a single value:
 355
 356 .. code-block:: python
 357
 358     from nose import assert_equal
 359
 360     from pisa import fib
 361
 362     def test_fib1():
 363         obs = fib(2)
 364         exp = 1
 365         assert_equal(obs, exp)
 366
 367 You would *then* go ahead and write the actual function:
 368
 369 .. code-block:: python
 370
 371     def fib(n):
 372         # you snarky so-and-so
 373         return 1
 374
 375 And that is it right?!  Well, not quite.  This implementation fails for
 376 most other values.  Adding tests we see that:
 377
 378 .. code-block:: python
 379
 380     def test_fib1():
 381         obs = fib(2)
 382         exp = 1
 383         assert_equal(obs, exp)
 384
 385
 386     def test_fib2():
 387         obs = fib(0)
 388         exp = 0
 389         assert_equal(obs, exp)
 390
 391         obs = fib(1)
 392         exp = 1
 393         assert_equal(obs, exp)
 394
 395 This extra test now requires that we bother to implement at least the initial values:
 396
 397 .. code-block:: python
 398
 399     def fib(n):
 400         # a little better
 401         if n == 0 or n == 1:
 402             return n
 403         return 1
 404
 405 However, this function still falls over for ``2 < n``.  Time for more tests!
 406
 407 .. code-block:: python
 408
 409     def test_fib1():
 410         obs = fib(2)
 411         exp = 1
 412         assert_equal(obs, exp)
 413
 414
 415     def test_fib2():
 416         obs = fib(0)
 417         exp = 0
 418         assert_equal(obs, exp)
 419
 420         obs = fib(1)
 421         exp = 1
 422         assert_equal(obs, exp)
 423
 424
 425     def test_fib3():
 426         obs = fib(3)
 427         exp = 2
 428         assert_equal(obs, exp)
 429
 430         obs = fib(6)
 431         exp = 8
 432         assert_equal(obs, exp)
 433
 434 At this point, we had better go ahead and try do the right thing...
 435
 436 .. code-block:: python
 437
 438     def fib(n):
 439         # finally, some math
 440         if n == 0 or n == 1:
 441             return n
 442         else:
 443             return fib(n - 1) + fib(n - 2)
 444
 445 Here it becomes very tempting to take an extended coffee break or possibly a
 446 power lunch.  But then you remember those pesky negative numbers and floats.
 447 Perhaps the right thing to do here is to just be undefined.
 448
 449 .. code-block:: python
 450
 451     def test_fib1():
 452         obs = fib(2)
 453         exp = 1
 454         assert_equal(obs, exp)
 455
 456
 457     def test_fib2():
 458         obs = fib(0)
 459         exp = 0
 460         assert_equal(obs, exp)
 461
 462         obs = fib(1)
 463         exp = 1
 464         assert_equal(obs, exp)
 465
 466
 467     def test_fib3():
 468         obs = fib(3)
 469         exp = 2
 470         assert_equal(obs, exp)
 471
 472         obs = fib(6)
 473         exp = 8
 474         assert_equal(obs, exp)
 475
 476
 477     def test_fib3():
 478         obs = fib(13.37)
 479         exp = NotImplemented
 480         assert_equal(obs, exp)
 481
 482         obs = fib(-9000)
 483         exp = NotImplemented
 484         assert_equal(obs, exp)
 485
 486 This means that it is time to add the appropriate case to the function itself:
 487
 488 .. code-block:: python
 489
 490     def fib(n):
 491         # sequence and you shall find
 492         if n < 0 or int(n) != n:
 493             return NotImplemented
 494         elif n == 0 or n == 1:
 495             return n
 496         else:
 497             return fib(n - 1) + fib(n - 2)
 498
 499 And thus - finally - we have a robust function together with working tests!
 500
 501 Exercise
 502 ========
 503 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which define a line segment.
 504 Additionally there exists experimental data which can be anywhere in the domain.
 505 Find the data point which is closest to the line segment.
 506
 507 In the ``close_line.py`` file there are four different implementations which all
 508 solve this problem.  `You can read more about them here.`_  However, there are no tests!
 509 Please write from scratch a ``test_close_line.py`` file which tests the closest_data_to_line()
 510 functions.
 511
 512 *Hint:* you can use one implementation function to test another.  Below is some sample data
 513 to help you get started.
 514
 515 .. image:: https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png
 516
 517 .. code-block:: python
 518
 519     import numpy as np
 520
 521     p1 = np.array([0.0, 0.0])
 522     p2 = np.array([1.0, 1.0])
 523     data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
 524
 525 .. _You can read more about them here.: http://inscight.org/2012/03/31/evolution_of_a_solution/
 526
 527