5-Testing/Readme.md

   1 [Back To
   2 Debugging](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/)
   3 - [Forward To
   4 Documentation](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/)
   5
   6 * * * * *
   7
   8 **Presented By Anthony Scopatz**
   9
  10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
  11 Scopatz**
  12
  13 # What is testing?
  14
  15 Software testing is a process by which one or more expected behaviors
  16 and results from a piece of software are exercised and confirmed. Well
  17 chosen tests will confirm expected code behavior for the extreme
  18 boundaries of the input domains, output ranges, parametric combinations,
  19 and other behavioral edge cases.
  20
  21 # Why test software?
  22
  23 Unless you write flawless, bug-free, perfectly accurate, fully precise,
  24 and predictable code every time, you must test your code in order to
  25 trust it enough to answer in the affirmative to at least a few of the
  26 following questions:
  27
  28 -   Does your code work?
  29 -   Always?
  30 -   Does it do what you think it does?
  31 -   Does it continue to work after changes are made?
  32 -   Does it continue to work after system configurations or libraries
  33     are upgraded?
  34 -   Does it respond properly for a full range of input parameters?
  35 -   What about edge or corner cases?
  36 -   What's the limit on that input parameter?
  37
  38 ## Verification
  39
  40 *Verification* is the process of asking, "Have we built the software
  41 correctly?" That is, is the code bug free, precise, accurate, and
  42 repeatable?
  43
  44 ## Validation
  45
  46 *Validation* is the process of asking, "Have we built the right
  47 software?" That is, is the code designed in such a way as to produce the
  48 answers we are interested in, data we want, etc.
  49
  50 ## Uncertainty Quantification
  51
  52 *Uncertainty Quantification* is the process of asking, "Given that our
  53 algorithm may not be deterministic, was our execution within acceptable
  54 error bounds?" This is particularly important for anything which uses
  55 random numbers, eg Monte Carlo methods.
  56
  57 # Where are tests?
  58
  59 Say we have an averaging function:
  60
  61 ```python
  62 def mean(numlist):
  63     total = sum(numlist)
  64     length = len(numlist)
  65     return total/length
  66 ```
  67
  68 Tests could be implemented as runtime exceptions in the function:
  69
  70 ```python
  71 def mean(numlist):
  72     try:
  73         total = sum(numlist)
  74         length = len(numlist)
  75     except ValueError:
  76         print "The number list was not a list of numbers."
  77     except:
  78         print "There was a problem evaluating the number list."
  79     return total/length
  80 ```
  81
  82 Sometimes tests they are functions alongside the function definitions
  83 they are testing.
  84
  85 ```python
  86 def mean(numlist):
  87     try:
  88         total = sum(numlist)
  89         length = len(numlist)
  90     except ValueError:
  91         print "The number list was not a list of numbers."
  92     except:
  93         print "There was a problem evaluating the number list."
  94     return total/length
  95
  96
  97 def test_mean():
  98     assert mean([0, 0, 0, 0]) == 0
  99     assert mean([0, 200]) == 100
 100     assert mean([0, -200]) == -100
 101     assert mean([0]) == 0
 102
 103
 104 def test_floating_mean():
 105     assert mean([1, 2]) == 1.5
 106 ```
 107
 108 Sometimes they are in an executable independent of the main executable.
 109
 110 ```python
 111 def mean(numlist):
 112     try:
 113         total = sum(numlist)
 114         length = len(numlist)
 115     except ValueError:
 116         print "The number list was not a list of numbers."
 117     except:
 118         print "There was a problem evaluating the number list."
 119     return total/length
 120 ```
 121
 122 Where, in a different file exists a test module:
 123
 124 ```python
 125 import mean
 126
 127   def test_mean():
 128       assert mean([0, 0, 0, 0]) == 0
 129       assert mean([0, 200]) == 100
 130       assert mean([0, -200]) == -100
 131       assert mean([0]) == 0
 132
 133
 134   def test_floating_mean():
 135       assert mean([1, 2]) == 1.5
 136 ```
 137
 138 # When should we test?
 139
 140 **ALWAYS!!!**
 141
 142 The longer answer is that testing either before or after your software
 143 is written will improve your code, but testing after your program is
 144 used for something important is too late.
 145
 146 If we have a robust set of tests, we can run them before adding
 147 something new and after adding something new. If the tests give the same
 148 results (as appropriate), we can have some assurance that we didn'treak
 149 anything. The same idea applies to making changes in your system
 150 configuration, updating support codes, etc.
 151
 152 Another important feature of testing is that it helps you remember what
 153 all the parts of your code do. If you are working on a large project
 154 over three years and you end up with 200 classes, it may be hard to
 155 remember what the widget class does in detail. If you have a test that
 156 checks all of the widget's functionality, you can look at the test to
 157 remember what it's supposed to do.
 158
 159 # Who should test?
 160
 161 In a collaborative coding environment, where many developers contribute
 162 to the same code base, developers should be responsible individually for
 163 testing the functions they create and collectively for testing the code
 164 as a whole.
 165
 166 Professionals often test their code, and take pride in test coverage,
 167 the percent of their functions that they feel confident are
 168 comprehensively tested.
 169
 170 # How are tests written?
 171
 172 The type of tests that are written is determined by the testing
 173 framework you adopt. Don't worry, there are a lot of choices.
 174
 175 ## Types of Tests
 176
 177 **Exceptions:** Exceptions can be thought of as type of runttime test.
 178 They alert the user to exceptional behavior in the code. Often,
 179 exceptions are related to functions that depend on input that is unknown
 180 at compile time. Checks that occur within the code to handle exceptional
 181 behavior that results from this type of input are called Exceptions.
 182
 183 **Unit Tests:** Unit tests are a type of test which test the fundametal
 184 units of a program's functionality. Often, this is on the class or
 185 function level of detail. However what defines a *code unit* is not
 186 formally defined.
 187
 188 To test functions and classes, the interfaces (API) - rather than the
 189 implmentation - should be tested. Treating the implementation as a ack
 190 box, we can probe the expected behavior with boundary cases for the
 191 inputs.
 192
 193 **System Tests:** System level tests are intended to test the code as a
 194 whole. As opposed to unit tests, system tests ask for the behavior as a
 195 whole. This sort of testing involves comparison with other validated
 196 codes, analytical solutions, etc.
 197
 198 **Regression Tests:** A regression test ensures that new code does
 199 change anything. If you change the default answer, for example, or add a
 200 new question, you'll need to make sure that missing entries are still
 201 found and fixed.
 202
 203 **Integration Tests:** Integration tests query the ability of the code
 204 to integrate well with the system configuration and third party
 205 libraries and modules. This type of test is essential for codes that
 206 depend on libraries which might be updated independently of your code or
 207 when your code might be used by a number of users who may have various
 208 versions of libraries.
 209
 210 **Test Suites:** Putting a series of unit tests into a collection of
 211 modules creates, a test suite. Typically the suite as a whole is
 212 executed (rather than each test individually) when verifying that the
 213 code base still functions after changes have been made.
 214
 215 # Elements of a Test
 216
 217 **Behavior:** The behavior you want to test. For example, you might want
 218 to test the fun() function.
 219
 220 **Expected Result:** This might be a single number, a range of numbers,
 221 a new fully defined object, a system state, an exception, etc. When we
 222 run the fun() function, we expect to generate some fun. If we don't
 223 generate any fun, the fun() function should fail its test.
 224 Alternatively, if it does create some fun, the fun() function should
 225 pass this test. The the expected result should known *a priori*. For
 226 numerical functions, this is result is ideally analytically determined
 227 even if the fucntion being tested isn't.
 228
 229 **Assertions:** Require that some conditional be true. If the
 230 conditional is false, the test fails.
 231
 232 **Fixtures:** Sometimes you have to do some legwork to create the
 233 objects that are necessary to run one or many tests. These objects are
 234 called fixtures as they are not really part of the test themselves but
 235 rather involve getting the computer into the appropriate state.
 236
 237 For example, since fun varies a lot between people, the fun() function
 238 is a method of the Person class. In order to check the fun function,
 239 then, we need to create an appropriate Person object on which to run
 240 fun().
 241
 242 **Setup and teardown:** Creating fixtures is often done in a call to a
 243 setup function. Deleting them and other cleanup is done in a teardown
 244 function.
 245
 246 **The Big Picture:** Putting all this together, the testing algorithm is
 247 often:
 248
 249 ```python
 250 setup()
 251 test()
 252 teardown()
 253 ```
 254
 255 But, sometimes it's the case that your tests change the fixtures. If so,
 256 it's better for the setup() and teardown() functions to occur on either
 257 side of each test. In that case, the testing algorithm should be:
 258
 259 ```python
 260 setup()
 261 test1()
 262 teardown()
 263
 264 setup()
 265 test2()
 266 teardown()
 267
 268 setup()
 269 test3()
 270 teardown()
 271 ```
 272
 273 * * * * *
 274
 275 # Nose: A Python Testing Framework
 276
 277 The testing framework we'll discuss today is called nose. However, there
 278 are several other testing frameworks available in most language. Most
 279 notably there is [JUnit](http://www.junit.org/) in Java which can
 280 arguably attributed to inventing the testing framework.
 281
 282 ## Where do nose tests live?
 283
 284 Nose tests are files that begin with Test-, Test\_, test-, or test\_.
 285 Specifically, these satisfy the testMatch regular expression
 286 [Tt]est[-\_]. (You can also teach nose to find tests by declaring them
 287 in the unittest.TestCase subclasses chat you create in your code. You
 288 can also create test functions which are not unittest.TestCase
 289 subclasses if they are named with the configured testMatch regular
 290 expression.)
 291
 292 ## Nose Test Syntax
 293
 294 To write a nose test, we make assertions.
 295
 296 ```python
 297 assert should_be_true()
 298 assert not should_not_be_true()
 299 ```
 300
 301 Additionally, nose itself defines number of assert functions which can
 302 be used to test more specific aspects of the code base.
 303
 304 ```python
 305 from nose.tools import *
 306
 307 assert_equal(a, b)
 308 assert_almost_equal(a, b)
 309 assert_true(a)
 310 assert_false(a)
 311 assert_raises(exception, func, *args, **kwargs)
 312 assert_is_instance(a, b)
 313 # and many more!
 314 ```
 315
 316 Moreover, numpy offers similar testing functions for arrays:
 317
 318 ```python
 319 from numpy.testing import *
 320
 321 assert_array_equal(a, b)
 322 assert_array_almost_equal(a, b)
 323 # etc.
 324 ```
 325
 326 ## Exersize: Writing tests for mean()
 327
 328 There are a few tests for the mean() function that we listed in this
 329 lesson. What are some tests that should fail? Add at least three test
 330 cases to this set. Edit the `test_mean.py` file which tests the mean()
 331 function in `mean.py`.
 332
 333 *Hint:* Think about what form your input could take and what you should
 334 do to handle it. Also, think about the type of the elements in the list.
 335 What should be done if you pass a list of integers? What if you pass a
 336 list of strings?
 337
 338 **Example**:
 339
 340     nosetests test_mean.py
 341
 342 # Test Driven Development
 343
 344 Some people develop code by writing the tests first.
 345
 346 If you write your tests comprehensively enough, the expected behaviors
 347 that you define in your tests will be the necessary and sufficient set
 348 of behaviors your code must perform. Thus, if you write the tests first
 349 and program until the tests pass, you will have written exactly enough
 350 code to perform the behavior your want and no more. Furthermore, you
 351 will have been forced to write your code in a modular enough way to make
 352 testing easy now. This will translate into easier testing well into the
 353 future.
 354
 355 ### An example
 356
 357 The overlap method takes two rectangles (red and blue) and computes the
 358 degree of overlap between them. Save it in overlap.py. A rectangle is
 359 defined as a tuple of tuples: ((x\_lo,y\_lo),(x\_hi),(y\_hi))
 360
 361     def overlap(red, blue):
 362        '''Return overlap between two rectangles, or None.'''
 363
 364        ((red_lo_x, red_lo_y), (red_hi_x, red_hi_y)) = red
 365        ((blue_lo_x, blue_lo_y), (blue_hi_x, blue_hi_y)) = blue
 366
 367        if (red_lo_x >= blue_hi_x) or \
 368           (red_hi_x <= blue_lo_x) or \
 369           (red_lo_y >= blue_hi_x) or \
 370           (red_hi_y <= blue_lo_y):
 371            return None
 372
 373        lo_x = max(red_lo_x, blue_lo_x)
 374        lo_y = max(red_lo_y, blue_lo_y)
 375        hi_x = min(red_hi_x, blue_hi_x)
 376        hi_y = min(red_hi_y, blue_hi_y)
 377        return ((lo_x, lo_y), (hi_x, hi_y))
 378
 379 Now let's create a set of tests for this class. Before we do this, let's
 380 think about *how* we might test this method. How should it work?
 381
 382     from overlap import overlap
 383
 384     def test_empty_with_empty():
 385        rect = ((0, 0), (0, 0))
 386        assert overlap(rect, rect) == None
 387
 388     def test_empty_with_unit():
 389        empty = ((0, 0), (0, 0))
 390        unit = ((0, 0), (1, 1))
 391        assert overlap(empty, unit) == None
 392
 393     def test_unit_with_unit():
 394        unit = ((0, 0), (1, 1))
 395        assert overlap(unit, unit) == unit
 396
 397     def test_partial_overlap():
 398        red = ((0, 3), (2, 5))
 399        blue = ((1, 0), (2, 4))
 400        assert overlap(red, blue) == ((1, 3), (2, 4))
 401
 402 Run your tests.
 403
 404     [rguy@infolab-33 ~/TestExample]$ nosetests
 405     ...F
 406     ======================================================================
 407     FAIL: test_overlap.test_partial_overlap
 408     ----------------------------------------------------------------------
 409     Traceback (most recent call last):
 410       File "/usr/lib/python2.6/site-packages/nose/case.py", line 183, in runTest
 411         self.test(*self.arg)
 412       File "/afs/ictp.it/home/r/rguy/TestExample/test_overlap.py", line 19, in test_partial_overlap
 413         assert overlap(red, blue) == ((1, 3), (2, 4))
 414     AssertionError
 415
 416     ----------------------------------------------------------------------
 417     Ran 4 tests in 0.005s
 418
 419     FAILED (failures=1)
 420
 421 Oh no! Something failed. The failure was on line in this test:
 422
 423     def test_partial_overlap():
 424       red = ((0, 3), (2, 5))
 425       blue = ((1, 0), (2, 4))
 426       assert overlap(red, blue) == ((1, 3), (2, 4))
 427
 428 Can you spot why it failed? Try to fix the method so all tests pass.