2 Debugging](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/4-Debugging/)
4 Documentation](https://github.com/thehackerwithin/UofCSCBC2012/tree/master/6-Documentation/)
8 **Presented By Anthony Scopatz**
10 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
13 ![image](http://memecreator.net/the-most-interesting-man-in-the-world/showimage.php/169/I-don't-always-test-my-code-But-when-I-do-I-do-it-in-production.jpg)
16 Software testing is a process by which one or more expected behaviors
17 and results from a piece of software are exercised and confirmed. Well
18 chosen tests will confirm expected code behavior for the extreme
19 boundaries of the input domains, output ranges, parametric combinations,
20 and other behavioral edge cases.
24 Unless you write flawless, bug-free, perfectly accurate, fully precise,
25 and predictable code every time, you must test your code in order to
26 trust it enough to answer in the affirmative to at least a few of the
29 - Does your code work?
31 - Does it do what you think it does?
32 - Does it continue to work after changes are made?
33 - Does it continue to work after system configurations or libraries
35 - Does it respond properly for a full range of input parameters?
36 - What about edge or corner cases?
37 - What's the limit on that input parameter?
41 *Verification* is the process of asking, "Have we built the software
42 correctly?" That is, is the code bug free, precise, accurate, and
47 *Validation* is the process of asking, "Have we built the right
48 software?" That is, is the code designed in such a way as to produce the
49 answers we are interested in, data we want, etc.
51 ## Uncertainty Quantification
53 *Uncertainty Quantification* is the process of asking, "Given that our
54 algorithm may not be deterministic, was our execution within acceptable
55 error bounds?" This is particularly important for anything which uses
56 random numbers, eg Monte Carlo methods.
60 Say we have an averaging function:
69 Tests could be implemented as runtime exceptions in the function:
77 print "The number list was not a list of numbers."
79 print "There was a problem evaluating the number list."
83 Sometimes tests they are functions alongside the function definitions
92 print "The number list was not a list of numbers."
94 print "There was a problem evaluating the number list."
99 assert mean([0, 0, 0, 0]) == 0
100 assert mean([0, 200]) == 100
101 assert mean([0, -200]) == -100
102 assert mean([0]) == 0
105 def test_floating_mean():
106 assert mean([1, 2]) == 1.5
109 Sometimes they are in an executable independent of the main executable.
115 length = len(numlist)
117 print "The number list was not a list of numbers."
119 print "There was a problem evaluating the number list."
123 Where, in a different file exists a test module:
129 assert mean([0, 0, 0, 0]) == 0
130 assert mean([0, 200]) == 100
131 assert mean([0, -200]) == -100
132 assert mean([0]) == 0
135 def test_floating_mean():
136 assert mean([1, 2]) == 1.5
139 # When should we test?
141 The three right answers are:
147 The longer answer is that testing either before or after your software
148 is written will improve your code, but testing after your program is
149 used for something important is too late.
151 If we have a robust set of tests, we can run them before adding
152 something new and after adding something new. If the tests give the same
153 results (as appropriate), we can have some assurance that we didn'treak
154 anything. The same idea applies to making changes in your system
155 configuration, updating support codes, etc.
157 Another important feature of testing is that it helps you remember what
158 all the parts of your code do. If you are working on a large project
159 over three years and you end up with 200 classes, it may be hard to
160 remember what the widget class does in detail. If you have a test that
161 checks all of the widget's functionality, you can look at the test to
162 remember what it's supposed to do.
166 In a collaborative coding environment, where many developers contribute
167 to the same code base, developers should be responsible individually for
168 testing the functions they create and collectively for testing the code
171 Professionals often test their code, and take pride in test coverage,
172 the percent of their functions that they feel confident are
173 comprehensively tested.
175 # How are tests written?
177 The type of tests that are written is determined by the testing
178 framework you adopt. Don't worry, there are a lot of choices.
182 **Exceptions:** Exceptions can be thought of as type of runttime test.
183 They alert the user to exceptional behavior in the code. Often,
184 exceptions are related to functions that depend on input that is unknown
185 at compile time. Checks that occur within the code to handle exceptional
186 behavior that results from this type of input are called Exceptions.
188 **Unit Tests:** Unit tests are a type of test which test the fundametal
189 units of a program's functionality. Often, this is on the class or
190 function level of detail. However what defines a *code unit* is not
193 To test functions and classes, the interfaces (API) - rather than the
194 implmentation - should be tested. Treating the implementation as a ack
195 box, we can probe the expected behavior with boundary cases for the
198 **System Tests:** System level tests are intended to test the code as a
199 whole. As opposed to unit tests, system tests ask for the behavior as a
200 whole. This sort of testing involves comparison with other validated
201 codes, analytical solutions, etc.
203 **Regression Tests:** A regression test ensures that new code does
204 change anything. If you change the default answer, for example, or add a
205 new question, you'll need to make sure that missing entries are still
208 **Integration Tests:** Integration tests query the ability of the code
209 to integrate well with the system configuration and third party
210 libraries and modules. This type of test is essential for codes that
211 depend on libraries which might be updated independently of your code or
212 when your code might be used by a number of users who may have various
213 versions of libraries.
215 **Test Suites:** Putting a series of unit tests into a collection of
216 modules creates, a test suite. Typically the suite as a whole is
217 executed (rather than each test individually) when verifying that the
218 code base still functions after changes have been made.
222 **Behavior:** The behavior you want to test. For example, you might want
223 to test the fun() function.
225 **Expected Result:** This might be a single number, a range of numbers,
226 a new fully defined object, a system state, an exception, etc. When we
227 run the fun() function, we expect to generate some fun. If we don't
228 generate any fun, the fun() function should fail its test.
229 Alternatively, if it does create some fun, the fun() function should
230 pass this test. The the expected result should known *a priori*. For
231 numerical functions, this is result is ideally analytically determined
232 even if the fucntion being tested isn't.
234 **Assertions:** Require that some conditional be true. If the
235 conditional is false, the test fails.
237 **Fixtures:** Sometimes you have to do some legwork to create the
238 objects that are necessary to run one or many tests. These objects are
239 called fixtures as they are not really part of the test themselves but
240 rather involve getting the computer into the appropriate state.
242 For example, since fun varies a lot between people, the fun() function
243 is a method of the Person class. In order to check the fun function,
244 then, we need to create an appropriate Person object on which to run
247 **Setup and teardown:** Creating fixtures is often done in a call to a
248 setup function. Deleting them and other cleanup is done in a teardown
251 **The Big Picture:** Putting all this together, the testing algorithm is
260 But, sometimes it's the case that your tests change the fixtures. If so,
261 it's better for the setup() and teardown() functions to occur on either
262 side of each test. In that case, the testing algorithm should be:
280 # Nose: A Python Testing Framework
282 The testing framework we'll discuss today is called nose. However, there
283 are several other testing frameworks available in most language. Most
284 notably there is [JUnit](http://www.junit.org/) in Java which can
285 arguably attributed to inventing the testing framework.
287 ## Where do nose tests live?
289 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
290 `test_`. Specifically, these satisfy the testMatch regular expression
291 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
292 in the unittest.TestCase subclasses chat you create in your code. You
293 can also create test functions which are not unittest.TestCase
294 subclasses if they are named with the configured testMatch regular
299 To write a nose test, we make assertions.
302 assert should_be_true()
303 assert not should_not_be_true()
306 Additionally, nose itself defines number of assert functions which can
307 be used to test more specific aspects of the code base.
310 from nose.tools import *
313 assert_almost_equal(a, b)
316 assert_raises(exception, func, *args, **kwargs)
317 assert_is_instance(a, b)
321 Moreover, numpy offers similar testing functions for arrays:
324 from numpy.testing import *
326 assert_array_equal(a, b)
327 assert_array_almost_equal(a, b)
331 ## Exersize: Writing tests for mean()
333 There are a few tests for the mean() function that we listed in this
334 lesson. What are some tests that should fail? Add at least three test
335 cases to this set. Edit the `test_mean.py` file which tests the mean()
336 function in `mean.py`.
338 *Hint:* Think about what form your input could take and what you should
339 do to handle it. Also, think about the type of the elements in the list.
340 What should be done if you pass a list of integers? What if you pass a
345 nosetests test_mean.py
347 # Test Driven Development
349 Test driven development (TDD) is a philosophy whereby the developer
350 creates code by **writing the tests fist**. That is to say you write the
351 tests *before* writing the associated code!
353 This is an iterative process whereby you write a test then write the
354 minimum amount code to make the test pass. If a new feature is needed,
355 another test is written and the code is expanded to meet this new use
356 case. This continues until the code does what is needed.
358 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
359 who diligently follow TDD swear by its effectiveness. This development
360 style was put forth most strongly by [Kent Beck in
361 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
365 Say you want to write a fib() function which generates values of the
366 Fibinacci sequence fof given indexes. You would - of course - start by
367 writing the test, possibly testing a single value:
370 from nose import assert_equal
377 assert_equal(obs, exp)
380 You would *then* go ahead and write the actual function:
384 # you snarky so-and-so
388 And that is it right?! Well, not quite. This implementation fails for
389 most other values. Adding tests we see that:
395 assert_equal(obs, exp)
401 assert_equal(obs, exp)
405 assert_equal(obs, exp)
408 This extra test now requires that we bother to implement at least the
419 However, this function still falls over for `2 < n`. Time for more
426 assert_equal(obs, exp)
432 assert_equal(obs, exp)
436 assert_equal(obs, exp)
442 assert_equal(obs, exp)
446 assert_equal(obs, exp)
449 At this point, we had better go ahead and try do the right thing...
457 return fib(n - 1) + fib(n - 2)
460 Here it becomes very tempting to take an extended coffee break or
461 possibly a power lunch. But then you remember those pesky negative
462 numbers and floats. Perhaps the right thing to do here is to just be
469 assert_equal(obs, exp)
475 assert_equal(obs, exp)
479 assert_equal(obs, exp)
485 assert_equal(obs, exp)
489 assert_equal(obs, exp)
495 assert_equal(obs, exp)
499 assert_equal(obs, exp)
502 This means that it is time to add the appropriate case to the funtion
507 # sequence and you shall find
508 if n < 0 or int(n) != n:
509 return NotImplemented
510 elif n == 0 or n == 1:
513 return fib(n - 1) + fib(n - 2)
516 And thus - finally - we have a robust function together with working