1 [Back To Debugging](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/debugging)
3 Documentation](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/documentation)
7 **Presented By Anthony Scopatz**
9 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
12 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
15 Software testing is a process by which one or more expected behaviors
16 and results from a piece of software are exercised and confirmed. Well
17 chosen tests will confirm expected code behavior for the extreme
18 boundaries of the input domains, output ranges, parametric combinations,
19 and other behavioral edge cases.
23 Unless you write flawless, bug-free, perfectly accurate, fully precise,
24 and predictable code every time, you must test your code in order to
25 trust it enough to answer in the affirmative to at least a few of the
28 - Does your code work?
30 - Does it do what you think it does?
31 - Does it continue to work after changes are made?
32 - Does it continue to work after system configurations or libraries
34 - Does it respond properly for a full range of input parameters?
35 - What about edge or corner cases?
36 - What's the limit on that input parameter?
37 - How will it affect your
38 [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
42 *Verification* is the process of asking, "Have we built the software
43 correctly?" That is, is the code bug free, precise, accurate, and
48 *Validation* is the process of asking, "Have we built the right
49 software?" That is, is the code designed in such a way as to produce the
50 answers we are interested in, data we want, etc.
52 ## Uncertainty Quantification
54 *Uncertainty Quantification* is the process of asking, "Given that our
55 algorithm may not be deterministic, was our execution within acceptable
56 error bounds?" This is particularly important for anything which uses
57 random numbers, eg Monte Carlo methods.
61 Say we have an averaging function:
70 Tests could be implemented as runtime exceptions in the function:
78 print "The number list was not a list of numbers."
80 print "There was a problem evaluating the number list."
84 Sometimes tests they are functions alongside the function definitions
93 print "The number list was not a list of numbers."
95 print "There was a problem evaluating the number list."
100 assert mean([0, 0, 0, 0]) == 0
101 assert mean([0, 200]) == 100
102 assert mean([0, -200]) == -100
103 assert mean([0]) == 0
106 def test_floating_mean():
107 assert mean([1, 2]) == 1.5
110 Sometimes they are in an executable independent of the main executable.
116 length = len(numlist)
118 print "The number list was not a list of numbers."
120 print "There was a problem evaluating the number list."
124 Where, in a different file exists a test module:
130 assert mean([0, 0, 0, 0]) == 0
131 assert mean([0, 200]) == 100
132 assert mean([0, -200]) == -100
133 assert mean([0]) == 0
136 def test_floating_mean():
137 assert mean([1, 2]) == 1.5
140 # When should we test?
142 The three right answers are:
148 The longer answer is that testing either before or after your software
149 is written will improve your code, but testing after your program is
150 used for something important is too late.
152 If we have a robust set of tests, we can run them before adding
153 something new and after adding something new. If the tests give the same
154 results (as appropriate), we can have some assurance that we didn't
155 wreak anything. The same idea applies to making changes in your system
156 configuration, updating support codes, etc.
158 Another important feature of testing is that it helps you remember what
159 all the parts of your code do. If you are working on a large project
160 over three years and you end up with 200 classes, it may be hard to
161 remember what the widget class does in detail. If you have a test that
162 checks all of the widget's functionality, you can look at the test to
163 remember what it's supposed to do.
167 In a collaborative coding environment, where many developers contribute
168 to the same code base, developers should be responsible individually for
169 testing the functions they create and collectively for testing the code
172 Professionals often test their code, and take pride in test coverage,
173 the percent of their functions that they feel confident are
174 comprehensively tested.
176 # How are tests written?
178 The type of tests that are written is determined by the testing
179 framework you adopt. Don't worry, there are a lot of choices.
183 **Exceptions:** Exceptions can be thought of as type of runtime test.
184 They alert the user to exceptional behavior in the code. Often,
185 exceptions are related to functions that depend on input that is unknown
186 at compile time. Checks that occur within the code to handle exceptional
187 behavior that results from this type of input are called Exceptions.
189 **Unit Tests:** Unit tests are a type of test which test the fundamental
190 units of a program's functionality. Often, this is on the class or
191 function level of detail. However what defines a *code unit* is not
194 To test functions and classes, the interfaces (API) - rather than the
195 implementation - should be tested. Treating the implementation as a
196 black box, we can probe the expected behavior with boundary cases for
199 **System Tests:** System level tests are intended to test the code as a
200 whole. As opposed to unit tests, system tests ask for the behavior as a
201 whole. This sort of testing involves comparison with other validated
202 codes, analytical solutions, etc.
204 **Regression Tests:** A regression test ensures that new code does
205 change anything. If you change the default answer, for example, or add a
206 new question, you'll need to make sure that missing entries are still
209 **Integration Tests:** Integration tests query the ability of the code
210 to integrate well with the system configuration and third party
211 libraries and modules. This type of test is essential for codes that
212 depend on libraries which might be updated independently of your code or
213 when your code might be used by a number of users who may have various
214 versions of libraries.
216 **Test Suites:** Putting a series of unit tests into a collection of
217 modules creates, a test suite. Typically the suite as a whole is
218 executed (rather than each test individually) when verifying that the
219 code base still functions after changes have been made.
223 **Behavior:** The behavior you want to test. For example, you might want
224 to test the fun() function.
226 **Expected Result:** This might be a single number, a range of numbers,
227 a new fully defined object, a system state, an exception, etc. When we
228 run the fun() function, we expect to generate some fun. If we don't
229 generate any fun, the fun() function should fail its test.
230 Alternatively, if it does create some fun, the fun() function should
231 pass this test. The the expected result should known *a priori*. For
232 numerical functions, this is result is ideally analytically determined
233 even if the function being tested isn't.
235 **Assertions:** Require that some conditional be true. If the
236 conditional is false, the test fails.
238 **Fixtures:** Sometimes you have to do some legwork to create the
239 objects that are necessary to run one or many tests. These objects are
240 called fixtures as they are not really part of the test themselves but
241 rather involve getting the computer into the appropriate state.
243 For example, since fun varies a lot between people, the fun() function
244 is a method of the Person class. In order to check the fun function,
245 then, we need to create an appropriate Person object on which to run
248 **Setup and teardown:** Creating fixtures is often done in a call to a
249 setup function. Deleting them and other cleanup is done in a teardown
252 **The Big Picture:** Putting all this together, the testing algorithm is
261 But, sometimes it's the case that your tests change the fixtures. If so,
262 it's better for the setup() and teardown() functions to occur on either
263 side of each test. In that case, the testing algorithm should be:
281 # Nose: A Python Testing Framework
283 The testing framework we'll discuss today is called nose. However, there
284 are several other testing frameworks available in most language. Most
285 notably there is [JUnit](http://www.junit.org/) in Java which can
286 arguably attributed to inventing the testing framework.
288 ## Where do nose tests live?
290 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
291 `test_`. Specifically, these satisfy the testMatch regular expression
292 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
293 in the unittest.TestCase subclasses chat you create in your code. You
294 can also create test functions which are not unittest.TestCase
295 subclasses if they are named with the configured testMatch regular
300 To write a nose test, we make assertions.
303 assert should_be_true()
304 assert not should_not_be_true()
307 Additionally, nose itself defines number of assert functions which can
308 be used to test more specific aspects of the code base.
311 from nose.tools import *
314 assert_almost_equal(a, b)
317 assert_raises(exception, func, *args, **kwargs)
318 assert_is_instance(a, b)
322 Moreover, numpy offers similar testing functions for arrays:
325 from numpy.testing import *
327 assert_array_equal(a, b)
328 assert_array_almost_equal(a, b)
332 ## Exercise: Writing tests for mean()
334 There are a few tests for the mean() function that we listed in this
335 lesson. What are some tests that should fail? Add at least three test
336 cases to this set. Edit the `test_mean.py` file which tests the mean()
337 function in `mean.py`.
339 *Hint:* Think about what form your input could take and what you should
340 do to handle it. Also, think about the type of the elements in the list.
341 What should be done if you pass a list of integers? What if you pass a
346 nosetests test_mean.py
348 # Test Driven Development
350 Test driven development (TDD) is a philosophy whereby the developer
351 creates code by **writing the tests fist**. That is to say you write the
352 tests *before* writing the associated code!
354 This is an iterative process whereby you write a test then write the
355 minimum amount code to make the test pass. If a new feature is needed,
356 another test is written and the code is expanded to meet this new use
357 case. This continues until the code does what is needed.
359 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
360 who diligently follow TDD swear by its effectiveness. This development
361 style was put forth most strongly by [Kent Beck in
362 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
366 Say you want to write a fib() function which generates values of the
367 Fibonacci sequence of given indexes. You would - of course - start by
368 writing the test, possibly testing a single value:
371 from nose import assert_equal
378 assert_equal(obs, exp)
381 You would *then* go ahead and write the actual function:
385 # you snarky so-and-so
389 And that is it right?! Well, not quite. This implementation fails for
390 most other values. Adding tests we see that:
396 assert_equal(obs, exp)
402 assert_equal(obs, exp)
406 assert_equal(obs, exp)
409 This extra test now requires that we bother to implement at least the
420 However, this function still falls over for `2 < n`. Time for more
427 assert_equal(obs, exp)
433 assert_equal(obs, exp)
437 assert_equal(obs, exp)
443 assert_equal(obs, exp)
447 assert_equal(obs, exp)
450 At this point, we had better go ahead and try do the right thing...
458 return fib(n - 1) + fib(n - 2)
461 Here it becomes very tempting to take an extended coffee break or
462 possibly a power lunch. But then you remember those pesky negative
463 numbers and floats. Perhaps the right thing to do here is to just be
470 assert_equal(obs, exp)
476 assert_equal(obs, exp)
480 assert_equal(obs, exp)
486 assert_equal(obs, exp)
490 assert_equal(obs, exp)
496 assert_equal(obs, exp)
500 assert_equal(obs, exp)
503 This means that it is time to add the appropriate case to the function
508 # sequence and you shall find
509 if n < 0 or int(n) != n:
510 return NotImplemented
511 elif n == 0 or n == 1:
514 return fib(n - 1) + fib(n - 2)
517 And thus - finally - we have a robust function together with working
522 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
523 define a line segment. Additionally there exists experimental data which
524 can be anywhere in the domain. Find the data point which is closest to
527 In the `close_line.py` file there are four different implementations
528 which all solve this problem. [You can read more about them
529 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
530 there are no tests! Please write from scratch a `test_close_line.py`
531 file which tests the closest\_data\_to\_line() functions.
533 *Hint:* you can use one implementation function to test another. Below
534 is some sample data to help you get started.
536 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
542 p1 = np.array([0.0, 0.0])
543 p2 = np.array([1.0, 1.0])
544 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])