1 [Back To Debugging](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/debugging) - [Forward To Documentation](https://github.com/thehackerwithin/boot-camps/tree/2013-01-chicago/documentation)
5 **Presented By Katy Huff**
7 **Based on materials by Katy Huff, Rachel Slaybaugh, and Anthony
10 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/test_prod.jpg)
13 Software testing is a process by which one or more expected behaviors
14 and results from a piece of software are exercised and confirmed. Well
15 chosen tests will confirm expected code behavior for the extreme
16 boundaries of the input domains, output ranges, parametric combinations,
17 and other behavioral **edge cases**.
21 Unless you write flawless, bug-free, perfectly accurate, fully precise,
22 and predictable code **every time**, you must test your code in order to
23 trust it enough to answer in the affirmative to at least a few of the
26 - Does your code work?
28 - Does it do what you think it does? ([Patriot Missile Failure](http://www.ima.umn.edu/~arnold/disasters/patriot.html))
29 - Does it continue to work after changes are made?
30 - Does it continue to work after system configurations or libraries
32 - Does it respond properly for a full range of input parameters?
33 - What about **edge or corner cases**?
34 - What's the limit on that input parameter?
35 - How will it affect your
36 [publications](http://www.nature.com/news/2010/101013/full/467775a.html)?
40 *Verification* is the process of asking, "Have we built the software
41 correctly?" That is, is the code bug free, precise, accurate, and
46 *Validation* is the process of asking, "Have we built the right
47 software?" That is, is the code designed in such a way as to produce the
48 answers we are interested in, data we want, etc.
50 ## Uncertainty Quantification
52 *Uncertainty Quantification* is the process of asking, "Given that our
53 algorithm may not be deterministic, was our execution within acceptable
54 error bounds?" This is particularly important for anything which uses
55 random numbers, eg Monte Carlo methods.
59 Say we have an averaging function:
68 Tests could be implemented as runtime **exceptions in the function**:
76 raise TypeError("The number list was not a list of numbers.")
78 print "There was a problem evaluating the number list."
82 Sometimes tests they are functions alongside the function definitions
91 raise TypeError("The number list was not a list of numbers.")
93 print "There was a problem evaluating the number list."
98 assert mean([0, 0, 0, 0]) == 0
99 assert mean([0, 200]) == 100
100 assert mean([0, -200]) == -100
101 assert mean([0]) == 0
104 def test_floating_mean():
105 assert mean([1, 2]) == 1.5
108 Sometimes they are in an executable independent of the main executable.
114 length = len(numlist)
116 raise TypeError("The number list was not a list of numbers.")
118 print "There was a problem evaluating the number list."
122 Where, in a different file exists a test module:
128 assert mean([0, 0, 0, 0]) == 0
129 assert mean([0, 200]) == 100
130 assert mean([0, -200]) == -100
131 assert mean([0]) == 0
134 def test_floating_mean():
135 assert mean([1, 2]) == 1.5
138 # When should we test?
140 The three right answers are:
146 The longer answer is that testing either before or after your software
147 is written will improve your code, but testing after your program is
148 used for something important is too late.
150 If we have a robust set of tests, we can run them before adding
151 something new and after adding something new. If the tests give the same
152 results (as appropriate), we can have some assurance that we didn't
153 wreak anything. The same idea applies to making changes in your system
154 configuration, updating support codes, etc.
156 Another important feature of testing is that it helps you remember what
157 all the parts of your code do. If you are working on a large project
158 over three years and you end up with 200 classes, it may be hard to
159 remember what the widget class does in detail. If you have a test that
160 checks all of the widget's functionality, you can look at the test to
161 remember what it's supposed to do.
165 In a collaborative coding environment, where many developers contribute
166 to the same code base, developers should be responsible individually for
167 testing the functions they create and collectively for testing the code
170 Professionals often test their code, and take pride in test coverage,
171 the percent of their functions that they feel confident are
172 comprehensively tested.
174 # How are tests written?
176 The type of tests that are written is determined by the testing
177 framework you adopt. Don't worry, there are a lot of choices.
181 **Exceptions:** Exceptions can be thought of as type of runtime test.
182 They alert the user to exceptional behavior in the code. Often,
183 exceptions are related to functions that depend on input that is unknown
184 at compile time. Checks that occur within the code to handle exceptional
185 behavior that results from this type of input are called Exceptions.
187 **Unit Tests:** Unit tests are a type of test which test the fundamental
188 units of a program's functionality. Often, this is on the class or
189 function level of detail. However what defines a *code unit* is not
192 To test functions and classes, the interfaces (API) - rather than the
193 implementation - should be tested. Treating the implementation as a
194 black box, we can probe the expected behavior with boundary cases for
197 **System Tests:** System level tests are intended to test the code as a
198 whole. As opposed to unit tests, system tests ask for the behavior as a
199 whole. This sort of testing involves comparison with other validated
200 codes, analytical solutions, etc.
202 **Regression Tests:** A regression test ensures that new code does
203 change anything. If you change the default answer, for example, or add a
204 new question, you'll need to make sure that missing entries are still
207 **Integration Tests:** Integration tests query the ability of the code
208 to integrate well with the system configuration and third party
209 libraries and modules. This type of test is essential for codes that
210 depend on libraries which might be updated independently of your code or
211 when your code might be used by a number of users who may have various
212 versions of libraries.
214 **Test Suites:** Putting a series of unit tests into a collection of
215 modules creates, a test suite. Typically the suite as a whole is
216 executed (rather than each test individually) when verifying that the
217 code base still functions after changes have been made.
221 **Behavior:** The behavior you want to test. For example, you might want
222 to test the fun() function.
224 **Expected Result:** This might be a single number, a range of numbers,
225 a new fully defined object, a system state, an exception, etc. When we
226 run the fun() function, we expect to generate some fun. If we don't
227 generate any fun, the fun() function should fail its test.
228 Alternatively, if it does create some fun, the fun() function should
229 pass this test. The the expected result should known *a priori*. For
230 numerical functions, this is result is ideally analytically determined
231 even if the function being tested isn't.
233 **Assertions:** Require that some conditional be true. If the
234 conditional is false, the test fails.
236 **Fixtures:** Sometimes you have to do some legwork to create the
237 objects that are necessary to run one or many tests. These objects are
238 called fixtures as they are not really part of the test themselves but
239 rather involve getting the computer into the appropriate state.
241 For example, since fun varies a lot between people, the fun() function
242 is a method of the Person class. In order to check the fun function,
243 then, we need to create an appropriate Person object on which to run
246 **Setup and teardown:** Creating fixtures is often done in a call to a
247 setup function. Deleting them and other cleanup is done in a teardown
250 **The Big Picture:** Putting all this together, the testing algorithm is
259 But, sometimes it's the case that your tests change the fixtures. If so,
260 it's better for the setup() and teardown() functions to occur on either
261 side of each test. In that case, the testing algorithm should be:
279 # Nose: A Python Testing Framework
281 The testing framework we'll discuss today is called nose. However, there
282 are several other testing frameworks available in most language. Most
283 notably there is [JUnit](http://www.junit.org/) in Java which can
284 arguably attributed to inventing the testing framework.
286 ## Where do nose tests live?
288 Nose tests are files that begin with `Test-`, `Test_`, `test-`, or
289 `test_`. Specifically, these satisfy the testMatch regular expression
290 `[Tt]est[-_]`. (You can also teach nose to find tests by declaring them
291 in the unittest.TestCase subclasses chat you create in your code. You
292 can also create test functions which are not unittest.TestCase
293 subclasses if they are named with the configured testMatch regular
298 To write a nose test, we make assertions.
301 assert should_be_true()
302 assert not should_not_be_true()
305 Additionally, nose itself defines number of assert functions which can
306 be used to test more specific aspects of the code base.
309 from nose.tools import *
312 assert_almost_equal(a, b)
315 assert_raises(exception, func, *args, **kwargs)
316 assert_is_instance(a, b)
320 Moreover, numpy offers similar testing functions for arrays:
323 from numpy.testing import *
325 assert_array_equal(a, b)
326 assert_array_almost_equal(a, b)
330 ## Exercise: Writing tests for mean()
332 There are a few tests for the mean() function that we listed in this
333 lesson. What are some tests that should fail? Add at least three test
334 cases to this set. Edit the `test_mean.py` file which tests the mean()
335 function in `mean.py`.
337 *Hint:* Think about what form your input could take and what you should
338 do to handle it. Also, think about the type of the elements in the list.
339 What should be done if you pass a list of integers? What if you pass a
344 nosetests test_mean.py
346 # Test Driven Development
348 Test driven development (TDD) is a philosophy whereby the developer
349 creates code by **writing the tests first**. That is to say you write the
350 tests *before* writing the associated code!
352 This is an iterative process whereby you write a test then write the
353 minimum amount code to make the test pass. If a new feature is needed,
354 another test is written and the code is expanded to meet this new use
355 case. This continues until the code does what is needed.
357 TDD operates on the YAGNI principle (You Ain't Gonna Need It). People
358 who diligently follow TDD swear by its effectiveness. This development
359 style was put forth most strongly by [Kent Beck in
360 2002](http://www.amazon.com/Test-Driven-Development-By-Example/dp/0321146530).
364 Say you want to write a fib() function which generates values of the
365 Fibonacci sequence of given indexes. You would - of course - start by
366 writing the test, possibly testing a single value:
369 from nose.tools import assert_equal
376 assert_equal(obs, exp)
379 You would *then* go ahead and write the actual function:
383 # you snarky so-and-so
387 And that is it right?! Well, not quite. This implementation fails for
388 most other values. Adding tests we see that:
394 assert_equal(obs, exp)
400 assert_equal(obs, exp)
404 assert_equal(obs, exp)
407 This extra test now requires that we bother to implement at least the
418 However, this function still falls over for `2 < n`. Time for more
425 assert_equal(obs, exp)
431 assert_equal(obs, exp)
435 assert_equal(obs, exp)
441 assert_equal(obs, exp)
445 assert_equal(obs, exp)
448 At this point, we had better go ahead and try do the right thing...
456 return fib(n - 1) + fib(n - 2)
459 Here it becomes very tempting to take an extended coffee break or
460 possibly a power lunch. But then you remember those pesky negative
461 numbers and floats. Perhaps the right thing to do here is to just be
468 assert_equal(obs, exp)
474 assert_equal(obs, exp)
478 assert_equal(obs, exp)
484 assert_equal(obs, exp)
488 assert_equal(obs, exp)
494 assert_equal(obs, exp)
498 assert_equal(obs, exp)
501 This means that it is time to add the appropriate case to the function
506 # sequence and you shall find
507 if n < 0 or int(n) != n:
508 return NotImplemented
509 elif n == 0 or n == 1:
512 return fib(n - 1) + fib(n - 2)
515 # Quality Assurance Exercise
517 Can you think of other tests to make for the fibonacci function? I promise there
520 Implement one new test in test_fib.py, run nosetests, and if it fails, implement
521 a more robust function for that case.
523 And thus - finally - we have a robust function together with working
528 **The Problem:** In 2D or 3D, we have two points (p1 and p2) which
529 define a line segment. Additionally there exists experimental data which
530 can be anywhere in the domain. Find the data point which is closest to
533 In the `close_line.py` file there are four different implementations
534 which all solve this problem. [You can read more about them
535 here.](http://inscight.org/2012/03/31/evolution_of_a_solution/) However,
536 there are no tests! Please write from scratch a `test_close_line.py`
537 file which tests the closest\_data\_to\_line() functions.
539 *Hint:* you can use one implementation function to test another. Below
540 is some sample data to help you get started.
542 ![image](https://github.com/thehackerwithin/UofCSCBC2012/raw/scopz/5-Testing/evo_sol1.png)
548 p1 = np.array([0.0, 0.0])
549 p2 = np.array([1.0, 1.0])
550 data = np.array([[0.3, 0.6], [0.25, 0.5], [1.0, 0.75]])
552 # Building a Library of Code you Trust
554 Suppose we’re going to be dealing a lot with these animal count files,
555 and doing many different kinds of analysis with them. In the
556 introduction to Python lesson we wrote a function that reads these files
557 but it’s stuck off in an IPython notebook. We could copy and paste it
558 into a new notebook every time we want to use it but that gets tedious
559 and makes it difficult to add features to the function. The ideal
560 solution would be to keep the function in one spot and use it over and
561 over again from many different places. Python modules to the rescue!
563 We’re going to move beyond the IPython notebook. Most Python code is
564 stored in \`.py\` files and then used in other \`.py\` files where it
565 has been pulled in using an \`import\` statement. Today we’ll show you
572 Make a new text file called \`animals.py\`. Copy the file reading
573 function from yesterday’s IPython notebook into the file and modify it
574 so that it returns the columns of the file as lists (instead of printing
579 We’re going to make a function to calculate the mean of all the values
580 in a list, but we’re going to write the tests for it first. Make a new
581 text file called \`test\_animals.py\`. Make a function called
582 \`test\_mean\` that runs your theoretical mean function through several
587 Write the mean function in \`animals.py\` and verify that it passes your
592 Write tests for a function that will take a file name and animal name as
593 arguments, and return the average number of animals per sighting.
597 Write a function that takes a file name and animal name and returns the
598 average number of animals per sighting. Make sure it passes your tests.