From: C. Titus Brown <titus@idyll.org>
Date: Sat, 17 Nov 2012 15:40:20 +0000 (-0800)
Subject: updated notebook links
X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=a0f447586a332d35eb578660d09a495f88f2f149;p=swc-testing-nose.git

updated notebook links

W. Trevor King: I dropped everything from the original 8113d9e except
for the python/testing* modification.

Conflicts:
	day2.rst
	python-packages.rst
---

diff --git a/python/testing-with-nose.ipynb b/python/testing-with-nose.ipynb
index 5dbe383..45ec9ac 100644
--- a/python/testing-with-nose.ipynb
+++ b/python/testing-with-nose.ipynb
@@ -7,17 +7,32 @@
  "worksheets": [
   {
    "cells": [
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## unit tests\n",
+      "\n",
+      "This is an example of unit testing with nose.  We are trying to make sure that the function calc_gc properly calculated the gc fraction of the DNA sequence.\n",
+      "\n",
+      "Problems worked through in class included --\n",
+      "\n",
+      "1. the sequence contained 'N's\n",
+      "2. the sequence contained lowercase char\n",
+      "3. divide by zero for sequences with no A, T, C, G"
+     ]
+    },
     {
      "cell_type": "code",
      "collapsed": false,
      "input": [
       "%%file calc_gc.py\n",
       "def calc_gc(sequence):\n",
-      "    sequence = sequence.upper()\n",
-      "    n = sequence.count('T') + sequence.count('A')\n",
-      "    m = sequence.count('G') + sequence.count('C')\n",
+      "    sequence = sequence.upper()                    # make all chars uppercase\n",
+      "    n = sequence.count('T') + sequence.count('A')  # count only A, T,\n",
+      "    m = sequence.count('G') + sequence.count('C')  # C, and G -- nothing else (no Ns, Rs, Ws, etc.)\n",
       "    if n + m == 0:\n",
-      "        return 0.\n",
+      "        return 0.                                  # avoid divide-by-zero\n",
       "    return float(m) / float(n + m)\n",
       "\n",
       "def test_1():\n",
@@ -25,11 +40,11 @@
       "    print 'hello, this is a test; the value of result is', result\n",
       "    assert result == 0.43\n",
       "    \n",
-      "def test_2():\n",
+      "def test_2(): # test handling N\n",
       "    result = round(calc_gc('NATGC'), 2)\n",
       "    assert result == 0.5, result\n",
       "    \n",
-      "def test_3():\n",
+      "def test_3(): # test handling lowercase\n",
       "    result = round(calc_gc('natgc'), 2)\n",
       "    assert result == 0.5, result\n"
      ],
@@ -40,24 +55,56 @@
        "output_type": "stream",
        "stream": "stdout",
        "text": [
-        "Overwriting calc_gc.py"
+        "Overwriting calc_gc.py\n"
        ]
-      },
+      }
+     ],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Running nosetests\n",
+      "\n",
+      "Here, the 'nosetests' command looks through calc_gc.py, finds all functions named test_, and runs them."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "!nosetests calc_gc.py"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
       {
        "output_type": "stream",
        "stream": "stdout",
        "text": [
-        "\n"
+        "...\r\n",
+        "----------------------------------------------------------------------\r\n",
+        "Ran 3 tests in 0.001s\r\n",
+        "\r\n",
+        "OK\r\n"
        ]
       }
      ],
-     "prompt_number": 42
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "You can also run nosetests with a '-v' option:"
+     ]
     },
     {
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "!nosetests calc_gc.py"
+      "!nosetests -v calc_gc.py"
      ],
      "language": "python",
      "metadata": {},
@@ -66,7 +113,10 @@
        "output_type": "stream",
        "stream": "stdout",
        "text": [
-        "...\r\n",
+        "calc_gc.test_1 ... ok\r\n",
+        "calc_gc.test_2 ... ok\r\n",
+        "calc_gc.test_3 ... ok\r\n",
+        "\r\n",
         "----------------------------------------------------------------------\r\n",
         "Ran 3 tests in 0.001s\r\n",
         "\r\n",
@@ -74,7 +124,18 @@
        ]
       }
      ],
-     "prompt_number": 43
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Regression testing\n",
+      "\n",
+      "Here I'm going to set up some regression tests, where we're simply comparing the output of a previously run script with the output of that script now.  If we're running on the same data, we should get the same answer... right?\n",
+      "\n",
+      "The script just calculates the average of the average GC content of each sequence in 25k.fq.gz."
+     ]
     },
     {
      "cell_type": "code",
@@ -85,7 +146,7 @@
       "import screed\n",
       "import calc_gc\n",
       "\n",
-      "filename = sys.argv[1]\n",
+      "filename = sys.argv[1]    # take the sequence filename in from the command line\n",
       "total_gc = []\n",
       "for record in screed.open(filename):\n",
       "    gc = calc_gc.calc_gc(record.sequence)\n",
@@ -100,7 +161,7 @@
        "output_type": "stream",
        "stream": "stdout",
        "text": [
-        "Writing gc-of-seqs.py"
+        "Overwriting gc-of-seqs.py"
        ]
       },
       {
@@ -111,12 +172,13 @@
        ]
       }
      ],
-     "prompt_number": 44
+     "prompt_number": 4
     },
     {
      "cell_type": "code",
      "collapsed": false,
      "input": [
+      "# run the script and look at the output -- then write that output into the following file.\n",
       "!python gc-of-seqs.py 25k.fq.gz"
      ],
      "language": "python",
@@ -130,7 +192,7 @@
        ]
       }
      ],
-     "prompt_number": 47
+     "prompt_number": 5
     },
     {
      "cell_type": "code",
@@ -139,8 +201,11 @@
       "%%file test_gc_script.py\n",
       "import subprocess\n",
       "\n",
-      "correct_output = \"0.607911191366\\n\"\n",
+      "correct_output = \"0.607911191366\\n\"   # this is taken from the previous exec'd cell\n",
       "\n",
+      "# the following function checks to see if running this script at the command line\n",
+      "# returns the right result.  make sure you're running this from *within* the python/ subdirectory\n",
+      "# of the 2012-11-scripps/ repository.\n",
       "def test_run():\n",
       "    p = subprocess.Popen('python gc-of-seqs.py 25k.fq.gz', shell=True, stdout=subprocess.PIPE)\n",
       "    (stdout, stderr) = p.communicate()\n",
@@ -165,7 +230,7 @@
        ]
       }
      ],
-     "prompt_number": 52
+     "prompt_number": 6
     },
     {
      "cell_type": "code",
@@ -182,13 +247,13 @@
        "text": [
         ".\r\n",
         "----------------------------------------------------------------------\r\n",
-        "Ran 1 test in 0.969s\r\n",
+        "Ran 1 test in 0.937s\r\n",
         "\r\n",
         "OK\r\n"
        ]
       }
      ],
-     "prompt_number": 53
+     "prompt_number": 7
     },
     {
      "cell_type": "code",
@@ -196,7 +261,8 @@
      "input": [],
      "language": "python",
      "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": 7
     }
    ],
    "metadata": {}