More database lessons for novices. To do:
authorGreg Wilson <gvwilson@third-bit.com>
Sun, 10 Nov 2013 10:30:52 +0000 (05:30 -0500)
committerW. Trevor King <wking@tremily.us>
Tue, 11 Mar 2014 04:30:41 +0000 (21:30 -0700)
*   `group by` in aggregation
*   data hygiene
*   explain the `%%sqlite` magic

sql/novice/04-calc.ipynb [new file with mode: 0644]
sql/novice/05-null.ipynb [new file with mode: 0644]
sql/novice/06-agg.ipynb [new file with mode: 0644]
sql/novice/07-join.ipynb [new file with mode: 0644]
sql/novice/08-create.ipynb [new file with mode: 0644]
sql/novice/09-hygiene.ipynb [new file with mode: 0644]
sql/novice/10-prog.ipynb [new file with mode: 0644]

diff --git a/sql/novice/04-calc.ipynb b/sql/novice/04-calc.ipynb
new file mode 100644 (file)
index 0000000..2ecf97d
--- /dev/null
@@ -0,0 +1,313 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Calculating New Values"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "After carefully re-reading the expedition logs,\n",
+      "we realize that the radiation measurements they report\n",
+      "may need to be corrected upward by 5%.\n",
+      "Rather than modifying the stored data,\n",
+      "we can do this calculation on the fly\n",
+      "as part of our query:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%load_ext sqlitemagic"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select 1.05 * reading from Survey where quant='rad';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>10.311</td></tr>\n",
+        "<tr><td>8.19</td></tr>\n",
+        "<tr><td>8.8305</td></tr>\n",
+        "<tr><td>7.581</td></tr>\n",
+        "<tr><td>4.5675</td></tr>\n",
+        "<tr><td>2.2995</td></tr>\n",
+        "<tr><td>1.533</td></tr>\n",
+        "<tr><td>11.8125</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4e50>"
+       ]
+      }
+     ],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "When we run the query,\n",
+      "the expression `1.05 * reading` is evaluated for each row.\n",
+      "Expressions can use any of the fields,\n",
+      "all of usual arithmetic operators,\n",
+      "and a variety of common functions.\n",
+      "(Exactly which ones depends on which database manager is being used.)\n",
+      "For example,\n",
+      "we can convert temperature readings from Fahrenheit to Celsius\n",
+      "and round to two decimal places:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select taken, round(5*(reading-32)/9, 2) from Survey where quant='temp';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>734</td><td>-29.72</td></tr>\n",
+        "<tr><td>735</td><td>-32.22</td></tr>\n",
+        "<tr><td>751</td><td>-28.06</td></tr>\n",
+        "<tr><td>752</td><td>-26.67</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023cbd90>"
+       ]
+      }
+     ],
+     "prompt_number": 4
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We can also combine values from different fields,\n",
+      "for example by using the string concatenation operator `||`:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select personal || ' ' || family from Person;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>William Dyer</td></tr>\n",
+        "<tr><td>Frank Pabodie</td></tr>\n",
+        "<tr><td>Anderson Lake</td></tr>\n",
+        "<tr><td>Valentina Roerich</td></tr>\n",
+        "<tr><td>Frank Danforth</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c46d0>"
+       ]
+      }
+     ],
+     "prompt_number": 5
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "> It may seem strange to use `personal` and `family` as field names\n",
+      "> instead of `first` and `last`,\n",
+      "> but it's a necessary first step toward handling cultural differences.\n",
+      "> For example,\n",
+      "> consider the following rules:\n",
+      "\n",
+      "<table>\n",
+      "  <tr> <th>Full Name</th> <th>Alphabetized Under</th> <th>Reason</th> </tr>\n",
+      "  <tr> <td>Liu Xiaobo</td> <td>Liu</td> <td>Chinese family names come first</td> </tr>\n",
+      "  <tr> <td> Leonardo da Vinci</td> <td>Leonardo</td> <td>\"da Vinci\" just means \"from Vinci\"</td> </tr>\n",
+      "  <tr> <td> Catherine de Medici</td> <td>Medici</td> <td>family name</td> </tr>\n",
+      "  <tr> <td> Jean de La Fontaine</td> <td>La Fontaine</td> <td>family name is \"La Fontaine\"</td> </tr>\n",
+      "  <tr> <td> Juan Ponce de Leon</td> <td>Ponce de Leon</td> <td>full family name is \"Ponce de Leon\"</td> </tr>\n",
+      "  <tr> <td> Gabriel Garcia Marquez</td> <td>Garcia Marquez</td> <td>double-barrelled Spanish surnames</td> </tr>\n",
+      "  <tr> <td> Wernher von Braun</td> <td>von <em>or</em> Braun</td> <td>depending on whether he was in Germany or the US</td> </tr>\n",
+      "  <tr> <td> Elizabeth Alexandra May Windsor</td> <td>Elizabeth</td> <td>monarchs alphabetize by the name under which they reigned</td> </tr>\n",
+      "  <tr> <td> Thomas a Beckett</td> <td>Thomas</td> <td>and saints according to the names by which they were canonized</td> </tr>\n",
+      "</table>\n",
+      "\n",
+      "> Clearly,\n",
+      "> even a two-part division into \"personal\" and \"family\"\n",
+      "> isn't enough..."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### Challenges\n",
+      "\n",
+      "1.  After further reading,\n",
+      "    we realize that Valentina Roerich\n",
+      "    was reporting salinity as percentages.\n",
+      "    Write a query that returns all of her salinity measurements\n",
+      "    from the `Survey` table\n",
+      "    with the values divided by 100.\n",
+      "\n",
+      "2.  The `union` operator combines the results of two queries:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Person where ident='dyer' union select * from Person where ident='roe';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>dyer</td><td>William</td><td>Dyer</td></tr>\n",
+        "<tr><td>roe</td><td>Valentina</td><td>Roerich</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023cbd50>"
+       ]
+      }
+     ],
+     "prompt_number": 6
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Use `union` to create a consolidated list of salinity measurements\n",
+      "in which Roerich's, and only Roerich's,\n",
+      "have been corrected as described in the previous challenge.\n",
+      "The output should be something like:\n",
+      "\n",
+      "<table>\n",
+      "  <tr> <td>619</td> <td>0.13</td> </tr>\n",
+      "  <tr> <td>622</td> <td>0.09</td> </tr>\n",
+      "  <tr> <td>734</td> <td>0.05</td> </tr>\n",
+      "  <tr> <td>751</td> <td>0.1</td> </tr>\n",
+      "  <tr> <td>752</td> <td>0.09</td> </tr>\n",
+      "  <tr> <td>752</td> <td>0.416</td> </tr>\n",
+      "  <tr> <td>837</td> <td>0.21</td> </tr>\n",
+      "  <tr> <td>837</td> <td>0.225</td> </tr>\n",
+      "</table>\n"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "3.  The site identifiers in the `Visited` table have two parts\n",
+      "    separated by a '-':"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select distinct site from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>DR-1</td></tr>\n",
+        "<tr><td>DR-3</td></tr>\n",
+        "<tr><td>MSK-4</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4e50>"
+       ]
+      }
+     ],
+     "prompt_number": 7
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Some major site identifiers are two letters long and some are three.\n",
+      "The \"in string\" function `instr(X, Y)`\n",
+      "returns the 1-based index of the first occurrence of string Y in string X,\n",
+      "or 0 if Y does not exist in X.\n",
+      "The substring function `substr(X, I)`\n",
+      "returns the substring of X starting at index I.\n",
+      "Use these two functions to produce a list of unique major site identifiers.\n",
+      "(For this data,\n",
+      "the list should contain only \"DR\" and \"MSK\")."
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "Next Steps"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "FIXME"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/05-null.ipynb b/sql/novice/05-null.ipynb
new file mode 100644 (file)
index 0000000..921d10b
--- /dev/null
@@ -0,0 +1,440 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Missing Data"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Real-world data is never complete&mdash;there are always holes.\n",
+      "Databases represent these holes using special value called `null`.\n",
+      "`null` is not zero, `False`, or the empty string;\n",
+      "it is a one-of-a-kind value that means \"nothing here\".\n",
+      "Dealing with `null` requires a few special tricks\n",
+      "and some careful thinking.\n",
+      "\n",
+      "To start,\n",
+      "let's have a look at the `Visited` table.\n",
+      "There are eight records,\n",
+      "but #752 doesn't have a date&mdash;or rather,\n",
+      "its date is null:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%load_ext sqlitemagic"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "<tr><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Null doesn't behave like other values.\n",
+      "If we select the records that come before 1930:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated<'1930-00-00';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c46d0>"
+       ]
+      }
+     ],
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "we get two results,\n",
+      "and if we select the ones that come during or after 1930:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated>='1930-00-00';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 4
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "we get five,\n",
+      "but record #752 isn't in either set of results.\n",
+      "The reason is that\n",
+      "`null<'1930-00-00'`\n",
+      "is neither true nor false:\n",
+      "null means, \"We don't know,\"\n",
+      "and if we don't know the value on the left side of a comparison,\n",
+      "we don't know whether the comparison is true or false.\n",
+      "Since databases represent \"don't know\" as null,\n",
+      "the value of `null<'1930-00-00'`\n",
+      "is actually `null`.\n",
+      "`null>='1930-00-00'` is also null\n",
+      "because we can't answer to that question either.\n",
+      "And since the only records kept by a `where`\n",
+      "are those for which the test is true,\n",
+      "record #752 isn't included in either set of results.\n",
+      "\n",
+      "Comparisons aren't the only operations that behave this way with nulls.\n",
+      "`1+null` is `null`,\n",
+      "`5*null` is `null`,\n",
+      "`log(null)` is `null`,\n",
+      "and so on.\n",
+      "In particular,\n",
+      "comparing things to null with = and != produces null:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated=NULL;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c46d0>"
+       ]
+      }
+     ],
+     "prompt_number": 5
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated!=NULL;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 6
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "To check whether a value is `null` or not,\n",
+      "we must use a special test `is null`:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated is NULL;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c46d0>"
+       ]
+      }
+     ],
+     "prompt_number": 7
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "or its inverse `is not null`:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Visited where dated is not NULL;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 8
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Null values cause headaches wherever they appear.\n",
+      "For example,\n",
+      "suppose we want to find the all of salinity measurements\n",
+      "that weren't taken by Dyer.\n",
+      "It's natural to write the query like this:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Survey where quant='sal' and person!='lake';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>619</td><td>dyer</td><td>sal</td><td>0.13</td></tr>\n",
+        "<tr><td>622</td><td>dyer</td><td>sal</td><td>0.09</td></tr>\n",
+        "<tr><td>752</td><td>roe</td><td>sal</td><td>41.6</td></tr>\n",
+        "<tr><td>837</td><td>roe</td><td>sal</td><td>22.5</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c46d0>"
+       ]
+      }
+     ],
+     "prompt_number": 9
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "but this query filters omits the records\n",
+      "where we don't know who took the measurement.\n",
+      "Once again,\n",
+      "the reason is that when `person` is `null`,\n",
+      "the `!=` comparison produces `null`,\n",
+      "so the record isn't kept in our results.\n",
+      "If we want to keep these records\n",
+      "we need to add an explicit check:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Survey where quant='sal' and (person!='lake' or person is null);"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>619</td><td>dyer</td><td>sal</td><td>0.13</td></tr>\n",
+        "<tr><td>622</td><td>dyer</td><td>sal</td><td>0.09</td></tr>\n",
+        "<tr><td>735</td><td>None</td><td>sal</td><td>0.06</td></tr>\n",
+        "<tr><td>752</td><td>roe</td><td>sal</td><td>41.6</td></tr>\n",
+        "<tr><td>837</td><td>roe</td><td>sal</td><td>22.5</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 10
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We still have to decide whether this is the right thing to do or not.\n",
+      "If we want to be absolutely sure that\n",
+      "we aren't including any measurements by Lake in our results,\n",
+      "we need to exclude all the records for which we don't know who did the work."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "1.  Write a query that sorts the records in `Visited` by date,\n",
+      "    omitting entries for which the date is not known\n",
+      "    (i.e., is null).\n",
+      "\n",
+      "1.  What do you expect the query:\n",
+      "\n",
+      "    ```\n",
+      "    select * from Visited where dated in ('1927-02-08', null);\n",
+      "    ```\n",
+      "\n",
+      "    to produce?\n",
+      "    What does it actually produce?\n",
+      "\n",
+      "1.  Some database designers prefer to use\n",
+      "    a [sentinel value](../gloss.html#sentinel-value)\n",
+      "    to mark missing data rather than `null`.\n",
+      "    For example,\n",
+      "    they will use the date \"0000-00-00\" to mark a missing date,\n",
+      "    or -1.0 to mark a missing salinity or radiation reading\n",
+      "    (since actual readings cannot be negative).\n",
+      "    What does this simplify?\n",
+      "    What burdens or risks does it introduce?"
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "Next Steps"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "FIXME"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/06-agg.ipynb b/sql/novice/06-agg.ipynb
new file mode 100644 (file)
index 0000000..871bbcc
--- /dev/null
@@ -0,0 +1,468 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Aggregation"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We now want to calculate ranges and averages for our data.\n",
+      "We know how to select all of the dates from the `Visited` table:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%load_ext sqlitemagic"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select dated from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1927-02-08</td></tr>\n",
+        "<tr><td>1927-02-10</td></tr>\n",
+        "<tr><td>1939-01-07</td></tr>\n",
+        "<tr><td>1930-01-12</td></tr>\n",
+        "<tr><td>1930-02-26</td></tr>\n",
+        "<tr><td>None</td></tr>\n",
+        "<tr><td>1932-01-14</td></tr>\n",
+        "<tr><td>1932-03-22</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4ed0>"
+       ]
+      }
+     ],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "but to combine them,\n",
+      "wee must use an [aggregation function](../gloss.html#aggregation-function)\n",
+      "such as `min` or `max`.\n",
+      "Each of these functions takes a set of records as input,\n",
+      "and produces a single record as output:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select min(dated) from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1927-02-08</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4ed0>"
+       ]
+      }
+     ],
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select max(dated) from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1939-01-07</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4090>"
+       ]
+      }
+     ],
+     "prompt_number": 4
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "`min` and `max` are just two of\n",
+      "the aggregation functions built into SQL.\n",
+      "Three others are `avg`,\n",
+      "`count`,\n",
+      "and `sum`:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select avg(reading) from Survey where quant='sal';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>7.20333333333</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4090>"
+       ]
+      }
+     ],
+     "prompt_number": 7
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select count(reading) from Survey where quant='sal';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>9</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4fd0>"
+       ]
+      }
+     ],
+     "prompt_number": 8
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select sum(reading) from Survey where quant='sal';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>64.83</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4d90>"
+       ]
+      }
+     ],
+     "prompt_number": 9
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We used `count(reading)` here,\n",
+      "but we could just as easily have counted `quant`\n",
+      "or any other field in the table,\n",
+      "or even used `count(*)`,\n",
+      "since the function doesn't care about the values themselves,\n",
+      "just how many values there are.\n",
+      "\n",
+      "SQL lets us do several aggregations at once.\n",
+      "We can,\n",
+      "for example,\n",
+      "find the range of sensible salinity measurements:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select min(reading), max(reading) from Survey where quant='sal' and reading<=1.0;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>0.05</td><td>0.21</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4fd0>"
+       ]
+      }
+     ],
+     "prompt_number": 10
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We can also combine aggregated results with raw results,\n",
+      "although the output might surprise you:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select person, count(*) from Survey where quant='sal' and reading<=1.0;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>lake</td><td>7</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4d90>"
+       ]
+      }
+     ],
+     "prompt_number": 11
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Why does Lake's name appear rather than Roerich's or Dyer's?\n",
+      "The answer is that when it has to aggregate a field,\n",
+      "but isn't told how to,\n",
+      "the database manager chooses an actual value from the input set.\n",
+      "It might use the first one processed,\n",
+      "the last one,\n",
+      "or something else entirely.\n",
+      "\n",
+      "Another important fact is that when there are no values to aggregate,\n",
+      "aggregation's result is \"don't know\"\n",
+      "rather than zero or some other arbitrary value:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select person, max(reading), sum(reading) from Survey where quant='missing';"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>None</td><td>None</td><td>None</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4fd0>"
+       ]
+      }
+     ],
+     "prompt_number": 12
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "One final important feature of aggregation functions is that\n",
+      "they are inconsistent with the rest of SQL in a very useful way.\n",
+      "If we add two values,\n",
+      "and one of them is null,\n",
+      "the result is null.\n",
+      "By extension,\n",
+      "if we use `sum` to add all the values in a set,\n",
+      "and any of those values are null,\n",
+      "the result should also be null.\n",
+      "It's much more useful,\n",
+      "though,\n",
+      "for aggregation functions to ignore null values\n",
+      "and only combine those that are non-null.\n",
+      "This behavior lets us write our queries as:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select min(dated) from Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1927-02-08</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4fd0>"
+       ]
+      }
+     ],
+     "prompt_number": 13
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "instead of always having to filter explicitly:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select min(dated) from Visited where dated is not null;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1927-02-08</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4090>"
+       ]
+      }
+     ],
+     "prompt_number": 14
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### Challenges\n",
+      "\n",
+      "1.  How many temperature readings did Frank Pabodie record,\n",
+      "    and what was their average value?\n",
+      "\n",
+      "2.  The average of a set of values is the sum of the values\n",
+      "    divided by the number of values.\n",
+      "    Does this mean that the `avg` function returns 2.0 or 3.0\n",
+      "    when given the values 1.0, `null`, and 5.0?\n",
+      "\n",
+      "3.  We want to calculate the difference between\n",
+      "    each individual radiation reading\n",
+      "    and the average of all the radiation readings.\n",
+      "    We write the query:\n",
+      "\n",
+      "    ```\n",
+      "    select reading-avg(reading) from Survey where quant='rad';\n",
+      "    ```\n",
+      "\n",
+      "    What does this actually produce, and why?\n",
+      "\n",
+      "4.  The function `group_concat(field, separator)`\n",
+      "    concatenates all the values in a field\n",
+      "    using the specified separator character\n",
+      "    (or ',' if the separator isn't specified).\n",
+      "    Use this to produce a one-line list of scientists' names,\n",
+      "    such as:\n",
+      "\n",
+      "    ```\n",
+      "    William Dyer, Frank Pabodie, Anderson Lake, Valentina Roerich, Frank Danforth\n",
+      "    ```\n",
+      "\n",
+      "    Can you find a way to order the list by surname?"
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "Next Steps"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "FIXME"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/07-join.ipynb b/sql/novice/07-join.ipynb
new file mode 100644 (file)
index 0000000..7510ba3
--- /dev/null
@@ -0,0 +1,386 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Combining Data"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "In order to submit her data to a web site\n",
+      "that aggregates historical meteorological data,\n",
+      "Gina needs to format it as\n",
+      "latitude, longitude, date, quantity, and reading.\n",
+      "However,\n",
+      "her latitudes and longitudes are in the `Site` table,\n",
+      "while the dates of measurements are in the `Visited` table\n",
+      "and the readings themselves are in the `Survey` table.\n",
+      "She needs to combine these tables somehow.\n",
+      "\n",
+      "The SQL command to do this is `join`.\n",
+      "To see how it works,\n",
+      "let's start by joining the `Site` and `Visited` tables:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%load_ext sqlitemagic"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Site join Visited;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "`join` creates\n",
+      "the [cross product](../gloss.html#cross-product)\n",
+      "of two tables,\n",
+      "i.e.,\n",
+      "it joins each record of one with each record of the other\n",
+      "to give all possible combinations.\n",
+      "Since there are three records in `Site`\n",
+      "and eight in `Visited`,\n",
+      "the join's output has 24 records.\n",
+      "And since each table has three fields,\n",
+      "the output has six fields.\n",
+      "  \n",
+      "What the join *hasn't* done is\n",
+      "figure out if the records being joined have anything to do with each other.\n",
+      "It has no way of knowing whether they do or not until we tell it how.\n",
+      "To do that,\n",
+      "we add a clause specifying that\n",
+      "we're only interested in combinations that have the same site name:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select * from Site join Visited on Site.name=Visited.site;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>619</td><td>DR-1</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>622</td><td>DR-1</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>DR-1</td><td>-49.85</td><td>-128.57</td><td>844</td><td>DR-1</td><td>1932-03-22</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>734</td><td>DR-3</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>735</td><td>DR-3</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>751</td><td>DR-3</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>DR-3</td><td>-47.15</td><td>-126.72</td><td>752</td><td>DR-3</td><td>None</td></tr>\n",
+        "<tr><td>MSK-4</td><td>-48.87</td><td>-123.4</td><td>837</td><td>MSK-4</td><td>1932-01-14</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4090>"
+       ]
+      }
+     ],
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "`on` does the same job as `where`:\n",
+      "it only keeps records that pass some test.\n",
+      "(The difference between the two is that `on` filters records\n",
+      "as they're being created,\n",
+      "while `where` waits until the join is done\n",
+      "and then does the filtering.)\n",
+      "Once we add this to our query,\n",
+      "the database manager throws away records\n",
+      "that combined information about two different sites,\n",
+      "leaving us with just the ones we want.\n",
+      "  \n",
+      "Notice that we used `table.field` to specify field names\n",
+      "in the output of the join.\n",
+      "We do this because tables can have fields with the same name,\n",
+      "and we need to be specific which ones we're talking about.\n",
+      "For example,\n",
+      "if we joined the `person` and `visited` tables,\n",
+      "the result would inherit a field called `ident`\n",
+      "from each of the original tables.\n",
+      "\n",
+      "We can now use the same dotted notation\n",
+      "to select the three columns we actually want\n",
+      "out of our join:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select Site.lat, Site.long, Visited.dated\n",
+      "from   Site join Visited\n",
+      "on     Site.name=Visited.site;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-08</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-10</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1932-03-22</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>None</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-01-12</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-02-26</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1939-01-07</td></tr>\n",
+        "<tr><td>-48.87</td><td>-123.4</td><td>1932-01-14</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c3890>"
+       ]
+      }
+     ],
+     "prompt_number": 4
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "If joining two tables is good,\n",
+      "joining many tables must be better.\n",
+      "In fact,\n",
+      "we can join any number of tables\n",
+      "simply by adding more `join` clauses to our query,\n",
+      "and more `on` tests to filter out combinations of records\n",
+      "that don't make sense:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select Site.lat, Site.long, Visited.dated, Survey.quant, Survey.reading\n",
+      "from   Site join Visited join Survey\n",
+      "on     Site.name=Visited.site\n",
+      "and    Visited.ident=Survey.taken\n",
+      "and    Visited.dated is not null;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-08</td><td>rad</td><td>9.82</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-08</td><td>sal</td><td>0.13</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-10</td><td>rad</td><td>7.8</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1927-02-10</td><td>sal</td><td>0.09</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1939-01-07</td><td>rad</td><td>8.41</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1939-01-07</td><td>sal</td><td>0.05</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1939-01-07</td><td>temp</td><td>-21.5</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-01-12</td><td>rad</td><td>7.22</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-01-12</td><td>sal</td><td>0.06</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-01-12</td><td>temp</td><td>-26.0</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-02-26</td><td>rad</td><td>4.35</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-02-26</td><td>sal</td><td>0.1</td></tr>\n",
+        "<tr><td>-47.15</td><td>-126.72</td><td>1930-02-26</td><td>temp</td><td>-18.5</td></tr>\n",
+        "<tr><td>-48.87</td><td>-123.4</td><td>1932-01-14</td><td>rad</td><td>1.46</td></tr>\n",
+        "<tr><td>-48.87</td><td>-123.4</td><td>1932-01-14</td><td>sal</td><td>0.21</td></tr>\n",
+        "<tr><td>-48.87</td><td>-123.4</td><td>1932-01-14</td><td>sal</td><td>22.5</td></tr>\n",
+        "<tr><td>-49.85</td><td>-128.57</td><td>1932-03-22</td><td>rad</td><td>11.25</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 5
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We can tell which records from `Site`, `Visited`, and `Survey`\n",
+      "correspond with each other\n",
+      "because those tables contain\n",
+      "[primary keys](../gloss.html#primary-key)\n",
+      "and [foreign keys](../gloss.html#foreign-key).\n",
+      "A primary key is a value,\n",
+      "or combination of values,\n",
+      "that uniquely identifies each record in a table.\n",
+      "A foreign key is a value (or combination of values) from one table\n",
+      "that identifies a unique record in another table.\n",
+      "Another way of saying this is that\n",
+      "a foreign key is the primary key of one table\n",
+      "that appears in some other table.\n",
+      "In our database,\n",
+      "`Person.ident` is the primary key in the `Person` table,\n",
+      "while `Survey.person` is a foreign key\n",
+      "relating the `Survey` table's entries\n",
+      "to entries in `Person`.\n",
+      "\n",
+      "Most database designers believe that\n",
+      "every table should have a well-defined primary key.\n",
+      "They also believe that this key should be separate from the data itself,\n",
+      "so that if we ever need to change the data,\n",
+      "we only need to make one change in one place.\n",
+      "One easy way to do this is\n",
+      "to create an arbitrary, unique ID for each record\n",
+      "as we add it to the database.\n",
+      "This is actually very common:\n",
+      "those IDs have names like \"student numbers\" and \"patient numbers\",\n",
+      "and they almost always turn out to have originally been\n",
+      "a unique record identifier in some database system or other.\n",
+      "As the query below demonstrates,\n",
+      "SQLite automatically numbers records as they're added to tables,\n",
+      "and we can use those record numbers in queries:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "%%sqlite survey.db\n",
+      "select rowid, * from Person;"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<table>\n",
+        "<tr><td>1</td><td>dyer</td><td>William</td><td>Dyer</td></tr>\n",
+        "<tr><td>2</td><td>pb</td><td>Frank</td><td>Pabodie</td></tr>\n",
+        "<tr><td>3</td><td>lake</td><td>Anderson</td><td>Lake</td></tr>\n",
+        "<tr><td>4</td><td>roe</td><td>Valentina</td><td>Roerich</td></tr>\n",
+        "<tr><td>5</td><td>danforth</td><td>Frank</td><td>Danforth</td></tr>\n",
+        "</table>"
+       ],
+       "metadata": {},
+       "output_type": "display_data",
+       "text": [
+        "<IPython.core.display.HTML at 0x1023c4550>"
+       ]
+      }
+     ],
+     "prompt_number": 6
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### Challenges\n",
+      "\n",
+      "1.  Write a query that lists all radiation readings from the DR-1 site.\n",
+      "\n",
+      "2.  Write a query that lists all sites visited by people named \"Frank\".\n",
+      "\n",
+      "3.  Describe in your own words what the following query produces:\n",
+      "\n",
+      "    ```\n",
+      "    select Site.name from Site join Visited\n",
+      "    on Site.lat<-49.0 and Site.name=Visited.site and Visited.dated>='1932-00-00';\n",
+      "    ```\n",
+      "\n",
+      "4.  Why does the `Person` table have an `ident` field?\n",
+      "    Why do we not just use scientists' names in the `Survey` table?\n",
+      "\n",
+      "5.  Why does the table `Site` exist?\n",
+      "    Why don't we just record latitudes and longitudes\n",
+      "    directly in the `Visited` and `Survey` tables?"
+     ]
+    },
+    {
+     "cell_type": "heading",
+     "level": 2,
+     "metadata": {},
+     "source": [
+      "Next Steps"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "FIXME"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/08-create.ipynb b/sql/novice/08-create.ipynb
new file mode 100644 (file)
index 0000000..d463ae2
--- /dev/null
@@ -0,0 +1,188 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Creating and Modifying Data"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "So far we have only looked at how to get information out of a database,\n",
+      "both because that is more frequent than adding information,\n",
+      "and because most other operations only make sense\n",
+      "once queries are understood.\n",
+      "If we want to create and modify data,\n",
+      "we need to know two other pairs of commands.\n",
+      "  \n",
+      "The first pair are `create table` and `drop table`.\n",
+      "While they are written as two words,\n",
+      "they are actually single commands.\n",
+      "The first one creates a new table;\n",
+      "its arguments are the names and types of the table's columns.\n",
+      "For example,\n",
+      "the following statements create the four tables in our survey database:\n",
+      "\n",
+      "    create table Person(ident text, personal text, family text);\n",
+      "    create table Site(name text, lat real, long real);\n",
+      "    create table Visited(ident integer, site text, dated text);\n",
+      "    create table Survey(taken integer, person text, quant real, reading real);\n",
+      "\n",
+      "We can get rid of one of our tables using:\n",
+      "  \n",
+      "    drop table Survey;\n",
+      "\n",
+      "Be very careful when doing this:\n",
+      "most databases have some support for undoing changes,\n",
+      "but it's better not to have to rely on it.\n",
+      "  \n",
+      "Different database systems support different data types for table columns,\n",
+      "but most provide the following:\n",
+      "\n",
+      "<table>\n",
+      "  <tr> <td>integer</td> <td>a signed integer</td> </tr>\n",
+      "  <tr> <td>real</td> <td>a floating point number</td> </tr>\n",
+      "  <tr> <td>text</td> <td>a character string</td> </tr>\n",
+      "  <tr> <td>blob</td> <td>a \"binary large object\", such as an image</td> </tr>\n",
+      "</table>\n",
+      "\n",
+      "Most databases also support Booleans and date/time values;\n",
+      "SQLite uses the integers 0 and 1 for the former,\n",
+      "and represents the latter as discussed [earlier](#a:dates).\n",
+      "An increasing number of databases also support geographic data types,\n",
+      "such as latitude and longitude.\n",
+      "Keeping track of what particular systems do or do not offer,\n",
+      "and what names they give different data types,\n",
+      "is an unending portability headache.\n",
+      "  \n",
+      "When we create a table,\n",
+      "we can specify several kinds of constraints on its columns.\n",
+      "For example,\n",
+      "a better definition for the `Survey` table would be:\n",
+      "  \n",
+      "    create table Survey(\n",
+      "        taken   integer not null, -- where reading taken\n",
+      "        person  text,             -- may not know who took it\n",
+      "        quant   real not null,    -- the quantity measured\n",
+      "        reading real not null,    -- the actual reading\n",
+      "        primary key(taken, quant),\n",
+      "        foreign key(taken) references Visited(ident),\n",
+      "        foreign key(person) references Person(ident)\n",
+      "    );\n",
+      "\n",
+      "Once again,\n",
+      "exactly what constraints are avialable\n",
+      "and what they're called\n",
+      "depends on which database manager we are using.\n",
+      "  \n",
+      "Once tables have been created,\n",
+      "we can add and remove records using our other pair of commands,\n",
+      "`insert` and `delete`.\n",
+      "The simplest form of `insert` statement lists values in order:\n",
+      "\n",
+      "    insert into Site values('DR-1', -49.85, -128.57);\n",
+      "    insert into Site values('DR-3', -47.15, -126.72);\n",
+      "    insert into Site values('MSK-4', -48.87, -123.40);\n",
+      "\n",
+      "We can also insert values into one table directly from another:\n",
+      "  \n",
+      "    create table JustLatLong(lat text, long TEXT);\n",
+      "    insert into JustLatLong select lat, long from site;\n",
+      "\n",
+      "Deleting records can be a bit trickier,\n",
+      "because we have to ensure that the database remains internally consistent.\n",
+      "If all we care about is a single table,\n",
+      "we can use the `delete` command with a `where` clause\n",
+      "that matches the records we want to discard.\n",
+      "For example,\n",
+      "once we realize that Frank Danforth didn't take any measurements,\n",
+      "we can remove him from the `Person` table like this:\n",
+      "  \n",
+      "    delete from Person where ident = \"danforth\";\n",
+      "\n",
+      "But what if we removed Anderson Lake instead?\n",
+      "Our `Survey` table would still contain seven records\n",
+      "of measurements he'd taken,\n",
+      "but that's never supposed to happen:\n",
+      "`Survey.person` is a foreign key into the `Person` table,\n",
+      "and all our queries assume there will be a row in the latter\n",
+      "matching every value in the former.\n",
+      "  \n",
+      "This problem is called [referential integrity](../gloss.html#referential-integrity):\n",
+      "we need to ensure that all references between tables can always be resolved correctly.\n",
+      "One way to do this is to delete all the records\n",
+      "that use `'lake'` as a foreign key\n",
+      "before deleting the record that uses it as a primary key.\n",
+      "If our database manager supports it,\n",
+      "we can automate this\n",
+      "using [cascading delete](../gloss.html#cascading-delete).\n",
+      "However,\n",
+      "this technique is outside the scope of this chapter.\n",
+      "\n",
+      "> Many applications use a hybrid storage model\n",
+      "> instead of putting everything into a database:\n",
+      "> the actual data (such as astronomical images) is stored in files,\n",
+      "> while the database stores the files' names,\n",
+      "> their modification dates,\n",
+      "> the region of the sky they cover,\n",
+      "> their spectral characteristics,\n",
+      "> and so on.\n",
+      "> This is also how most music player software is built:\n",
+      "> the database inside the application keeps track of the MP3 files,\n",
+      "> but the files themselves live on disk."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### Challenges\n",
+      "\n",
+      "1.  Write an SQL statement to replace all uses of `null`\n",
+      "    in `Survey.person`\n",
+      "    with the string `'unknown'`.\n",
+      "\n",
+      "2.  One of our colleagues has sent us a [CSV](../gloss.html#csv) file\n",
+      "    containing temperature readings by Robert Olmstead,\n",
+      "    which is formatted like this:\n",
+      "\n",
+      "    ```\n",
+      "    Taken,Temp\n",
+      "    619,-21.5\n",
+      "    622,-15.5\n",
+      "    ```\n",
+      "\n",
+      "    Write a small Python program that reads this file in\n",
+      "    and prints out the SQL `insert` statements needed\n",
+      "    to add these records to the survey database.\n",
+      "    Note: you will need to add an entry for Olmstead\n",
+      "    to the `Person` table.\n",
+      "    If you are testing your program repeatedly,\n",
+      "    you may want to investigate SQL's `insert or replace` command.\n",
+      "\n",
+      "3.  SQLite has several administrative commands that aren't part of the SQL standard.\n",
+      "    One of them is `.dump`,\n",
+      "    which prints the SQL commands needed to re-create the database.\n",
+      "    Another is `.load`,\n",
+      "    which reads a file created by `.dump` and restores the database.\n",
+      "    A colleague of yours thinks that storing dump files (which are text) in version control\n",
+      "    is a good way to track and manage changes to the database.\n",
+      "    What are the pros and cons of this approach?"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/09-hygiene.ipynb b/sql/novice/09-hygiene.ipynb
new file mode 100644 (file)
index 0000000..9b607e3
--- /dev/null
@@ -0,0 +1,29 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Data Hygiene"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "FIXME"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file
diff --git a/sql/novice/10-prog.ipynb b/sql/novice/10-prog.ipynb
new file mode 100644 (file)
index 0000000..9681a14
--- /dev/null
@@ -0,0 +1,262 @@
+{
+ "metadata": {
+  "name": ""
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "heading",
+     "level": 1,
+     "metadata": {},
+     "source": [
+      "Programming with Databases"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "To close,\n",
+      "let's have a look at how to access a database from\n",
+      "a general-purpose programming language like Python.\n",
+      "Other languages use almost exactly the same model:\n",
+      "library and function names may differ,\n",
+      "but the concepts are the same.\n",
+      "\n",
+      "Here's a short Python program that selects latitudes and longitudes\n",
+      "from an SQLite database stored in a file called `survey.db`:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "import sqlite3\n",
+      "connection = sqlite3.connect(\"survey.db\")\n",
+      "cursor = connection.cursor()\n",
+      "cursor.execute(\"select site.lat, site.long from site;\")\n",
+      "results = cursor.fetchall()\n",
+      "for r in results:\n",
+      "    print r\n",
+      "cursor.close()\n",
+      "connection.close()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "output_type": "stream",
+       "stream": "stdout",
+       "text": [
+        "(-49.85, -128.57)\n",
+        "(-47.15, -126.72)\n",
+        "(-48.87, -123.4)\n"
+       ]
+      }
+     ],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The program starts by importing the `sqlite3` library.\n",
+      "If we were connecting to MySQL, DB2, or some other database,\n",
+      "we would import a different library,\n",
+      "but all of them provide the same functions,\n",
+      "so that the rest of our program does not have to change\n",
+      "(at least, not much)\n",
+      "if we switch from one database to another.\n",
+      "\n",
+      "Line 2 establishes a connection to the database.\n",
+      "Since we're using SQLite,\n",
+      "all we need to specify is the name of the database file.\n",
+      "Other systems may require us to provide a username and password as well.\n",
+      "Line 3 then uses this connection to create\n",
+      "a [cursor](../gloss.html#cursor);\n",
+      "just like the cursor in an editor,\n",
+      "its role is to keep track of where we are in the database.\n",
+      "\n",
+      "On line 4, we use that cursor to ask the database to execute a query for us.\n",
+      "The query is written in SQL,\n",
+      "and passed to `cursor.execute` as a string.\n",
+      "It's our job to make sure that SQL is properly formatted;\n",
+      "if it isn't,\n",
+      "or if something goes wrong when it is being executed,\n",
+      "the database will report an error.\n",
+      "\n",
+      "The database returns the results of the query to us\n",
+      "in response to the `cursor.fetchall` call on line 5.\n",
+      "This result is a list with one entry for each record in the result set;\n",
+      "if we loop over that list (line 6) and print those list entries (line 7),\n",
+      "we can see that each one is a tuple\n",
+      "with one element for each field we asked for.\n",
+      "\n",
+      "Finally, lines 8 and 9 close our cursor and our connection,\n",
+      "since the database can only keep a limited number of these open at one time.\n",
+      "Since establishing a connection takes time,\n",
+      "though,\n",
+      "we shouldn't open a connection,\n",
+      "do one operation,\n",
+      "then close the connection,\n",
+      "only to reopen it a few microseconds later to do another operation.\n",
+      "Instead,\n",
+      "it's normal to create one connection that stays open for the lifetime of the program."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Queries in real applications will often depend on values provided by users.\n",
+      "For example,\n",
+      "this function takes a user's ID as a parameter and returns their name:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def get_name(database_file, person_ident):\n",
+      "    query = \"select personal || ' ' || family from Person where ident='\" + person_ident + \"';\"\n",
+      "\n",
+      "    connection = sqlite3.connect(database_file)\n",
+      "    cursor = connection.cursor()\n",
+      "    cursor.execute(query)\n",
+      "    results = cursor.fetchall()\n",
+      "    cursor.close()\n",
+      "    connection.close()\n",
+      "\n",
+      "    return results[0][0]\n",
+      "\n",
+      "print \"full name for dyer:\", get_name('survey.db', 'dyer')"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "output_type": "stream",
+       "stream": "stdout",
+       "text": [
+        "full name for dyer: William Dyer\n"
+       ]
+      }
+     ],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We use string concatenation on the first line of this function\n",
+      "to construct a query containing the user ID we have been given.\n",
+      "This seems simple enough,\n",
+      "but what happens if someone gives us this string as input?\n",
+      "\n",
+      "    dyer'; drop table Survey; select '\n",
+      "\n",
+      "It looks like there's garbage after the name of the project,\n",
+      "but it is very carefully chosen garbage.\n",
+      "If we insert this string into our query,\n",
+      "the result is:\n",
+      "\n",
+      "    select personal || ' ' || family from Person where ident='dyer'; drop tale Survey; select '';\n",
+      "\n",
+      "If we execute this,\n",
+      "it will erase one of the tables in our database.\n",
+      "  \n",
+      "This technique is called [SQL injection](../gloss.html#sql-injection),\n",
+      "and it has been used to attack thousands of programs over the years.\n",
+      "In particular,\n",
+      "many web sites that take data from users insert values directly into queries\n",
+      "without checking them carefully first.\n",
+      "  \n",
+      "Since a villain might try to smuggle commands into our queries in many different ways,\n",
+      "the safest way to deal with this threat is\n",
+      "to replace characters like quotes with their escaped equivalents,\n",
+      "so that we can safely put whatever the user gives us inside a string.\n",
+      "We can do this by using a [prepared statement](../gloss.html#prepared-statement)\n",
+      "instead of formatting our statements as strings.\n",
+      "Here's what our example program looks like if we do this:"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "def get_name(database_file, person_ident):\n",
+      "    query = \"select personal || ' ' || family from Person where ident=?;\"\n",
+      "\n",
+      "    connection = sqlite3.connect(database_file)\n",
+      "    cursor = connection.cursor()\n",
+      "    cursor.execute(query, [person_ident])\n",
+      "    results = cursor.fetchall()\n",
+      "    cursor.close()\n",
+      "    connection.close()\n",
+      "\n",
+      "    return results[0][0]\n",
+      "\n",
+      "print \"full name for dyer:\", get_name('survey.db', 'dyer')"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "output_type": "stream",
+       "stream": "stdout",
+       "text": [
+        "full name for dyer: William Dyer\n"
+       ]
+      }
+     ],
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The key changes are in the query string and the `execute` call.\n",
+      "Instead of formatting the query ourselves,\n",
+      "we put question marks in the query template where we want to insert values.\n",
+      "When we call `execute`,\n",
+      "we provide a list\n",
+      "that contains as many values as there are question marks in the query.\n",
+      "The library matches values to question marks in order,\n",
+      "and translates any special characters in the values\n",
+      "into their escaped equivalents\n",
+      "so that they are safe to use."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### Challenges\n",
+      "\n",
+      "1.  Write a Python program that creates a new database\n",
+      "    in a file called `original.db`\n",
+      "    containing a single table called `Pressure`,\n",
+      "    with a single field called `reading`,\n",
+      "    and inserts 100,000 random numbers between 10.0 and 25.0.\n",
+      "    How long does it take this program to run?\n",
+      "    How long does it take to run a program\n",
+      "    that simply writes those random numbers to a file?\n",
+      "\n",
+      "2.  Write a Python program that creates a new database\n",
+      "    called `backup.db`\n",
+      "    with the same structure as `original.db`\n",
+      "    and copies all the values greater than 20.0\n",
+      "    from `original.db` to `backup.db`.\n",
+      "    Which is faster:\n",
+      "    filtering values in the query,\n",
+      "    or reading everything into memory and filtering in Python?"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
\ No newline at end of file