From: Greg Wilson Date: Fri, 7 Feb 2014 17:05:02 +0000 (-0500) Subject: Preparing material for Version 5 novice release X-Git-Url: http://git.tremily.us/?a=commitdiff_plain;h=050d8917197b3a78b88811de37c7ecac43bba0ba;p=swc-boot-camp.git Preparing material for Version 5 novice release W. Trevor King: I removed everything except the sql/novice changes from the original b66851c [1]. [1]: https://github.com/swcarpentry/bc/commit/b66851c2b765440e5c7ccb23383aff15eec31ff2 --- diff --git a/sql/novice/.gitignore b/sql/novice/.gitignore new file mode 100644 index 0000000..b22cd77 --- /dev/null +++ b/sql/novice/.gitignore @@ -0,0 +1,9 @@ +01-select.md +02-sort-dup.md +03-filter.md +04-calc.md +05-null.md +06-agg.md +07-join.md +08-create.md +09-prog.md diff --git a/sql/novice/01-select.ipynb b/sql/novice/01-select.ipynb index 4455598..85f074a 100644 --- a/sql/novice/01-select.ipynb +++ b/sql/novice/01-select.ipynb @@ -9,7 +9,7 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Selecting Data" @@ -45,9 +45,24 @@ "and these lessons will show how to do that." ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Explain the difference between a table, a record, and a field.\n", + "* Explain the difference between a database and a database manager.\n", + "* Write a query to select all values for specific fields from a single table." + ] + }, { "cell_type": "heading", - "level": 2, + "level": 3, "metadata": {}, "source": [ "A Few Definitions" @@ -127,7 +142,7 @@ " 734 DR-3 1939-01-07 \n", " 735 DR-3 1930-01-12 \n", " 751 DR-3 1930-02-26 \n", - " 752 DR-3 NULL \n", + " 752 DR-3   \n", " 837 MSK-4 1932-01-14 \n", " 844 DR-1 1932-03-22 \n", "\n", @@ -145,8 +160,8 @@ " 734 lake sal 0.05 \n", " 734 pb temp -21.5 \n", " 735 pb rad 7.22 \n", - " 735 NULL sal 0.06 \n", - " 735 NULL temp -26.0 \n", + " 735   sal 0.06 \n", + " 735   temp -26.0 \n", " 751 pb rad 4.35 \n", " 751 pb temp -18.5 \n", " 751 lake sal 0.1 \n", @@ -169,8 +184,9 @@ "metadata": {}, "source": [ "Notice that three entries—one in the `Visited` table,\n", - "and two in the `Survey` table—are shown as `NULL`.\n", - "We'll return to these values [later](#s:null).\n", + "and two in the `Survey` table—are shown in red\n", + "because they don't contain any actual data:\n", + "we'll return to these missing values [later](#s:null).\n", "For now,\n", "let's write an SQL query that displays scientists' names.\n", "We do this using the SQL command `select`,\n", @@ -212,7 +228,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -254,11 +270,11 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], - "prompt_number": 5 + "prompt_number": 3 }, { "cell_type": "markdown", @@ -305,11 +321,11 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], - "prompt_number": 6 + "prompt_number": 4 }, { "cell_type": "markdown", @@ -341,11 +357,11 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], - "prompt_number": 7 + "prompt_number": 5 }, { "cell_type": "markdown", @@ -378,17 +394,17 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], - "prompt_number": 8 + "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. Write a query that selects only site names from the `Site` table.\n", "\n", @@ -406,6 +422,23 @@ "\n", " What style do you find easiest to read, and why?" ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* A relational database stores information in tables,\n", + " each of which has a fixed set of columns and a variable number of records.\n", + "* A database manager is a program that manipulates information stored in a database.\n", + "* We write queries in a specialized language called SQL to extract information from databases.\n", + "* SQL is case-insensitive." + ] } ], "metadata": {} diff --git a/sql/novice/02-sort-dup.ipynb b/sql/novice/02-sort-dup.ipynb index f09cdd0..8ffe0db 100644 --- a/sql/novice/02-sort-dup.ipynb +++ b/sql/novice/02-sort-dup.ipynb @@ -9,12 +9,26 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Sorting and Removing Duplicates" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Write queries that display results in a particular order.\n", + "* Write queries that eliminate duplicate values from data." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -77,7 +91,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -114,7 +128,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -166,7 +180,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -187,7 +201,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. Write a query that selects distinct dates from the `Site` table." ] @@ -228,7 +242,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -268,7 +282,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -327,7 +341,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -371,7 +385,7 @@ "metadata": {}, "output_type": "display_data", "text": [ - "" + "" ] } ], @@ -381,9 +395,29 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", + "\n", + "1. Write a query that returns the distinct dates in the `Visited` table.\n", + "\n", + "2. Write a query that displays the full names of the scientists in the `Person` table, ordered by family name." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", "\n", - "1. Write a query that displays scientists' full names, but orders them by surname." + "* The records in a database table are not intrinsically ordered:\n", + " if we want to display them in some order,\n", + " we must specify that explicitly.\n", + "* The values in a database are not guaranteed to be unique:\n", + " if we want to eliminate duplicates,\n", + " we must specify that explicitly as well." ] } ], diff --git a/sql/novice/03-filter.ipynb b/sql/novice/03-filter.ipynb index c9a3e95..f06d40e 100644 --- a/sql/novice/03-filter.ipynb +++ b/sql/novice/03-filter.ipynb @@ -9,12 +9,26 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Filtering" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Write queries that select records that satisfy user-specified conditions.\n", + "* Explain the order in which the clauses in a query are executed." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -417,13 +431,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", - "1. Suppose we want to select all sites that lie within 30° of the equator.\n", + "1. Suppose we want to select all sites that lie more than 30° from the poles.\n", " Our first query is:\n", "\n", " ~~~\n", - " select * from Site where (lat > -30) or (lat < 30);\n", + " select * from Site where (lat > -60) or (lat < 60);\n", " ~~~\n", "\n", " Explain why this is wrong,\n", @@ -445,7 +459,7 @@ " 'a' like '%a' True \n", " 'b' like '%a' False \n", " 'alpha' like 'a%' True \n", - " 'alpha' like 'a%p%' True \n", + " 'alpha' like 'a%p%' True \n", " \n", " The expression `*column-name* not like *pattern*`\n", " inverts the test.\n", @@ -453,6 +467,21 @@ " write a query that finds all the records in `Visited`\n", " that *aren't* from sites labelled 'DR-something'." ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* Use `where` to filter records according to Boolean conditions.\n", + "* Filtering is done on whole records,\n", + " so conditions can use fields that are not actually displayed." + ] } ], "metadata": {} diff --git a/sql/novice/04-calc.ipynb b/sql/novice/04-calc.ipynb index 79c64cb..ad8cbf4 100644 --- a/sql/novice/04-calc.ipynb +++ b/sql/novice/04-calc.ipynb @@ -9,12 +9,25 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Calculating New Values" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Write queries that calculate new values for each selected record." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -182,7 +195,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. After further reading,\n", " we realize that Valentina Roerich\n", @@ -290,6 +303,19 @@ "(For this data,\n", "the list should contain only \"DR\" and \"MSK\")." ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* SQL can perform calculations using the values in a record as part of a query." + ] } ], "metadata": {} diff --git a/sql/novice/05-null.ipynb b/sql/novice/05-null.ipynb index 5173fb3..0b0f3c3 100644 --- a/sql/novice/05-null.ipynb +++ b/sql/novice/05-null.ipynb @@ -9,12 +9,27 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Missing Data" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Explain how databases represent missing information.\n", + "* Explain the three-valued logic databases use when manipulating missing information.\n", + "* Write queries that handle missing information correctly." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -304,7 +319,7 @@ "source": [ "Null values cause headaches wherever they appear.\n", "For example,\n", - "suppose we want to find the all of salinity measurements\n", + "suppose we want to find all the salinity measurements\n", "that weren't taken by Dyer.\n", "It's natural to write the query like this:" ] @@ -394,6 +409,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "#### Challenges\n", + "\n", "1. Write a query that sorts the records in `Visited` by date,\n", " omitting entries for which the date is not known\n", " (i.e., is null).\n", @@ -417,6 +434,21 @@ " What does this simplify?\n", " What burdens or risks does it introduce?" ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* Databases use `null` to represent missing information.\n", + "* Any arithmetic or Boolean operation involving `null` produces `null` as a result.\n", + "* The only operators that can safely be used with `null` are `is null` and `is not null`." + ] } ], "metadata": {} diff --git a/sql/novice/06-agg.ipynb b/sql/novice/06-agg.ipynb index 4982a97..f54e01f 100644 --- a/sql/novice/06-agg.ipynb +++ b/sql/novice/06-agg.ipynb @@ -9,12 +9,28 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Aggregation" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Define \"aggregation\" and give examples of its use.\n", + "* Write queries that compute aggregated values.\n", + "* Trace the execution of a query that performs aggregation.\n", + "* Explain how missing data is handled during aggregation." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -672,7 +688,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. How many temperature readings did Frank Pabodie record,\n", " and what was their average value?\n", @@ -688,7 +704,7 @@ " We write the query:\n", "\n", " ~~~\n", - " select reading-avg(reading) from Survey where quant='rad';\n", + " select reading - avg(reading) from Survey where quant='rad';\n", " ~~~\n", "\n", " What does this actually produce, and why?\n", @@ -706,6 +722,21 @@ "\n", " Can you find a way to order the list by surname?" ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* An aggregation function combines many values to produce a single new value.\n", + "* Aggregation functions ignore `null` values.\n", + "* Aggregation happens after filtering." + ] } ], "metadata": {} diff --git a/sql/novice/07-join.ipynb b/sql/novice/07-join.ipynb index a5afa8f..93761c5 100644 --- a/sql/novice/07-join.ipynb +++ b/sql/novice/07-join.ipynb @@ -9,12 +9,29 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Combining Data" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Explain the operation of a query that joins two tables.\n", + "* Explain how to restrict the output of a query containing a join to only include meaningful combinations of values.\n", + "* Write queries that join tables on equal keys.\n", + "* Explain what primary and foreign keys are, and why they are useful.\n", + "* Explain what atomic values are, and why database fields should only contain atomic values." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -341,7 +358,7 @@ }, { "cell_type": "heading", - "level": 2, + "level": 3, "metadata": {}, "source": [ "Data Hygiene" @@ -392,21 +409,30 @@ "we have to change multiple records in the database.\n", "What's worse,\n", "we may have to guess which records to change,\n", - "since some other sites may actually have been visited on that date.\n", + "since other sites may also have been visited on that date.\n", "\n", "The fourth rule is that the units for every value should be stored explicitly.\n", "Our database doesn't do this,\n", "and that's a problem:\n", "Roerich's salinity measurements are several orders of magnitude larger than anyone else's,\n", "but we don't know if that means she was using parts per million instead of parts per thousand,\n", - "or whether there actually was a saline anomaly at that site in 1932." + "or whether there actually was a saline anomaly at that site in 1932.\n", + "\n", + "Stepping back,\n", + "data and the tools used to store it have a symbiotic relationship:\n", + "we use tables and joins because it's efficient,\n", + "provided our data is organized a certain way,\n", + "but organize our data that way because we have tools to manipulate it efficiently\n", + "if it's in a certain form.\n", + "As anthropologists say,\n", + "the tool shapes the hand that shapes the tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. Write a query that lists all radiation readings from the DR-1 site.\n", "\n", @@ -419,6 +445,24 @@ " on Site.lat<-49.0 and Site.name=Visited.site and Visited.dated>='1932-00-00';\n", " ~~~" ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* Every fact should be represented in a database exactly once.\n", + "* A join produces all combinations of records from one table with records from another.\n", + "* A primary key is a field (or set of fields) whose values uniquely identify the records in a table.\n", + "* A foreign key is a field (or set of fields) in one table whose values are a primary key in another table.\n", + "* We can eliminate meaningless combinations of records by matching primary keys and foreign keys between tables.\n", + "* Keys should be atomic values to make joins simpler and more efficient." + ] } ], "metadata": {} diff --git a/sql/novice/08-create.ipynb b/sql/novice/08-create.ipynb index c695563..94da1e9 100644 --- a/sql/novice/08-create.ipynb +++ b/sql/novice/08-create.ipynb @@ -9,12 +9,26 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Creating and Modifying Data" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Write queries that creates tables.\n", + "* Write queries to insert, modify, and delete records." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -106,7 +120,7 @@ "We can also insert values into one table directly from another:\n", "\n", "~~~\n", - "create table JustLatLong(lat text, long TEXT);\n", + "create table JustLatLong(lat text, long text);\n", "insert into JustLatLong select lat, long from site;\n", "~~~\n", "\n", @@ -159,7 +173,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. Write an SQL statement to replace all uses of `null`\n", " in `Survey.person`\n", @@ -190,7 +204,23 @@ " which reads a file created by `.dump` and restores the database.\n", " A colleague of yours thinks that storing dump files (which are text) in version control\n", " is a good way to track and manage changes to the database.\n", - " What are the pros and cons of this approach?" + " What are the pros and cons of this approach?\n", + " (Hint: records aren't stored in any particular order.)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* Database tables are created using queries that specify their names and the names and properties of their fields.\n", + "* Records can be inserted, updated, or deleted using queries.\n", + "* It is simpler and safer to modify data when every record has a unique primary key." ] } ], diff --git a/sql/novice/09-prog.ipynb b/sql/novice/09-prog.ipynb index 6105f8c..63461ba 100644 --- a/sql/novice/09-prog.ipynb +++ b/sql/novice/09-prog.ipynb @@ -9,12 +9,27 @@ "cells": [ { "cell_type": "heading", - "level": 1, + "level": 2, "metadata": {}, "source": [ "Programming with Databases" ] }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "objectives" + ] + }, + "source": [ + "#### Objectives\n", + "\n", + "* Write short programs that execute SQL queries.\n", + "* Trace the execution of a program that contains an SQL query.\n", + "* Explain why most database applications are written in a general-purpose language rather than in SQL." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -238,7 +253,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Challenges\n", + "#### Challenges\n", "\n", "1. Write a Python program that creates a new database\n", " in a file called `original.db`\n", @@ -258,6 +273,22 @@ " filtering values in the query,\n", " or reading everything into memory and filtering in Python?" ] + }, + { + "cell_type": "markdown", + "metadata": { + "cell_tags": [ + "keypoints" + ] + }, + "source": [ + "#### Key Points\n", + "\n", + "* We usually write database applications in a general-purpose language, and embed SQL queries in it.\n", + "* To connect to a database, a program must use a library specific to that database manager.\n", + "* A program may open one or more connections to a single database, and have one or more cursors active in each.\n", + "* Programs can read query results in batches or all at once." + ] } ], "metadata": {} diff --git a/sql/novice/README.txt b/sql/novice/README.txt index 1a67353..58dd4fb 100644 --- a/sql/novice/README.txt +++ b/sql/novice/README.txt @@ -1 +1,8 @@ -Note: run 'sqlite3 survey.db < gen-survey-database.sql' to re-create survey database before loading notebooks. +# Resources + +* `gen-survey-database.sql`: re-generate survey database used in examples. +* `sqlitemagic.py`: IPython Notebook plugin to support SQLite interaction. + +# Notes + +* Run 'sqlite3 survey.db < gen-survey-database.sql' to re-create survey database before loading notebooks. diff --git a/sql/novice/guide.md b/sql/novice/guide.md deleted file mode 100644 index 22fb503..0000000 --- a/sql/novice/guide.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -layout: lesson -root: ../.. -title: Instructor's Guide -level: novice ---- -Relational databases are not as widely used in science as in business, -but they are still a common way to store large data sets with complex structure. -Even when the data itself isn't in a database, -the metadata could be: -for example, -meteorological data might be stored in files on disk, -but data about when and where observations were made, -data ranges, -and so on could be in a database -to make it easier for scientists to find what they want to. - -Teaching Notes --------------- - -* The first few sections (up to "Missing Data") usually go very quickly. - The pace usually slows down a bit when null values are discussed - mostly because learners have a lot of details to keep straight by this point. - Things *really* slow down during the discussion of joins, - but this is the key idea in the whole lesson: - important ideas like primary keys and referential integrity - only make sense once learners have seen how they're used in joins. - It's worth going over things a couple of times if necessary (with lots of examples). - -* The sections on creating and modifying data, - and programming with databases, - can be dropped if time is short. - Of the two, - people seem to care most about how to add data (which only takes a few minutes to demonstrate). - - -* Overall, - this material takes three hours to present assuming that a short exercise is done with each topic. - -* Simple calculations are actually easier to do in a spreadsheet; - the advantages of using a database become clear as soon as filtering and joins are needed. - Instructors may therefore want to show a spreadsheet with the information from the four database tables - consolidated into a single sheet, - and demonstrate what's needed in both systems to answer questions like, - "What was the average radiation reading in 1931?" - -* Some learners may have heard that NoSQL databases - (i.e., ones that don't use the relational model) - are the next big thing, - and ask why we're not teaching those. - The answers are: - 1. Relational databases are far more widely used than NoSQL databases. - 2. We have far more experience with relational databases than with any other kind, - so we have a better idea of what to teach and how to teach it. - 3. NoSQL databases are as different from each other as they are from relational databases. - Until a leader emerges, it isn't clear *which* NoSQL database we should teach. - -* Run `sqlite3 survey.db < gen-survey-database.sql` - to re-create survey database before loading notebooks. diff --git a/sql/novice/index.md b/sql/novice/index.md index 1f31676..aded47f 100644 --- a/sql/novice/index.md +++ b/sql/novice/index.md @@ -21,8 +21,8 @@ knowing how they work will help us understand why so many of the systems we use behave the way we do, and why they insist on structuring data in certain ways. -Topics ------- +
+ 1. [Selecting Data](01-select.html) 2. [Sorting and Removing Duplicates](02-sort-dup.html) 3. [Filtering](03-filter.html) @@ -33,12 +33,4 @@ Topics 8. [Creating and Modifying Data](08-create.html) 9. [Programming with Databases](09-prog.html) -See Also --------- -* [Instructor's Guide](guide.html) -* [Reference](reference.html) - -Resources ---------- -* `gen-survey-database.sql`: re-generate survey database used in examples. -* `sqlitemagic.py`: IPython Notebook plugin to support SQLite interaction. +
diff --git a/sql/novice/reference.md b/sql/novice/reference.md deleted file mode 100644 index ea860f3..0000000 --- a/sql/novice/reference.md +++ /dev/null @@ -1,122 +0,0 @@ ---- -layout: lesson -root: ../.. -title: SQL Reference -level: novice ---- -Basic Queries -------------- - -Select one or more columns from a table: - - SELECT column_name_1, column_name_2 FROM table_name; - -Select all columns from a table: - - SELECT * FROM table_name; - -Get only unique results in a query: - - SELECT DISTINCT column_name FROM table_name; - -Perform calculations in a query: - - SELECT column_name_1, ROUND(column_name_2 / 1000.0) FROM table_name; - -Sort results in ascending order: - - SELECT * FROM table_name ORDER BY column_name_1; - -Sort results in ascending and descending order: - - SELECT * FROM table_name ORDER BY column_name_1 ASC, column_name_2 DESC; - -Filtering ---------- - -Select only data meeting a condition: - - SELECT * FROM table_name WHERE column_name > 0; - -Select only data meeting a combination of conditions: - - SELECT * FROM table_name WHERE (column_name_1 >= 1000) AND (column_name_2 = 'A' OR column_name_2 = 'B'); - -Missing Data ------------- - -Use `NULL` to represent missing data. - -`NULL` is not 0, false, or any other specific value. -Operations involving `NULL` produce `NULL`, so `1+NULL`, `2>NULL`, and `3=NULL` are all `NULL`. - -Test whether a value is null: - - SELECT * FROM table_name WHERE column_name IS NULL; - -Test whether a value is not null: - - SELECT * FROM table_name WHERE column_name IS NOT NULL; - -Grouping and Aggregation ------------------------- - -Combine values using aggregation functions: - - SELECT SUM(column_name_1) FROM table_name; - -Combine data into groups and calculate combined values in groups: - - SELECT column_name_1, SUM(column_name_2), COUNT(*) FROM table_name GROUP BY column_name_1; - -Joins ------ - -Join data from two tables: - - SELECT * FROM table_name_1 JOIN table_name_2 ON table_name_1.column_name = table_name_2.column_name; - -Writing Queries ---------------- - -SQL commands must be combined in the following order: -`SELECT`, `FROM`, `JOIN`, `ON`, `WHERE`, `GROUP BY`, `ORDER BY`. - -Creating Tables ---------------- - -Create tables by specifying column names and types. -Include primary and foreign key relationships and other constraints. - - CREATE TABLE survey( - taken INTEGER NOT NULL, - person TEXT, - quant REAL NOT NULL, - PRIMARY KEY(taken, quant), - FOREIGN KEY(person) REFERENCES person(ident) - ); - -Programming ------------ - -Execute queries in a general-purpose programming language by: - -* loading the appropriate library -* creating a connection -* creating a cursor -* repeatedly: - * execute a query - * fetch some or all results -* disposing of the cursor -* closing the connection - -Python example: - - import sqlite3 - connection = sqlite3.connect("database_name") - cursor = connection.cursor() - cursor.execute("...query...") - for r in cursor.fetchall(): - ...process result r... - cursor.close() - connection.close()