novice/python/06-cmdline.ipynb

   1 {
   2  "metadata": {
   3   "name": ""
   4  },
   5  "nbformat": 3,
   6  "nbformat_minor": 0,
   7  "worksheets": [
   8   {
   9    "cells": [
  10     {
  11      "cell_type": "heading",
  12      "level": 2,
  13      "metadata": {
  14       "cell_tags": []
  15      },
  16      "source": [
  17       "Command-Line Programs"
  18      ]
  19     },
  20     {
  21      "cell_type": "markdown",
  22      "metadata": {
  23       "cell_tags": []
  24      },
  25      "source": [
  26       "The IPython Notebook and other interactive tools are great for prototyping code and exploring data,\n",
  27       "but sooner or later we will want to use our program in a pipeline\n",
  28       "or run it in a shell script to process thousands of data files.\n",
  29       "In order to do that,\n",
  30       "we need to make our programs work like other Unix command-line tools.\n",
  31       "For example,\n",
  32       "we may want a program that reads a data set\n",
  33       "and prints the average inflammation per patient:\n",
  34       "\n",
  35       "~~~\n",
  36       "$ python readings.py --mean inflammation-01.csv\n",
  37       "5.45\n",
  38       "5.425\n",
  39       "6.1\n",
  40       "...\n",
  41       "6.4\n",
  42       "7.05\n",
  43       "5.9\n",
  44       "~~~\n",
  45       "\n",
  46       "but we might also want to look at the minimum of the first four lines\n",
  47       "\n",
  48       "~~~\n",
  49       "$ head -4 inflammation-01.csv | python readings.py --min\n",
  50       "~~~\n",
  51       "\n",
  52       "or the maximum inflammations in several files one after another:\n",
  53       "\n",
  54       "~~~\n",
  55       "$ python readings.py --max inflammation-*.csv\n",
  56       "~~~\n",
  57       "\n",
  58       "Our overall requirements are:\n",
  59       "\n",
  60       "1. If no filename is given on the command line, read data from [standard input](../../gloss.html#standard-input).\n",
  61       "2. If one or more filenames are given, read data from them and report statistics for each file separately.\n",
  62       "3. Use the `--min`, `--mean`, or `--max` flag to determine what statistic to print.\n",
  63       "\n",
  64       "To make this work,\n",
  65       "we need to know how to handle command-line arguments in a program,\n",
  66       "and how to get at standard input.\n",
  67       "We'll tackle these questions in turn below."
  68      ]
  69     },
  70     {
  71      "cell_type": "markdown",
  72      "metadata": {
  73       "cell_tags": [
  74        "objectives"
  75       ]
  76      },
  77      "source": [
  78       "#### Objectives\n",
  79       "\n",
  80       "*   Use the values of command-line arguments in a program.\n",
  81       "*   Handle flags and files separately in a command-line program.\n",
  82       "*   Read data from standard input in a program so that it can be used in a pipeline."
  83      ]
  84     },
  85     {
  86      "cell_type": "heading",
  87      "level": 3,
  88      "metadata": {
  89       "cell_tags": []
  90      },
  91      "source": [
  92       "Command-Line Arguments"
  93      ]
  94     },
  95     {
  96      "cell_type": "markdown",
  97      "metadata": {
  98       "cell_tags": []
  99      },
 100      "source": [
 101       "Using the text editor of your choice,\n",
 102       "save the following in a text file:"
 103      ]
 104     },
 105     {
 106      "cell_type": "code",
 107      "collapsed": false,
 108      "input": [
 109       "!cat sys-version.py"
 110      ],
 111      "language": "python",
 112      "metadata": {
 113       "cell_tags": []
 114      },
 115      "outputs": [
 116       {
 117        "output_type": "stream",
 118        "stream": "stdout",
 119        "text": [
 120         "import sys\r\n",
 121         "print 'version is', sys.version\r\n"
 122        ]
 123       }
 124      ],
 125      "prompt_number": 2
 126     },
 127     {
 128      "cell_type": "markdown",
 129      "metadata": {
 130       "cell_tags": []
 131      },
 132      "source": [
 133       "The first line imports a library called `sys`,\n",
 134       "which is short for \"system\".\n",
 135       "It defines values such as `sys.version`,\n",
 136       "which describes which version of Python we are running.\n",
 137       "We can run this script from within the IPython Notebook like this:"
 138      ]
 139     },
 140     {
 141      "cell_type": "code",
 142      "collapsed": false,
 143      "input": [
 144       "%run sys-version.py"
 145      ],
 146      "language": "python",
 147      "metadata": {
 148       "cell_tags": []
 149      },
 150      "outputs": [
 151       {
 152        "output_type": "stream",
 153        "stream": "stdout",
 154        "text": [
 155         "version is 2.7.5 |Anaconda 1.8.0 (x86_64)| (default, Oct 24 2013, 07:02:20) \n",
 156         "[GCC 4.0.1 (Apple Inc. build 5493)]\n"
 157        ]
 158       }
 159      ],
 160      "prompt_number": 3
 161     },
 162     {
 163      "cell_type": "markdown",
 164      "metadata": {
 165       "cell_tags": []
 166      },
 167      "source": [
 168       "or like this:"
 169      ]
 170     },
 171     {
 172      "cell_type": "code",
 173      "collapsed": false,
 174      "input": [
 175       "!ipython sys-version.py"
 176      ],
 177      "language": "python",
 178      "metadata": {
 179       "cell_tags": []
 180      },
 181      "outputs": [
 182       {
 183        "output_type": "stream",
 184        "stream": "stdout",
 185        "text": [
 186         "version is 2.7.5 |Anaconda 1.8.0 (x86_64)| (default, Oct 24 2013, 07:02:20) \r\n",
 187         "[GCC 4.0.1 (Apple Inc. build 5493)]\r\n"
 188        ]
 189       }
 190      ],
 191      "prompt_number": 4
 192     },
 193     {
 194      "cell_type": "markdown",
 195      "metadata": {
 196       "cell_tags": []
 197      },
 198      "source": [
 199       "The first method, `%run`,\n",
 200       "uses a special command in the IPython Notebook to run a program in a `.py` file.\n",
 201       "The second method is more general:\n",
 202       "the exclamation mark `!` tells the Notebook to run a shell command,\n",
 203       "and it just so happens that the command we run is `ipython` with the name of the script."
 204      ]
 205     },
 206     {
 207      "cell_type": "markdown",
 208      "metadata": {
 209       "cell_tags": []
 210      },
 211      "source": [
 212       "Here's another script that does something more interesting:"
 213      ]
 214     },
 215     {
 216      "cell_type": "code",
 217      "collapsed": false,
 218      "input": [
 219       "!cat argv-list.py"
 220      ],
 221      "language": "python",
 222      "metadata": {
 223       "cell_tags": []
 224      },
 225      "outputs": [
 226       {
 227        "output_type": "stream",
 228        "stream": "stdout",
 229        "text": [
 230         "import sys\r\n",
 231         "print 'sys.argv is', sys.argv\r\n"
 232        ]
 233       }
 234      ],
 235      "prompt_number": 5
 236     },
 237     {
 238      "cell_type": "markdown",
 239      "metadata": {
 240       "cell_tags": []
 241      },
 242      "source": [
 243       "The strange name `argv` stands for \"argument values\".\n",
 244       "Whenever Python runs a program,\n",
 245       "it takes all of the values given on the command line\n",
 246       "and puts them in the list `sys.argv`\n",
 247       "so that the program can determine what they were.\n",
 248       "If we run this program with no arguments:"
 249      ]
 250     },
 251     {
 252      "cell_type": "code",
 253      "collapsed": false,
 254      "input": [
 255       "!ipython argv-list.py"
 256      ],
 257      "language": "python",
 258      "metadata": {
 259       "cell_tags": []
 260      },
 261      "outputs": [
 262       {
 263        "output_type": "stream",
 264        "stream": "stdout",
 265        "text": [
 266         "sys.argv is ['/Users/gwilson/s/bc/python/novice/argv-list.py']\r\n"
 267        ]
 268       }
 269      ],
 270      "prompt_number": 6
 271     },
 272     {
 273      "cell_type": "markdown",
 274      "metadata": {
 275       "cell_tags": []
 276      },
 277      "source": [
 278       "the only thing in the list is the full path to our script,\n",
 279       "which is always `sys.argv[0]`.\n",
 280       "If we run it with a few arguments, however:"
 281      ]
 282     },
 283     {
 284      "cell_type": "code",
 285      "collapsed": false,
 286      "input": [
 287       "!ipython argv-list.py first second third"
 288      ],
 289      "language": "python",
 290      "metadata": {
 291       "cell_tags": []
 292      },
 293      "outputs": [
 294       {
 295        "output_type": "stream",
 296        "stream": "stdout",
 297        "text": [
 298         "sys.argv is ['/Users/gwilson/s/bc/python/novice/argv-list.py', 'first', 'second', 'third']\r\n"
 299        ]
 300       }
 301      ],
 302      "prompt_number": 7
 303     },
 304     {
 305      "cell_type": "markdown",
 306      "metadata": {
 307       "cell_tags": []
 308      },
 309      "source": [
 310       "then Python adds each of those arguments to that magic list."
 311      ]
 312     },
 313     {
 314      "cell_type": "markdown",
 315      "metadata": {
 316       "cell_tags": []
 317      },
 318      "source": [
 319       "With this in hand,\n",
 320       "let's build a version of `readings.py` that always prints the per-patient mean of a single data file.\n",
 321       "The first step is to write a function that outlines our implementation,\n",
 322       "and a placeholder for the function that does the actual work.\n",
 323       "By convention this function is usually called `main`,\n",
 324       "though we can call it whatever we want:"
 325      ]
 326     },
 327     {
 328      "cell_type": "code",
 329      "collapsed": false,
 330      "input": [
 331       "!cat readings-01.py"
 332      ],
 333      "language": "python",
 334      "metadata": {
 335       "cell_tags": []
 336      },
 337      "outputs": [
 338       {
 339        "output_type": "stream",
 340        "stream": "stdout",
 341        "text": [
 342         "import sys\r\n",
 343         "import numpy as np\r\n",
 344         "\r\n",
 345         "def main():\r\n",
 346         "    script = sys.argv[0]\r\n",
 347         "    filename = sys.argv[1]\r\n",
 348         "    data = np.loadtxt(filename, delimiter=',')\r\n",
 349         "    for m in data.mean(axis=1):\r\n",
 350         "        print m\r\n"
 351        ]
 352       }
 353      ],
 354      "prompt_number": 8
 355     },
 356     {
 357      "cell_type": "markdown",
 358      "metadata": {
 359       "cell_tags": []
 360      },
 361      "source": [
 362       "This function gets the name of the script from `sys.argv[0]`,\n",
 363       "because that's where it's always put,\n",
 364       "and the name of the file to process from `sys.argv[1]`.\n",
 365       "Here's a simple test:"
 366      ]
 367     },
 368     {
 369      "cell_type": "code",
 370      "collapsed": false,
 371      "input": [
 372       "%run readings-01.py inflammation-01.csv"
 373      ],
 374      "language": "python",
 375      "metadata": {
 376       "cell_tags": []
 377      },
 378      "outputs": [],
 379      "prompt_number": 9
 380     },
 381     {
 382      "cell_type": "markdown",
 383      "metadata": {
 384       "cell_tags": []
 385      },
 386      "source": [
 387       "There is no output because we have defined a function,\n",
 388       "but haven't actually called it.\n",
 389       "Let's add a call to `main`:"
 390      ]
 391     },
 392     {
 393      "cell_type": "code",
 394      "collapsed": false,
 395      "input": [
 396       "!cat readings-02.py"
 397      ],
 398      "language": "python",
 399      "metadata": {
 400       "cell_tags": []
 401      },
 402      "outputs": [
 403       {
 404        "output_type": "stream",
 405        "stream": "stdout",
 406        "text": [
 407         "import sys\r\n",
 408         "import numpy as np\r\n",
 409         "\r\n",
 410         "def main():\r\n",
 411         "    script = sys.argv[0]\r\n",
 412         "    filename = sys.argv[1]\r\n",
 413         "    data = np.loadtxt(filename, delimiter=',')\r\n",
 414         "    for m in data.mean(axis=1):\r\n",
 415         "        print m\r\n",
 416         "\r\n",
 417         "main()\r\n"
 418        ]
 419       }
 420      ],
 421      "prompt_number": 10
 422     },
 423     {
 424      "cell_type": "markdown",
 425      "metadata": {
 426       "cell_tags": []
 427      },
 428      "source": [
 429       "and run that:"
 430      ]
 431     },
 432     {
 433      "cell_type": "code",
 434      "collapsed": false,
 435      "input": [
 436       "%run readings-02.py inflammation-01.csv"
 437      ],
 438      "language": "python",
 439      "metadata": {
 440       "cell_tags": []
 441      },
 442      "outputs": [
 443       {
 444        "output_type": "stream",
 445        "stream": "stdout",
 446        "text": [
 447         "5.45\n",
 448         "5.425\n",
 449         "6.1\n",
 450         "5.9\n",
 451         "5.55\n",
 452         "6.225\n",
 453         "5.975\n",
 454         "6.65\n",
 455         "6.625\n",
 456         "6.525\n",
 457         "6.775\n",
 458         "5.8\n",
 459         "6.225\n",
 460         "5.75\n",
 461         "5.225\n",
 462         "6.3\n",
 463         "6.55\n",
 464         "5.7\n",
 465         "5.85\n",
 466         "6.55\n",
 467         "5.775\n",
 468         "5.825\n",
 469         "6.175\n",
 470         "6.1\n",
 471         "5.8\n",
 472         "6.425\n",
 473         "6.05\n",
 474         "6.025\n",
 475         "6.175\n",
 476         "6.55\n",
 477         "6.175\n",
 478         "6.35\n",
 479         "6.725\n",
 480         "6.125\n",
 481         "7.075\n",
 482         "5.725\n",
 483         "5.925\n",
 484         "6.15\n",
 485         "6.075\n",
 486         "5.75\n",
 487         "5.975\n",
 488         "5.725\n",
 489         "6.3\n",
 490         "5.9\n",
 491         "6.75\n",
 492         "5.925\n",
 493         "7.225\n",
 494         "6.15\n",
 495         "5.95\n",
 496         "6.275\n",
 497         "5.7\n",
 498         "6.1\n",
 499         "6.825\n",
 500         "5.975\n",
 501         "6.725\n",
 502         "5.7\n",
 503         "6.25\n",
 504         "6.4\n",
 505         "7.05\n",
 506         "5.9\n"
 507        ]
 508       }
 509      ],
 510      "prompt_number": 11
 511     },
 512     {
 513      "cell_type": "markdown",
 514      "metadata": {},
 515      "source": [
 516       "> #### The Right Way to Do It\n",
 517       ">\n",
 518       "> If our programs can take complex parameters or multiple filenames,\n",
 519       "> we shouldn't handle `sys.argv` directly.\n",
 520       "> Instead,\n",
 521       "> we should use Python's `argparse` library,\n",
 522       "> which handles common cases in a systematic way,\n",
 523       "> and also makes it easy for us to provide sensible error messages for our users."
 524      ]
 525     },
 526     {
 527      "cell_type": "markdown",
 528      "metadata": {
 529       "cell_tags": [
 530        "challenges"
 531       ]
 532      },
 533      "source": [
 534       "#### Challenges\n",
 535       "\n",
 536       "1.  Write a command-line program that does addition and subtraction:\n",
 537       "    ~~~\n",
 538       "    python arith.py 1 + 2\n",
 539       "    3\n",
 540       "    python arith.py 3 - 4\n",
 541       "    -1\n",
 542       "    ~~~\n",
 543       "\n",
 544       "    What goes wrong if you try to add multiplication using '*' to the program?\n",
 545       "\n",
 546       "2.  Using the `glob` module introduced [03-loop.ipynb](earlier),\n",
 547       "    write a simple version of `ls` that shows files in the current directory with a particular suffix:\n",
 548       "    ~~~\n",
 549       "    python my_ls.py py\n",
 550       "    left.py\n",
 551       "    right.py\n",
 552       "    zero.py\n",
 553       "    ~~~"
 554      ]
 555     },
 556     {
 557      "cell_type": "heading",
 558      "level": 3,
 559      "metadata": {
 560       "cell_tags": []
 561      },
 562      "source": [
 563       "Handling Multiple Files"
 564      ]
 565     },
 566     {
 567      "cell_type": "markdown",
 568      "metadata": {
 569       "cell_tags": []
 570      },
 571      "source": [
 572       "The next step is to teach our program how to handle multiple files.\n",
 573       "Since 60 lines of output per file is a lot to page through,\n",
 574       "we'll start by creating three smaller files,\n",
 575       "each of which has three days of data for two patients:"
 576      ]
 577     },
 578     {
 579      "cell_type": "code",
 580      "collapsed": false,
 581      "input": [
 582       "!ls small-*.csv"
 583      ],
 584      "language": "python",
 585      "metadata": {
 586       "cell_tags": []
 587      },
 588      "outputs": [
 589       {
 590        "output_type": "stream",
 591        "stream": "stdout",
 592        "text": [
 593         "small-01.csv small-02.csv small-03.csv\r\n"
 594        ]
 595       }
 596      ],
 597      "prompt_number": 12
 598     },
 599     {
 600      "cell_type": "code",
 601      "collapsed": false,
 602      "input": [
 603       "!cat small-01.csv"
 604      ],
 605      "language": "python",
 606      "metadata": {
 607       "cell_tags": []
 608      },
 609      "outputs": [
 610       {
 611        "output_type": "stream",
 612        "stream": "stdout",
 613        "text": [
 614         "0,0,1\r\n",
 615         "0,1,2\r\n"
 616        ]
 617       }
 618      ],
 619      "prompt_number": 13
 620     },
 621     {
 622      "cell_type": "code",
 623      "collapsed": false,
 624      "input": [
 625       "%run readings-02.py small-01.csv"
 626      ],
 627      "language": "python",
 628      "metadata": {
 629       "cell_tags": []
 630      },
 631      "outputs": [
 632       {
 633        "output_type": "stream",
 634        "stream": "stdout",
 635        "text": [
 636         "0.333333333333\n",
 637         "1.0\n"
 638        ]
 639       }
 640      ],
 641      "prompt_number": 14
 642     },
 643     {
 644      "cell_type": "markdown",
 645      "metadata": {
 646       "cell_tags": []
 647      },
 648      "source": [
 649       "Using small data files as input also allows us to check our results more easily:\n",
 650       "here,\n",
 651       "for example,\n",
 652       "we can see that our program is calculating the mean correctly for each line,\n",
 653       "whereas we were really taking it on faith before.\n",
 654       "This is yet another rule of programming:\n",
 655       "\"[test the simple things first](../../rules.html#test-simple-first)\".\n",
 656       "\n",
 657       "We want our program to process each file separately,\n",
 658       "so we need a looop that executes once for each filename.\n",
 659       "If we specify the files on the command line,\n",
 660       "the filenames will be in `sys.argv`,\n",
 661       "but we need to be careful:\n",
 662       "`sys.argv[0]` will always be the name of our script,\n",
 663       "rather than the name of a file.\n",
 664       "We also need to handle an unknown number of filenames,\n",
 665       "since our program could be run for any number of files.\n",
 666       "\n",
 667       "The solution to both problems is to loop over the contents of `sys.argv[1:]`.\n",
 668       "The '1' tells Python to start the slice at location 1,\n",
 669       "so the program's name isn't included;\n",
 670       "since we've left off the upper bound,\n",
 671       "the slice runs to the end of the list,\n",
 672       "and includes all the filenames.\n",
 673       "Here's our changed program:"
 674      ]
 675     },
 676     {
 677      "cell_type": "code",
 678      "collapsed": false,
 679      "input": [
 680       "!cat readings-03.py"
 681      ],
 682      "language": "python",
 683      "metadata": {
 684       "cell_tags": []
 685      },
 686      "outputs": [
 687       {
 688        "output_type": "stream",
 689        "stream": "stdout",
 690        "text": [
 691         "import sys\r\n",
 692         "import numpy as np\r\n",
 693         "\r\n",
 694         "def main():\r\n",
 695         "    script = sys.argv[0]\r\n",
 696         "    for filename in sys.argv[1:]:\r\n",
 697         "        data = np.loadtxt(filename, delimiter=',')\r\n",
 698         "        for m in data.mean(axis=1):\r\n",
 699         "            print m\r\n",
 700         "\r\n",
 701         "main()\r\n"
 702        ]
 703       }
 704      ],
 705      "prompt_number": 15
 706     },
 707     {
 708      "cell_type": "markdown",
 709      "metadata": {
 710       "cell_tags": []
 711      },
 712      "source": [
 713       "and here it is in action:"
 714      ]
 715     },
 716     {
 717      "cell_type": "code",
 718      "collapsed": false,
 719      "input": [
 720       "%run readings-03.py small-01.csv small-02.csv"
 721      ],
 722      "language": "python",
 723      "metadata": {
 724       "cell_tags": []
 725      },
 726      "outputs": [
 727       {
 728        "output_type": "stream",
 729        "stream": "stdout",
 730        "text": [
 731         "0.333333333333\n",
 732         "1.0\n",
 733         "13.6666666667\n",
 734         "11.0\n"
 735        ]
 736       }
 737      ],
 738      "prompt_number": 16
 739     },
 740     {
 741      "cell_type": "markdown",
 742      "metadata": {
 743       "cell_tags": []
 744      },
 745      "source": [
 746       "Note:\n",
 747       "at this point,\n",
 748       "we have created three versions of our script called `readings-01.py`,\n",
 749       "`readings-02.py`, and `readings-03.py`.\n",
 750       "We wouldn't do this in real life:\n",
 751       "instead,\n",
 752       "we would have one file called `readings.py` that we committed to version control\n",
 753       "every time we got an enhancement working.\n",
 754       "For teaching,\n",
 755       "though,\n",
 756       "we need all the successive versions side by side."
 757      ]
 758     },
 759     {
 760      "cell_type": "markdown",
 761      "metadata": {
 762       "cell_tags": [
 763        "challenges"
 764       ]
 765      },
 766      "source": [
 767       "#### Challenges\n",
 768       "\n",
 769       "1.  Write a program called `check.py` that takes the names of one or more inflammation data files as arguments\n",
 770       "    and checks that all the files have the same number of rows and columns.\n",
 771       "    What is the best way to test your program?"
 772      ]
 773     },
 774     {
 775      "cell_type": "heading",
 776      "level": 3,
 777      "metadata": {
 778       "cell_tags": []
 779      },
 780      "source": [
 781       "Handling Command-Line Flags"
 782      ]
 783     },
 784     {
 785      "cell_type": "markdown",
 786      "metadata": {
 787       "cell_tags": []
 788      },
 789      "source": [
 790       "The next step is to teach our program to pay attention to the `--min`, `--mean`, and `--max` flags.\n",
 791       "These always appear before the names of the files,\n",
 792       "so we could just do this:"
 793      ]
 794     },
 795     {
 796      "cell_type": "code",
 797      "collapsed": false,
 798      "input": [
 799       "!cat readings-04.py"
 800      ],
 801      "language": "python",
 802      "metadata": {
 803       "cell_tags": []
 804      },
 805      "outputs": [
 806       {
 807        "output_type": "stream",
 808        "stream": "stdout",
 809        "text": [
 810         "import sys\r\n",
 811         "import numpy as np\r\n",
 812         "\r\n",
 813         "def main():\r\n",
 814         "    script = sys.argv[0]\r\n",
 815         "    action = sys.argv[1]\r\n",
 816         "    filenames = sys.argv[2:]\r\n",
 817         "\r\n",
 818         "    for f in filenames:\r\n",
 819         "        data = np.loadtxt(f, delimiter=',')\r\n",
 820         "\r\n",
 821         "        if action == '--min':\r\n",
 822         "            values = data.min(axis=1)\r\n",
 823         "        elif action == '--mean':\r\n",
 824         "            values = data.mean(axis=1)\r\n",
 825         "        elif action == '--max':\r\n",
 826         "            values = data.max(axis=1)\r\n",
 827         "\r\n",
 828         "        for m in values:\r\n",
 829         "            print m\r\n",
 830         "\r\n",
 831         "main()\r\n"
 832        ]
 833       }
 834      ],
 835      "prompt_number": 17
 836     },
 837     {
 838      "cell_type": "markdown",
 839      "metadata": {
 840       "cell_tags": []
 841      },
 842      "source": [
 843       "This works:"
 844      ]
 845     },
 846     {
 847      "cell_type": "code",
 848      "collapsed": false,
 849      "input": [
 850       "%run readings-04.py --max small-01.csv"
 851      ],
 852      "language": "python",
 853      "metadata": {
 854       "cell_tags": []
 855      },
 856      "outputs": [
 857       {
 858        "output_type": "stream",
 859        "stream": "stdout",
 860        "text": [
 861         "1.0\n",
 862         "2.0\n"
 863        ]
 864       }
 865      ],
 866      "prompt_number": 18
 867     },
 868     {
 869      "cell_type": "markdown",
 870      "metadata": {
 871       "cell_tags": []
 872      },
 873      "source": [
 874       "but there are seveal things wrong with it:\n",
 875       "\n",
 876       "1.  `main` is too large to read comfortably.\n",
 877       "\n",
 878       "2.  If `action` isn't one of the three recognized flags,\n",
 879       "    the program loads each file but does nothing with it\n",
 880       "    (because none of the branches in the conditional match).\n",
 881       "    [Silent failures](../../gloss.html#silent-failure) like this\n",
 882       "    are always hard to debug.\n",
 883       "\n",
 884       "This version pulls the processing of each file out of the loop into a function of its own.\n",
 885       "It also checks that `action` is one of the allowed flags\n",
 886       "before doing any processing,\n",
 887       "so that the program fails fast:"
 888      ]
 889     },
 890     {
 891      "cell_type": "code",
 892      "collapsed": false,
 893      "input": [
 894       "!cat readings-05.py"
 895      ],
 896      "language": "python",
 897      "metadata": {
 898       "cell_tags": []
 899      },
 900      "outputs": [
 901       {
 902        "output_type": "stream",
 903        "stream": "stdout",
 904        "text": [
 905         "import sys\r\n",
 906         "import numpy as np\r\n",
 907         "\r\n",
 908         "def main():\r\n",
 909         "    script = sys.argv[0]\r\n",
 910         "    action = sys.argv[1]\r\n",
 911         "    filenames = sys.argv[2:]\r\n",
 912         "    assert action in ['--min', '--mean', '--max'], \\\r\n",
 913         "           'Action is not one of --min, --mean, or --max: ' + action\r\n",
 914         "    for f in filenames:\r\n",
 915         "        process(f, action)\r\n",
 916         "\r\n",
 917         "def process(filename, action):\r\n",
 918         "    data = np.loadtxt(filename, delimiter=',')\r\n",
 919         "\r\n",
 920         "    if action == '--min':\r\n",
 921         "        values = data.min(axis=1)\r\n",
 922         "    elif action == '--mean':\r\n",
 923         "        values = data.mean(axis=1)\r\n",
 924         "    elif action == '--max':\r\n",
 925         "        values = data.max(axis=1)\r\n",
 926         "\r\n",
 927         "    for m in values:\r\n",
 928         "        print m\r\n",
 929         "\r\n",
 930         "main()\r\n"
 931        ]
 932       }
 933      ],
 934      "prompt_number": 19
 935     },
 936     {
 937      "cell_type": "markdown",
 938      "metadata": {
 939       "cell_tags": []
 940      },
 941      "source": [
 942       "This is four lines longer than its predecessor,\n",
 943       "but broken into more digestible chunks of 8 and 12 lines."
 944      ]
 945     },
 946     {
 947      "cell_type": "markdown",
 948      "metadata": {},
 949      "source": [
 950       "Python has a module named [argparse](http://docs.python.org/dev/library/argparse.html)\n",
 951       "that helps handle complex command-line flags. We will not cover this module in this lesson\n",
 952       "but you can go to Tshepang Lekhonkhobe's [Argparse tutorial](http://docs.python.org/dev/howto/argparse.html)\n",
 953       "that is part of Python's Official Documentation."
 954      ]
 955     },
 956     {
 957      "cell_type": "markdown",
 958      "metadata": {
 959       "cell_tags": [
 960        "challenges"
 961       ]
 962      },
 963      "source": [
 964       "#### Challenges\n",
 965       "\n",
 966       "1.  Rewrite this program so that it uses `-n`, `-m`, and `-x` instead of `--min`, `--mean`, and `--max` respectively.\n",
 967       "    Is the code easier to read?\n",
 968       "    Is the program easier to understand?\n",
 969       "\n",
 970       "2.  Separately,\n",
 971       "    modify the program so that if no parameters are given\n",
 972       "    (i.e., no action is specified and no filenames are given),\n",
 973       "    it prints a message explaining how it should be used.\n",
 974       "\n",
 975       "3.  Separately,\n",
 976       "    modify the program so that if no action is given\n",
 977       "    it displays the means of the data."
 978      ]
 979     },
 980     {
 981      "cell_type": "heading",
 982      "level": 3,
 983      "metadata": {
 984       "cell_tags": []
 985      },
 986      "source": [
 987       "Handling Standard Input"
 988      ]
 989     },
 990     {
 991      "cell_type": "markdown",
 992      "metadata": {
 993       "cell_tags": []
 994      },
 995      "source": [
 996       "The next thing our program has to do is read data from standard input if no filenames are given\n",
 997       "so that we can put it in a pipeline,\n",
 998       "redirect input to it,\n",
 999       "and so on.\n",
1000       "Let's experiment in another script:"
1001      ]
1002     },
1003     {
1004      "cell_type": "code",
1005      "collapsed": false,
1006      "input": [
1007       "!cat count-stdin.py"
1008      ],
1009      "language": "python",
1010      "metadata": {
1011       "cell_tags": []
1012      },
1013      "outputs": [
1014       {
1015        "output_type": "stream",
1016        "stream": "stdout",
1017        "text": [
1018         "import sys\r\n",
1019         "\r\n",
1020         "count = 0\r\n",
1021         "for line in sys.stdin:\r\n",
1022         "    count += 1\r\n",
1023         "\r\n",
1024         "print count, 'lines in standard input'\r\n"
1025        ]
1026       }
1027      ],
1028      "prompt_number": 20
1029     },
1030     {
1031      "cell_type": "markdown",
1032      "metadata": {
1033       "cell_tags": []
1034      },
1035      "source": [
1036       "This little program reads lines from a special \"file\" called `sys.stdin`,\n",
1037       "which is automatically connected to the program's standard input.\n",
1038       "We don't have to open it&mdash;Python and the operating system\n",
1039       "take care of that when the program starts up&mdash;\n",
1040       "but we can do almost anything with it that we could do to a regular file.\n",
1041       "Let's try running it as if it were a regular command-line program:"
1042      ]
1043     },
1044     {
1045      "cell_type": "code",
1046      "collapsed": false,
1047      "input": [
1048       "!ipython count-stdin.py < small-01.csv"
1049      ],
1050      "language": "python",
1051      "metadata": {
1052       "cell_tags": []
1053      },
1054      "outputs": [
1055       {
1056        "output_type": "stream",
1057        "stream": "stdout",
1058        "text": [
1059         "2 lines in standard input\r\n"
1060        ]
1061       }
1062      ],
1063      "prompt_number": 21
1064     },
1065     {
1066      "cell_type": "markdown",
1067      "metadata": {
1068       "cell_tags": []
1069      },
1070      "source": [
1071       "What if we run it using `%run`?"
1072      ]
1073     },
1074     {
1075      "cell_type": "code",
1076      "collapsed": false,
1077      "input": [
1078       "%run count-stdin.py < fractal_1.txt"
1079      ],
1080      "language": "python",
1081      "metadata": {
1082       "cell_tags": []
1083      },
1084      "outputs": [
1085       {
1086        "output_type": "stream",
1087        "stream": "stdout",
1088        "text": [
1089         "0 lines in standard input\n"
1090        ]
1091       }
1092      ],
1093      "prompt_number": 22
1094     },
1095     {
1096      "cell_type": "markdown",
1097      "metadata": {
1098       "cell_tags": []
1099      },
1100      "source": [
1101       "As you can see,\n",
1102       "`%run` doesn't understand file redirection:\n",
1103       "that's a shell thing.\n",
1104       "\n",
1105       "A common mistake is to try to run something that reads from standard input like this:\n",
1106       "\n",
1107       "~~~\n",
1108       "!ipython count_stdin.py fractal_1.txt\n",
1109       "~~~\n",
1110       "\n",
1111       "i.e., to forget the `<` character that redirect the file to standard input.\n",
1112       "In this case,\n",
1113       "there's nothing in standard input,\n",
1114       "so the program waits at the start of the loop for someone to type something on the keyboard.\n",
1115       "Since there's no way for us to do this,\n",
1116       "our program is stuck,\n",
1117       "and we have to halt it using the `Interrupt` option from the `Kernel` menu in the Notebook.\n",
1118       "\n",
1119       "We now need to rewrite the program so that it loads data from `sys.stdin` if no filenames are provided.\n",
1120       "Luckily,\n",
1121       "`numpy.loadtxt` can handle either a filename or an open file as its first parameter,\n",
1122       "so we don't actually need to change `process`.\n",
1123       "That leaves `main`:"
1124      ]
1125     },
1126     {
1127      "cell_type": "markdown",
1128      "metadata": {
1129       "cell_tags": []
1130      },
1131      "source": [
1132       "~~~\n",
1133       "def main():\n",
1134       "    script = sys.argv[0]\n",
1135       "    action = sys.argv[1]\n",
1136       "    filenames = sys.argv[2:]\n",
1137       "    assert action in ['--min', '--mean', '--max'], \\\n",
1138       "           'Action is not one of --min, --mean, or --max: ' + action\n",
1139       "    if len(filenames) == 0:\n",
1140       "        process(sys.stdin, action)\n",
1141       "    else:\n",
1142       "        for f in filenames:\n",
1143       "            process(f, action)\n",
1144       "~~~"
1145      ]
1146     },
1147     {
1148      "cell_type": "markdown",
1149      "metadata": {
1150       "cell_tags": []
1151      },
1152      "source": [
1153       "Let's try it out\n",
1154       "(we'll see in a moment why we send the output through `head`):"
1155      ]
1156     },
1157     {
1158      "cell_type": "code",
1159      "collapsed": false,
1160      "input": [
1161       "!ipython readings-06.py --mean < small-01.csv | head -10"
1162      ],
1163      "language": "python",
1164      "metadata": {
1165       "cell_tags": []
1166      },
1167      "outputs": [
1168       {
1169        "output_type": "stream",
1170        "stream": "stdout",
1171        "text": [
1172         "[TerminalIPythonApp] CRITICAL | Bad config encountered during initialization:\r\n",
1173         "[TerminalIPythonApp] CRITICAL | Unrecognized flag: '--mean'\r\n",
1174         "=========\r\n",
1175         " IPython\r\n",
1176         "=========\r\n",
1177         "\r\n",
1178         "Tools for Interactive Computing in Python\r\n",
1179         "=========================================\r\n",
1180         "\r\n",
1181         "    A Python shell with automatic history (input and output), dynamic object\r\n",
1182         "    introspection, easier configuration, command completion, access to the\r\n",
1183         "    system shell and more.  IPython can also be embedded in running programs.\r\n"
1184        ]
1185       }
1186      ],
1187      "prompt_number": 23
1188     },
1189     {
1190      "cell_type": "markdown",
1191      "metadata": {
1192       "cell_tags": []
1193      },
1194      "source": [
1195       "Whoops:\n",
1196       "why are we getting IPython's help rather than the line-by-line average of our data?\n",
1197       "The answer is that IPython has a hard time telling\n",
1198       "which command-line arguments are meant for it,\n",
1199       "and which are meant for the program it's running.\n",
1200       "To make our meaning clear,\n",
1201       "we have to use `--` (a double dash)\n",
1202       "to separate the two:"
1203      ]
1204     },
1205     {
1206      "cell_type": "code",
1207      "collapsed": false,
1208      "input": [
1209       "!ipython readings-06.py -- --mean < small-01.csv"
1210      ],
1211      "language": "python",
1212      "metadata": {
1213       "cell_tags": []
1214      },
1215      "outputs": [
1216       {
1217        "output_type": "stream",
1218        "stream": "stdout",
1219        "text": [
1220         "0.333333333333\r\n",
1221         "1.0\r\n"
1222        ]
1223       }
1224      ],
1225      "prompt_number": 24
1226     },
1227     {
1228      "cell_type": "markdown",
1229      "metadata": {
1230       "cell_tags": []
1231      },
1232      "source": [
1233       "That's better.\n",
1234       "In fact,\n",
1235       "that's done:\n",
1236       "the program now does everything we set out to do."
1237      ]
1238     },
1239     {
1240      "cell_type": "markdown",
1241      "metadata": {
1242       "cell_tags": [
1243        "challenges"
1244       ]
1245      },
1246      "source": [
1247       "#### Challenges\n",
1248       "\n",
1249       "1.  Write a program called `line-count.py` that works like the Unix `wc` command:\n",
1250       "    *   If no filenames are given, it reports the number of lines in standard input.\n",
1251       "    *   If one or more filenames are given, it reports the number of lines in each, followed by the total number of lines."
1252      ]
1253     },
1254     {
1255      "cell_type": "markdown",
1256      "metadata": {
1257       "cell_tags": [
1258        "keypoints"
1259       ]
1260      },
1261      "source": [
1262       "#### Key Points\n",
1263       "\n",
1264       "*   The `sys` library connects a Python program to the system it is running on.\n",
1265       "*   The list `sys.argv` contains the command-line arguments that a program was run with.\n",
1266       "*   Avoid silent failures.\n",
1267       "*   The \"file\" `sys.stdin` connects to a program's standard input.\n",
1268       "*   The \"file\" `sys.stdout` connects to a program's standard output."
1269      ]
1270     }
1271    ],
1272    "metadata": {}
1273   }
1274  ]
1275 }