posts/Abax/pbs_queues.mdwn

   1 [[!meta  title="Batch Queue Job Control"]]
   2
   3 <a name="qsub" />
   4
   5 Submitting jobs
   6 ===============
   7
   8 You can submit jobs to the batch queue for later proccessing with
   9 `qsub`.  Batch queueing can get pretty fancy, so `qsub` comes with
  10 lots of options (see `man qsub`).  For the most part, you can trust
  11 your sysadmin to have set up some good defaults, and not worry about
  12 setting any options explicitly.  As you get used to the batch queue
  13 system, you'll want tighter control of how your jobs execute by
  14 invoking more sophisticated options yourself, but don't let that scare
  15 you off at the beginning.  They are, after all, only *options*.  This
  16 paper will give you a good start on the options I find myself using
  17 most often.
  18
  19 Simple submission
  20 -----------------
  21
  22 The simplest example of a job submission is:
  23
  24     $ echo "sleep 30 && echo 'Running a job...'" | qsub
  25     2705.n0.physics.drexel.edu
  26
  27 Which submits a job executing `sleep 30 && echo 'Running a job...'`
  28 to the queue.  The job gets an identifying ID in the queue, which
  29 `qsub` prints to `stdout`.
  30
  31 You can check the status of your job in the queue with `qstat`.
  32
  33     $ qstat
  34     Job id            Name             User            Time Use S Queue
  35     ----------------- ---------------- --------------- -------- - -----
  36     2705.n0           STDIN            sysadmin               0 Q batch
  37
  38 There is more information on `qstat` in the [qstat section][sec.qstat].
  39
  40 If your job is too complicated to fit on a single line, you can save
  41 it in a script:
  42
  43     #!/bin/bash
  44     # file: echo_script.sh
  45     sleep 30
  46     echo "a really,"
  47     echo "really,"
  48     echo "complicated"
  49     echo "script"
  50
  51 and submit the script:
  52
  53     $ qsub echo_script.sh
  54     2706.n0.physics.drexel.edu
  55
  56 All the arguments discussed in later sections for the command line
  57 should have comment-style analogs that you can enter in your script if
  58 you use the script-submission approach with `qsub`.
  59
  60 Note that you *cannot* run executibles directly with `qsub`.  For
  61 example
  62
  63     $ cat script.py
  64     #!/usr/bin/python
  65     print("hello world!")
  66     $ qsub python script.py
  67
  68 will fail because `python` is an executible.
  69 Either use
  70
  71     $ echo python script.py | qsub
  72
  73 wrap your [[Python]] script in a [[Bash]] script
  74
  75     $ cat wrapper.sh
  76     #!/bin/bash
  77     python script.py
  78     $ qsub wrapper.sh
  79
  80 or run your Python script directly (relying on the sha-bang)
  81
  82     $ qsub script.py
  83
  84 IO: Job names and working directories
  85 -------------------------------------
  86
  87 You will often be interested in the `stdout` and `stderr` output from
  88 your jobs.  The batch queue system saves this information for you (to
  89 the directory from which you called `qsub`) in two files
  90 `<jobname>.o<jobID-number>` and `<jobname>.e<jobID-number>`.  Job IDs
  91 we have seen before, they're just the numeric part of `qsub` output
  92 (or the first field in the `qstat` output).  Job IDs are assigned by
  93 the batch queue server, and are unique to each job.  Job names are
  94 assigned by the job submitter (that's you) and need not be unique.
  95 They give you a method for keeping track of what job is doing what
  96 task, since you have no control over the job ID.  The combined
  97 `<jobname>.<jobID-number>` pair is both unique (for the server) and
  98 recognizable (for the user), which is why it's used to label the
  99 output data from a given job.  You control the job name by passing the
 100 `-N <jobname>` argument to `qsub`.
 101
 102     $ echo "sleep 30 && echo 'Running a job...'" | qsub -N myjob
 103     2707.n0.physics.drexel.edu
 104     $ qstat
 105     Job id            Name             User            Time Use S Queue
 106     ----------------- ---------------- --------------- -------- - -----
 107     2707.n0           myjob            sysadmin               0 Q batch
 108
 109 Perhaps you are fine with `stdin` and `stdout`, but the default naming
 110 scheme, even with the job name flexibility, is too restrictive.  No
 111 worries, `qsub` lets you specify exactly which files you'd like to use
 112 with the unsurprisingly named `-o` and `-e` options.
 113
 114     $ echo "echo 'ABC' && echo 'DEF' > /dev/stderr" | qsub -o my_out -e my_err
 115     2708.n0.physics.drexel.edu
 116      … time passes …
 117     $ cat my_out
 118     ABC
 119     $ cat my_err
 120     DEF
 121
 122 A time will come when you are no longer satified with `stdin` and
 123 `stdout` and you want to open your own files or worse, run a program!
 124 Because no sane person uses absolute paths all the time, we need to
 125 know what directory we're in so we can construct our relative paths.
 126 You might expect that your job will execute from the same directory
 127 that you called qsub from, but that is not the case.  I think the
 128 reason is that that directory is not garaunteed to exist on the host
 129 that eventually runs your program.  In any case, your job will begin
 130 executing in your home directory.  Writing relative paths from your
 131 home directory is about as annoying as writing absolute paths, so
 132 `qsub` gives your script a nifty environment variable `PBS_O_WORKDIR`,
 133 which is set to the directory you called `qsub` from.  Since *you*
 134 know that this directory exists on the hosts (since the home
 135 directories are NFS mounted on all of our cluster nodes), you can move
 136 to that directory yourself, using something like
 137
 138     $ echo 'pwd && cd $PBS_O_WORKDIR && pwd' | qsub
 139     2709.n0.physics.drexel.edu
 140      … time passes …
 141     $ cat STDIN.o2709
 142     /home/sysadmin
 143     /home/sysadmin/howto/cluster/pbs_queues
 144
 145 Note that if we had enclosed the echo argument in double quotes (`"`),
 146 we would have to escape the `$` symbol in our `echo` argument so that
 147 it survives the shell expansion and makes it safely into `qsub`'s
 148 input.
 149
 150 Long jobs
 151 ---------
 152
 153 If you have jobs that may take longer than the default wall time
 154 (currently 1 hour), you will need to tell the job manager.  Walltimes
 155 may seem annoying, since you don't really know how long a job will run
 156 for, but they protect the cluster from people running broken programs
 157 that waste nodes looping around forever without accomplishing
 158 anything.  Therefor, your walltime doesn't have to be exactly, or even
 159 close to, your actual job execution time.  Before submitting millions
 160 of long jobs, it is a good idea to submit a timing job to see how long
 161 your jobs should run for.  Then set the walltime a factor of 10 or so
 162 higher.  For example
 163
 164     $ echo "time (sleep 30 && echo 'Running a job...')" | qsub -j oe
 165     2710.n0.physics.drexel.edu
 166      … time passes …
 167     $ cat STDIN.o2710
 168     Running a job...
 169
 170     real  0m30.013s
 171     user  0m0.000s
 172     sys  0m0.000s
 173     $ echo "sleep 30 && echo 'Running a job...'" | qsub -l walltime=15:00
 174     2711.n0.physics.drexel.edu
 175     $ qstat -f | grep '[.]walltime'
 176
 177 You can set walltimes in `[[H:]M:]S` format, where the number of
 178 hours, minutes, and seconds are positive integers.  I passed the `-j
 179 oe` combines both `sdtout` and `stdin` streams on `stdin` because
 180 `time` prints to `stderr`.  Walltimes are only accurate on the order
 181 of minutes and above, but you probably shouldn't be batch queueing
 182 jobs that take less time anyway.
 183
 184 Job dependencies
 185 ----------------
 186
 187 You will often find yourself in a situation where the execution of one
 188 job depends on the output of another job.  For example, `jobA` and
 189 `jobB` generate some data, and `jobC` performs some analysis on that
 190 data.  It wouldn't do for `jobC` to go firing off as soon as there was
 191 a free node, if there was no data available yet to analyze.  We can
 192 deal with *dependencies* like these by passing a `-W
 193 depend=<dependency-list>` option to `qsub`.  The dependency list can
 194 get pretty fancy (see `man qsub`), but for the case outlined above,
 195 we'll only need `afterany` dependencies (because `jobC` should execute
 196 after jobs `A` and `B`).
 197
 198 Looking at the `man` page, the proper format for our dependency list
 199 is `afterany:jobid[:jobid...]`, so we need to catch the job IDs output
 200 by `qsub`.  We'll use [[Bash's|Bash]] command substitution
 201 (`$(command)`) for this.
 202
 203     $ AID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && echo \"we're in\" > A_out" | qsub)
 204     $ BID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && pwd > B_out" | qsub)
 205     $ COND="depend=afterany:$AID:$BID -o C_out -W depend=afterany:$AID:$BID"
 206     $ CID=$(echo "cd \$PBS_O_WORKDIR && cat A_out B_out" | qsub -W depend=afterany:$AID:$BID -o C_out)
 207     $ echo -e "A: $AID\nB: $BID\nC: $CID"
 208     A: 2712.n0.physics.drexel.edu
 209     B: 2713.n0.physics.drexel.edu
 210     C: 2714.n0.physics.drexel.edu
 211     $ qstat
 212     Job id                    Name             User            Time Use S Queue
 213     ------------------------- ---------------- --------------- -------- - -----
 214     2712.n0                   STDIN            sysadmin               0 R batch
 215     2713.n0                   STDIN            sysadmin               0 R batch
 216     2714.n0                   STDIN            sysadmin               0 H batch
 217      … time passes …
 218     $ cat C_out
 219     we're in
 220     /home/sysadmin/howto/cluster/pbs_queues
 221
 222 Note that we have to escape the `PBS_O_WORKDIR` expansion so that the
 223 variable substitution occurs when the job runs, and not when the echo
 224 command runs.
 225
 226 Job arrays
 227 ----------
 228
 229 If you have *lots* of jobs you'd like to submit at once, it is tempting try
 230
 231     $ for i in $(seq 1 5); do JOBID=`echo "echo 'Running a job...'" | qsub`; done
 232
 233 This does work, but it puts quite a load on the server as the number
 234 of jobs gets large.  In order to allow the execution of such repeated
 235 commands the batch server provides *job arrays*.  You simply pass
 236 `qsub` the `-t array_request` option, listing the range or list of IDs
 237 for which you'd like to run your command.
 238
 239     $ echo "sleep 30 && echo 'Running job \$PBS_ARRAYID...'" | qsub -t 1-5
 240     2721.n0.physics.drexel.edu
 241     $ qstat
 242     Job id            Name             User            Time Use S Queue
 243     ----------------- ---------------- --------------- -------- - -----
 244     2721-1.n0         STDIN-1          sysadmin               0 R batch
 245     2721-2.n0         STDIN-2          sysadmin               0 R batch
 246     2721-3.n0         STDIN-3          sysadmin               0 R batch
 247     2721-4.n0         STDIN-4          sysadmin               0 R batch
 248     2721-5.n0         STDIN-5          sysadmin               0 R batch
 249
 250 One possibly tricky issue is depending on a job array.  If you have an
 251 analysis job that you need to run to compile the results of your whole
 252 array, try
 253
 254     $ JOBID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && pwd && echo 1 > val\${PBS_ARRAYID}_out" | qsub -t 1-5)
 255     $ sleep 2  # give the job a second to load in...
 256     $ JOBNUM=$(echo $JOBID | cut -d. -f1)
 257     $ COND="depend=afterany"
 258     $ for i in $(seq 1 5); do COND="$COND:$JOBNUM-$i"; done
 259     $ echo "cd \$PBS_O_WORKDIR && awk 'START{s=0}{s+=\$0}END{print s}' val*_out" | \
 260           qsub -o sum_out -W $COND
 261     2723.n0.physics.drexel.edu
 262
 263     $ qstat
 264     Job id            Name             User            Time Use S Queue
 265     ----------------- ---------------- --------------- -------- - -----
 266     2722-1.n0         STDIN-1          sysadmin               0 R batch
 267     2722-2.n0         STDIN-2          sysadmin               0 R batch
 268     2722-3.n0         STDIN-3          sysadmin               0 R batch
 269     2722-4.n0         STDIN-4          sysadmin               0 R batch
 270     2722-5.n0         STDIN-5          sysadmin               0 R batch
 271     2723.n0           STDIN            sysadmin               0 H batch
 272     $ cat sum_out
 273     5
 274
 275 Note that you must create any files needed by the dependent jobs
 276 *during* the early jobs.  The dependent job may start as soon as the
 277 early jobs finish, *before* the `stdin` and `stdout` files for some
 278 early jobs have been written.  Sadly, depending on either the returned
 279 job ID or just its numeric portion doesn't seem to work.
 280
 281 It is important that the jobs on which you depend are loaded into the
 282 server *before your depending job is submitted*.  To ensure this, you
 283 may need to add a reasonable sleep time between submitting your job
 284 array and submitting your dependency.  However, your depending job
 285 will also hang if some early jobs have *already finished* by the time
 286 you get around to submitting it.  In practice, this is not much of a
 287 problem, because your jobs will likely be running for at least a few
 288 minutes, giving you a large window during which you can submit your
 289 dependent job.
 290
 291 See the examples sections and `man qsub` for more details.
 292
 293 <a name="qstat" />
 294
 295 Querying
 296 ========
 297
 298 You can get information about currently running and queued jobs with
 299 `qstat`.  In the examples in the other sections, we've been using bare
 300 `qstat`s to get information about the status of jobs in the queue.
 301 You get information about a particular command with
 302
 303     $ JOBID=`echo "sleep 30 && echo 'Running a job...'" | qsub`
 304     $ sleep 2 && qstat $JOBID
 305     Job id            Name             User            Time Use S Queue
 306     ----------------- ---------------- --------------- -------- - -----
 307     2724.n0           STDIN            sysadmin               0 R batch
 308
 309 and you can get detailed information on a every command (or a
 310 particular one, see previous example) with the `-f` (full) option.
 311
 312     $ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
 313     $ sleep 2
 314     $ qstat -f
 315     Job Id: 2725.n0.physics.drexel.edu
 316         Job_Name = STDIN
 317         Job_Owner = sysadmin@n0.physics.drexel.edu
 318         job_state = R
 319         queue = batch
 320         server = n0.physics.drexel.edu
 321         Checkpoint = u
 322         ctime = Thu Jun 26 13:58:54 2008
 323         Error_Path = n0.physics.drexel.edu:/home/sysadmin/STDIN.e2725
 324         exec_host = n8/0
 325         Hold_Types = n
 326         Join_Path = n
 327         Keep_Files = n
 328         Mail_Points = a
 329         mtime = Thu Jun 26 13:58:55 2008
 330         Output_Path = n0.physics.drexel.edu:/home/sysadmin/STDIN.o2725
 331         Priority = 0
 332         qtime = Thu Jun 26 13:58:54 2008
 333         Rerunable = True
 334         Resource_List.nodect = 1
 335         Resource_List.nodes = 1
 336         Resource_List.walltime = 01:00:00
 337         session_id = 18020
 338         Variable_List = PBS_O_HOME=/home/sysadmin,PBS_O_LANG=en_US.UTF-8,
 339       PBS_O_LOGNAME=sysadmin,
 340       PBS_O_PATH=/home/sysadmin/bin:/usr/local/bin:/usr/local/sbin:/usr/bin
 341       :/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/usr/local/maui/bin:/home/sysadmi
 342       n/script:/home/sysadmin/bin:.,PBS_O_MAIL=/var/mail/sysadmin,
 343       PBS_O_SHELL=/bin/bash,PBS_SERVER=n0.physics.drexel.edu,
 344       PBS_O_HOST=n0.physics.drexel.edu,
 345       PBS_O_WORKDIR=/home/sysadmin/,
 346       PBS_O_QUEUE=batch
 347         etime = Thu Jun 26 13:58:54 2008
 348         start_time = Thu Jun 26 13:58:55 2008
 349         start_count = 1
 350
 351 The `qstat` command gives you lots of information about the current
 352 state of a job, but to get a history you should use the `tracejob`
 353 command.
 354
 355     $ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
 356     $ sleep 2 && tracejob $JOBID
 357
 358     Job: 2726.n0.physics.drexel.edu
 359
 360     06/26/2008 13:58:57  S    enqueuing into batch, state 1 hop 1
 361     06/26/2008 13:58:57  S    Job Queued at request of sysadmin@n0.physics.drexel.edu, owner = sysadmin@n0.physics.drexel.edu, job name = STDIN, queue = batch
 362     06/26/2008 13:58:58  S    Job Modified at request of root@n0.physics.drexel.edu
 363     06/26/2008 13:58:58  S    Job Run at request of root@n0.physics.drexel.edu
 364     06/26/2008 13:58:58  S    Job Modified at request of root@n0.physics.drexel.edu
 365
 366 You can also get the status of the queue itself by passing `-q` option to `qstat`
 367
 368     $ qstat -q
 369
 370     server: n0
 371
 372     Queue            Memory CPU Time Walltime Node  Run Que Lm  State
 373     ---------------- ------ -------- -------- ----  --- --- --  -----
 374     batch              --      --       --      --    2   0 --   E R
 375                                                    ----- -----
 376                                                        2     0
 377
 378 or the status of the server with the `-B` option.
 379
 380     $ qstat -B
 381     Server             Max   Tot   Que   Run   Hld   Wat   Trn   Ext Status
 382     ----------------   ---   ---   ---   ---   ---   ---   ---   --- ----------
 383     n0.physics.drexe     0     2     0     2     0     0     0     0 Active
 384
 385 You can get information on the status of the various nodes with
 386 `qnodes` (a symlink to `pbsnodes`).  The output of `qnodes` is bulky
 387 and not of public interest, so we will not reproduce it here.  For
 388 more details on flags you can pass to `qnodes`/`pbsnodes` see `man
 389 pbsnodes`, but I haven't had any need for fancyness yet.
 390
 391 <a name="qalter" />
 392
 393 Altering and deleting jobs
 394 ==========================
 395
 396 Minor glitches in submitted jobs can be fixed by altering the job with `qalter`.
 397 For example, incorrect dependencies may be causing a job to hold in the queue forever.
 398 We can remove these invalid holds with
 399
 400     $ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub -W depend=afterok:3)
 401     $ qstat
 402     Job id            Name             User            Time Use S Queue
 403     ----------------- ---------------- --------------- -------- - -----
 404     2725.n0           STDIN            sysadmin               0 R batch
 405     2726.n0           STDIN            sysadmin               0 R batch
 406     2727.n0           STDIN            sysadmin               0 H batch
 407     $ qalter -h n $JOBID
 408     $ qstat
 409     Job id            Name             User            Time Use S Queue
 410     ----------------- ---------------- --------------- -------- - -----
 411     2725.n0           STDIN            sysadmin               0 R batch
 412     2726.n0           STDIN            sysadmin               0 R batch
 413     2727.n0           STDIN            sysadmin               0 Q batch
 414
 415 `qalter` is a Swiss-army-knife command, since it can change many
 416 aspects of a job.  The specific hold-release case above could also
 417 have been handled with the `qrls` command.  There are a number of
 418 other `q*` commands which provide detailed control over jobs and
 419 queues, but I haven't had to use them yet.
 420
 421 If you decide a job is beyond repair, you can kill it with `qdel`.
 422 For obvious reasons, you can only kill your own jobs, unless your an
 423 administrator.
 424
 425     $ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
 426     $ qdel $JOBID
 427     $ echo "deleted $JOBID"
 428     deleted 2728.n0.physics.drexel.edu
 429     $ qstat
 430     Job id            Name             User            Time Use S Queue
 431     ----------------- ---------------- --------------- -------- - -----
 432     2725.n0           STDIN            sysadmin               0 R batch
 433     2726.n0           STDIN            sysadmin               0 R batch
 434     2727.n0           STDIN            sysadmin               0 R batch
 435
 436 Further reading
 437 ===============
 438
 439 I used to have a number of scripts and hacks put together to make it
 440 easy to run my [[sawsim]] Monte Carlo simulations and setup dependent
 441 jobs to process the results.  This system was never particularly
 442 elegant.  Over time, I gained access to a number of SMP machines, as
 443 well as my multi host cluster.  In order to support more general
 444 parallelization and post-processing, I put together a general manager
 445 for embarassingly parallel jobs.  There are implementations using a
 446 range of parallelizing tools, from multi-threading through PBS and
 447 MPI.  See the [sawsim source][sawsim-manager] for details.
 448
 449
 450 [sec.qsub]: #qsub
 451 [sec.qstat]: #qstat
 452 [sec.qalter]: #qalter
 453 [sawsim-manager]: http://git.tremily.us/?p=sawsim.git;a=tree;f=pysawsim/manager;hb=HEAD