[[!meta title="Batch Queue Job Control"]]
Submitting jobs
===============
You can submit jobs to the batch queue for later proccessing with
`qsub`. Batch queueing can get pretty fancy, so `qsub` comes with
lots of options (see `man qsub`). For the most part, you can trust
your sysadmin to have set up some good defaults, and not worry about
setting any options explicitly. As you get used to the batch queue
system, you'll want tighter control of how your jobs execute by
invoking more sophisticated options yourself, but don't let that scare
you off at the beginning. They are, after all, only *options*. This
paper will give you a good start on the options I find myself using
most often.
Simple submission
-----------------
The simplest example of a job submission is:
$ echo "sleep 30 && echo 'Running a job...'" | qsub
2705.n0.physics.drexel.edu
Which submits a job executing `sleep 30 && echo 'Running a job...'`
to the queue. The job gets an identifying ID in the queue, which
`qsub` prints to `stdout`.
You can check the status of your job in the queue with `qstat`.
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2705.n0 STDIN sysadmin 0 Q batch
There is more information on `qstat` in the [qstat section][sec.qstat].
If your job is too complicated to fit on a single line, you can save
it in a script:
#!/bin/bash
# file: echo_script.sh
sleep 30
echo "a really,"
echo "really,"
echo "complicated"
echo "script"
and submit the script:
$ qsub echo_script.sh
2706.n0.physics.drexel.edu
All the arguments discussed in later sections for the command line
should have comment-style analogs that you can enter in your script if
you use the script-submission approach with `qsub`.
Note that you *cannot* run executibles directly with `qsub`. For
example
$ cat script.py
#!/usr/bin/python
print("hello world!")
$ qsub python script.py
will fail because `python` is an executible.
Either use
$ echo python script.py | qsub
wrap your [[Python]] script in a [[Bash]] script
$ cat wrapper.sh
#!/bin/bash
python script.py
$ qsub wrapper.sh
or run your Python script directly (relying on the sha-bang)
$ qsub script.py
IO: Job names and working directories
-------------------------------------
You will often be interested in the `stdout` and `stderr` output from
your jobs. The batch queue system saves this information for you (to
the directory from which you called `qsub`) in two files
`.o` and `.e`. Job IDs
we have seen before, they're just the numeric part of `qsub` output
(or the first field in the `qstat` output). Job IDs are assigned by
the batch queue server, and are unique to each job. Job names are
assigned by the job submitter (that's you) and need not be unique.
They give you a method for keeping track of what job is doing what
task, since you have no control over the job ID. The combined
`.` pair is both unique (for the server) and
recognizable (for the user), which is why it's used to label the
output data from a given job. You control the job name by passing the
`-N ` argument to `qsub`.
$ echo "sleep 30 && echo 'Running a job...'" | qsub -N myjob
2707.n0.physics.drexel.edu
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2707.n0 myjob sysadmin 0 Q batch
Perhaps you are fine with `stdin` and `stdout`, but the default naming
scheme, even with the job name flexibility, is too restrictive. No
worries, `qsub` lets you specify exactly which files you'd like to use
with the unsurprisingly named `-o` and `-e` options.
$ echo "echo 'ABC' && echo 'DEF' > /dev/stderr" | qsub -o my_out -e my_err
2708.n0.physics.drexel.edu
… time passes …
$ cat my_out
ABC
$ cat my_err
DEF
A time will come when you are no longer satified with `stdin` and
`stdout` and you want to open your own files or worse, run a program!
Because no sane person uses absolute paths all the time, we need to
know what directory we're in so we can construct our relative paths.
You might expect that your job will execute from the same directory
that you called qsub from, but that is not the case. I think the
reason is that that directory is not garaunteed to exist on the host
that eventually runs your program. In any case, your job will begin
executing in your home directory. Writing relative paths from your
home directory is about as annoying as writing absolute paths, so
`qsub` gives your script a nifty environment variable `PBS_O_WORKDIR`,
which is set to the directory you called `qsub` from. Since *you*
know that this directory exists on the hosts (since the home
directories are NFS mounted on all of our cluster nodes), you can move
to that directory yourself, using something like
$ echo 'pwd && cd $PBS_O_WORKDIR && pwd' | qsub
2709.n0.physics.drexel.edu
… time passes …
$ cat STDIN.o2709
/home/sysadmin
/home/sysadmin/howto/cluster/pbs_queues
Note that if we had enclosed the echo argument in double quotes (`"`),
we would have to escape the `$` symbol in our `echo` argument so that
it survives the shell expansion and makes it safely into `qsub`'s
input.
Long jobs
---------
If you have jobs that may take longer than the default wall time
(currently 1 hour), you will need to tell the job manager. Walltimes
may seem annoying, since you don't really know how long a job will run
for, but they protect the cluster from people running broken programs
that waste nodes looping around forever without accomplishing
anything. Therefor, your walltime doesn't have to be exactly, or even
close to, your actual job execution time. Before submitting millions
of long jobs, it is a good idea to submit a timing job to see how long
your jobs should run for. Then set the walltime a factor of 10 or so
higher. For example
$ echo "time (sleep 30 && echo 'Running a job...')" | qsub -j oe
2710.n0.physics.drexel.edu
… time passes …
$ cat STDIN.o2710
Running a job...
real 0m30.013s
user 0m0.000s
sys 0m0.000s
$ echo "sleep 30 && echo 'Running a job...'" | qsub -l walltime=15:00
2711.n0.physics.drexel.edu
$ qstat -f | grep '[.]walltime'
You can set walltimes in `[[H:]M:]S` format, where the number of
hours, minutes, and seconds are positive integers. I passed the `-j
oe` combines both `sdtout` and `stdin` streams on `stdin` because
`time` prints to `stderr`. Walltimes are only accurate on the order
of minutes and above, but you probably shouldn't be batch queueing
jobs that take less time anyway.
Job dependencies
----------------
You will often find yourself in a situation where the execution of one
job depends on the output of another job. For example, `jobA` and
`jobB` generate some data, and `jobC` performs some analysis on that
data. It wouldn't do for `jobC` to go firing off as soon as there was
a free node, if there was no data available yet to analyze. We can
deal with *dependencies* like these by passing a `-W
depend=` option to `qsub`. The dependency list can
get pretty fancy (see `man qsub`), but for the case outlined above,
we'll only need `afterany` dependencies (because `jobC` should execute
after jobs `A` and `B`).
Looking at the `man` page, the proper format for our dependency list
is `afterany:jobid[:jobid...]`, so we need to catch the job IDs output
by `qsub`. We'll use [[Bash's|Bash]] command substitution
(`$(command)`) for this.
$ AID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && echo \"we're in\" > A_out" | qsub)
$ BID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && pwd > B_out" | qsub)
$ COND="depend=afterany:$AID:$BID -o C_out -W depend=afterany:$AID:$BID"
$ CID=$(echo "cd \$PBS_O_WORKDIR && cat A_out B_out" | qsub -W depend=afterany:$AID:$BID -o C_out)
$ echo -e "A: $AID\nB: $BID\nC: $CID"
A: 2712.n0.physics.drexel.edu
B: 2713.n0.physics.drexel.edu
C: 2714.n0.physics.drexel.edu
$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
2712.n0 STDIN sysadmin 0 R batch
2713.n0 STDIN sysadmin 0 R batch
2714.n0 STDIN sysadmin 0 H batch
… time passes …
$ cat C_out
we're in
/home/sysadmin/howto/cluster/pbs_queues
Note that we have to escape the `PBS_O_WORKDIR` expansion so that the
variable substitution occurs when the job runs, and not when the echo
command runs.
Job arrays
----------
If you have *lots* of jobs you'd like to submit at once, it is tempting try
$ for i in $(seq 1 5); do JOBID=`echo "echo 'Running a job...'" | qsub`; done
This does work, but it puts quite a load on the server as the number
of jobs gets large. In order to allow the execution of such repeated
commands the batch server provides *job arrays*. You simply pass
`qsub` the `-t array_request` option, listing the range or list of IDs
for which you'd like to run your command.
$ echo "sleep 30 && echo 'Running job \$PBS_ARRAYID...'" | qsub -t 1-5
2721.n0.physics.drexel.edu
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2721-1.n0 STDIN-1 sysadmin 0 R batch
2721-2.n0 STDIN-2 sysadmin 0 R batch
2721-3.n0 STDIN-3 sysadmin 0 R batch
2721-4.n0 STDIN-4 sysadmin 0 R batch
2721-5.n0 STDIN-5 sysadmin 0 R batch
One possibly tricky issue is depending on a job array. If you have an
analysis job that you need to run to compile the results of your whole
array, try
$ JOBID=$(echo "cd \$PBS_O_WORKDIR && sleep 30 && pwd && echo 1 > val\${PBS_ARRAYID}_out" | qsub -t 1-5)
$ sleep 2 # give the job a second to load in...
$ JOBNUM=$(echo $JOBID | cut -d. -f1)
$ COND="depend=afterany"
$ for i in $(seq 1 5); do COND="$COND:$JOBNUM-$i"; done
$ echo "cd \$PBS_O_WORKDIR && awk 'START{s=0}{s+=\$0}END{print s}' val*_out" | \
qsub -o sum_out -W $COND
2723.n0.physics.drexel.edu
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2722-1.n0 STDIN-1 sysadmin 0 R batch
2722-2.n0 STDIN-2 sysadmin 0 R batch
2722-3.n0 STDIN-3 sysadmin 0 R batch
2722-4.n0 STDIN-4 sysadmin 0 R batch
2722-5.n0 STDIN-5 sysadmin 0 R batch
2723.n0 STDIN sysadmin 0 H batch
$ cat sum_out
5
Note that you must create any files needed by the dependent jobs
*during* the early jobs. The dependent job may start as soon as the
early jobs finish, *before* the `stdin` and `stdout` files for some
early jobs have been written. Sadly, depending on either the returned
job ID or just its numeric portion doesn't seem to work.
It is important that the jobs on which you depend are loaded into the
server *before your depending job is submitted*. To ensure this, you
may need to add a reasonable sleep time between submitting your job
array and submitting your dependency. However, your depending job
will also hang if some early jobs have *already finished* by the time
you get around to submitting it. In practice, this is not much of a
problem, because your jobs will likely be running for at least a few
minutes, giving you a large window during which you can submit your
dependent job.
See the examples sections and `man qsub` for more details.
Querying
========
You can get information about currently running and queued jobs with
`qstat`. In the examples in the other sections, we've been using bare
`qstat`s to get information about the status of jobs in the queue.
You get information about a particular command with
$ JOBID=`echo "sleep 30 && echo 'Running a job...'" | qsub`
$ sleep 2 && qstat $JOBID
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2724.n0 STDIN sysadmin 0 R batch
and you can get detailed information on a every command (or a
particular one, see previous example) with the `-f` (full) option.
$ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
$ sleep 2
$ qstat -f
Job Id: 2725.n0.physics.drexel.edu
Job_Name = STDIN
Job_Owner = sysadmin@n0.physics.drexel.edu
job_state = R
queue = batch
server = n0.physics.drexel.edu
Checkpoint = u
ctime = Thu Jun 26 13:58:54 2008
Error_Path = n0.physics.drexel.edu:/home/sysadmin/STDIN.e2725
exec_host = n8/0
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Thu Jun 26 13:58:55 2008
Output_Path = n0.physics.drexel.edu:/home/sysadmin/STDIN.o2725
Priority = 0
qtime = Thu Jun 26 13:58:54 2008
Rerunable = True
Resource_List.nodect = 1
Resource_List.nodes = 1
Resource_List.walltime = 01:00:00
session_id = 18020
Variable_List = PBS_O_HOME=/home/sysadmin,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=sysadmin,
PBS_O_PATH=/home/sysadmin/bin:/usr/local/bin:/usr/local/sbin:/usr/bin
:/usr/sbin:/bin:/sbin:/usr/X11R6/bin:/usr/local/maui/bin:/home/sysadmi
n/script:/home/sysadmin/bin:.,PBS_O_MAIL=/var/mail/sysadmin,
PBS_O_SHELL=/bin/bash,PBS_SERVER=n0.physics.drexel.edu,
PBS_O_HOST=n0.physics.drexel.edu,
PBS_O_WORKDIR=/home/sysadmin/,
PBS_O_QUEUE=batch
etime = Thu Jun 26 13:58:54 2008
start_time = Thu Jun 26 13:58:55 2008
start_count = 1
The `qstat` command gives you lots of information about the current
state of a job, but to get a history you should use the `tracejob`
command.
$ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
$ sleep 2 && tracejob $JOBID
Job: 2726.n0.physics.drexel.edu
06/26/2008 13:58:57 S enqueuing into batch, state 1 hop 1
06/26/2008 13:58:57 S Job Queued at request of sysadmin@n0.physics.drexel.edu, owner = sysadmin@n0.physics.drexel.edu, job name = STDIN, queue = batch
06/26/2008 13:58:58 S Job Modified at request of root@n0.physics.drexel.edu
06/26/2008 13:58:58 S Job Run at request of root@n0.physics.drexel.edu
06/26/2008 13:58:58 S Job Modified at request of root@n0.physics.drexel.edu
You can also get the status of the queue itself by passing `-q` option to `qstat`
$ qstat -q
server: n0
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch -- -- -- -- 2 0 -- E R
----- -----
2 0
or the status of the server with the `-B` option.
$ qstat -B
Server Max Tot Que Run Hld Wat Trn Ext Status
---------------- --- --- --- --- --- --- --- --- ----------
n0.physics.drexe 0 2 0 2 0 0 0 0 Active
You can get information on the status of the various nodes with
`qnodes` (a symlink to `pbsnodes`). The output of `qnodes` is bulky
and not of public interest, so we will not reproduce it here. For
more details on flags you can pass to `qnodes`/`pbsnodes` see `man
pbsnodes`, but I haven't had any need for fancyness yet.
Altering and deleting jobs
==========================
Minor glitches in submitted jobs can be fixed by altering the job with `qalter`.
For example, incorrect dependencies may be causing a job to hold in the queue forever.
We can remove these invalid holds with
$ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub -W depend=afterok:3)
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2725.n0 STDIN sysadmin 0 R batch
2726.n0 STDIN sysadmin 0 R batch
2727.n0 STDIN sysadmin 0 H batch
$ qalter -h n $JOBID
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2725.n0 STDIN sysadmin 0 R batch
2726.n0 STDIN sysadmin 0 R batch
2727.n0 STDIN sysadmin 0 Q batch
`qalter` is a Swiss-army-knife command, since it can change many
aspects of a job. The specific hold-release case above could also
have been handled with the `qrls` command. There are a number of
other `q*` commands which provide detailed control over jobs and
queues, but I haven't had to use them yet.
If you decide a job is beyond repair, you can kill it with `qdel`.
For obvious reasons, you can only kill your own jobs, unless your an
administrator.
$ JOBID=$(echo "sleep 30 && echo 'Running a job...'" | qsub)
$ qdel $JOBID
$ echo "deleted $JOBID"
deleted 2728.n0.physics.drexel.edu
$ qstat
Job id Name User Time Use S Queue
----------------- ---------------- --------------- -------- - -----
2725.n0 STDIN sysadmin 0 R batch
2726.n0 STDIN sysadmin 0 R batch
2727.n0 STDIN sysadmin 0 R batch
Further reading
===============
I used to have a number of scripts and hacks put together to make it
easy to run my [[sawsim]] Monte Carlo simulations and setup dependent
jobs to process the results. This system was never particularly
elegant. Over time, I gained access to a number of SMP machines, as
well as my multi host cluster. In order to support more general
parallelization and post-processing, I put together a general manager
for embarassingly parallel jobs. There are implementations using a
range of parallelizing tools, from multi-threading through PBS and
MPI. See the [sawsim source][sawsim-manager] for details.
[sec.qsub]: #qsub
[sec.qstat]: #qstat
[sec.qalter]: #qalter
[sawsim-manager]: http://git.tremily.us/?p=sawsim.git;a=tree;f=pysawsim/manager;hb=HEAD