[[!meta title="Notes on using and managing clusters"]] Using ----- Our cluster runs the open source [Torque][]/[Maui][] [[portable batch scheduling system|PBS]] (PBS). A batch scheduler takes user submitted jobs, and distributes them across the the cluster in an intelligent manner, so users don't need to worry about sharing resources fairly or [[ssh]]ing into compute nodes to start their jobs. Users submit jobs to the queue using `qsub`. I've compiled my own [[brief intro|pbs_queues]] to `qsub`, and there are lots more floating about the internet. While PBS queues are great for distributing embarassingly parallel jobs across the cluster, your application may need processes running on seperate compute nodes to share data. A common approach is to use the [[Message Passing Interface|MPI]] (MPI). Our cluster uses the [mpich2][] implementation. Cluster-aware applications written in MPI can be started through Torque using an [alternate mpiexec][mpiexec] from the [Ohio Supercomputer Center][OSC]. There is a nice, brief [introduction][] by Kristina Wanous at the [University of Northern Iowa][UNI]. Managing -------- Our cluster (9 dual-core nodes) runs [Debian][]. The compute nodes all boot to [[NFS roots|nfs_root]] off the server node. Once that hurdle was passed, setting up Torque, Maui, mpich2, and mpiexec was pretty simple, mostly the usual: wget ... tar ... configure ... make make install with a bit of configuring for our setup. I'll put up some more detailed notes and our config options when I get the time. [[!tag tags/linux]] [[!tag tags/hardware]] [Torque]: http://www.clusterresources.com/pages/products/torque-resource-manager.php [Maui]: http://www.clusterresources.com/pages/products/maui-cluster-scheduler.php [mpich2]: http://www.mcs.anl.gov/research/projects/mpich2/ [mpiexec]: http://www.osc.edu/~djohnson/mpiexec/ [OSC]: http://www.osc.edu/~djohnson/mpiexec/ [introduction]: http://debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality [UNI]: http://www.uni.edu/ [Debian]: http://www.debian.org/