SLURM refers to the "Simple Linux Utility for Resource Management" and is a job manager for high performance computing.
It's sometimes easiest to get started by trying out some commands.
What Jobs are currently running?
qstat
What Jobs am I currently running?
qstat -u username
Launch an interactive session on one node with 16 cores:
qrsh -pe omp 16
Launch a batch job one node with 16 cores:
qsub -pe omp 16 script.sh``
Cancel a batch job
qdel -j jobID
Cancel all my jobs
qdel -u username
Here are a few more detailed examples
Example batch file with directives that reserve one node in the default queue, with 16 cores and exclusive use of the node:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 16
#SBATCH --time=1:00:00
#SBATCH --exclusive
<<shell commands that set up and run the job>>
The following tools are useful for interacting or otherwise using SLURM.
- JobMaker is a small interface that a center can deploy, customized to their slurm.conf. Since the slurm.conf is readable by all nodes, a user can generate the data for the tool equivalently.
- JobStats makes it easy for users to see status of jobs, and what resources were actually utilized from those requested.
- doppler is a complementary web application to jobstats that shows users, and account job efficiency/resource wastage.
- smanage is a tool developed out of Harvard to help with management of job arrays.