Slurm user cheatsheet

It’s time to compile the knowledge acquire after running our SLURM cluster for several months. I will dump here the information we have so far. I know it’s available everywhere, but in principle, I want to have my own version of it 🙂

What it does Example
List cluster load (jobs running, pending, completing) squeue
List all current jobs for a user: squeue -u username
List all running jobs for a user squeue -u username -t RUNNING
List all nodes, and its usage sinfo
List all nodes, and its usage, in detail sinfo -lN
List detailed information for a job (useful for troubleshooting) scontrol show jobid -dd JOBID
Log in interactively (minimum resources, no node specified) srun –pty bash -login
Log in interactively (no node specified, asking for GPU) srun –constraint=gpu –pty bash -login
Log in interactively on NODENAME (minimum resources) srun –nodelist=NODENAME –pty bash -login
Log in interactively on NODENAME, asking for 40 tasks srun -n 40 –nodelist=NODENAME –pty bash -login
Submit job.sh (default values) sbatch job.sh
Submit job.sh requesting 40 tasks sbatch –ntasks=40 job.sh
Submit job.sh to 4 nodes, not specified sbatch –nodes=4 job.sh
Submit job.sh to node NODENAME sbatch –nodelist=NODENAME job.sh
Submit job.sh asking for 30 GB of RAM per NODE sbatch –mem=30720 job.sh
Submit job.sh asking for 8 GB of RAM per CPU sbatch –mem-per-cpu=8192 job.sh
Submit job.sh asking for 30 GB of RAM per CPU sbatch –mem-per-cpu=30720 job.sh
Submit job.sh to a GPU node, requesting 2 Titan X Pascal sbatch –gres=gpu:TXP:2 –constraint=gpu job.sh
Cancel job with JOBID scancel JOBID

The commands to allocate resources MUST be put together. For exaple, for submission

 sbatch --ntask=9 --nodes=3 
--ntasks-per-node=3 --gres=gpu:TXP:2 --constraint=gpu job.sh

and for interactive login:

 srun -n 40 --nodelist=NODENAME 
--gres=gpu:TXP:2 --constraint=gpu --pty bash -login

Failing to do so will result on overbooking of resources, or a CRASH because of the lack of them.

The commands can also be included on your script, by using the preprocessor tag #SBATCH. For example:

#SBATCH --mem-per-cpu=2048

I don’t think at this stage you need a sample submission script. So I will not post it 🙂 but most of the ones you can find by googling will work, assuming you have all the software installed properly. In the future, I hope I can make a post about possible error message you may get, and what they mean… but so far, so good!

Advertisements

About bitsanddragons

A traveller, an IT professional and a casual writer
This entry was posted in bits, centos, linux, slurm. Bookmark the permalink.

One Response to Slurm user cheatsheet

  1. Pingback: Slurm controller configuration | Bits and Dragons

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s