Slurm controller configuration

Slurm seems to be a hot topic, and a lot of people arrive here to check it out. So I feel like by demand I need to post about a little bit more.  To the beginners I’m going to save some search and send them to my first post Slurm on CentOS 7. Now that you know how to install it, and you (may) even have a basic partition system working, let’s discuss some aspects the slurm.conf file. For now one, please remember to format the file properly, since I just show you my notes, not working solutions.

The hard core of  slurm is the resource management. Our resources are the nodes. Let’s say we want to add 3 nodes, with different features, but consecutive names. The obvious approach is to list them. Like this.

# COMPUTE NODES
NodeName=node101 NodeAddr=192.168.1.1 CPUs=20 State=UNKNOWN
NodeName=node102 NodeAddr=192.168.1.2 CPUs=40 State=UNKNOWN
NodeName=node103 NodeAddr=192.168.1.3 CPUs=30 State=UNKNOWN
# PARTITION NAMES
PartitionName=test Nodes=node101,node102,node103 
Default=YES MaxTime=INFINITE State=UP

We can also declare them in a compact way.

# COMPUTE NODES
NodeName=node[101-103] NodeAddr=192.168.1.[1-3] State=UNKNOWN
# PARTITION NAMES
PartitionName=debug Nodes=node1[01-03] 
MaxTime=INFINITE State=UP

Both ways leave slurm to manage more or less freely the node resources. This ends up on a node-locking configuration. If you submit a job, in principle, it will go to one node, and run there. The node will be then marked as allocated, and no more jobs will be allowed to go to the machine. Event if there are free resources available! So what to do instead? I suppose you have some knowledge of your hardware, so the answer is to declare the resources per node. You can still use the compact notation. Let’s say we want to add now two more nodes nodea,nodeb with the same hardware, that includes two GPUs per server. Our slurm.conf configuration should look very similar to the next.

# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
DefMemPerCPU=8192
GresTypes=gpu

# COMPUTE NODES
NodeName=nodea,nodeb RealMemory=515000 Sockets=4 
CoresPerSocket=8 ThreadsPerCore=1 
Feature=gpu,opteron Weight=10 State=UNKNOWN Gres=gpu:TXP:2

# PARTITION NAMES
PartitionName=ALL Nodes=nodea,nodeb,node1[01-03]
Default=YES MaxTime=INFINITE State=UP

What is this gres thing? In order to declare and share GPU resources, we need to add a file to the slurm configuration on /etc/slurm/. This file is the gres.conf. It looks like this:

Name=gpu Type=TXP File=/dev/nvidia0
Name=gpu Type=TXP File=/dev/nvidia1

As you see, what you declare on the slurm.conf node definition is the “type” you have on the gres.conf. In our example, TXP=Titan XP. You can name it the way you like. After being sure the new slurm.conf is on our 5 nodes, and the gres.conf on our two nodes with GPUs, we restart all the daemons and the slurm controller.

Now we have a clear resource sharing, since slurm doesn’t have to figure out how to give you the resources. The problem is, if you launch a job, you need to specifically ask for the resources you want. For example, like this:

srun -n 40  –gres=gpu:TXP:2 –constraint=gpu –pty bash -login

The above command will give you a bash login on nodea and nodeb (the ones with Feature=gpu, that we put as a constraint) where you will have 40 cores and 2 GPUs. If another job is launched in this way, it should go to the pooled resources that are free at this moment, that is, the free CPUs from the 64 in total (4x8x2=64, therefore only 24) and the free GPUs. Note that, as pointed out on my Slurm user cheatsheet, you can also “cheat” the DefMemPerCPU on slurm.conf. On the above exaple, set up to 8 GB of RAM.

What else do I need to say about slurm now? I’ll tell you: how about making a safer install, where two servers run the controller? Yes, that will be a good one. And how about the job accounting features? It will be nice to know how to manage that also. So see you soon!

Advertisements

About bitsanddragons

A traveller, an IT professional and a casual writer
This entry was posted in bits, centos, hardware, linux, slurm. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s