How To Set Up SLURM on Linux

What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is a workload manager. It handles job scheduling, resource allocation, and node management in clusters.


1. Controller (Master) Node

1.1 Install SLURM

sudo pacman -S slurm-llnl

1.2 Create SLURM User

This slurm user will be used to maintain ownership of the files and directories used for SLURM.

sudo useradd -r -s /usr/bin/nologin slurm

1.3 Configure slurm.conf

This file must be identical on all nodes.

ClusterName=cluster
SlurmctldHost=archlinux

ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPort=6817
SlurmdPort=6818
SlurmUser=slurm
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/spool/slurmctld

TaskPlugin=task/affinity,task/cgroup

SchedulerType=sched/backfill
SelectType=select/cons_tres

SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log

NodeName=node1 NodeAddr=192.168.144.132 CPUs=2 RealMemory=3500 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN
NodeName=node2 NodeAddr=192.168.144.175 CPUs=2 RealMemory=3500 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN

PartitionName=debug Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP

Save this file to /etc/slurm-llnl/slurm.conf

1.4 Controller Directories

Now, to run SLURM smoothly, you need two directories.

  • /var/spool --> for saving live state + recovery
  • /var/log --> for history + debugging
sudo mkdir -p /var/log/slurm
sudo mkdir -p /var/spool/slurmctld
sudo touch /var/log/slurm/slurmctld.log
sudo chown -R slurm:slurm /var/spool/slurmctld /var/log/slurm
sudo chmod 750 /var/log/slurm /var/spool/slurmctld
sudo chmod 640 /var/log/slurm/*.log

1.5 Start slurmctld

sudo systemctl enable slurmctld
sudo systemctl start slurmctld

2. Compute Nodes

2.1 Copy Configuration

sudo cp /etc/slurm-llnl/slurm.conf /srv/nfsroot/etc/slurm-llnl/slurm.conf

2.2 Create compute node directories

sudo mkdir -p /srv/nfsroot/var/spool/slurmd
sudo mkdir -p /srv/nfsroot/var/log/slurm
sudo touch /srv/nfsroot/var/log/slurm/slurmd.log
sudo chown -R slurm:slurm /srv/nfsroot/var/spool/slurmd /srv/nfsroot/var/log/slurm
sudo chmod 750 /srv/nfsroot/var/spool/slurmd /srv/nfsroot/var/log/slurm
sudo chmod 640 /srv/nfsroot/var/log/slurm/*.log

2.3 Start slurmd

sudo systemctl enable slurmd
sudo systemctl start slurmd

2.4 Verify Cluster

sudo scontrol update NodeName=node[1-2] State=RESUME
sinfo
scontrol show nodes