Slurm batch scheduler allows users to associate to jobs optional fields in text format, such as the name of the job or a comment. These fields are not directly used by Slurm, their value has no influence on its behavior. They are available to store any information related to the jobs and can be manipulated through standard Slurm commands. For example, these fields can be useful for differentiating, filtering, classifying or annotating jobs.
We will discover in this post an approach to use the comment field to associate structured metadata to the jobs and enrich accounting reports with custom information.
The job comment can be defined when submitting jobs with --comment
argument:
$ sbatch --comment hello --wrap "sleep 60"
Submitted batch job 943
It is then visible with the scontrol show job $ID
command:
$ scontrol show job 943
JobId=943 JobName=wrap
…
Comment=hello
…
It is also possible to define the comment after the job is submitted with
scontrol update job $ID
command.
By default, the comment is saved by Slurm as long as the job is in scheduling
queue (pending or running). Once the job is finished, the comment is lost.
However, it is possible to configure Slurm to save the comment in SlurmDBD
accounting database, by enabling this setting in slurm.conf
configuration
file:
AccountingStoreFlags = job_comment
Job comments then become persistent after jobs end and they can be retrieved
with the standard Slurm accounting command sacct
.
The comment field can store any form of text, without constraint on its format. In particular, this text can be a serialized representation of a data structure, such as JSON.
Here is an example of a minimalist batch shell script comment.sh
that sets
a comment with an associative array containing a mesh key and a random integer
value at the end of its execution:
#!/bin/sh
# compute stuff here
MESH=$(shuf -i 0-1000 -n 1)
scontrol update job $SLURM_JOB_ID comment="{\"mesh\": ${MESH}}"
For example, by submitting two jobs with this batch script:
$ sbatch comment.sh
Submitted batch job 894
$ sbatch comment.sh
Submitted batch job 895
It is then possible to extract this metadata from the accounting database, by
coupling the output of the sacct
command to the jq
utility:
$ sacct --json | jq '.jobs | map({id: .job_id, user: .user, cores: .required.CPUs, meta: .comment.job|fromjson })'
[
{
"id": 894,
"user": "remi",
"cores": 1,
"meta": {
"mesh": 760
}
},1
{
"id": 895,
"user": "remi",
"cores": 1,
"meta": {
"mesh": 720
}
}
]
Here is a more advanced Python example script comment.py
that updates the
comment at the end of its execution with an associative array of 3 keys and
values of different types:
1#!/usr/bin/python3
2import signal
3import time
4import atexit
5import sys
6import os
7import subprocess
8import random
9import json
10
11def save_metadata():
12 """Save computation metadata in Slurm job's comment."""
13 job_id = os.getenv('SLURM_JOB_ID')
14 metadata = {
15 'mesh': random.randrange(0, 1000),
16 'complexity': random.random(),
17 'tag': random.choice(['choose', 'among', 'three']),
18 }
19 cmd = ['scontrol', 'update', 'job', job_id, f"comment={json.dumps(metadata)}"]
20 print(f"Saving metadata in Slurm job {job_id} comment field")
21 subprocess.run(cmd)
22
23
24def handle_timeout(signum, frame):
25 """Signal handler which stops the computation."""
26 signame = signal.Signals(signum).name
27 print(f"Signal {signame} ({signum}) received due to job timeout, saving "
28 "metadata and exiting properly")
29 sys.exit(0)
30
31
32def main():
33 # Bind SIGUSR1 sent by Slurm to notify of approaching job's timelimit
34 signal.signal(signal.SIGUSR1, handle_timeout)
35 # Register save_metadata() function to run just before exiting the program
36 atexit.register(save_metadata)
37
38 # Start fake computation for 5 minutes
39 print("Starting computation")
40 time.sleep(300.) # simulating long interruptible computation
41
42
43if name == '__main__':
44 main()
The script registers the save_metadata()
function (l11) with the atexit
module (l36) to properly handle error cases and interruption by Slurm,
typically in case of reached time limit or preemption.
This script has an approximate execution time of 5 minutes. It is submitted to
Slurm a first time with a 10 minutes limit to let it enough time to end
normally, and a second time with a 3 minutes limit with instruction to Slurm to
send a SIGUSR1
signal 60 seconds before its termination:
$ sbatch --time 10 --wrap "srun python3 -u comment.py"
Submitted batch job 10773
$ sbatch --time 3 --signal USR1@60 --wrap "srun python3 -u comment.py"
Submitted batch job 10774
Here are the job outputs in both cases:
$ cat slurm-10773.out
Starting computation
Saving metadata in Slurm job 10773 comment field
$ cat slurm-10774.out
Starting computation
Signal SIGUSR1 (10) received due to job timeout, saving metadata and exiting properly
Saving metadata in Slurm job 10774 comment field
In the first case where the job ended normally, the save_metadata()
function
was executed at the end of the job. In the second case where the job had was
interrupted because of its time limit, the script received SIGUSR1
signal,
executed handle_timeout()
signal handler which caused the script to stop and
eventually triggered execution of save_metadata()
function.
The generated metadata can then be extracted from Slurm accounting database with this command:
$ sacct --json | jq '.jobs | map({id: .job_id, user: .user, cores: .required.CPUs, meta: .comment.job|fromjson })'
[
{
"id": 10773,
"user": "remi",
"cores": 1,
"meta": {
"mesh": 376,
"complexity": 0.2924316126744422,
"tag": "among"
}
},
{
"id": 10774,
"user": "remi",
"cores": 1,
"meta": {
"mesh": 157,
"complexity": 0.7724739043178511,
"tag": "three"
}
}
]
This feature can be useful to associate various metadata to Slurm jobs, typically to generate additional and custom metrics in clusters accounting reports.