Running Jobs on Plato
All jobs on Plato must be run through the Slurm scheduler to ensure that the system performs properly. No jobs should be run on the "head node", nor should jobs be started directly on the working nodes.
Batch job scripts are UNIX shell scripts-- text files of commands for the UNIX shell to interpret, similar to what you could execute by typing directly at a keyboard. They contain special comment lines that contain Slurm directives.
You should always specify the amount of RAM your job needs per node; otherwise, the default amount is used. At the time of this writing, the default is 1940 MB. The script below has an example memory allocation.
Executables compiled with the OpenMPI API can be run in batch files simply with the srun command.
PBS equivalent or similar |
SLURM directive |
Purpose |
Example |
-l walltime=hhh:mm:ss |
--time=hhh:mm:ss |
Requests the maximum amount of time that the job can be running. The job will be aborted when this time is reached. If no time is specified, a default value will be assigned (currently 20min). |
#SBATCH –-time=15:20:00 requests 15 hours and 20 minutes |
–l nodes=N:ppn=n |
--nodes=N --tasks-per-node=n |
Requests the number of nodes and the number of tasks/processes per node that will be allocated to the job. By default SLURM will allocate 1 core per task, unless --cpus-per-task is specified. |
#SBATCH --nodes=10 #SBATCH --tasks-per-node=4 requests 10 nodes with 4 cores each. |
N/A |
--nodes=N --tasks-per-node=n --cpus-per-task=c |
Requests the number of nodes, the number of tasks/processes per node, and the number of cores per task that will be allocated to the job. This job will be allowed to start n processes on each node, but each process will be able to run on c different cores (with c threads for example). |
#SBATCH --nodes=10 #SBATCH --tasks-per-node=4 #SBATCH --cpus-per-task=2 requests 10 nodes with 4 tasks/processes each. Each process could use 2 cores. (this job will be allocated with 80 cores in total) |
nodes=N:ppn=n:gpus=g |
--nodes=N --tasks-per-node=n --gres=gpu:g |
Requests the number of nodes, the number of cores per node, and the number of GPUs per node that will be allocated to the job. |
#SBATCH --nodes=2 #SBATCH --gres=gpu:1 #SBATCH --tasks-per-node=8 requests 2 nodes with 1 GPU and 8 cores each |
-l procs=X |
--ntasks=X |
Requests the number of tasks/processes that will be allocated to the job in an unspecified number of nodes (SLURM could allocate the cores in any number of nodes from 1 to X so that the job starts as soon as possible. |
#SBATCH --ntasks=100 requests 100 cores that will be distributed across the cluster |
–l pmem=m |
--mem-per-cpu=m |
Requests the maximum amount of memory in megabytes that will be assigned to any core in the job. m must be an integer. It can also be requested in gigabytes adding G at the end. Default value is currently 1940 MB. |
#SBATCH --mem-per-cpu=4G requests 4 gigabytes per core |
-l mem=M |
N/A |
|
|
N/A |
--mem=M |
Requests the maximum amount of memory in megabytes that must be allocated per node. M must be an integer. It can also be requested in gigabytes adding G at the end. |
#SBATCH --mem=18G requests 18 gigabytes of memory on each node allocated to the job |
-N <jobname> |
--job-name=jobname |
Sets the name that your job will have for the scheduler system. This name will be showed, for example, when using qstat or squeue. Default name of the job is the name of the submission script. |
#SBATCH --job-name=exp1 sets exp1 as the name of the job |
-o <outputname> |
--output=outputname |
Specifies the name of the file where SLURM will direct the standard output and standard error of the job. Default name of output log file is slurm-<jobID>.out |
#SBATCH --output=results1 sets results1 as the name of the output log file |
-j oe |
N/A |
In SLURM, the standard output and error are joined in the same file by default. This can be modified if --error is specified |
|
N/A |
--error=errorname |
Specifies the name of the file where SLURM will direct the error output of the job. Default name is slurm-<jobID>.out |
#SBATCH --error=errors1 sets errors1 as the name of the error log file |
PBS_O_WORKDIR |
SLURM_SUBMIT_DIR |
This environment variable contains the directory from which the job was submitted. By default, a job will start running on this directory, so there is no need anymore to cd $SLURM_SUBMIT_DIR Note that PBS_O_WORKDIR is no longer defined and can cause problems if you still use it. |
|
PBS_JOBID |
SLURM_JOB_ID |
This environment variable contains the ID of the job allocation |
|
qsub <my_script> |
sbatch <my_script> |
Submits the job described by the batch script my_script |
sbatch experiment1 submits script experiment1 |
mpirun <my_mpi_code> |
srun <my_mpi_code> |
srun is the native launcher of parallel processes in slurm. We strongly recommend to use srun instead of mpirun or mpiexec. Note that there is no need to specify the number of processes, this info will be obtained from the job request. |
#SBATCH --nstaks=25 #SBATCH --cpus-per-task=3 srun vasp will start 25 vasp processes across the cluster, and allows each process to fork into 3 parallel threads |
pbsdsh –v <my_script> pbsdsh2 –v <my_script> |
srun <my_script> |
srun can also run multiple copies of a script or binary in parallel. Use the environment variable SLURM_PROCID (the equivalent of PBS_VNODENUM) to know the number of process assigned to each copy |
#SBATCH --nodes=2 #SBATCH --tasks-per-node=8 srun exp1 will run 16 copies of the script exp1 (8 instances on each node). The variable SLURM_PROCID will take values 0,1,2…15 |
qstat |
squeue |
Show a list of all jobs in the queues. With squeue the jobs are ordered by priority, not by order of submission as with qstat.
|
squeue
|
qstat –f <jobID> |
scontrol show job <jobID> scontrol show job <jobID> -dd |
List detailed information of the job with identification number jobID |
scontrol show job 321 obtains detailed information of job 321 |
qstat –u <nsid> |
squeue –u <nsid> |
List all jobs of user nsid |
squeue –u abc123 lists all jobs of user abc123 in the queues |
showbf showq |
N/A |
There are no direct equivalence to these commands in SLURM, but rather different options to obtain the same information. Some of these options are described below |
|
|
sprio |
Shows the priority of queued jobs and the different factors that compose it. |
sprio |
|
sshare |
Shows the usage of an user and its group. It also shows the fairshare value that will be used to calculate the fairshare component of the priority of jobs submitted by the user. |
sshare |
|
sinfo --states=idle |
Shows a list of all idle nodes per queue |
sinfo --states=idle |
|
squeue --start |
Shows the list of all jobs queued with its estimated start times |
squeue --start –u abc123 lists all queued jobs of user abc123 with its estimated start times |
|
sbatch --test-only <my_script> |
Shows the estimated start time of the job described in my_script without submitting it. |
sbatch --test-only experiment1 determines the estimated start time for the job described in experiment1 as if it were submitted now. The job is never submitted. |
|
scancel <jobID> |
Cancel/delete a job with identification number jobID |
scancel 321 cancel the job 321 |
|
scancel –u <nsid> |
Cancel/delete all jobs of user nsid |
scancel –u abc123 |
Here is an example of a job script, diffuse.slurm, to run an OpenMPI program named diffuse.
#!/bin/bash
# Sample Slurm Script for use with OpenMPI on Plato
# Begin Slurm directives with #SBATCH
#SBATCH --nodes=3
#SBATCH --tasks-per-node=8
#SBATCH --mem=2G
#SBATCH --time=8:00:00
#SBATCH --job-name=diffuse_example
echo "Starting run at: `date`"
srun ./diffuse
echo "Program finished with exit code $? at: `date`"
exit 0
The directives
#SBATCH --nodes=3
#SBATCH --tasks-per-node=2
requests 8 processors/cores on 3 computers/nodes: 24 cores in total. The adjacent directive
#SBATCH --mem=2G
requests 2GB (2 gigabytes) of RAM per node for this job. The maximum time for this job is requested with
#SBATCH --time=8:00:00
The name that the job will have is specified with
#SBATCH --job-name=diffuse_example
srun is used to distribute the openMPI application, diffuse, across the allocated resources. To submit the script use the sbatch command as below:
sbatch diffuse.slurm
To check on the status of all the jobs on the system, type:
squeue
To limit the listing to show just the jobs associated with your user name, type:
squeue -u username
To delete a job, use the scancel command with the jobid assigned by Slurm:
scancel jobid