Running Jobs on Plato

All jobs on Plato must be run through the SLURM scheduler to ensure that the system performs properly. No jobs should be run on the "head node", nor should jobs be started directly on the working nodesBatch job scripts are UNIX shell scripts-- text files of commands for the UNIX shell to interpret, similar to what you could execute by typing directly at a keyboard. They contain special comment lines that contain SLURM directives.

You should always specify the amount of RAM your job needs per node; otherwise, the default amount is used. At the time of this writing, the default is 1940 MB. The script below has an example memory allocation. Executables compiled with the OpenMPI API can be run in batch files simply with the srun command. Please have a look at SLURM Quick Reference page for commonly used commands and directives.

Here is an example of a job script, diffuse.slurm, to run an OpenMPI program named diffuse:


#!/bin/bash
# Sample Slurm Script for use with OpenMPI on Plato
# Begin Slurm directives with #SBATCH

#SBATCH --nodes=3

#SBATCH --tasks-per-node=8

#SBATCH --mem=2G 

#SBATCH --time=8:00:00
#SBATCH --job-name=diffuse_example

echo "Starting run at: `date`"
srun ./diffuse
echo "Program finished with exit code $? at: `date`"
exit 0


 

The directives

 

#SBATCH --nodes=3

#SBATCH --tasks-per-node=8

requests 8 processors/cores on 3 computers/nodes: 24 cores in total. The adjacent directive 

#SBATCH --mem=2G 

requests 2GB (2 gigabytes) of RAM per node for this job. The maximum time for this job is requested with

#SBATCH --time=8:00:00

The name that the job will have is specified with

 

#SBATCH --job-name=diffuse_example

 

srun is used to distribute the openMPI application, diffuse, across the allocated resources. To submit the script use the sbatch command: sbatch diffuse.slurm 

To check on the status of all the jobs on the system, type: squeue

To limit the listing to show just the jobs associated with your user name, type: squeue -u <NSID>

To delete a job, use the scancel command with the jobid assigned by Slurm: scancel <jobid>

Please check SLURM Documentation at https://slurm.schedmd.com/documentation.html  for more commands and their usage.
Last modified on