Zeno, named for Zeno of Citium - a Greek philosopher, is a cluster with GPU accelerators and high speed (InfiniBand) connections between nodes. Like the other HPC training resources, this is intended for educating people on advanced computing. It can be used for limited research computations, if there is no training ongoing.
There are 8 compute nodes, each with 12 processing cores, 24GB of RAM, and a Tesla M2075 GPU processor.
The InifiniBand network and the GPU processors provide very fast processing. To take advantage of the processing power of the GPUs, a program must be compiled with the CUDA libraries. The CUDA 4.2 environment is currently available on Zeno, including OpenCL.
Zeno is intended to provide training for larger systems like parallel.westgrid.ca, WestGrid's GPU cluster.
Connections are made to Zeno via the ssh protocol.
The "module" command controls which environments are loaded at each shell invocation for the default installed versions of mpi on the cluster.
Modules available:
Use the "initadd" command to add the appropriate modules. The following is recommended:
module initadd openmpi/1.6 nvidia/cuda
Cluster jobs are submitted the same way as on Socrates. An important exception is that if your submission requires a GPU, you need to request the whole node to reserve the GPU processor. (This is done by requesting all of the processors on a single node: qsub -W x="NACCESSPOLICY:SINGLEJOB" myjob.pbs )
Do not request more than 2 nodes without prior approval from the system administrators.
Here is an example job script, diffuse.pbs, for a job to run an OpenMPI program named diffuse. The command #PBS -l nodes=3:ppn=2 requests 2 processors on 3 computers - 6 cores in total. The adjacent command #PBS -l mem=2GB requests 2GB (2 gigabytes) of RAM be allocated for this job.
#/bin/sh
#Sample PBS Script for use with OpenMPI on Socrates
#Jason Hlady
# Begin PBS directives for defaults
# All torque (batching scheduler commands) begin with #PBS for historical reasons
# Default is for serial job: one processor on one node
# can override this with qsub at the command line, or alter it in the script
# in the form of nodes=X:ppn=Y
# X = number of computers : Y = number of processors per computer
#PBS -l nodes=3:ppn=2
#PBS -l mem=2GB
# There are other directives to control the maximum time your job will take.
# These are walltime and cput. Both use the format hours:minutes:seconds (hh:mm:ss)
# This would stop your job after 3 days: #PBS -l walltime 72:00:00
# This would stop your job after 200 cpu-hours (total): #PBS -l cput 200:00:00
# There are also maximum values that cannot be overridden.
# Job name which will show up in queue, job output
# Remove the second # from below line and rename job
##PBS -N
#Optional: join error and output into one stream
#PBS -j oe
#------------------------------------------------------
# Debugging section for PBS
echo "Node $PBS_NODEFILE :"
echo "---------------------"
cat $PBS_NODEFILE
echo "---------------------"
echo "Shell is $SHELL"
NUM_PROCS=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $NUM_PROCS processors."
echo "which mpirun = `which mpirun`"
#-------------------------------------------------------
### Run the application
# shows what node the app started on--useful for serial jobs
echo `hostname`
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Starting run at: `date`"
# change the program name from "./diffuse" in the below line between the hash marks
# to the name of your executable
############
mpirun --hostfile $PBS_NODEFILE ./diffuse
############
echo "Program finished with exit code $? at: `date`"
exit 0
To submit the script called diffuse.pbs to the batch job handling system, use the qsub command as below:
qsub diffuse.pbs
To check on the status of all the jobs on the system, type:
qstat
To limit the listing to show just the jobs associated with your user name, type:
qstat -u username
To delete a job, use the qdel command with the jobid assigned from qsub:
qdel jobid
Information and Communications Technology
Saskatoon, Saskatchewan
Canada
ICT Help Desk: 306-966-4817 or 1-800-966-4817
ICT Finance and Administration Office: 306-966-4866
Contact Us | Site Index | Provide Website Feedback
© U of S 1994 –
Disclaimer
Policies