|
Socrates is a cluster of 37 Sun Microsystem computers comprising 288 compute-cores on 36 compute nodes, and was installed in July 2009.
Socrates exists as a High Performance Computing (HPC) platform for teaching, training, and research for University of Saskatchewan faculty, staff, and students. |
1 head node: socrates.usask.ca
2 x Quad core Intel Xeon processors
8GB ECC RAM
1 TB of RAIDed storage
TORQUE/Moab scheduling software
8 capability nodes: compute-0-0 to compute-0-7
Sun Fire X4150
2 x Quad core Intel Xeon L5420 at 2.5GHz (8 cores)
32 GB ECC RAM
146 GB 10000 rpm SAS hard drive
1 Gb NIC
28 capacity nodes: compute-0-8 to compute-0-35
Sun Fire X2250
2 x Quad core Intel Xeon L5420 at 2.5GHz (8 cores)
8 GB ECC RAM
250 GB 7200 rpm SATA hard drive
1 Gb NIC
1 Gigabit Ethernet private network (48 port GigE switch)
RHEL 5.3 Linux/OSCAR clustering software
1 TB of RAIDed storage on head node
Local hard drives on nodes for scratch space
Accounts are created for classes and are removed after the classes have finished. Special accounts (not associated with a class) are available by request by faculty supervisor to hpc_consult@usask.ca, subject to approval based on appropriateness of use.
Socrates is on the UofS local network, and cannot be reached directly from off-campus (for security reasons). Users can connect from off-campus by using the UofS VPN. Alternatively a user can connect from off-campus by first connecting to homepage.usask.ca, then from there to Socrates via ssh.
ssh abc123@socrates.usask.ca
where abc123 is your NSID, and the password used on Socrates is your UofS NSID password.
Depending on the class, there may be class specific software, but most software will be located here:
/share/apps
Applications:
MATLAB - /usr/local/bin/matlab
Compilers:
/usr/bin/gcc /usr/bin/g77 /usr/bin/gfortran The Intel compiler suite (see below)
In order to configure your session to use the Intel compilers for "mpif90" and "mpicc", you should ensure that you do not have another MPI configuration loaded:
module unload rocks-openmpi
you must use the following command to load the correct configurations:
module load intel/xe12 intel/mpi_xe12
You will have to use the "initadd" command to use the correct libraries for running your programs in the batch system.
module initadd intel/xe12 intel/mpi_xe12
Note that the Intel compilers include the optimised Math Kernel Library with BLAS, LAPACK, ScaLAPACK, LINPACK and FFT libraries. This utility can help you determine the library linking procedure.
The "module" command controls which parallel processing environment is loaded at each shell invocation for the default installed versions of mpi on the cluster. If your class uses a different version, another module will apply. Unless you have a need for another environment, the OpenMPI environment is recommended for your use.
module initadd mpi_gnu
All jobs on socrates must be run through the scheduler to ensure that the system performs properly. No jobs should be run on the "head node", nor should jobs be started directly on the working nodes.
Acknowledgements to WestGrid, on which this TORQUE quickstart guide was based.
Batch job scripts are UNIX shell scripts-- text files of commands for the UNIX shell to interpret, similar to what you could execute by typing directly at a keyboard. They contain special comment lines that contain TORQUE directives. TORQUE evolved from software called PBS (Portable Batch System). Consequences of that history are that the TORQUE directive lines begin with #PBS, some environment variables contain "PBS" (such as $PBS_O_WORKDIR
in the script below) and the script files themselves typically have a .pbs suffix (although that is not required).
You should always specify the amount of RAM your job needs per node, otherwise the default amount is used. At the time of this writing, the default is 950MB - 950 megabytes. The script below has an example memory allocation.
Executables compiled with the OpenMPI API can be run in batch files simply with the command:
mpirun --hostfile $PBS_NODEFILE optionalcommands arguments
Here is an example job script, diffuse.pbs, for a job to run an OpenMPI program named diffuse. The command #PBS -l nodes=3:ppn=2 requests 2 processors on 3 computers - 6 cores in total. The adjacent command #PBS -l mem=2GB requests 2GB (2 gigabytes) of RAM be allocated for this job.
#/bin/sh
#Sample PBS Script for use with OpenMPI on Socrates
#Jason Hlady
# Begin PBS directives for defaults
# All torque (batching scheduler commands) begin with #PBS for historical reasons
# Default is for serial job: one processor on one node
# can override this with qsub at the command line, or alter it in the script
# in the form of nodes=X:ppn=Y
# X = number of computers : Y = number of processors per computer
#PBS -l nodes=3:ppn=2
#PBS -l mem=2GB
# There are other directives to control the maximum time your job will take.
# These are walltime and cput. Both use the format hours:minutes:seconds (hh:mm:ss)
# This would stop your job after 3 days: #PBS -l walltime 72:00:00
# This would stop your job after 200 cpu-hours (total): #PBS -l cput 200:00:00
# There are also maximum values that cannot be overridden.
# Job name which will show up in queue, job output
# Remove the second # from below line and rename job
##PBS -N
#Optional: join error and output into one stream
#PBS -j oe
#------------------------------------------------------
# Debugging section for PBS
echo "Node $PBS_NODEFILE :"
echo "---------------------"
cat $PBS_NODEFILE
echo "---------------------"
echo "Shell is $SHELL"
NUM_PROCS=`/bin/awk 'END {print NR}' $PBS_NODEFILE`
echo "Running on $NUM_PROCS processors."
echo "which mpirun = `which mpirun`"
#-------------------------------------------------------
### Run the application
# shows what node the app started on--useful for serial jobs
echo `hostname`
cd $PBS_O_WORKDIR
echo "Current working directory is `pwd`"
echo "Starting run at: `date`"
# change the program name from "./diffuse" in the below line between the hash marks
# to the name of your executable
############
mpirun --hostfile $PBS_NODEFILE ./diffuse
############
echo "Program finished with exit code $? at: `date`"
exit 0
To submit the script called diffuse.pbs to the batch job handling system, use the qsub command as below:
qsub diffuse.pbs
To check on the status of all the jobs on the system, type:
qstat
To limit the listing to show just the jobs associated with your user name, type:
qstat -u username
To delete a job, use the qdel command with the jobid assigned from qsub:
qdel jobid
Information and Communications Technology
Saskatoon, Saskatchewan
Canada
ICT Help Desk: 306-966-4817 or 1-800-966-4817
ICT Finance and Administration Office: 306-966-4866
Contact Us | Site Index | Provide Website Feedback
© U of S 1994 –
Disclaimer
Policies