Research Cluster (Plato)

Plato is a Linux-based, high performance computing (HPC) cluster designed to support your research projects. It can also be used for training highly qualified personnel on advanced research computing (ARC) techniques and applications. HPC cluster Plato has a total of 120 nodes, with an aggregate of 7.4 TB RAM and 2,000 CPU cores, producing of 64 theoretical TFLOPS. The nodes include 94 general purpose compute nodes, 2 GPU nodes, 2 large memory nodes and 22 CONTRIBUTED nodes. The contributed nodes are not available to the general users. The cluster uses SLURM  software for job scheduling and resource management; Bright Computing software for cluster management and runs RedHat Enterprise Linux (RHEL) 7.3 as operating system.

HPC cluster Plato is provided by the University ICT Academic and Research technologies (ART). Although, it is not part of ComputeCanada ARC infrastructure, Plato is very similar, in its principles, to ComputeCanada resources. Using CVMFS, HPC cluster Plato provides University's researchers access to ComputeCanada scientific software stack.  

Access to HPC cluster Plato is facilitated through the University ICT Authentication and Access Management  and is based on certain eligibility criteria. Computational resorces of HPC cluster plato, such as: CPU cores, RAM, shared storage space, GPU and othes, allocated on FairShare basis and is based on Accounting groups. Following are commonly asked questions and answers:     

Check out sections below for additional information: 

Specifications


  • 2 Dell PowerEdge R430 head nodes:
    • 2 x eight-core Intel Xeon processors
    • 31 GB RAM
    • 10 Gb Ethernet to USASK network
    • 10 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 450TB DATAStore (/datastore)
    • High availability
    • No user access

  • 2 Dell PowerEdge R430 login/interactive nodes (plato.usask.ca):
    • 2 x eight-core Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
    • 31 GB RAM
    • 10 Gb Ethernet to USASK network
    • 10 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 450TB DATAStore (/datastore)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca) 
    • High availability
    • Eligible users access

  • 30 HPE Proliant DL160 G8 compute nodes (plato101-124, plato201-202, plato301-304):
    • 2 x eight-core Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz
    • 31 GB RAM
    • 2 x 1 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 347 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)

  • 64 Dell PowerEdge C6220 II high density compute nodes (plato225-248, plato309-348):
    • 2 x eight-core Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz 
    • 31 GB RAM
    • 2 x 1 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 347 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)

  • 2 Dell PowerEdge C4130 GPU nodes (platogpu103-104):
    • 2 x eight-core Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
    • 31 GB RAM
    • 2 x 1 Gb Ethernet to Cluster network
    • 2 x NVIDIA K40 GPU
    • 18 TB NAS shared storage (/home)
    • 805 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)

  • 1 Dell PowerEdge R920 big memory node (platobmem502):
    • 4 x twelve-core Intel(R) Xeon(R) CPU E7-4850 v2 @ 2.30GHz
    • 2 TB RAM
    • 2 x 10 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 1.5 TB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)

  • 1 Dell PowerEdge R910 big memory node (platobem501):
    • 4 x eight-core Intel(R) Xeon(R) CPU E7- 8837  @ 2.67GHz
    • 840 GB RAM
    • 2 x 10 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 163 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)

  • 20 HPE ProLiant SL210t G8 CONTRIBUTED high density compute nodes (plato205-224):
    • 2 x eight-core Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz 
    • 31 GB RAM
    • 2 x 1 Gb Ethernet to Cluster network
    • 18 TB NAS shared storage (/home)
    • 347 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)
    • No public access

  • 2 Dell PowerEdge C4130 CONTRIBUTED GPU nodes (platogpu101-102):
    • 2 x sixteen-core Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
    • 250GB RAM
    • 2 x 1 Gb Ethernet to Cluster network
    • 2 x 50 Gb InfiniBand to MPI network 
    • 2 x NVIDIA K80 GPU
    • 18 TB NAS shared storage (/home)
    • 768 GB local storage drive (/local)
    • Software stack via CVMFS2 (/cvmfs/soft.computecanada.ca)
    • No public access

  • Total 120 nodes, 7.4 TB RAM, 2,000 CPU cores,  64 Dual precision Theoretical TFLOPS
  • 3 Dell N2048 1 Gb Ethernet Compute fabric switches
  • 3 dell N3048 1 Gb Ethernet Storage fabric switches
  • 2 dell S4048-ON 10/40 Gb Ethernet Core fabric switches  
  • RHEL 7.3 Operating System
  • CVMFS access to Computecanada scientific software stack
  • Local Custom scientific  software stack
  • Bright Computing Cluster management software 
  • 18 TB high I/O Dell FS8600 NAS shared storage
  • Local hard drives on compute nodes are used for local scratch

Available Software

Using CVMFS, HPC cluster Plato provides University's researchers access to ComputeCanada scientific software stack. In order to list all currently available software packages -- use module avail command. This command will display formatted information about software and modules required to run it. Another usefull command to list all software alphabetically is module spider | grep "^  [a-Z]*:".  Output will look like this:

abaqus: abaqus/6.14.1
abinit: abinit/8.2.2
abyss: abyss/1.5.2, abyss/1.9.0
admixture: admixture/1.3.0

...

wps: wps/3.8.0, wps/3.8.1
wrf: wrf/3.8.0, wrf/3.8.1
xcrysden: xcrysden/1.5.60
yaxt: yaxt/0.5.1

Current number of software packages on HPC Cluster Plato is 245.
This is calculated with command module spider | grep "^  [a-Z]*:" | wc -l

If software package has multiple versions of it available, for example abyss: abyss/1.5.2, abyss/1.9.0, in order to enable a specific version of the package environment needs to be set. Use module load abyss/1.5.2 command to make version 1.5.2 available, for example. If software has only one version available, such as yaxt: yaxt/0.5.1, it is sufficient to use only name of the software, as the only versuion is a default one: module load yaxt, for example.  

Some software packages require another ones in order to operate. Such dependancies are resolved automatically. However, there is a requirement that must be fulfilled by user -- selection of the compiler being used. HPC Cluster Plato currently has GNU CC and Intel CC compirer suites and in order to run any software, one or another compiler must be chosen by user. Default compiler on Plato is Intel CC v 2016.4 and all users have it loaded at login time automatically. So if software requires GCC, then environment should be changed by loading required module: module load gcc/<version> for specific versionor module load gcc for default version (5.4.0) To load Intel compiler use commands: module load intel/<version> or module load intel for default (2016.4). Because users may run different jobs during a single session, it is more flexible and, generally, recommended to load specific environments, including compiler selection, within submit scripts used for job submissions. 

Compilers

Default compiler on Plato is Intel CC v 2016.4 and all users have it loaded at login time automatically. So if software requires GNU CC, then environment should be changed by loading required module: module load gcc/<version> for specific versionor module load gcc for default version (5.4.0) To load Intel compiler use commands: module load intel/<version> or module load intel for default (2016.4). Note that the Intel compilers include the optimised Math Kernel Library with BLAS, LAPACK, ScaLAPACK, LINPACK and FFT libraries. This utility can help you determine the library linking procedure.

List of available versions:

GNU Intel CUDA MATLAB
  • gcc/4.8.5
  • gcc/4.9.4
  • gcc/5.4.0
  • gcc/6.4.0
  • gcc/7.3.0
  • intel/2014.6
  • intel/2016.4
  • intel/2017.1
  • intel/2017.5
  • intel/2018.3
  • gcccuda
  • mcr/R2013a
  • mcr/R2013b
  • mcr/R2014a
  • mcr/R2014b
  • mcr/R2015a
  • mcr/R2015b
  • mcr/R2016a
  • mcr/R2016b
  • mcr/R2017a
  • mcr/R2017b
  • mcr/R2018a

Set up Environment

In order to set up environment required to run different applications from Plato software Stack use "module" command, which controls all necessary operations with environmental modules, such as: load, unload, search, show information. These and othe "module" commands can be invoked interactivle in command line, added to user session by placing them as commands in ~/.bashrc or ~/.bash_profile files or including them to job submit script prior to job itself. Below is brief list of commonly used module commands:

module avail List all available modules  
module list List all currently loaded modules 
module spider <name> Search a module that matches <name> 
module load <name> Load module <name>
module unload <name> Unload module <name>
module show <name> Show internal commands of the module <name>

Access Policies

  1. ICT provides Plato for research use requiring High Performance Computing by University of Saskatchewan research groups working on U of S research projects.
  2. Groups eligible for Plato access are headed by a U of S faculty member, referred to as the Principle Investigator (PI) and include the PI and their associated students, staff and collaborators.
  3. Access to Plato is granted by ICT only on request of the PI, who manages membership of his or her group by requesting access for a list of students, staff and collaborators. The PI is responsible for the actions of the group's members on Plato, ensuring appropriate (e.g. related to U of S research projects and per university policies) work is done on Plato.
  4. There is currently no direct charge for the service to researchers.
  5. Each group is assigned an equal potential share of the system, regardless of number of members/accounts in a group. Scheduling occurs at the group level.
    1. Members of the same group are expected to coordinate their usage with each other. 
    2. Maximising a group’s actual usage in the “equal potential share” of the system requires regular submission of jobs over the long term (months and years). 
    3. Share of the system enforced by scheduling software; users shall use the job batching system, unless other arrangements have been made.
  6. Priority access for a faculty group or guaranteed allocations to Plato resources can be requested in service of faculty grant applications (and similar). For each request, ICT will determine the details of the request and make a recommendation to the HPC Advisory Committee.
  7. Non-traditional uses (e.g. custom images, web services for cluster nodes, etc.) are possible and feasibility is determined on a case-by-case basis by ICT.
  8. Limits on job parameters (e.g. maximum walltime or processor equivalents, limits to number of jobs submitted at a time, etc.) may be imposed to maintain the integrity of the system, optimize the use of the hardware and/or improve fairness of scheduling.
  9. Plato cannot by itself provide all needed HPC computational cycles for the entire university. Plato should be considered a stepping stone to shared resource usage (e.g. WestGrid/Compute Canada).
  10. For ICT to grant access to Plato for a PI and/or the members of his group, the PI must agree to:
    1. Provide a one paragraph abstract describing the research project:
      1. ICT intends to publish the abstracts on a webpage of ongoing research using Plato. ICT will ask the PI for permission (and approval of publication date, where needed) to publish the abstract.
    2. Report annually on research outcomes achieved using Plato:
      1. Provide (and allow ICT to publish on webpage or similar) a list of papers, theses and conference presentations published using results generated on Plato.
  11. Conditions of use:
    1. Runaway user jobs may be terminated with little or no warning to preserve the integrity of the system.
    2. Plato's uptime and network availability are on a best effort basis. Plato is not on emergency power or UPS.
    3. Users will arrange their own long term data storage. Some disk space is available for intermediate calculations and input data, but data on Plato in user directories is not backed up. ICT provides several data storage options. Users may be asked to remove data from Plato that is not immediately required as inputs for computation. Quotas may be enforced.
    4. There is no guarantee on availability of computational cycles to a PI Group.
    5. Development tools, parallel libraries, math libraries and compilers are provided on Plato; additional compiler/library purchase/installation requests will be handled on a case-by-case basis. ICT cannot commit in advance to the purchase or installation of all scientific libraries or software.
    6. Installation of scientific/research software, customization of Plato nodes and technical support is available on a best-effort basis from ICT.
    7. ICT/Plato administrators may make changes to the priorities of jobs, or request delay/termination of running jobs, to facilitate overall throughput.
Last modified on