Kohinoor 3

Introduction: Kohinoor 3 is the third HPC cluster in the Kohinoor 3 trilogy of clusters in TIFR – TCIS, Hyderabad, which is installed and operation from September 2016.
This cluster is composed of 69 nodes, in which one is a head node and all other are execution nodes. This cluster is a heterogeneous cluster which is composed of 64 CPU nodes and nodes and 4 GPU nodes with 4 Nos of Nvidia Tesla K40 per node. The cluster nodes are connected using Infiniband HBAs through a 6 Nos of 36 port completely non-blocking Mellanox FDR Interconnect infiniband (IB) switch. The cluster is managed by the open source batch scheduler “SLRUM” software for job scheduling and load balancing. The head-node allows user logins for job submission in cluster. The cluster has a locally attached Parallel File System (Open source Lustre) of 200 TB across the nodes through IB switch, which is used for computational runs and another 200 TB NAS is attached to the head node of the cluster for the purpose of archiving and post processing of data.

OEM – M/s. SuperMicro (Supplied and installed by Vendor M/s. Netweb Technologies, New Delhi)

Kohinoor 3 Overview

  1. Master node
    • 2 * Intel Broadwell 10C E5-2630V4 2.2 GHz 20M 8GT/s
    • 64 GB DDR4 2133MHz RAM.
    • 4 × 600 GB Enterprise Hard Disk SAS @ 10000 rpm with RAID 10
    • 1 X Mellanox FDR Infiniband port
  2. Compute nodes (CPU only) [64 Nos.]
    • 2 * Intel Broadwell 10C E5-2630V4 2.2 GHz 20M 8GT/s
    • 64 GB DDR4 2133MHz RAM.
    • 4 × 600 GB Enterprise Hard Disk SAS @ 10000 rpm with RAID 10
    • 1 X Mellanox FDR Infiniband port
  3. Compute Nodes (CPU with 4 Nos of Nvidia Tesla K40 GPUs per node) [4 Nos.]
    • 2 * Intel Broadwell 10C E5-2630V4 2.2 GHz 20M 8GT/s
    • 64 GB DDR4 2133MHz RAM.
    • 4 × 600 GB Enterprise Hard Disk SAS @ 10000 rpm with RAID 10
    • 1 X Mellanox FDR Infiniband port
  4. MDS and OSS Storage Nodes
    • 1 * Intel Haswell 6C E5-1650V3 3.5 GHz 20M 8GT/s
    • 128 GB DDR4 2133MHz RAM.
    • 2 × 80 GB Enterprise SATA SSD with RAID 1
    • Redundant power supply
    • 1 X Mellanox FDR Infiniband port
  5. Compute Storage
    • 90 bay 4U JBOD
    • 60 X 4 TB NL-SAS 7200 K RPM Drives configured in RAID 6 with 200 TB usable space
    • Filesystem – Intel Open Lustre Parallel File System
    • 1 X Mellanox FDR Infiniband interconnect
  6. Archival Storage
    • 36 Bay 4U storage server
    • 2 x Intel 2620V4 8C 2.1 Ghz Broadwell processor
    • 64 GB DDR4 2133MHz RAM
    • 2 x 480 GB Enterprise SATA SSD with RAID 1
    • 36 x 6 TB GB Enterprise SATA 7200 K RPM drives with RAID Z1
    • Filesystem – FreeNAS 9 with ZFS file system
    • 1 X Mellanox FDR Infiniband interconnect connected only to head node
  7. Networking & Interconnect
    • Primary compute nodes communication network is through a completely non-blocking interconnect of 6 Nos of 36 port Mellanox FDR IB switch
    • Secondary communication network for cluster management is through a 48 port Gigabit Ethernet switch
  8. System Software
    • Operating System – CentOS 7.2
    • Clustering tool – XCAT
    • Job Scheduler – SLRUM
  9. Libraries
    • GNU compiler collection
    • CUDA 8.0
    • MVAPICH 2.0
    • OpenMPI 2.0
  10. Application software/Libraries
    • LAMMPS, Gromacs, CuFFT, FFTW, MPI, GERRIS, Quantum Expresso, etc.

TCIS-Kohinoor 3 Cluster Document