CLUSTER UPGRADE to SLURM and Rocky 9.2

We are pleased to announce that an upgrade to the cluster is underway.  We have upgraded two compute nodes and invite you to try them out: compute-094 and compute-111. You can ssh to them directly from the current login nodes until we announce the availability of a new login node, which will be temporarily named login31 (and later renamed to jhpce01).

The most significant change will be the switch in schedulers from SGE (Sun Grid Engine) to SLURM (Simple Linux Utility for Resource Management).  The SGE codebase is not actively maintained, and the newest version is about 10 years old at this point. SLURM on the other hand is more widely used, with regular patches and updates made available.

SLURM and SGE are conceptually similar, with the notion of “jobs”, “nodes”, “partitions” (known as “queues” in SGE), and resource allocation for RAM and cores. However the commands and options between the two schedulers are different.  An orientation to using SLURM on the JHPCE cluster is available, and we will be providing training sessions for end users as we get closer to the cutover date. There are also documents and example code files in /jhpce/shared/jhpce/slurm on the test nodes.

We will also be upgrading the operating system from Centos Linux 7.9 to Rocky Linux 9.2. Both Centos and Rocky are built off of the same RedHat source code, and are binary compatible with Redhat Linux. 

We are standing up parts of the new cluster alongside the old, with the intention of moving more compute nodes over as we flesh out the new cluster’s capabilities.

We hope to finish in time for the resumption of school, but will press on if that deadline passes.

We will continue to use modules to manage the user environment with respect to different packages.    Because of the upgrade to the OS, current modules should be recompiled.  If you have helped build modules in the past, we would greatly appreciate your help doing so again. These are the new module directories and their content:

  /jhpce/shared/jhpce – the systems admin staff

  /jhpce/shared/community – you good folks

  /jhpce/shared/libd – Lieber Institute

Please use the bithelp mailing list for discussions about the new cluster – problems, solutions, requests.

Thank you for your interest and participation!

Jeffrey

This entry was posted in JHPCE Announcements. Bookmark the permalink.