Disk storage space on the JHPCE cluster

There are several types of storage on the JHPCE cluster. Some space is for permanent storage of files, and other spaces can be used for short term storage of data.

For long term storage of files, most users make use of their 100GB of space in their home directory. All users have a unique home directory /users/USERNAME which, by default, will only be visible to them. There are ways to share data in your home directory with others, using Unix groups or Access Control Lists (more info at https://jhpce.jhu.edu/knowledge-base/granting-permissions-using-acls ) , but by default, only the owner of the home directory will be able to use their home directory space. Home directories do get backed up, whereas other storage spaces on the cluster may not.

For those groups needing more storage space than their 100GB home directory, we have large storage arrays (over 10,000 TB of space in total), and we will sell allocations on these large arrays. We build a new storage array about every 18 months, so if you are interested in purchasing an allocation on our next storage build, please email us at bitsupport@lists.jhu.edu. We have additional information, include current charges for storage at https://jhpce.jhu.edu/policies/current-storage-offerings/

In addition to these long-term storage offerings, all users have access to 1TB of scratch space. Scratch space tends to be faster than the long-term project space mentioned above, so you may see a reduction in the run time needed for your programs by using scratch space. You will also avoid taking up precious space in your home directory or project storage space by using scratch space. Some commonly use cases for scratch space are:

  • Temporary or intermediary files. Oftentimes programs will generate intermediary files which are only used while the program is running, and then not needed after the program completes.
  • Data downloaded from an external source. If you download data from another institute or from a web site that you don’t need to keep, you should download this data to scratch space to avoid taking up space elsewhere.
  • Files that are read multiple times. If you are using a data file this is being read from multiple times by your program, you may see a speedup by first copying that file to scratch space, and then having your program read from the file in scratch.

We call our scratch space on JHPCE “fastscratch”. The fastscratch array provides 22TB of space that is built on faster SSD drives, vs traditional hard drives used for project space and home directories. All users have a 1TB quota for scratch space, and data older than 30 days is purged from scratch space. More details on using fastscratch can be found at https://jhpce.jhu.edu/knowledge-base/fastscratch-space-on-jhpce/

Traditionally in Unix/Linux, the /tmp (or /var/tmp) directories are used for storing temporary files. On the JHPCE cluster, the use of /tmp is strongly discouraged. The /tmp directory is smaller on the compute nodes than the /scratch directories, and can get filled up easily. If your application uses /tmp for temporary files, please use an option for your application which makes use of fastscratch.

When running R, the tmpdir() setting will dictate where temporary files are stored. When using sbatch to submit a job, the tmpdir() is set to the /tmp directory. This should be fine for most cases, but if you might be generating 10s of GB of temporary files, you may want to use “fastscratch”. When you srun into a compute node and then run R, tempdir() is set to /tmp, so you will likely want to change the R temp dir to use “fastscratch”.

In SAS, the default WORK directory will be located under your “fastscratch” directory.

In Stata, the default “tempfile” location is under /tmp. This can be changed by setting the STATATMP environment variable.