The Storage for Science (S4S) initiative is the result of the IT effort to provide UMassMed with a flexible, scalable, and affordable option to meet the constantly growing need for storage in our research community.
The S4S architecture stores data across multiple physical locations allowing it to be written and used simultaneously at the Worcester Campus and from the GHPCC data center in Holyoke, eliminating the need for users to perform lengthy manual file transfers between these locations.
PIs and Labs can purchase S4S space for 15 cents per GB/per year.
Storage for Science Best Practices
Because Storage for Science is a shared service, improper or suboptimal use may potentially affect other users. In the event that one or more computers are making use of S4S in a way that impacts its availability, access from those systems may be halted in the process of diagnosing and resolving the issue.
To reduce the chances of performance issues with Storage for Science, please follow the following best practices when using S4S storage:
Storage for Science is intended to be a deep repository for data and is not designed as high-performance space for running computations. If you intend to make numerous changes to a file, please consider making a copy to other storage (such as local disk on the machine making the changes) and later copy the final versions back to S4S for retention.
Writing to S4S puts substantially more load on the system than reading from it; as such, please only run a single copy or move session at a time. Multiple processes or machines writing to the same directory can cause dramatic reductions in performance.
- Never write to the same file on S4S from multiple machines at the same time.
- Reading from S4S from multiple processes or machines simultaneously is an expected and perfectly acceptable use.
- Storage for Science is designed to allow access to data from multiple locations (such as Holyoke and Worcester) and synchronizing data across that distance does take time. Please be prepared for newly written files and changes to potentially take a few minutes to be visible at remote sites.
- Whenever possible combine small files into a single larger archive file before writing to Storage for Science. Use of tools such as zip and tar are highly encouraged, especially for files smaller than one megabyte that are not stored as efficiently by S4S.
- Aim to have less than 10,000 subdirectories in a single directory.
- Keep less than 100,000 files in a single directory.
If you intend to make Storage for Science part of your research pipeline, please contact Research Computing (firstname.lastname@example.org) so we can work with you to ensure the best results.
Purchasing Storage for Science