Data Storage Solutions
The Hierarchical Storage Management (HSM) System
Table of Contents
Introduction
Storing large amounts of data in the Home Directory Service may cause a filesystem full condition, or the quota for the home directory to be exceeded. The traditional solutions are to buy more disk space, or to store the data on external media and retrieve it as needed. UVa provides the Hierarchical Storage Management (HSM) System as an alternative to these options.
UVa's Hierarchical Storage Management (HSM) system provides access to permanent storage for large amounts of data. The HSM is available to UVa faculty and staff. The HSM uses Quantum (formerly ADIC) StorNext filesystem software, with a large disk filesystem allocated on ITS's storage network, and Quantum (ADIC) i2000 automated tape libraries, containing LTO tape drives. The system uses robotics for automatic retrieval and storage of tapes. The HSM is administered by the ITS UNIX Systems group.
Files are copied to tape in a migration process which is invisible to the user. When an attempt is made to access a migrated file, it is automatically retrieved (recalled). The recall may take several minutes while a tape is mounted by the robot, the tape is positioned, and the file is copied to disk.
Please observe the following guidelines when using the Hierarchical Storage Manager:
- The HSM should not be used to store large numbers of files with the expectation that they can be retrieved quickly. It can take a significant amount of time to recall a file and quite a long amount of time to recall many files; for example, recalling 1000 small files would take over a day. Rather than storing a large number of files, HSM users should combine them into a few archive files using a utility such as tar or cpio. The archive files should then be stored. When it is time to recall a collection of files, the archive files can be recalled relatively quickly, and the desired files extracted.
- The HSM should not be used to store files that are used regularly. Otherwise, a recall operation may often be necessary.
- Do not expect to be able to retrieve arbitrarily large amounts of data at one time. The shared HSM filesystem can handle hundreds of gigabytes of user data. If you need to manipulate an extremely large dataset, however, or if other users are utilizing a significant portion of the filesystem space at a given moment, you may need to devise a way to access your data in smaller denominations.
Backup of HSM Data
In order to preserve user data in case of hardware failure or accidental removal, files written into an HSM-managed filesystem are also written onto magnetic tape in two different tape libraries in two different buildings on Grounds. This operation occurs within a few minutes of the file system update.
The following is a typical scenario: From a remote computer, file F is created or modified within an HSM-managed filesystem. Within minutes, F is backed up offsite in two locations. Subsequently, if filesystem space is needed, F can be deleted by the HSM software from the disk location, leaving both copies intact in the tape libraries. This will only occur if the file F has not been modified in at least 6 days.
Requesting Space on the HSM
To get space on the HSM, email ITS UNIX Systems with the following information:
- Your name and UVa Computing ID (e.g., mst3k)
- Your e-mail address
- Your local phone number
- Your UVa affiliation (Faculty, staff, graduate student)
- The name of the system from which you wish to use the HSM directory, (e.g., blue.unix or a departmental UNIX workstation)
Graduate Students and Groups
If you are a graduate student or part of a research group it may be best to have one HSM directory—the project leader's or supervisor's—with each person being part of a UNIX group (MyGroups) that has read and write permissions to that directory. The supervisor can create a sub-directory for each of the group members and set read and write permissions for them to that directory using the UNIX chmod command. Each group member can also use chmod to change the permissions to their own files. For example the following will take away read privileges from the group and other:
chmod go-r filename
If someone leaves the group the project leader can still have access to their data.
To have a HSM directory configured this way email ITS UNIX Systems with the following details:
- the name and UVa Computing ID (e.g., mst3k) of the faculty or staff member who will "own" the directory
- the list of names and UVa Computing IDs of those who are to have access to the directory.
- the name of the system(s) from which each of the groups members wish to use the HSM directory, (e.g., blue.unix or a departmental UNIX workstation
People can be added to and removed from the UNIX group using MyGroups.
Using the HSM
If you have been granted space on the HSM then you should have a directory named /net/hsm/user_id on an ITS-managed UNIX computer system. To change to this directory type:
cd /net/hsm/user_id
for example, if your user ID is mst3k use the command
cd /net/hsm/mst3k
In order to use the HSM from a non-ITS UNIX computer, you or your system administrator must first make a local directory and mount the HSM filesystem to that directory. After receiving confirmation that your computer has been added, become root, create the directory /net/hsm/mst3k, then mount the HSM to this directory with the command
mount -t nfs storage-fs2.itc.virginia.edu:/stornext/snfs3/hsm/mst3k
If you wish to do this every time the computer is booted, add the appropriate line to your /etc/fstab or equivalent file.
Getting and Using Files
To place a file into the HSM simply move or copy it into this directory. For example, to copy the file foo from /home/mst3k to the HSM directory
cp /home/mst3k/foo /net/hsm/mst3k/foo
Suppose the directory /home/mst3k/smallfiles contains various small files. The commands
cd /home/mst3k tar cf /net/hsm/mst3k/tar.smallfiles smallfiles
combine the contents of the directory smallfiles including any subdirectories, into the single file tar.smallfiles which is stored in the HSM. The entire directory contents may later be retrieved with a single tape operation by accessing tar.smallfiles using commands such as
cd /home/mst3k/tmp tar xvpf /net/hsm/mst3k/tar.smallfiles
When accessing a file that has been placed into the HSM, expect an initial delay of as much as a few minutes if the file has been migrated to tape. Often, however, the delay is only a few seconds long.
The HSM should not be used to store large temporary files, e.g. input files for programs; that is what /tmp, /bigtmp, and /longtmp are to be used for. Use of the HSM for temporary files can lead to substantial performance problems.