Google+
ITS and UVa logos for printed output

Data Storage Solutions

The Hierarchical Storage Management (HSM) System

Table of Contents

Introduction

Storing large amounts of data in the Home Directory Service may cause a filesystem full condition, or the quota for the home directory to be exceeded. The traditional solutions are to buy more disk space, or to store the data on external media and retrieve it as needed. UVa provides the Hierarchical Storage Management (HSM) System as an alternative to these options.

UVa's Hierarchical Storage Management (HSM) system provides access to permanent storage for large amounts of data. The HSM is available to UVa faculty and staff. The HSM uses Quantum (formerly ADIC) StorNext filesystem software, with a large disk filesystem allocated on ITS's storage network, and Quantum (ADIC) i2000 automated tape libraries, containing LTO tape drives. The system uses robotics for automatic retrieval and storage of tapes. The HSM is administered by the ITS Systems & Storage group.

Files are copied to tape in a migration process which is invisible to the user. When an attempt is made to access a migrated file, it is automatically retrieved (recalled). The recall may take several minutes while a tape is mounted by the robot, the tape is positioned, and the file is copied to disk.

Please observe the following guidelines when using the Hierarchical Storage Manager:

  • The HSM should not be used to store large numbers of files with the expectation that they can be retrieved quickly. It can take a significant amount of time to recall a file and quite a long amount of time to recall many files; for example, recalling 1000 small files would take over a day. Rather than storing a large number of files, HSM users should combine them into a few archive files using a utility such as tar or cpio. The archive files should then be stored. When it is time to recall a collection of files, the archive files can be recalled relatively quickly, and the desired files extracted.
  • The HSM should not be used to store files that are used regularly. Otherwise, a recall operation may often be necessary.
  • Do not expect to be able to retrieve arbitrarily large amounts of data at one time. The shared HSM filesystem can handle hundreds of gigabytes of user data. If you need to manipulate an extremely large dataset, however, or if other users are utilizing a significant portion of the filesystem space at a given moment, you may need to devise a way to access your data in smaller denominations.

Backup of HSM Data

In order to preserve user data in case of hardware failure or accidental removal, files written into an HSM-managed filesystem are also written onto magnetic tape in two different tape libraries in two different buildings on Grounds. This operation occurs within a few minutes of the file system update.

The following is a typical scenario: From a remote computer, file F is created or modified within an HSM-managed filesystem. Within minutes, F is backed up offsite in two locations. Subsequently, if filesystem space is needed, F can be deleted by the HSM software from the disk location, leaving both copies intact in the tape libraries. This will only occur if the file F has not been modified in at least 6 days.

Requesting Space on the HSM

To get space on the HSM, email ITS Systems & Storage with the following information:

  • Your name and UVa Computing ID (e.g., mst3k)
  • Your email address
  • Your local phone number
  • Your UVa affiliation (Faculty, staff, graduate student)
  • The name of the system from which you wish to use the HSM directory

Graduate Students and Groups

If you are a graduate student or part of a research group it may be best to have one HSM directory—the project leader's or supervisor's—with each person being part of a UNIX group (MyGroups) that has read and write permissions to that directory. The supervisor can create a sub-directory for each of the group members and set read and write permissions for them to that directory using the UNIX chmod command. Each group member can also use chmod to change the permissions to their own files. For example the following will take away read privileges from the group and other:

chmod go-r filename

If someone leaves the group the project leader can still have access to their data.

To have a HSM directory configured this way email ITS Systems & Storage with the following details:

  • the name and UVa Computing ID (e.g., mst3k) of the faculty or staff member who will "own" the directory
  • the list of names and UVa Computing IDs of those who are to have access to the directory
  • the name of the system(s) from which each of the groups members wish to use the HSM directory

People can be added to and removed from the UNIX group using MyGroups.

Using the HSM

If you have been granted space on the HSM then you should have a directory named /net/hsm/user_id on an ITS-managed UNIX computer system. To change to this directory type:

cd /net/hsm/user_id

for example, if your user ID is mst3k use the command

cd /net/hsm/mst3k

In order to use the HSM from a non-ITS UNIX computer, you or your system administrator must first make a local directory and mount the HSM filesystem to that directory. After receiving confirmation that your computer has been added, become root, create the directory /net/hsm/mst3k, then mount the HSM to this directory with the command

mount -t nfs storage-fs2.itc.virginia.edu:/stornext/snfs3/hsm/mst3k

If you wish to do this every time the computer is booted, add the appropriate line to your /etc/fstab or equivalent file.

Getting and Using Files

To place a file into the HSM simply move or copy it into this directory. For example, to copy the file foo from /home/mst3k to the HSM directory

cp /home/mst3k/foo /net/hsm/mst3k/foo

Suppose the directory /home/mst3k/smallfiles contains various small files. The commands

cd /home/mst3k
tar cf /net/hsm/mst3k/tar.smallfiles smallfiles

combine the contents of the directory smallfiles including any subdirectories, into the single file tar.smallfiles which is stored in the HSM. The entire directory contents may later be retrieved with a single tape operation by accessing tar.smallfiles using commands such as

cd /home/mst3k/tmp
tar xvpf /net/hsm/mst3k/tar.smallfiles

When accessing a file that has been placed into the HSM, expect an initial delay of as much as a few minutes if the file has been migrated to tape. Often, however, the delay is only a few seconds long.

The HSM should not be used to store large temporary files, e.g. input files for programs; that is what /tmp, /bigtmp, and /longtmp are to be used for. Use of the HSM for temporary files can lead to substantial performance problems.

  Page Updated: Monday 2017-11-13 11:16:11 EST