The atomic unit of data that is managed by the JSOC storage system is called a storage unit. The JSOC storage system is therefore denoted Storage Unit Management System (SUMS). Each storage unit contains the data segment part of datarecords from a single dataseries, and corresponds to the contents of a single directory, [possibly with subdirectories for each datarecord]. A storage unit index (denoted DSIndex for historical reasons) is stored with each data record and identifies the storage unit holding the data segments for the record.
A storage unit may be stored online on magnetic disk, offline e.g. on a magnetic tape in a cabinet, or nearline on a tape in a robotic tape library. (The particular storage media is not important to the concept). In response to a user's request to access a particular datarecord the JSOC catalog will identify the storage unit containing that datarecord by looking up its DSIndex. The DSIndex is an index into the SUMS internal catalog which tracks the location of each storage unit. If the requested storage unit is not online the SUMS will allocate storage space, name a directory, and copy the storage unit into that directory. The SUMS will report the working directory pathname to the JSOC catalog where it is accessible to the user. All storage units are owned and managed by the SUMS.
Storage unit are ``write-once'' objects and clients of SUMS can only perform two operations on them: 1) open existing unit as read-only, 2) create new unit. Deletion or modification of storage units will be restricted to SUMS administrative programs and will require special privileges.
The datarecords from a particular dataseries will in general be stored
in many storage units. The default size of a unit is specified when a
series is created. It should be chosen based on knowledge of the size
of the data records and how they are likely to be computed, such that
a storage unit corresponds to the output of a ``natural'' processing
batch and/or is a convenient size to handle for data export and
efficient transfer to tape (the tape archival service can probably
further bundle multiple units, if available, together in a single
tar
command to gain further efficiency).
One of the goals of the JSOC data model is allowing user to make a new version of a data record when the value of one or more keywords change without having to copy the large files making up the data segments. This is accomplished by allowing multiple data record to point to the same storage unit. A small example will illustrate this: Consider a simple data series where records have two keywords ``seriesnum'' (which is the primary index) and ``x'' and a single data segment ``data''. Assume that each storage unit contains a single data record. Let us consider the following sequence of events:
[Question from Jesper Schou: Is the example above the only case where records are allowed to point to the same storage units? Should it be allowed that records with different primary index or even records from different series point to the same storage unit? It sort of wreaks the whole data concept. Are there examples where it would be really useful in a way that cannot be accomplished with links?]