Overview of JSOC Data Series, DRMS, and SUMS

In the following we describe the logical organization of the HMI/AIA JSOC Data Record Management System (DRMS), also referred to as the JSOC catalog below, and define a number of terms used to describe data in the JSOC at various levels of abstraction. This section is based on the DRMS description by Rasmus Larsen linked at the bottom of the page.

The JSOC data series

Data is stored in the JSOC in "Data Series." A Data Series (or dataseries) is a basic sequence of similar data objects, typically "images" or other binary data along with associated metadata. A dataseries consists of a sequence of Data Records. Usually, each datarecord is the data for one step in "time". Most but certainly not all dataseries are sequences in time. They can be in principle any list of data objects. A good way to think about a dataseries is as a table of rows and columns where each row is a record. The columns contain metadata and descriptors of access to the binary data objects.

A datarecord is the basic "atomic unit" of a dataseries, or more precisely: The smallest unit that will be individually registered and available for export from a data series in the JSOC catalog. Most (if not all) access to the JSOC archive by both pipeline processing modules and external data export services will be in terms of data records. In other words, what we in informally call the "JSOC catalog" is fi rst and foremost a data record catalog.

A datarecord consists of Keyword tagged meta-data describing the record and 0 or more named Datasegments usually containing binary arrays of data values. All datarecords in a given dataseries have the same set of keyword and datasegment names and associated record specific values. The dataseries description and the datarecords are maintained in a relational database called DRMS (Data Record Managment System). DRMS is implemented as a set of [http://www.postgresql.org/ PostgreSQL] tables. There is one database table for each series containing the values of keywords, segment metadata, and links for all data record in the series. The values for a single data record are contained in a single row in that table.

In summary (click for more details)

DataRecords

Datarecords contain several types of metadata including keyword values, segment descriptors, record links, and some processing information. Each record in a particular series is given a record number (called recnum) which serves as its ultimate identification in the database. Usually one or more keywords are designated prime keys which are the primary way records are identified for the user. The prime keys are used together to uniquely identify a dataseries record and are used to define the main index for the series. Any records with same sets of prime key values are treated as different versions of the same record. Thus the most recent instance of any record in a given series may be found by specifying the values of the prime keys for that series. The pre-defined keyword "recnum" is used for the main index in the case that no prime keys are defined. If a record with prime keys has been modified, older versions of the record will still be in the table but will have smaller values of recnum.

In order to access a set of records from a series a description must be provided to select the desired records. We call that description a "Dataset Name". Thus, in JSOC/DRMS a dataset name is actually a database query. The DRMS dataset name rules have been defined to provide user friendly (well it is the goal) names that are easy to remember and use.

Keywords

A data record contains zero or more (typically many) named keywords that each map to a value of a simple type such as integer, float, string, or time associated with the record. Keywords are often used to store meta-data describing properties, history and/or context of the main image/observable data stored in the record's data segments. This is a concept familiar from standard fi le-based data formats, such as FITS, where the FITS header keywords would correspond to the JSOC keywords and the primary binary arrays or tables would correspond to the JSOC data segments.

In the JSOC catalog keywords values are stored in database tables separate from the files holding the data segments. This makes it possible to:

There is one database table for each series containing the values of keywords and links for all data record in the series. The values for a single data record will be contained in a single row in that table.

Prime Keywords

For many series a primary index associated with the principal axis (e.g. time or (lattitude, longitude)) associated with each datarecord is desired. The intention is that the primary index maps to a unique value or slot on the principal axis. There might exist multiple versions of the "same observation" (e.g. newer versions could be created to include earlier missing data or to fi x a bad calibration). Since there might be multiple versions of the "same" record, the primary index does not uniquely identify a data record.

The primary index consists of one or more keyword values that are logically concatenated to form the full index. If two records have keywords values that diff er on any of the keywords comprising the primary index, they are considered diff erent data record (w.r.t. the primary index), otherwise they are considered only different versions of the same data record (w.r.t. the primary index). The default behavior of the JSOC is to return the most recent version of a datarecord for a given primary index. Since record numbers (recnums) are assigned in order of creation the most recent version is record with the highest recnum. The primary index has two crucial uses in the JSOC:

Segments

While the DRMS record contains the description of each datasegment, the information contained a datasegment is not stored in the database but is stored in Storage Units "owned" by SUMS (Storage Unit Management System). Storageunits are simply directories containing files. SUMS itself maintains tables in PostgreSQL to track storageunit locations on disk and/or tape. A storage unit may contain data for 1 or more datasegments for 1 or more datarecords.

A data record contains zero or more named links. Links are pointers between data records and make it possible for data records to inherit keyword values from each other, and to capture other dependencies between them such as processing history. For example, a data record can contain links to the data records that were used in creating it, such as a dopplergram data record pointing to the filtergrams from which is was created. Links come in two varieties, static and dynamic:

DRMS

SUMS

[wiki:SumsDataModel SUMS - the Storage Unit Management System]

Implementation

The JSOC Application Programming Interface (API) provides a set of functions, with bindings to host languages including C, and FORTRAN, and maybe someday IDL and MATLAB, that allow programs to connect to the JSOC environment and retrieve and manipulate data records. The API contains groups of functions that

condition,

The API is described in the man pages and elsewhere in this wiki.

Older Documents

There are several older documents that while not accurate in describing the JSOC system as it is now implemented, do contain useful information about the design and intent and usage ideas. These are: