Differences between revisions 5 and 6
Revision 5 as of 2013-02-26 04:05:54
Size: 8801
Editor: DNab4211fe
Comment:
Revision 6 as of 2013-02-26 04:55:34
Size: 17705
Editor: DNab4211fe
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:

== Installing NetDRMS ==

=== Installing NetDRMS for the First Time ===

The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order:

1. Set up the environment (to be done by a superuser)
a. Create a ''production'' linux user (named production by default). If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:

{{{<production user> <host>=NOPASSWD:<path to sum_chmown>}}}

This will allow sum_chmown to be run without a password prompt being presented.
b. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
c. Make sure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file and put it in the production user's home directory. Please see XXX for information on how to do this.




The configuration and compilation of NetDRMS described here can proceed largely independently of the site and/or user setup, which only needs to be done once. It is recommended that the site setup be done first, as the NetDRMS build requires the definition of certain site-dependent names, such as those of the database and server; however, if these names are already known, the libraries can be built without the database and SUMS storage in place. Any code that requires access to the database will not of course function until the DRMS and SUMS services have been set up.

These instructions assume that there is already a NetDRMS database server and associated SUMS server that you can connect to. If that is not the case, then you or someone else at your site will first have to do a Site Installation. You must also have the PostgreSQL Core installed at least as a client library on any machine on which you intend to build the package. You should have psql in your path.

Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms.
Most Recent Version (7.0)
Current and Earlier Versions
The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB.
In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case.
In the target root directory $DRMS, run
  ./configure
This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.)
The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default):
linux_ia32 (`uname -s` = Linux, `uname -m` = ia32 | i686 | i386)
linux_x86_64 (`uname -s` = Linux, `uname -m` = x86_64)
The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.

If you are making on any other architecture, the target name will be custom. Binaries and libraries will be placed in appropriate subdirectories based on these names. If you will be making on multiple architectures, or if you wish to change the target architecture name, you should either add the following line near the beginning of the file $DRMS/make_basic.mk
  JSOC_MACHINE = name
or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts.
If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:
  COMPILER = icc
  FCOMPILER= ifort
Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default.
N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*.
In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps.
These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it.
At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.



There are two parts to setting up NetDRMS. First, the necessary services must be set up at the institution or group that will be hosting the NetDRMS service. The basic preparation and installation only needs to be done once, although the actual software distribution may be updated from time to time without affecting the setup. Second, individual users may wish to set up the NetDRMS software distribution for use or development in their own environment. Again, there are a few administrative tasks that need to be performed once when a user is registered, but the software may be updated or rebuilt at any time. Once the site preparation and setup is complete, user setup is a simple task, so there are two sets of instructions. Most users only need to concern themselves with the second, Installing / Upgrading NetDRMS.


old stuff below
Line 17: Line 81:
* Reserved disk space to serve as the SUMS disk cache.
* A database server running Postgres version 8.4.
* A "current" copy of the JSOC software tree, available from Stanford.
 * Reserved disk space to serve as the SUMS disk cache.
 * A database server running Postgres version 8.4.
 * A "current" copy of the JSOC software tree, available from Stanford.

NetDRMS - a shared data management system

Introduction

In order to process, archive, and distribute the substantial quantity of data flowing from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data management system. This system, the Data Record Management System (DRMS), consists of data series, each of which is a collection of related data. For example, there exists a data series named hmi.M_45s, which contains the HMI 45-second cadence magnetograms. Each data series consists of several DRMS objects: records, keywords, segments, and links. A DRMS record is the smallest unit of data-series data. Typically, it represents data for a single observation in time (hence the term series in data series), but there is no restriction on how a user organizes their data. A data series may contain one or more DRMS keywords, each of which represents a named bit of metadata. For example, many data series contain a DRMS keyword named CRPIX1. A DRMS segment is a collection of data that contains storage/retrieval information needed by DRMS to locate auxiliary data files. These data files contain large sets of data like image arrays. Generally, they are image files, but what they contain is arbitrary and user-defined. A data series optionally contains one or more DRMS links, each of which is a collection of data that links the data series to other DRMS data series. Each DRMS record contains record-specific values for the DRMS keywords, segments, and links. In this way, one record may have one set of keyword, segment, and link values, and another record may have a different set of these values.

The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may point to more than one data file.

To manage all these data, DRMS comprises several components, one of which is a database instance in a relational-database management system (PostgreSQL). The DRMS Library code uses a database instance and several tables to implement the DRMS objects. For each data-series record, there exists a database table that contains one row per each DRMS record. The columns of each of these records contain the DRMS keyword, segment, and link values - bits of data that are all small enough to efficiently fit in a database record. The data-file data are too large to fit into a database record, so those data reside in data files in SUMS. The DRMS-segment values point to the data files, using a unique identifier called a SUNUM. SUMS itself comprises several components, one of which is another database instance that contains several database tables. When DRMS needs a data file, it requests the file from SUMS by providing SUMS with a SUNUM, and then SUMS consults its database tables to derive the path to the data file. SUMS shuttles files between hard disk (aka the disk cache) and tape, so data files have no permanent file path. Therefore, when DRMS requests the path to a file, SUMS must obtain the current path by consulting a database table.

Installing NetDRMS

Installing NetDRMS for the First Time

The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order:

1. Set up the environment (to be done by a superuser) a. Create a production linux user (named production by default). If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:

<production user> <host>=NOPASSWD:<path to sum_chmown>

This will allow sum_chmown to be run without a password prompt being presented. b. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group. c. Make sure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file and put it in the production user's home directory. Please see XXX for information on how to do this.

The configuration and compilation of NetDRMS described here can proceed largely independently of the site and/or user setup, which only needs to be done once. It is recommended that the site setup be done first, as the NetDRMS build requires the definition of certain site-dependent names, such as those of the database and server; however, if these names are already known, the libraries can be built without the database and SUMS storage in place. Any code that requires access to the database will not of course function until the DRMS and SUMS services have been set up.

These instructions assume that there is already a NetDRMS database server and associated SUMS server that you can connect to. If that is not the case, then you or someone else at your site will first have to do a Site Installation. You must also have the PostgreSQL Core installed at least as a client library on any machine on which you intend to build the package. You should have psql in your path.

Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms. Most Recent Version (7.0) Current and Earlier Versions The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB. In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case. In the target root directory $DRMS, run

  • /configure

This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.) The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default): linux_ia32 (uname -s = Linux, uname -m = ia32 | i686 | i386) linux_x86_64 (uname -s = Linux, uname -m = x86_64) The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.

If you are making on any other architecture, the target name will be custom. Binaries and libraries will be placed in appropriate subdirectories based on these names. If you will be making on multiple architectures, or if you wish to change the target architecture name, you should either add the following line near the beginning of the file $DRMS/make_basic.mk

  • JSOC_MACHINE = name

or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts. If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:

  • COMPILER = icc FCOMPILER= ifort

Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default. N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*. In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps. These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it. At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.

There are two parts to setting up NetDRMS. First, the necessary services must be set up at the institution or group that will be hosting the NetDRMS service. The basic preparation and installation only needs to be done once, although the actual software distribution may be updated from time to time without affecting the setup. Second, individual users may wish to set up the NetDRMS software distribution for use or development in their own environment. Again, there are a few administrative tasks that need to be performed once when a user is registered, but the software may be updated or rebuilt at any time. Once the site preparation and setup is complete, user setup is a simple task, so there are two sets of instructions. Most users only need to concern themselves with the second, Installing / Upgrading NetDRMS.

old stuff below

Building Your Own DRMS and SUMS

Sites other than the JSOC can DRMS data series. They can maintain local copies of the DRMS and SUMS data created at the JSOC. And they can create their own DRMS data, of which other sites can maintain local copies. To participate in this network of sites sharing data, a site (aka a node) must install a DRMS/SUMS system to become a NetDRMS site. Once a member of a this network, a NetDRMS site can selectively share specific data series - it is not necessary to share all series.

There are three fundamental requiremants for setting up and operating a DRMS system:

  • Reserved disk space to serve as the SUMS disk cache.
  • A database server running Postgres version 8.4.
  • A "current" copy of the JSOC software tree, available from Stanford.

Setting up a SUMS

The SUMS disk area can be as simple as a directory, but it is probably better to assign at least one disk partition to the SUMS cache. Unless a tape library also exists, the SUMS partition(s) must be large enough to store all the data segments in the DRMS that are to be archived locally. For datasets for which other DRMS servers provide the permanent archive, the local SUMS will serve only as a local cache, so size is dictated by expected usage.

The directory or directories to be used for SUMS must be owned by a user named production (can be any uid) and belong to a group named SOI (can be any gid), and have a permissions mask of 8354 (drwxrwsr-x). The group SOI should include as members any users who will be writing data into the DRMS by running modules or otherwise.

Setting up the Postgres Database server

You should have Postgres Version 8.1 or higher installed; JSOC database servers are currently (Oct 2006) running on the following systems:

  • a 64-bit dual-core xeon running Red Hat Enterprise Linux 4 with Postgres v. 8.1.2
  • a 32-bit dual-core pentium 4 running Scientific Linux (?; equinox) with Postgres v. 8.1.4

Populating the Database

First, you must create the database tables required for SUMS. You can do so by running the following psql commands:

create table SUM_MAIN (
 ONLINE_LOC             VARCHAR(80) NOT NULL,
 ONLINE_STATUS          VARCHAR(5),
 ARCHIVE_STATUS         VARCHAR(5),
 OFFSITE_ACK            VARCHAR(5),
 HISTORY_COMMENT        VARCHAR(80),
 OWNING_SERIES          VARCHAR(80),
 STORAGE_GROUP          integer,
 STORAGE_SET            integer,
 BYTES                  bigint,
 DS_INDEX               bigint,
 CREATE_SUMID           bigint NOT NULL,
 CREAT_DATE             timestamp(0),
 ACCESS_DATE            timestamp(0),
 USERNAME               VARCHAR(10),
 ARCH_TAPE              VARCHAR(20),
 ARCH_TAPE_POS          VARCHAR(15),
 ARCH_TAPE_FN           integer,
 ARCH_TAPE_DATE         timestamp(0),
 WARNINGS               VARCHAR(260),
 STATUS                 integer,
 SAFE_TAPE              VARCHAR(20),
 SAFE_TAPE_POS          VARCHAR(15),
 SAFE_TAPE_FN           integer,
 SAFE_TAPE_DATE         timestamp(0),
 constraint pk_summain primary key (DS_INDEX)
);

create table SUM_OPEN (
    SUMID      bigint not null,
    OPEN_DATE  timestamp(0),
    constraint pk_sumopen primary key (SUMID)
);

create table SUM_PARTN_ALLOC (
    wd                 VARCHAR(80) not null,
    sumid              bigint not null,
    status             integer not null,
    bytes              bigint,
    effective_date     VARCHAR(20),
    archive_substatus  integer,
    group_id           integer,
    ds_index           bigint not null,
    safe_id            integer
);

create table SUM_PARTN_AVAIL (
       partn_name    VARCHAR(80) not null,
       total_bytes   bigint not null,
       avail_bytes   bigint not null,
       pds_set_num   integer not null,
       constraint pk_sumpartnavail primary key (partn_name)
);

create table SUM_TAPE (
        tapeid          varchar(20) not null,
        nxtwrtfn        integer not null,
        spare           integer not null,
        group_id        integer not null,
        avail_blocks    bigint not null,
        closed          integer not null,
        last_write      timestamp(0),
        constraint pk_tape primary key (tapeid)
);

create sequence SUM_SEQ
  increment 1
  start 2
  no maxvalue
  no cycle
  cache 50;

create sequence SUM_DS_INDEX_SEQ
  increment 1
  start 1
  no maxvalue
  no cycle
  cache 10;

create table SUM_FILE (
        tapeid          varchar(20) not null,
        filenum         integer not null,
        gtarblock       integer,
        md5cksum        varchar(36) not null,
        constraint pk_file primary key (tapeid, filenum)
       );

create table SUM_GROUP (
        group_id        integer not null,
        retain_days     integer not null,
        effective_date  VARCHAR(20),
        constraint pk_group primary key (group_id)
       );

(These are contained in the scripts create_tables.sql, sum_file.sql, and sum_group.sql in the JSOC software library base/sums/scripts/postgres.) For example, if you have created a database named mydb on a server named myserver (and had one of those scripts in your wd), you could enter the command

  psql -h myserver mydb -f create_tables.sql

Or you could simply enter the commands by hand. (You should be the database administrator when you create these tables.)

JsocWiki: DRMSSetup (last edited 2024-01-19 09:08:03 by ArtAmezcua)