Differences between revisions 8 and 67 (spanning 59 versions)
Revision 8 as of 2013-02-26 05:03:42
Size: 17716
Editor: DNab4211ad
Comment:
Revision 67 as of 2014-11-13 04:49:23
Size: 34231
Editor: ArtAmezcua
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
Line 4: Line 3:
Line 7: Line 5:
The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may ''point'' to more than one data file.  The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may ''point'' to more than one data file.
Line 11: Line 9:
Line 13: Line 10:
Line 15: Line 11:

The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order:

 1. Set up the environment (to be done by a superuser)
  a. Create a ''production'' linux user (named production by default). If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:[[BR]]{{{<production user> <host>=NOPASSWD:<path to sum_chmown>}}}[[BR]]This will allow sum_chmown to be run without a password prompt being presented.
  b. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
  c. Make sure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file and put it in the production user's home directory. Please see XXX for information on how to do this.


The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System (PG) and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order. All accounts/paths/ports/etc. referenced can be modified, but we recommend not doing this. Debugging issues from Stanford becomes difficult if every site does things differently. The accounts/paths/ports/etc. listed below are the ones used on Stanford's test NetDRMS (on the machine shoom):

 1. Set up the linux environment (to be done by a superuser)
  a. Create a ''production'' linux user (named production by default). The name of this user is the value of the SUMS_MANAGER parameter in the config.local file. If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:<<BR>><<BR>>{{{<production user> <host>=NOPASSWD:<path to sum_chmown>}}}<<BR>><<BR>>This will allow sum_chmown to be run without a password prompt being presented.
  a. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
  a. Ensure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file in the production user's home directory. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file.
  a. Create a linux user named "postgres". This is the user that will own all of the PG data files. It is also the user that will run the server daemon process (postgres).
  a. Each user of DRMS, including the production user, must set two environment variables in their environment:<<BR>><<BR>>{{{setenv JSOCROOT <DRMS source tree root>}}}<<BR>>{{{setenv JSOC_MACHINE <OS and CPU>}}}<<BR>><<BR>>where <DRMS source tree root> is the root of the DRMS source tree installed by the production linux user, and <OS and CPU> is "linux_x86_64", if DRMS was installed on a machine with a Linux OS and a 64-bit processor, or "linux_avx", if DRMS was installed on a machine with a Linux OS and a 64-bit processor that supports Advanced Vector Extensions (which supports an extended instruction set).
  a. Create the SUMS log directory on the SUMS server machine, if it does not already exist. The name/path for this directory is defined in config.local in the SUMS_LOG_BASEDIR parameter. The actual directory must match the value of this parameter, which defaults to /usr/local/logs/SUM. You are free to change this path in SUMS_LOG_BASEDIR. This directory must be writeable by the linux production user.
 1. Set up the PG database.
  a. Install server version 8.4 (this is the only version supported by Stanford) on a dedicated machine. Obtain the latest 8.4 rpm binaries from ftp://ftp.postgresql.org/pub/binary/.
  a. Install the client software, version 8.4, on all machines that will be used to either access the database server or build DRMS software. All DRMS software must connect to the DRMS and SUMS databases. To do so, it must be linked against static and/or dynamic libraries that allow database access. These libraries are a component of the PG client software, so it must be installed on machines used to build DRMS software. Some dynamic libraries are involved, so the host on which this software is run must also have the PG client software installed.
  a. Create a database cluster for the DRMS data. A database cluster is a storage area on disk that contains the data for one or more databases. The storage area is implemented as a directory (the ''data directory'') and it is managed by a single instance of a PG server process. To create this cluster (data directory), first log-in as the linux user postgres, and then run the initdb command:<<BR>><<BR>>{{{initdb --locale=C -D /var/lib/pgsql/data}}}<<BR>><<BR>>This will create the data directory /var/lib/pgsql/data on the database server host. The "--locale" argument will set cluster locale to "C". I have no idea what the C locale is - just use it.
  a. Create a database cluster for the SUMS data. This cluster is distinct from the cluster for the DRMS data, and it is maintained by a separated server instance:<<BR>><<BR>>{{{initdb --locale=C -D /var/lib/pgsql/data_sums}}}<<BR>><<BR>>This will create the data directory /var/lib/pgsql/data_sums on the database server host.
  a. Edit the PG configuration files. The configuration files are cluster-specific, and they reside in the data directory created by the initdb command. A complete list of all modifiable parameters can be found in the PG online documentation, but a couple are worth mentioning now.
   i. listen_addresses (in postgresql.conf) is a list of IP addresses from which connections can be made. By default the value of the parameter is "localhost", which disallows IP connections from all machines, except the machine hosting the database server process. This is not what you want. The single-quoted string '*' will allow connections from all machines. If you want to be more restrictive, you can simply provide a comma-separated list of hostnames or IP addresses.
   i. port (in postgresql.conf) is the port on which the server listens for connections. If you create more than one cluster on the host server machine (e.g., if you create both the DRMS and SUMS clusters on a single host), then you'll need to change the port number for at least one cluster (you cannot have two server processes listening for connections on the same port). We suggest using port 5432 for the DRMS cluster (port = 5432 - no quotes), and port 5434 for the SUMS cluster.
   i. logging_collector (in postgresql.conf). Set this to 'on' so that the output of the postgres server process will be captured into log files and rotated once per day.
   i. log_rotation_size (in postgresql.conf). Set this to 0. This will cause PG to emit one log every day (as opposed to starting a new log after the previous log is a certain size).
   i. log_min_duration_statement (in postgresql.conf). Set this to 1000 so that only queries that are greater than 1000 ms in run time will be logged. Otherwise, the log files will quickly get out of hand.
   i. The pg_hba.conf file. This file contains lines of the form<<BR>><<BR>>{{{<connection type> <databases> <user> <IP address> <IP mask> <authentication method>}}}<<BR>><<BR>>if you wish to use an IP-address mask to specify a range of IP addresses, or<<BR>><<BR>>{{{<connection type> <databases> <user> <CIDR-address> <authentication method>}}}<<BR>><<BR>>if you wish to use a CIDR-address to specify the range. To get yourself up and running, you'll need to add a line or two to this file. To allow access by one host, we suggest<<BR>><<BR>>{{{host all all XXX.XXX.XXX.XXX 255.255.255.255 md5}}}<<BR>><<BR>>or<<BR>><<BR>>{{{host all all XXX.XXX.XXX.XXX/32 md5}}}<<BR>><<BR>>For multiple-host access, we suggest<<BR>><<BR>>{{{host all all XXX.XXX.XXX.0 255.255.255.0 md5}}}<<BR>><<BR>>or<<BR>><<BR>>{{{host all all XXX.XXX.XXX.0/24 md5}}}
 1. The remainder of the instructions require that the PG servers (there is one for the DRMS cluster, and one for the SUMS cluster) be running. To start-up the server instances run:<<BR>><<BR>>{{{su postgres}}}<<BR>>{{{pg_ctl start -D /var/lib/pgsql/data # start the DRMS-database cluster server}}}<<BR>>{{{pg_ctl start -D /var/lib/pgsql/data_sums -o "-p 5434" # start the SUMS-database cluster server}}}.<<BR>><<BR>> The server logs will be placed in the data directory.
 1. Create the DRMS database in the DRMS cluster, and create the SUMS database in the SUMS cluster:<<BR>><<BR>>{{{su postgres}}}<<BR>>{{{createdb --locale C -E LATIN1 -T template0 data # create the DRMS database in the DRMS-database cluster}}}<<BR>>{{{createdb --locale C -E LATIN1 -T template0 -p 5434 data_sums # create the SUMS database in the SUMS-database cluster}}}. NOTE: The -E flag sets the character encoding of the characters stored in the database. LATIN1 is not a great choice (it would have been better to have used SQL_ASCII or UTF8), but that is what was chosen at Stanford so we're stuck with it, which means remote sites that have become series subscribers are stuck with it too.
 1. Install the required DB-server languages:<<BR>><<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plpgsql data # Add the plpgsql language to the DRMS database}}}<<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plperl data # Add the plperl language to the DRMS database}}}<<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plperlu data # Add the plperlu 'unstrusted' language to the DRMS database}}}<<BR>><<BR>>At this time, there are no auxiliary languages needed for the SUMS database.
 1. Create various tables and DRMS database functions needed by the DRMS library:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 -U postgres data -f $JSOCROOT/base/drms/scripts/NetDRMS.sql # Create the 'admin' schema and tables within this schema; create the 'drms' schema}}}<<BR>>{{{# Create the SUMSADMIN database user}}}<<BR>>{{{su postgres}}}<<BR>>{{{cd $JSOCROOT/base/drms/scripts}}}<<BR>>{{{./createpgfuncs.pl data # Create functions in the DRMS database}}}
 1. Create database accounts for DRMS users. To use DRMS software/modules, a user of this software must have an account on the DRMS database (a DRMS series is implemented as several database objects). The software, when run, will log into a user account on the DRMS database - by default, the name of the user account is the name of the linux user account that the DRMS software runs under.
   a. Run the newdrmsuser.pl script - you will be prompted for the postgres dbuser password:<<BR>><<BR>>{{{$JSOCROOT/base/drms/scripts/newdrmsuser.pl data <db server host> 5432 <db user> <initial password> <db user namespace> user 1}}}<<BR>><<BR>>where <db user> is the name of the user whose account is to be created and <db user namespace> is the namespace DRMS should use when running as the db user and reading or writing database tables. The namespace is a logical container of database objects, like database tables, sequences, functions, etc. The names of all objects are qualified by the namespace. For example, to unambiguously refer to the table "mytable", you prepend the name with the namespace. So, for example, if this table is in the su_production namespace (container), then you refer to the table as "su_production.mytable". In this way, there can be other tables with the same name, but that reside in a different namespace (e.g., su_arta.mytable is a different table that just happens to have the same name). Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this page]] for assistance with choosing a namespace. <initial password> is the initial password for this account.
   a. Have the user that owns the account change the password:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 data}}}<<BR>>{{{data=> ALTER USER <db user> WITH PASSWORD '<new password>';}}}<<BR>><<BR>>where <new password> is the replacement for the original password. It must be enclosed in single quotes.
   a. Have the user put their password in their .pgpass file. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file. This file allows the user to login to their database account without having to provide a password at a prompt.
   a. Create a db account for the linux production user (the name is the value of the SUMS_MANAGER parameter in config.local). The name of the database user for this linux user is the same as the name of the linux user (typically 'production'). Follow the previous steps to create this database account.
   a. Create a password for the sumsadmin DRMS database user, following the "ALTER USER" directions above. The user was created by the newdrmsuser.pl script above.
   a. OPTIONALLY, create a table to be used for DRMS version control:<<BR>>{{{psql -h <db server host> -p 5432 -U <postgres administrator> data}}}<<BR>>{{{CREATE TABLE drms.minvers(minversion text default '1.0' not null);}}}<<BR>>{{{GRANT SELECT ON drms.minvers TO public;}}}<<BR>>{{{INSERT INTO drms.minvers(minversion) VALUES(<version>);}}}<<BR>>where <version> is the minimum DRMS version that a DRMS module must have before it can connect to the DRMS database.
 1. Set-up the SUMS database. Although the SUMS data cluster and SUMS database have been already created, you must create certain tables and users in this newly created database.
   a. Create the production user in the SUMS database:<<BR>><<BR>>{{{$JSOCROOT/base/drms/scripts/newdrmsuser.pl data_sums <db server host> 5434 <db production user> <password> <db production user namespace> sys 1}}}<<BR>><<BR>>where <db production user namespace> is the namespace. Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this link]] for assistance with choosing a namespace for the production user.
   a. Put the production db user into the sumsadmin group:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 data -U postgres}}}<<BR>>{{{postgres=> GRANT sumsadmin TO <db production user>;}}}<<BR>><<BR>>
   a. Put the production user's password into the .pgpass file. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file.
   a. Create the SUMS database tables:<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U production -f scripts/create_sums_tables.sql data_sums}}}<<BR>>{{{ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>}}}<<BR>><<BR>>where <min val> is <drms site code> << 48, and and <max val> is <min val> + 281474976710655 (2^<drms site code> - 1), and <drms site code> is the value of the DRMS_SITE_CODE parameter in config.local.
   a. Grant elevated privileges to these tables to the db production user (the scripts should be modified to do this):<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U postgres data_sums}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_tape TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_ds_index_seq,sum_seq TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_file,sum_group,sum_main,sum_open TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_partn_alloc,sum_partn_avail TO production;}}}<<BR>><<BR>>
   a. SUMS data files are organized into "partitions" which are implemented as directories. Each partition must be named /SUM[0-9]* (e.g., /SUM, /SUM0, /SUM101). Each directory must be owned by the production linux user (e.g., "production). The file-system group to which the directories belong, the SUMS user group (e.g., SOI) must also contain all DRMS users. So, if linux user art will be using DRMS and running DRMS modules, then art must be a member of the SUMS user group. You are free to create as few or many of these partitions as you desire. Create these directories now.<<BR>><<BR>>NOTE: Please avoid using file systems that limit the number of directories and/or files. For example, the EXT3 file system limits the number of directories to 64K. That number is far too small for SUMS usage.
   a. Initialize the sum_partn_avail table with the names of these partitions. For each SUMS partition run the following:<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U postgres data_sums}}}<<BR>>{{{data_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);}}}<<BR>><<BR>>where <SUMS partition path> is the full path of the partition (the path must be enclosed in single quotes) and <avail bytes> is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.
 1. Build the SUMS binaries:<<BR>><<BR>>{{{su - <production user>; cd $JSOCROOT; ./configure; make sums}}}<<BR>><<BR>>
 1. Copy the sum_chmown program to <path to sum_chmown> (chosen in step 1a. above), make the production user the owner, and give it setuid privileges:<<BR>><<BR>>{{{su - root}}}<<BR>>{{{cp $JSOCROOT/drms/_linux_x86_64/base/sums/apps/sum_chmown <path to sum_chmown>}}}<<BR>>{{{chown root:root <path to sum_chmown>}}}<<BR>>{{{chmod u+s <path to sum_chmown>}}}<<BR>><<BR>>
 1. Start SUMS: <<BR>><<BR>>{{{$JSOCROOT/base/sums/scripts/sum_start.NetDRMS}}}<<BR>><<BR>>The script does not return a prompt after echoing "sum_svc now available". Just hit RETURN.
 1. To stop SUMS for any reason, run this script:<<BR>><<BR>>{{{$JSOCROOT/base/sums/scripts/sum_stop.NetDRMS}}}<<BR>><<BR>>
Line 32: Line 62:
Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms.
Most Recent Version (7.0)
Current and Earlier Versions
The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB.
In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators: 

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.) 

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes. 

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case.
In the target root directory $DRMS, run
  ./configure
This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.)
The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default):
linux_ia32 (`uname -s` = Linux, `uname -m` = ia32 | i686 | i386)
linux_x86_64 (`uname -s` = Linux, `uname -m` = x86_64)
The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug. 
Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms. Most Recent Version (7.0) Current and Earlier Versions The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB. In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case. In the target root directory $DRMS, run

 .
/configure

This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.) The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default): linux_ia32 (`uname -s` = Linux, `uname -m` = ia32 | i686 | i386) linux_x86_64 (`uname -s` = Linux, `uname -m` = x86_64) The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.
Line 52: Line 75:
  JSOC_MACHINE = name
or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts.
If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:
  COMPILER = icc
 
FCOMPILER= ifort
Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default.
N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*.
In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps.
These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it.
At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.


 .
JSOC_MACHINE = name

or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts. If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:

 .
COMPILER = icc FCOMPILER= ifort

Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default.  N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*. In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps. These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it. At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.
Line 69: Line 88:
Line 71: Line 89:
Line 72: Line 91:

Sites other than the JSOC can DRMS data series. They can maintain local copies of the DRMS and SUMS data created at the JSOC. And they can create their own DRMS data, of which other sites can maintain local copies. To participate in this network of sites sharing data, a site (aka a node) must install a DRMS/SUMS system to become a NetDRMS site. Once a member of a this network, a NetDRMS site can selectively share specific data series - it is not necessary to share all series. 
Sites other than the JSOC can DRMS data series. They can maintain local copies of the DRMS and SUMS data created at the JSOC. And they can create their own DRMS data, of which other sites can maintain local copies. To participate in this network of sites sharing data, a site (aka a node) must install a DRMS/SUMS system to become a NetDRMS site. Once a member of a this network, a NetDRMS site can selectively share specific data series - it is not necessary to share all series.
Line 77: Line 95:
 * Reserved disk space to serve as the SUMS disk cache.   * Reserved disk space to serve as the SUMS disk cache.
Line 82: Line 100:

The SUMS disk area can be as simple as a directory, but it is probably better to assign at
least one disk partition to the SUMS cache. Unless a tape library also exists, the SUMS
partition(s) must be large enough to store all the data segments in the DRMS that are to be
archived locally. For datasets for which other DRMS servers provide the permanent archive,
the local SUMS will serve only as a local cache, so size is dictated by expected usage.

The directory or directories to be used for SUMS must be owned by a user named '''production'''
(can be any uid) and belong to a group named '''SOI''' (can be any gid), and have a permissions
mask of 8354 (''drwxrwsr-x''). The group '''SOI''' should include as members any users who
will be writing data into the DRMS by running modules or otherwise.
The SUMS disk area can be as simple as a directory, but it is probably better to assign at least one disk partition to the SUMS cache. Unless a tape library also exists, the SUMS partition(s) must be large enough to store all the data segments in the DRMS that are to be archived locally. For datasets for which other DRMS servers provide the permanent archive, the local SUMS will serve only as a local cache, so size is dictated by expected usage.

The directory or directories to be used for SUMS must be owned by a user named '''production''' (can be any uid) and belong to a group named '''SOI''' (can be any gid), and have a permissions mask of 8354 (''drwxrwsr-x''). The group '''SOI''' should include as members any users who will be writing data into the DRMS by running modules or otherwise.
Line 95: Line 105:

You should have Postgres Version 8.1 or higher installed; JSOC database servers are
currently (Oct 2006) running on the following systems:
  * a 64-bit dual-core xeon running Red Hat Enterprise Linux 4 with Postgres v. 8.1.2
  * a 32-bit dual-core pentium 4 running Scientific Linux (?; equinox) with Postgres v. 8.1.4
You should have Postgres Version 8.1 or higher installed; JSOC database servers are currently (Oct 2006) running on the following systems:

* a 64-bit dual-core xeon running Red Hat Enterprise Linux 4 with Postgres v. 8.1.2
 * a 32-bit dual-core pentium 4 running Scientific Linux (?; equinox) with Postgres v. 8.1.4
Line 102: Line 111:

First, you must create the database tables required for SUMS. You can do so by running the
following psql commands:
First, you must create the database tables required for SUMS. You can do so by running the following psql commands:
Line 187: Line 194:
 tapeid  varchar(20) not null,
 filenum  integer not null,
 gtarblock integer,
 md5cksum varchar(36) not null,
 constraint pk_file primary key (tapeid, filenum)
        tapeid          varchar(20) not null,
        filenum         integer not null,
        gtarblock       integer,
        md5cksum        varchar(36) not null,
        constraint pk_file primary key (tapeid, filenum)
Line 195: Line 202:
 group_id integer not null,
 retain_days integer not null,
 effective_date VARCHAR(20),
 constraint pk_group primary key (group_id)
        group_id        integer not null,
        retain_days     integer not null,
        effective_date  VARCHAR(20),
        constraint pk_group primary key (group_id)
Line 201: Line 208:

(These are contained in the scripts '''create_tables.sql''', '''sum_file.sql''', and
'''sum_group.sql''' in the JSOC software library '''base/sums/scripts/postgres'''.) For example,
if you have created a database named ''mydb'' on a server named ''myserver'' (and had
one of those scripts in your ''wd''), you could enter the command
(These are contained in the scripts '''create_tables.sql''', '''sum_file.sql''', and '''sum_group.sql''' in the JSOC software library '''base/sums/scripts/postgres'''.) For example, if you have created a database named ''mydb'' on a server named ''myserver'' (and had one of those scripts in your ''wd''), you could enter the command
Line 210: Line 213:

Or you could simply enter the commands by hand. (You should be the database administrator
when you create these tables.)
Or you could simply enter the commands by hand. (You should be the database administrator when you create these tables.)

== Remote SUMS ==
A local NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units associated with these data series. There are currently two methods to facilitate these downloads. The Java Mirroring Daemon (JMD) is a tool that can be installed and configured to download SUs automatically as the series data records are ingested into the local NetDRMS. It can obtain the SUs from any other NetDRMS that has the SUs, not just the NetDRMS that originally produced them.

NetDRMS - a shared data management system

Introduction

In order to process, archive, and distribute the substantial quantity of data flowing from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data management system. This system, the Data Record Management System (DRMS), consists of data series, each of which is a collection of related data. For example, there exists a data series named hmi.M_45s, which contains the HMI 45-second cadence magnetograms. Each data series consists of several DRMS objects: records, keywords, segments, and links. A DRMS record is the smallest unit of data-series data. Typically, it represents data for a single observation in time (hence the term series in data series), but there is no restriction on how a user organizes their data. A data series may contain one or more DRMS keywords, each of which represents a named bit of metadata. For example, many data series contain a DRMS keyword named CRPIX1. A DRMS segment is a collection of data that contains storage/retrieval information needed by DRMS to locate auxiliary data files. These data files contain large sets of data like image arrays. Generally, they are image files, but what they contain is arbitrary and user-defined. A data series optionally contains one or more DRMS links, each of which is a collection of data that links the data series to other DRMS data series. Each DRMS record contains record-specific values for the DRMS keywords, segments, and links. In this way, one record may have one set of keyword, segment, and link values, and another record may have a different set of these values.

The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may point to more than one data file.

To manage all these data, DRMS comprises several components, one of which is a database instance in a relational-database management system (PostgreSQL). The DRMS Library code uses a database instance and several tables to implement the DRMS objects. For each data-series record, there exists a database table that contains one row per each DRMS record. The columns of each of these records contain the DRMS keyword, segment, and link values - bits of data that are all small enough to efficiently fit in a database record. The data-file data are too large to fit into a database record, so those data reside in data files in SUMS. The DRMS-segment values point to the data files, using a unique identifier called a SUNUM. SUMS itself comprises several components, one of which is another database instance that contains several database tables. When DRMS needs a data file, it requests the file from SUMS by providing SUMS with a SUNUM, and then SUMS consults its database tables to derive the path to the data file. SUMS shuttles files between hard disk (aka the disk cache) and tape, so data files have no permanent file path. Therefore, when DRMS requests the path to a file, SUMS must obtain the current path by consulting a database table.

Installing NetDRMS

Installing NetDRMS for the First Time

The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System (PG) and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order. All accounts/paths/ports/etc. referenced can be modified, but we recommend not doing this. Debugging issues from Stanford becomes difficult if every site does things differently. The accounts/paths/ports/etc. listed below are the ones used on Stanford's test NetDRMS (on the machine shoom):

  1. Set up the linux environment (to be done by a superuser)
    1. Create a production linux user (named production by default). The name of this user is the value of the SUMS_MANAGER parameter in the config.local file. If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:

      <production user> <host>=NOPASSWD:<path to sum_chmown>

      This will allow sum_chmown to be run without a password prompt being presented.

    2. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
    3. Ensure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file in the production user's home directory. Please click here for information on the .pgpass file.

    4. Create a linux user named "postgres". This is the user that will own all of the PG data files. It is also the user that will run the server daemon process (postgres).
    5. Each user of DRMS, including the production user, must set two environment variables in their environment:

      setenv JSOCROOT <DRMS source tree root>
      setenv JSOC_MACHINE <OS and CPU>

      where <DRMS source tree root> is the root of the DRMS source tree installed by the production linux user, and <OS and CPU> is "linux_x86_64", if DRMS was installed on a machine with a Linux OS and a 64-bit processor, or "linux_avx", if DRMS was installed on a machine with a Linux OS and a 64-bit processor that supports Advanced Vector Extensions (which supports an extended instruction set).

    6. Create the SUMS log directory on the SUMS server machine, if it does not already exist. The name/path for this directory is defined in config.local in the SUMS_LOG_BASEDIR parameter. The actual directory must match the value of this parameter, which defaults to /usr/local/logs/SUM. You are free to change this path in SUMS_LOG_BASEDIR. This directory must be writeable by the linux production user.
  2. Set up the PG database.
    1. Install server version 8.4 (this is the only version supported by Stanford) on a dedicated machine. Obtain the latest 8.4 rpm binaries from ftp://ftp.postgresql.org/pub/binary/.

    2. Install the client software, version 8.4, on all machines that will be used to either access the database server or build DRMS software. All DRMS software must connect to the DRMS and SUMS databases. To do so, it must be linked against static and/or dynamic libraries that allow database access. These libraries are a component of the PG client software, so it must be installed on machines used to build DRMS software. Some dynamic libraries are involved, so the host on which this software is run must also have the PG client software installed.
    3. Create a database cluster for the DRMS data. A database cluster is a storage area on disk that contains the data for one or more databases. The storage area is implemented as a directory (the data directory) and it is managed by a single instance of a PG server process. To create this cluster (data directory), first log-in as the linux user postgres, and then run the initdb command:

      initdb --locale=C -D /var/lib/pgsql/data

      This will create the data directory /var/lib/pgsql/data on the database server host. The "--locale" argument will set cluster locale to "C". I have no idea what the C locale is - just use it.

    4. Create a database cluster for the SUMS data. This cluster is distinct from the cluster for the DRMS data, and it is maintained by a separated server instance:

      initdb --locale=C -D /var/lib/pgsql/data_sums

      This will create the data directory /var/lib/pgsql/data_sums on the database server host.

    5. Edit the PG configuration files. The configuration files are cluster-specific, and they reside in the data directory created by the initdb command. A complete list of all modifiable parameters can be found in the PG online documentation, but a couple are worth mentioning now.
      1. listen_addresses (in postgresql.conf) is a list of IP addresses from which connections can be made. By default the value of the parameter is "localhost", which disallows IP connections from all machines, except the machine hosting the database server process. This is not what you want. The single-quoted string '*' will allow connections from all machines. If you want to be more restrictive, you can simply provide a comma-separated list of hostnames or IP addresses.
      2. port (in postgresql.conf) is the port on which the server listens for connections. If you create more than one cluster on the host server machine (e.g., if you create both the DRMS and SUMS clusters on a single host), then you'll need to change the port number for at least one cluster (you cannot have two server processes listening for connections on the same port). We suggest using port 5432 for the DRMS cluster (port = 5432 - no quotes), and port 5434 for the SUMS cluster.
      3. logging_collector (in postgresql.conf). Set this to 'on' so that the output of the postgres server process will be captured into log files and rotated once per day.
      4. log_rotation_size (in postgresql.conf). Set this to 0. This will cause PG to emit one log every day (as opposed to starting a new log after the previous log is a certain size).
      5. log_min_duration_statement (in postgresql.conf). Set this to 1000 so that only queries that are greater than 1000 ms in run time will be logged. Otherwise, the log files will quickly get out of hand.
      6. The pg_hba.conf file. This file contains lines of the form

        <connection type>  <databases>  <user>  <IP address>  <IP mask>  <authentication method>

        if you wish to use an IP-address mask to specify a range of IP addresses, or

        <connection type>  <databases>  <user>  <CIDR-address>  <authentication method>

        if you wish to use a CIDR-address to specify the range. To get yourself up and running, you'll need to add a line or two to this file. To allow access by one host, we suggest

        host  all  all XXX.XXX.XXX.XXX  255.255.255.255  md5

        or

        host  all  all XXX.XXX.XXX.XXX/32  md5

        For multiple-host access, we suggest

        host  all  all XXX.XXX.XXX.0  255.255.255.0  md5

        or

        host  all  all  XXX.XXX.XXX.0/24  md5

  3. The remainder of the instructions require that the PG servers (there is one for the DRMS cluster, and one for the SUMS cluster) be running. To start-up the server instances run:

    su postgres
    pg_ctl start -D /var/lib/pgsql/data # start the DRMS-database cluster server
    pg_ctl start -D /var/lib/pgsql/data_sums -o "-p 5434" # start the SUMS-database cluster server.

    The server logs will be placed in the data directory.

  4. Create the DRMS database in the DRMS cluster, and create the SUMS database in the SUMS cluster:

    su postgres
    createdb --locale C -E LATIN1 -T template0 data # create the DRMS database in the DRMS-database cluster
    createdb --locale C -E LATIN1 -T template0 -p 5434 data_sums # create the SUMS database in the SUMS-database cluster. NOTE: The -E flag sets the character encoding of the characters stored in the database. LATIN1 is not a great choice (it would have been better to have used SQL_ASCII or UTF8), but that is what was chosen at Stanford so we're stuck with it, which means remote sites that have become series subscribers are stuck with it too.

  5. Install the required DB-server languages:

    createlang -h <db server host> -p 5432 -U postgres plpgsql data # Add the plpgsql language to the DRMS database
    createlang -h <db server host> -p 5432 -U postgres plperl data # Add the plperl language to the DRMS database
    createlang -h <db server host> -p 5432 -U postgres plperlu data # Add the plperlu 'unstrusted' language to the DRMS database

    At this time, there are no auxiliary languages needed for the SUMS database.

  6. Create various tables and DRMS database functions needed by the DRMS library:

    psql -h <db server host> -p 5432 -U postgres data -f $JSOCROOT/base/drms/scripts/NetDRMS.sql # Create the 'admin' schema and tables within this schema; create the 'drms' schema
    # Create the SUMSADMIN database user
    su postgres
    cd $JSOCROOT/base/drms/scripts
    ./createpgfuncs.pl data # Create functions in the DRMS database

  7. Create database accounts for DRMS users. To use DRMS software/modules, a user of this software must have an account on the DRMS database (a DRMS series is implemented as several database objects). The software, when run, will log into a user account on the DRMS database - by default, the name of the user account is the name of the linux user account that the DRMS software runs under.
    1. Run the newdrmsuser.pl script - you will be prompted for the postgres dbuser password:

      $JSOCROOT/base/drms/scripts/newdrmsuser.pl data <db server host> 5432 <db user> <initial password> <db user namespace> user 1

      where <db user> is the name of the user whose account is to be created and <db user namespace> is the namespace DRMS should use when running as the db user and reading or writing database tables. The namespace is a logical container of database objects, like database tables, sequences, functions, etc. The names of all objects are qualified by the namespace. For example, to unambiguously refer to the table "mytable", you prepend the name with the namespace. So, for example, if this table is in the su_production namespace (container), then you refer to the table as "su_production.mytable". In this way, there can be other tables with the same name, but that reside in a different namespace (e.g., su_arta.mytable is a different table that just happens to have the same name). Please see the NOTE in this page for assistance with choosing a namespace. <initial password> is the initial password for this account.

    2. Have the user that owns the account change the password:

      psql -h <db server host> -p 5432 data
      data=> ALTER USER <db user> WITH PASSWORD '<new password>';

      where <new password> is the replacement for the original password. It must be enclosed in single quotes.

    3. Have the user put their password in their .pgpass file. Please click here for information on the .pgpass file. This file allows the user to login to their database account without having to provide a password at a prompt.

    4. Create a db account for the linux production user (the name is the value of the SUMS_MANAGER parameter in config.local). The name of the database user for this linux user is the same as the name of the linux user (typically 'production'). Follow the previous steps to create this database account.
    5. Create a password for the sumsadmin DRMS database user, following the "ALTER USER" directions above. The user was created by the newdrmsuser.pl script above.
    6. OPTIONALLY, create a table to be used for DRMS version control:
      psql -h <db server host> -p 5432 -U <postgres administrator> data
      CREATE TABLE drms.minvers(minversion text default '1.0' not null);
      GRANT SELECT ON drms.minvers TO public;
      INSERT INTO drms.minvers(minversion) VALUES(<version>);
      where <version> is the minimum DRMS version that a DRMS module must have before it can connect to the DRMS database.

  8. Set-up the SUMS database. Although the SUMS data cluster and SUMS database have been already created, you must create certain tables and users in this newly created database.
    1. Create the production user in the SUMS database:

      $JSOCROOT/base/drms/scripts/newdrmsuser.pl data_sums <db server host> 5434 <db production user> <password> <db production user namespace> sys 1

      where <db production user namespace> is the namespace. Please see the NOTE in this link for assistance with choosing a namespace for the production user.

    2. Put the production db user into the sumsadmin group:

      psql -h <db server host> -p 5432 data -U postgres
      postgres=> GRANT sumsadmin TO <db production user>;

    3. Put the production user's password into the .pgpass file. Please click here for information on the .pgpass file.

    4. Create the SUMS database tables:

      psql -h <db server host> -p 5434 -U production -f scripts/create_sums_tables.sql data_sums
      ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>

      where <min val> is <drms site code> << 48, and and <max val> is <min val> + 281474976710655 (2^<drms site code> - 1), and <drms site code> is the value of the DRMS_SITE_CODE parameter in config.local.

    5. Grant elevated privileges to these tables to the db production user (the scripts should be modified to do this):

      psql -h <db server host> -p 5434 -U postgres data_sums
      data_sums=> GRANT ALL ON sum_tape TO production;
      data_sums=> GRANT ALL ON sum_ds_index_seq,sum_seq TO production;
      data_sums=> GRANT ALL ON sum_file,sum_group,sum_main,sum_open TO production;
      data_sums=> GRANT ALL ON sum_partn_alloc,sum_partn_avail TO production;

    6. SUMS data files are organized into "partitions" which are implemented as directories. Each partition must be named /SUM[0-9]* (e.g., /SUM, /SUM0, /SUM101). Each directory must be owned by the production linux user (e.g., "production). The file-system group to which the directories belong, the SUMS user group (e.g., SOI) must also contain all DRMS users. So, if linux user art will be using DRMS and running DRMS modules, then art must be a member of the SUMS user group. You are free to create as few or many of these partitions as you desire. Create these directories now.

      NOTE: Please avoid using file systems that limit the number of directories and/or files. For example, the EXT3 file system limits the number of directories to 64K. That number is far too small for SUMS usage.

    7. Initialize the sum_partn_avail table with the names of these partitions. For each SUMS partition run the following:

      psql -h <db server host> -p 5434 -U postgres data_sums
      data_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);

      where <SUMS partition path> is the full path of the partition (the path must be enclosed in single quotes) and <avail bytes> is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

  9. Build the SUMS binaries:

    su - <production user>; cd $JSOCROOT; ./configure; make sums

  10. Copy the sum_chmown program to <path to sum_chmown> (chosen in step 1a. above), make the production user the owner, and give it setuid privileges:

    su - root
    cp $JSOCROOT/drms/_linux_x86_64/base/sums/apps/sum_chmown <path to sum_chmown>
    chown root:root <path to sum_chmown>
    chmod u+s <path to sum_chmown>

  11. Start SUMS:

    $JSOCROOT/base/sums/scripts/sum_start.NetDRMS

    The script does not return a prompt after echoing "sum_svc now available". Just hit RETURN.

  12. To stop SUMS for any reason, run this script:

    $JSOCROOT/base/sums/scripts/sum_stop.NetDRMS

The configuration and compilation of NetDRMS described here can proceed largely independently of the site and/or user setup, which only needs to be done once. It is recommended that the site setup be done first, as the NetDRMS build requires the definition of certain site-dependent names, such as those of the database and server; however, if these names are already known, the libraries can be built without the database and SUMS storage in place. Any code that requires access to the database will not of course function until the DRMS and SUMS services have been set up.

These instructions assume that there is already a NetDRMS database server and associated SUMS server that you can connect to. If that is not the case, then you or someone else at your site will first have to do a Site Installation. You must also have the PostgreSQL Core installed at least as a client library on any machine on which you intend to build the package. You should have psql in your path.

Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms. Most Recent Version (7.0) Current and Earlier Versions The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB. In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case. In the target root directory $DRMS, run

  • /configure

This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.) The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default): linux_ia32 (uname -s = Linux, uname -m = ia32 | i686 | i386) linux_x86_64 (uname -s = Linux, uname -m = x86_64) The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.

If you are making on any other architecture, the target name will be custom. Binaries and libraries will be placed in appropriate subdirectories based on these names. If you will be making on multiple architectures, or if you wish to change the target architecture name, you should either add the following line near the beginning of the file $DRMS/make_basic.mk

  • JSOC_MACHINE = name

or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts. If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:

  • COMPILER = icc FCOMPILER= ifort

Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default. N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*. In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps. These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it. At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.

There are two parts to setting up NetDRMS. First, the necessary services must be set up at the institution or group that will be hosting the NetDRMS service. The basic preparation and installation only needs to be done once, although the actual software distribution may be updated from time to time without affecting the setup. Second, individual users may wish to set up the NetDRMS software distribution for use or development in their own environment. Again, there are a few administrative tasks that need to be performed once when a user is registered, but the software may be updated or rebuilt at any time. Once the site preparation and setup is complete, user setup is a simple task, so there are two sets of instructions. Most users only need to concern themselves with the second, Installing / Upgrading NetDRMS.

old stuff below

Building Your Own DRMS and SUMS

Sites other than the JSOC can DRMS data series. They can maintain local copies of the DRMS and SUMS data created at the JSOC. And they can create their own DRMS data, of which other sites can maintain local copies. To participate in this network of sites sharing data, a site (aka a node) must install a DRMS/SUMS system to become a NetDRMS site. Once a member of a this network, a NetDRMS site can selectively share specific data series - it is not necessary to share all series.

There are three fundamental requiremants for setting up and operating a DRMS system:

  • Reserved disk space to serve as the SUMS disk cache.
  • A database server running Postgres version 8.4.
  • A "current" copy of the JSOC software tree, available from Stanford.

Setting up a SUMS

The SUMS disk area can be as simple as a directory, but it is probably better to assign at least one disk partition to the SUMS cache. Unless a tape library also exists, the SUMS partition(s) must be large enough to store all the data segments in the DRMS that are to be archived locally. For datasets for which other DRMS servers provide the permanent archive, the local SUMS will serve only as a local cache, so size is dictated by expected usage.

The directory or directories to be used for SUMS must be owned by a user named production (can be any uid) and belong to a group named SOI (can be any gid), and have a permissions mask of 8354 (drwxrwsr-x). The group SOI should include as members any users who will be writing data into the DRMS by running modules or otherwise.

Setting up the Postgres Database server

You should have Postgres Version 8.1 or higher installed; JSOC database servers are currently (Oct 2006) running on the following systems:

  • a 64-bit dual-core xeon running Red Hat Enterprise Linux 4 with Postgres v. 8.1.2
  • a 32-bit dual-core pentium 4 running Scientific Linux (?; equinox) with Postgres v. 8.1.4

Populating the Database

First, you must create the database tables required for SUMS. You can do so by running the following psql commands:

create table SUM_MAIN (
 ONLINE_LOC             VARCHAR(80) NOT NULL,
 ONLINE_STATUS          VARCHAR(5),
 ARCHIVE_STATUS         VARCHAR(5),
 OFFSITE_ACK            VARCHAR(5),
 HISTORY_COMMENT        VARCHAR(80),
 OWNING_SERIES          VARCHAR(80),
 STORAGE_GROUP          integer,
 STORAGE_SET            integer,
 BYTES                  bigint,
 DS_INDEX               bigint,
 CREATE_SUMID           bigint NOT NULL,
 CREAT_DATE             timestamp(0),
 ACCESS_DATE            timestamp(0),
 USERNAME               VARCHAR(10),
 ARCH_TAPE              VARCHAR(20),
 ARCH_TAPE_POS          VARCHAR(15),
 ARCH_TAPE_FN           integer,
 ARCH_TAPE_DATE         timestamp(0),
 WARNINGS               VARCHAR(260),
 STATUS                 integer,
 SAFE_TAPE              VARCHAR(20),
 SAFE_TAPE_POS          VARCHAR(15),
 SAFE_TAPE_FN           integer,
 SAFE_TAPE_DATE         timestamp(0),
 constraint pk_summain primary key (DS_INDEX)
);

create table SUM_OPEN (
    SUMID      bigint not null,
    OPEN_DATE  timestamp(0),
    constraint pk_sumopen primary key (SUMID)
);

create table SUM_PARTN_ALLOC (
    wd                 VARCHAR(80) not null,
    sumid              bigint not null,
    status             integer not null,
    bytes              bigint,
    effective_date     VARCHAR(20),
    archive_substatus  integer,
    group_id           integer,
    ds_index           bigint not null,
    safe_id            integer
);

create table SUM_PARTN_AVAIL (
       partn_name    VARCHAR(80) not null,
       total_bytes   bigint not null,
       avail_bytes   bigint not null,
       pds_set_num   integer not null,
       constraint pk_sumpartnavail primary key (partn_name)
);

create table SUM_TAPE (
        tapeid          varchar(20) not null,
        nxtwrtfn        integer not null,
        spare           integer not null,
        group_id        integer not null,
        avail_blocks    bigint not null,
        closed          integer not null,
        last_write      timestamp(0),
        constraint pk_tape primary key (tapeid)
);

create sequence SUM_SEQ
  increment 1
  start 2
  no maxvalue
  no cycle
  cache 50;

create sequence SUM_DS_INDEX_SEQ
  increment 1
  start 1
  no maxvalue
  no cycle
  cache 10;

create table SUM_FILE (
        tapeid          varchar(20) not null,
        filenum         integer not null,
        gtarblock       integer,
        md5cksum        varchar(36) not null,
        constraint pk_file primary key (tapeid, filenum)
       );

create table SUM_GROUP (
        group_id        integer not null,
        retain_days     integer not null,
        effective_date  VARCHAR(20),
        constraint pk_group primary key (group_id)
       );

(These are contained in the scripts create_tables.sql, sum_file.sql, and sum_group.sql in the JSOC software library base/sums/scripts/postgres.) For example, if you have created a database named mydb on a server named myserver (and had one of those scripts in your wd), you could enter the command

  psql -h myserver mydb -f create_tables.sql

Or you could simply enter the commands by hand. (You should be the database administrator when you create these tables.)

Remote SUMS

A local NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units associated with these data series. There are currently two methods to facilitate these downloads. The Java Mirroring Daemon (JMD) is a tool that can be installed and configured to download SUs automatically as the series data records are ingested into the local NetDRMS. It can obtain the SUs from any other NetDRMS that has the SUs, not just the NetDRMS that originally produced them.

JsocWiki: DRMSSetup (last edited 2024-01-19 09:08:03 by ArtAmezcua)