Differences between revisions 6 and 258 (spanning 252 versions)
Revision 6 as of 2013-02-26 04:55:34
Size: 17705
Editor: DNab4211fe
Comment:
Revision 258 as of 2020-01-16 05:48:02
Size: 77001
Editor: ArtAmezcua
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
Line 4: Line 3:

In order to process, archive, and distribute the substantial quantity of data flowing from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data management system. This system, the Data Record Management System (DRMS), consists of ''data series'', each of which is a collection of related data. For example, there exists a data series named hmi.M_45s, which contains the HMI 45-second cadence magnetograms. Each data series consists of several DRMS objects: records, keywords, segments, and links. A DRMS record is the smallest unit of data-series data. Typically, it represents data for a single observation in time (hence the term ''series'' in data series), but there is no restriction on how a user organizes their data. A data series may contain one or more DRMS keywords, each of which represents a named bit of metadata. For example, many data series contain a DRMS keyword named CRPIX1. A DRMS segment is a collection of data that contains storage/retrieval information needed by DRMS to locate auxiliary data files. These data files contain large sets of data like image arrays. Generally, they are image files, but what they contain is arbitrary and user-defined. A data series optionally contains one or more DRMS links, each of which is a collection of data that ''links'' the data series to other DRMS data series. Each DRMS record contains record-specific values for the DRMS keywords, segments, and links. In this way, one record may have one set of keyword, segment, and link values, and another record may have a different set of these values.

The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may ''point'' to more than one data file.

To manage all these data, DRMS comprises several components, one of which is a database instance in a relational-database management system (PostgreSQL). The DRMS Library code uses a database instance and several tables to implement the DRMS objects. For each data-series record, there exists a database table that contains one row per each DRMS record. The columns of each of these records contain the DRMS keyword, segment, and link values - bits of data that are all small enough to efficiently fit in a database record. The data-file data are too large to fit into a database record, so those data reside in data files in SUMS. The DRMS-segment values ''point'' to the data files, using a unique identifier called a SUNUM. SUMS itself comprises several components, one of which is another database instance that contains several database tables. When DRMS needs a data file, it ''requests'' the file from SUMS by providing SUMS with a SUNUM, and then SUMS consults its database tables to derive the path to the data file. SUMS shuttles files between hard disk (aka the disk cache) and tape, so data files have no permanent file path. Therefore, when DRMS requests the path to a file, SUMS must obtain the current path by consulting a database table.
In order to process, archive, and distribute the substantial quantity of solar data captured by the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data-management system, NetDRMS. This system comprises two PostgreSQL databases, multiple file systems, a tape back-up system, and software to manage these components. Related sets of data are grouped into data series, each, conceptually, a table of data where each row of data typically associated with an observation time, or a Carrington rotation. As an example, the data series hmi.M_45s contains the HMI 45-second cadence magnetograms, both observation metadata and image FITS files. The columns contain metadata, such as the observation time, the ID of the camera used to acquire the data, the image rotation, etc. One column in this table contains an ID that refers to a set of data files, typically a set of FITS files that contain images.

The Data Record Management System (DRMS) is the subsystem that contains and manages the "DRMS" database of metadata and data-file-locator information. One component is a software library, written in C, that provides client programs, also known as "DRMS modules", with an Application Programming Interface (API) that allows the users to access these data. The Storage Unit Management System (SUMS) is the subsystem that contains and manages the "SUMS" database and associated storage hardware. The database contains information needed to locate data files that reside on hardware. The entire system as a whole is typically referred to as DRMS. The user interfaces with the DRMS subsystem only, and the DRMS subsystem interfaces with SUMS - the user does not interact with SUMS directly. The JSOC provides NetDRMS to non-JSOC institutions so that those sites can take advantage of the JSOC-developed software to manage large amounts of solar data.

A NetDRMS site is an institution with a local NetDRMS installation. It does not generate the JSOC-owned production data series (e.g., hmi.M_720s, aia.lev1) that Stanford generates for scientific use. A NetDRMS site can generate its own data, production or otherwise. That site can create software that uses NetDRMS to generate its own data series. But it can also act as a "mirror" for individual data series. When acting as a mirror for a Stanford data series, the site downloads from Stanford DRMS database information and stores it in its own NetDRMS database, and it downloads SUMS files, and stores them in its own SUMS subsystem. As the data files are downloaded to the local SUMS, the SUMS database is updated with the information needed to manage the data files. It is possible for a NetDRMS site to mirror the DRMS data of any other NetDRMS site, but at this point, the only site whose data are currently mirrored is the Stanford JSOC.
Line 13: Line 10:

=== Installing NetDRMS for the First Time ===

The initial installation of NetDRMS requires X. First, you will need to create a few linux users and groups, giving them the needed permissions (see X below). Second, you will need to install the PostgreSQL Relational Database Management System and create two databases (see X below). Third, you will need to establish disk storage for SUMS (see X below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order:

1. Set up the environment (to be done by a superuser)
a. Create a ''production'' linux user (named production by default). If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:

{{{<production user> <host>=NOPASSWD:<path to sum_chmown>}}}

This will allow sum_chmown to be run without a password prompt being presented.
b. Create a linux group to which the production user belongs. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
c. Make sure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file and put it in the production user's home directory. Please see XXX for information on how to do this.




The configuration and compilation of NetDRMS described here can proceed largely independently of the site and/or user setup, which only needs to be done once. It is recommended that the site setup be done first, as the NetDRMS build requires the definition of certain site-dependent names, such as those of the database and server; however, if these names are already known, the libraries can be built without the database and SUMS storage in place. Any code that requires access to the database will not of course function until the DRMS and SUMS services have been set up.

These instructions assume that there is already a NetDRMS database server and associated SUMS server that you can connect to. If that is not the case, then you or someone else at your site will first have to do a Site Installation. You must also have the PostgreSQL Core installed at least as a client library on any machine on which you intend to build the package. You should have psql in your path.

Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms.
Most Recent Version (7.0)
Current and Earlier Versions
The size of the source distribution is currently (V 7.0) about 10 MB. A built system (including SUMS) is typically about 300 MB.
In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However, you create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case.
In the target root directory $DRMS, run
  ./configure
This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.)
The NetDRMS distribution is currently supported for two target architectures under Linux, named (by default):
linux_ia32 (`uname -s` = Linux, `uname -m` = ia32 | i686 | i386)
linux_x86_64 (`uname -s` = Linux, `uname -m` = x86_64)
The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.

If you are making on any other architecture, the target name will be custom. Binaries and libraries will be placed in appropriate subdirectories based on these names. If you will be making on multiple architectures, or if you wish to change the target architecture name, you should either add the following line near the beginning of the file $DRMS/make_basic.mk
  JSOC_MACHINE = name
or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts.
If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:
  COMPILER = icc
  FCOMPILER= ifort
Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. For Macs, the default compiler is gcc. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default.
N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*.
In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps.
These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

To make the SUMS server available, the SUMS manager (only) needs to run make sums in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it.
At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go.



There are two parts to setting up NetDRMS. First, the necessary services must be set up at the institution or group that will be hosting the NetDRMS service. The basic preparation and installation only needs to be done once, although the actual software distribution may be updated from time to time without affecting the setup. Second, individual users may wish to set up the NetDRMS software distribution for use or development in their own environment. Again, there are a few administrative tasks that need to be performed once when a user is registered, but the software may be updated or rebuilt at any time. Once the site preparation and setup is complete, user setup is a simple task, so there are two sets of instructions. Most users only need to concern themselves with the second, Installing / Upgrading NetDRMS.


old stuff below
== Building Your Own DRMS and SUMS ==

Sites other than the JSOC can DRMS data series. They can maintain local copies of the DRMS and SUMS data created at the JSOC. And they can create their own DRMS data, of which other sites can maintain local copies. To participate in this network of sites sharing data, a site (aka a node) must install a DRMS/SUMS system to become a NetDRMS site. Once a member of a this network, a NetDRMS site can selectively share specific data series - it is not necessary to share all series.

There are three fundamental requiremants for setting up and operating a DRMS system:

 * Reserved disk space to serve as the SUMS disk cache.
 * A database server running Postgres version 8.4.
 * A "current" copy of the JSOC software tree, available from Stanford.

== Setting up a SUMS ==

The SUMS disk area can be as simple as a directory, but it is probably better to assign at
least one disk partition to the SUMS cache. Unless a tape library also exists, the SUMS
partition(s) must be large enough to store all the data segments in the DRMS that are to be
archived locally. For datasets for which other DRMS servers provide the permanent archive,
the local SUMS will serve only as a local cache, so size is dictated by expected usage.

The directory or directories to be used for SUMS must be owned by a user named '''production'''
(can be any uid) and belong to a group named '''SOI''' (can be any gid), and have a permissions
mask of 8354 (''drwxrwsr-x''). The group '''SOI''' should include as members any users who
will be writing data into the DRMS by running modules or otherwise.

== Setting up the Postgres Database server ==

You should have Postgres Version 8.1 or higher installed; JSOC database servers are
currently (Oct 2006) running on the following systems:
  * a 64-bit dual-core xeon running Red Hat Enterprise Linux 4 with Postgres v. 8.1.2
  * a 32-bit dual-core pentium 4 running Scientific Linux (?; equinox) with Postgres v. 8.1.4

== Populating the Database ==

First, you must create the database tables required for SUMS. You can do so by running the
following psql commands:

{{{
create table SUM_MAIN (
 ONLINE_LOC VARCHAR(80) NOT NULL,
 ONLINE_STATUS VARCHAR(5),
 ARCHIVE_STATUS VARCHAR(5),
 OFFSITE_ACK VARCHAR(5),
 HISTORY_COMMENT VARCHAR(80),
 OWNING_SERIES VARCHAR(80),
 STORAGE_GROUP integer,
 STORAGE_SET integer,
 BYTES bigint,
 DS_INDEX bigint,
 CREATE_SUMID bigint NOT NULL,
 CREAT_DATE timestamp(0),
 ACCESS_DATE timestamp(0),
 USERNAME VARCHAR(10),
 ARCH_TAPE VARCHAR(20),
 ARCH_TAPE_POS VARCHAR(15),
 ARCH_TAPE_FN integer,
 ARCH_TAPE_DATE timestamp(0),
 WARNINGS VARCHAR(260),
 STATUS integer,
 SAFE_TAPE VARCHAR(20),
 SAFE_TAPE_POS VARCHAR(15),
 SAFE_TAPE_FN integer,
 SAFE_TAPE_DATE timestamp(0),
 constraint pk_summain primary key (DS_INDEX)
);

create table SUM_OPEN (
    SUMID bigint not null,
    OPEN_DATE timestamp(0),
    constraint pk_sumopen primary key (SUMID)
);

create table SUM_PARTN_ALLOC (
    wd VARCHAR(80) not null,
    sumid bigint not null,
    status integer not null,
    bytes bigint,
    effective_date VARCHAR(20),
    archive_substatus integer,
    group_id integer,
    ds_index bigint not null,
    safe_id integer
);

create table SUM_PARTN_AVAIL (
       partn_name VARCHAR(80) not null,
       total_bytes bigint not null,
       avail_bytes bigint not null,
       pds_set_num integer not null,
       constraint pk_sumpartnavail primary key (partn_name)
);

create table SUM_TAPE (
        tapeid varchar(20) not null,
        nxtwrtfn integer not null,
        spare integer not null,
        group_id integer not null,
        avail_blocks bigint not null,
        closed integer not null,
        last_write timestamp(0),
        constraint pk_tape primary key (tapeid)
);

create sequence SUM_SEQ
  increment 1
  start 2
  no maxvalue
  no cycle
  cache 50;

create sequence SUM_DS_INDEX_SEQ
  increment 1
  start 1
  no maxvalue
  no cycle
  cache 10;

create table SUM_FILE (
 tapeid varchar(20) not null,
 filenum integer not null,
 gtarblock integer,
 md5cksum varchar(36) not null,
 constraint pk_file primary key (tapeid, filenum)
       );

create table SUM_GROUP (
 group_id integer not null,
 retain_days integer not null,
 effective_date VARCHAR(20),
 constraint pk_group primary key (group_id)
       );
}}}

(These are contained in the scripts '''create_tables.sql''', '''sum_file.sql''', and
'''sum_group.sql''' in the JSOC software library '''base/sums/scripts/postgres'''.) For example,
if you have created a database named ''mydb'' on a server named ''myserver'' (and had
one of those scripts in your ''wd''), you could enter the command

{{{
  psql -h myserver mydb -f create_tables.sql
}}}

Or you could simply enter the commands by hand. (You should be the database administrator
when you create these tables.)
Installing the NetDRMS system requires:
 * installing PostgreSQL [ [[#install-pg|Installing PostgreSQL]] ]
 * instantiating a PostgreSQL cluster for two databases (one for DRMS and one for SUMS) [ [[#initialize-pg|Initializing PostgreSQL]] ]
 * installing CFITSIO [ [[#install-cfitsio|Installing CFITSIO] ]]
 * installing the DBD::Pg Perl package [ [[#install-perl-dbdpg|Installing DBD::Pg]] ]
 * installing packages to the system Python 3, or installing a new distribution, like Anaconda [ [[#install-python3|Installing Python3]] ]
 * installing {{{openssl}}} development packages [ [[#install-openssldev|Installing OpenSSL Development Packages]] ]
 * installing the NetDRMS software code tree, which includes code to create DRMS libraries and modules and SUMS libraries [ [[#install-netdrms|Installing NetDRMS]] ]
 * initializing SUMS storage such as hard drives or SSD drives [ [[#initialize-sums-disk|Initializing SUMS Storage]] ]
 * running the SUMS daemon (which accepts and processes SUMS requests from DRMS clients) [ [[#run-sums|Running SUMS]]]
 * creating DRMS user accounts [ [[#create-users|Creating DRMS User Accounts]]]

Optional steps include:
 * registering for JSOC-data-series subscriptions and running NetDRMS software to receive, in real time, data updates [ [[#register-subscriptions|Registering for Subscriptions]] ]
 * installing JSOC-specific project code that is not part of the base NetDRMS installation; the JSOC maintains code to generate JSOC-owned data that is not generally of interest to NetDRMS sites, but sites are welcome to obtain downloads of that code. Doing so involves additional configuration to the base NetDRMS system.
 * installing Slony PostgreSQL data-replication software to become a provider of your site's data
 * installing a webserver that hosts several NetDRMS CGIs to allow web access to your data
 * installing the Virtual Solar Observatory (VSO) software to become a VSO provider of data

For best results, and to facilitate debugging issues, please follow these steps in order.

<<Anchor(install-pg)>>
=== Installing PostgreSQL ===
PostgreSQL is a relational database management system. Data are stored primarily in relations (tables) of records that can be ''mapped'' to each other - given one or more records, you can query the database to find other records. These relations are organized on disk in a hierarchical fashion. At the top level are one or more database ''clusters''. A cluster is simply a storage location on disk (i.e., directory). PostgreSQL manages the cluster's data files with a single process, or PostgreSQL ''instance''. Various operations on the cluster will result in PostgreSQL forking new ephemeral child processes, but ultimately there is only one master/parent process per cluster.

Each cluster contains the data for one or more databases. Each cluster requires a fair amount of system memory, so it makes sense to install a single cluster on a single host. It does ''not'' make sense to make separate clusters, each holding one database; each cluster can efficiently support many databases, which are then fairly independent of each other. ''In terms of querying'' the databases are completely independent (i.e., a query on one database cannot involve relations in different databases). However, two databases in a single cluster ''do'' share the same disk directory, so there is not the same degree of independence at the OS/filesystem level. This may only matter if an administrator is operating directly on the files (performing backups, replication, creating standby systems, etc.).

To install PostgreSQL, select a host machine, {{{<PostgreSQL host>}}}, to act as the PostgreSQL database server. We recommend installing ''only'' PostgreSQL on this machine, given the large amount of memory and resources required for optimal PostgreSQL operation. We find a Fedora-based system, such as CentOS, to be a good choice, but please visit [[https://www.postgresql.org/docs]] for system requirements and other information germane to installation. The following instructions assume a Fedora-based Linux system such as CentOS (documentation for other distributions, such as Debian and openSUSE can be found online) and a bash shell.<<BR>><<BR>>
Install the needed PostgreSQL server packages on {{{<PostgreSQL host>}}} by first visiting [[https://yum.postgresql.org/repopackages.php]] to locate and download the PostgreSQL "repo" rpm file appropriate for your OS and architecture. Each repo rpm contains a {{{yum}}} configuration file that can be used to install all supported PostgreSQL releases. You should install the latest version if possible (version 12, as of the time of this writing). Although you can use your browser to download the file, it might be easier to use Linux command-line tools:
{{{
$ curl -OL https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
}}}
Install the yum repo configuration file ({{{pgdg-redhat-all.repo}}}) from the downloaded repo rpm file:
{{{
$ sudo rpm -i pgdg-redhat-repo-latest.noarch.rpm
}}}
This installs the repo configuration file to {{{/etc/yum.repos.d/}}}. Find the names of the PostgreSQL packages needed from the repository; the following assumes PostgreSQL 12, but should you want to install an older version, replace "12" with one of 94, 95, 96, 10, or 11:
{{{
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql[0-9]*\.' | cut -d '.' -f 1
postgresql12
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*devel\.' | cut -d '.' -f 1
postgresql12-devel
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*contrib\.' | cut -d '.' -f 1
postgresql12-contrib
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*libs\.' | cut -d '.' -f 1
postgresql12-libs
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*plperl\.' | cut -d '.' -f 1
postgresql12-plperl
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*server\.' | cut -d '.' -f 1
postgresql12-server
}}}
Use {{{yum}}} to install all four packages:
{{{
$ sudo yum install <packages>
}}}
where {{{<packages>}}} are the package names determined in the previous step ({{{postgresql12 postgresql12-contrib postgresql12-devel postgresql12-libs postgresql12-plperl postgresql12-server}}}).
The rpm package installation will have created the PostgreSQL superuser Linux account {{{<PostgreSQL superuser>}}} (i.e., {{{postgres}}}); {{{<PostgreSQL superuser>}}} will own the PostgreSQL database clusters and server processes that will be created in the following steps. To perform the next steps, you will need to become user {{{<PostgreSQL superuser>}}}:
{{{
$ sudo su - <PosgreSQL superuser>
}}}
Depending on where the package files are installed, you might need to add the PostgreSQL command to your {{{PATH}}} environment variable. To test this, run:
{{{
$ which initdb
}}}
If the {{{initdb}}} command cannot be found, then add the PostgreSQL binaries path to {{{PATH}}}. Find the path to the PostgreSQL installation:
{{{
$ rpm -ql postgresql12
<PostgreSQL install dir>/bin/clusterdb
...
}}}
In this example, {{{<PostgreSQL install dir>}}} is {{{/usr/pgsql-12}}}. Then add the binary path to {{{PATH}}}:
{{{
$ export PATH=/usr/pgsql-12/bin:$PATH
}}}
<PosgreSQL superuser> will be using the binaries in this directory, so it is a good idea to add the export command to {{{.bashrc}}}.
As described above, create one database cluster for the two databases (one for DRMS data, and one for SUMS data):
{{{
$ whoami
postgres
$ initdb --locale=C -D <PostgreSQL cluster>
}}}
where {{{<PostgreSQL cluster>}}} should be {{{/var/lib/pgsql/netdrms}}}. Use this path, unless there is some good reason you cannot. {{{initdb}}} will initialize the cluster data directory (identified by the {{{-D}}} argument). This will result in the creation of template databases, configuration files, and other items.<<BR>><<BR>>
The database cluster will contain two configuration files you need to edit: {{{postgresql.conf}}} and {{{pghba.conf}}}. Please refer to the PostgreSQL documentation to properly edit these files. Here are some brief suggestions:
 * {{{postgresql.conf}}} - for changes to take effect of parameters marked with a ^*^, a restart of a running server instance is required ({{{pg_ctl restart}}}), otherwise changes will require a reload ({{{pg_ctl reload}}})
  * {{{listen_addresses}}}^*^ specifies the interface on which the {{{postgres}}} server processes will listen for incoming connections. You will need to ensure that connections can be made from all machines that will run DRMS modules (the modules connect to both the DRMS and SUMS databases), so change the default {{{'localhost'}}} to {{{'*'}}}, which causes the servers to listen on all interfaces:
  {{{
listen_addresses = '*'
  }}}
  * {{{port}}}^*^ is the server port on which the server listens for connections.
  {{{
port = <PostgreSQL port>
  }}}
  The default port is {{{5432}}}, and unless there is a good reason use {{{5432}}} for {{{<PostgreSQL port>}}}
  * {{{logging_collector}}}^*^ controls whether on not stdout and stderr are logged to a file in the database cluster (in the {{{log}}} or {{{pg_log}}} directory, depending on release). By default it is off - set it so {{{on}}} in each cluster.
  {{{
logging_collector = on
  }}}
  * {{{log_rotation_size}}} sets the maximum size, in kilobytes, of a log file. Set this to {{{0}}} to disable rotation, otherwise a new log will be created after the current one grows to some size.
  * {{{log_rotation_age}}} set the maximum age, in minutes, of a log file. Set this to {{{1d}}} (1 day) so that each day a new log file is created.
  {{{
log_rotation_age = 1d
  }}}
  * {{{log_min_duration_statement}}} is the amount of time, in milliseconds, a query must run before triggering a log entry. Set to this 1000 so that only long-running queries, over a second, will be logged.
  * {{{shared_buffers}}}^*^ is the size of shared-memory buffers. For a server dedicated to a single database cluster, this should be about 25% of the total memory.
  {{{
shared_buffers = 32GB
  }}}
 * {{{pg_hba.conf}}} controls the methods by which client authentication is achieved (HBA stands for host-based authentication). It will likely take a little time to understand and properly edit this configuration file. If you are not familiar with networking concepts (such as subnets, name resolution, reverse name resolution, CIDR notation, IPv4 versus IPv6, network interfaces, etc.) then now is the time to become familiar.<<BR>><<BR>>This configuration file contains a set of columns that identify which user can access which database from which machines. It also defines the method by which authenticaton occurs. When a user attempts to connect to a database, the server transverses this list looking for the ''first'' row that matches. Once this row is identified, the user must authenticate - if authentication fails, the connection is rejected. The server does '''''not''''' attempt additional rows.
 For changes to take effect of any of the parameters in this file, a {{{reload}}} of a server instance is required (not a {{{restart}}})
Here are the recommended entries:
 {{{
# local superuser connections
# TYPE DATABASE USER AUTH-METHOD
  local all all trust # this applies ONLY if the user is logged into the PG server AND they do not use the -h argument to psql
  host all all 127.0.0.1/8 trust # for -h localhost, if localhost resolves to an IPv4 address; also for -h 127.0.0.1
  host all all ::1/128 trust # for -h localhost, if localhost resolves to an IPv6 address; also for -h ::1

# non-local superuser connections
# TYPE DATABASE USER ADDRESS AUTH-METHOD
  host all postgres XXX.XXX.XXX.XXX/YY trust

# non-superuser connections (which can be made from any non-server machines only)
# TYPE DATABASE USER ADDRESS AUTH-METHOD
  host netdrms all XXX.XXX.XXX.XXX/YY md5
  host netdrms_sums all XXX.XXX.XXX.XXX/YY md5
 }}}
 where the columns are defined as follows:
 * {{{TYPE}}} - this column defines the type of socket connection made (Unix-domain, TCP/IP, the encryption used, etc.). {{{local}}} is ''only'' relevant to Unix-domain local connections from the database server host {{{<PostgreSQL host>}}} itself. Since only {{{postgres}}} will log into the database server, the first row above applies to the administrator only. {{{host}}} is ''only'' relevant to TCP/IP connections, regardless of the encryption status of the connection.
 * {{{DATABASE}}} - this column identifies the database to which the user has access. Whenever a user attempt to connect to the database server, they specify a database to access. That database must be in the DATABASE column. We recommend using {{{netdrms}}} for non-superusers, blocking such users from accessing all databases except the DRMS one. Conversely, we recommend using {{{all}}} for the superuser so they can access both the DRMS and SUMS databases (and any other that might exist).<<BR>><<BR>>
 NOTE: You are using the database name {{{netdrms}}} in {{{pg_hba.conf}}}, even though you have not actually created that database yet. This is OK; you will do so once you start the PostgreSQL cluster instance.
 * {{{USER}}} - this column identifies which users can access the specified databases.
 * {{{ADDRESS}}} - this column identifies the host IP addresses (or host names, but do not use those) from which a connection is allowed. To specify a range of IP addresses, such as those on a subnet, use a CIDR address. This column should be empty for {{{local}}} connections.
 * {{{AUTH-METHOD}}} - this column identifies the type authentication to use. We recommend using either {{{trust}}} or {{{md5}}}. When {{{trust}}} is specified, PostreSQL will unconditionally accept the connection to the database specified in the row. If {{{md5}}} is specified, then the user will be required to provide a password. If you follow the recommendations above, then for the {{{local}}} row, any user who can log into the database server can access any database in the cluster without any further authentication. Generally only a superuser will be able to log into the database server, so this choice makes sense. For non-{{{local}}} connections by {{{postgres}}}, the Linux PostreSQL superuser {{{postgres}}} can access any database on the server without further authentication. For the remaining non-{{{local}}} non-{{{postgres}}} connections, users will need to provide a password.
Should you need to edit either of these configuration files AFTER you have started the database instance (by running {{{pg_ctl start}}}, as described in the next section), you will need to either {{{reload}}} or {{{restart}}} the instance:
{{{
$ whoami
postgres
# reload
$ pg_ctl reload -D <PostgreSQL cluster>
# restart
$ pg_ctl restart -D <PostgreSQL cluster>
}}}

<<Anchor(initialize-pg)>>
=== Initializing PostgreSQL ===
You need to now initialize your PostgreSQL instance by creating the DRMS and SUMS databases, installing database-server languages, creating a schema, creating a relation. To accomplish this become {{{<PostgreSQL superuser>}}}; all steps in this section must be performed by the superuser:
{{{
$ sudo su - postgres
}}}
Start the database instance for the cluster you created:
{{{
$ whoami
postgres
$ pg_ctl start -D <PostgreSQL cluster>
}}}
You previous created {{{<PostgreSQL cluster>}}}, which will most likely be {{{/var/lib/pgsql/netdrms}}}. Ensure the configuration files you created work. This can be done by attempting to connect to the database server as {{{<PostgreSQL superuser>}}} with {{{psql}}} from {{{<PostgreSQL host>}}}:
{{{
$ whoami
<PostgreSQL superuser>
$ hostname
<PostgreSQL host>
$ psql -p <PostgreSQL port>
psql (12.1)
Type "help" for help.

postgres=#
}}}
You should not see any errors, and you should see the {{{postgres=#}}} superuser {{{psql}}} prompt. After you encounter no errors, create the two databases:
{{{
$ whoami
postgres
# create the DRMS database
$ createdb --locale C -E UTF8 -T template0 netdrms
# create the SUMS database
$ createdb --locale C -E UTF8 -T template0 netdrms_sums
}}}
Install the required database-server languages:
{{{
$ whoami
postgres
# create the PostgreSQL scripting language (versions <= 9.6)
# no need to create the PostgreSQL scripting language (versions > 9.6)
$ createlang plpgsql netdrms
# create the "trusted" perl language (versions <= 9.6)
createlang -h <PostgreSQL host> plperl netdrms
# create the "trusted" perl language (versions > 9.6)
$ psql -p <PostgreSQL port> netdrms
netdrms=# CREATE EXTENSION IF NOT EXISTS plperl;
netdrms=# \q
# create the "untrused" perl language (versions <= 9.6)
$ createlang -h <PostgreSQL host> plperlu netdrms
# create the "untrused" perl language (versions > 9.6)
netdrms=# CREATE EXTENSION IF NOT EXISTS plperlu;
netdrms=# \q
}}}
The SUMS database does not use any language extensions so there is no need to create any for the SUMS database.

<<Anchor(install-cfitsio)>>
=== Installing CFITSIO ===
The base NetDRMS release requires CFITSIO, a {{{C}}} library used by NetDRMS to read and write FITS files. Visit [[https://heasarc.gsfc.nasa.gov/fitsio/]] to obtain the link to the CFITSIO source-code tarball. Create an installation directory {{{<CFITSIO install dir>}}} (such as {{{/opt/cfitsio-X.XX}}}, with a link from {{{/opt/cfitsio}}} to {{{<CFITSIO install dir>}}}), download the tarball, and extract the tarball into {{{<CFITSIO install dir>}}}:
{{{
$ sudo mkdir -p <CFITSIO install dir>
$ ctrl+d
}}}
{{{
$ curl -OL 'http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio-X.XX.tar.gz'
$ tar xvzf netdrms_X.X.tar.gz
$ cd cfitsio-X.XX
}}}
Please read the {{{README}}} file for complete installation instructions. As a quick start, run:
{{{
$ ./configure --prefix=<CFITSIO install dir>
# build the CFITSIO library
$ make
# install the libraries and binaries to <CFITSIO install dir>
$ sudo make install
# create the link from cfitsio to <CFITSIO install dir>
$ sudo su -
$ cd <CFITSIO install dir>/..
$ ln -s <CFITSIO install dir> cfitsio
}}}

CFITSIO has a dependency on {{{libcurl}}} - in fact, any program made by linking to {{{cfitsio}}} will also require {{{libcurl-devel}}} since {{{cfitsio}}} uses the {{{libcurl}}} API. We recommend using {{{yum}}} to install the two packages (if they are not already installed - it is quite likely that {{{libcurl}}} will already be installed):
{{{
$ sudo yum install libcurl-devel
}}}
<<Anchor(install-openssldev)>>
=== Installing OpenSSL Development Packages ===
NetDRMS requires the OpenSSL Developer's API. If this API has not already been installed, do so now:
{{{
$ sudo yum install openssl-devel
}}}

<<Anchor(install-perl-dbdpg)>>
=== Installing DBD::Pg ===
One step in the installation process will require running a {{{perl}}} script that accesses the PostgreSQL database. In order for this to work, you will need to ensure the {{{DBD::Pg}}} module has been installed. To check for installation, run:
{{{
$ perl -M'DBD::Pg'
}}}
If there is no error about not being able to locate the module, and the command simply hangs, then you are all set (enter {{{ctrl-C}}} to exit). If the module is not installed, and you are running the {{{perl}}} installed with the system, then run {{{yum}}} to identify the package:
{{{
$ yum list | grep -i 'dbd-pg'
...
perl-DBD-Pg.x86_64 2.19.3-4.el7 base
...
}}}

then bringing to bear all your powers of divination, choose the correct package and install it:
{{{
$ sudo yum install 'perl-DBD-Pg'
}}}

If using a non-system {{{perl}}}, use the distro's installation method. If the distro does not have that module, or the distro installer does not work, as a final act of desperation use CPAN
{{{
$ sudo perl -MCPAN -e 'install DBD::Pg'
}}}

<<Anchor(install-python3)>>
=== Installing Python3 ===
NetDRMS requires that a number of {{{python}}} packages and modules be present that are not generally part of a system installation. In addition, many scripts require {{{python3}}} and not {{{python2}}}. The easiest way to satisfy these eeds is to install a ''data-science''-oriented {{{python3}}} distribution, such as {{{Anaconda}}}. In that vein, install {{{Anaconda}}} into an appropriate installation directory such as {{{/opt/anaconda3}}}. To locate the {{{Linux}}} installer, visit [[https://docs.anaconda.com/anaconda/install/linux/]]:
{{{
$ curl -OL 'https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh'
$ sha256sum Anaconda3-2019.10-Linux-x86_64.sh
46d762284d252e51cd58a8ca6c8adc9da2eadc82c342927b2f66ed011d1d8b53 Anaconda3-2019.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2019.10-Linux-x86_64.sh
}}}
After some initial prompts, the installer will display
{{{
PREFIX=/home/<user>/anaconda3
}}}
This path is the default installation directory ({{{<user>}}} is the user running {{{bash}}}). Replace the {{{PREFIX}}} path with {{{<Anaconda3 install dir>}}}.

<<Anchor(install-netdrms)>>
=== Installing NetDRMS ===
To install NetDRMS, you will need to select an appropriate machine on which to install NetDRMS, an appropriate machine/hardware on which to host the SUMS service, create Linux users and groups, download the NetDRMS release tarball and extract the release source, initialize the Linux environment, create log directories, create the configuration file and run the configuration script, compile and install the executables, create the the DRMS- and SUMS-database users/relations/functions/objects, initialize the SUMS storage hardware, install the SUMS and Remote SUMS daemons.

The optimal hardware configuration will likely depend on your needs, but the following recommendations should suffice for most sites. DRMS and SUMS can share a single host machine. The most widely used and tested Linux distributions are Fedora-based, and at the time of this writing, CentOS is the most popular. Sites have successfully used openSUSE too, but if possible, we would recommend using CentOS. SUMS requires a large amount of storage to hold the DRMS data-series data/image files. The amount needed can vary widely, and depends directly on the amount of data you wish to keep online at any given time. Most NetDRMS sites mirror some amount of (but not all) JSOC SDO data - the more data mirrored, the larger the amount of storage needed. To complicate matters, a site can also mirror only a subset of each data series' data; perhaps one site wishes to retain only the current month's data of many data series, but another wishes to retain all data for one or two series. To decide on the amount of storage needed, you will have to ask the JSOC how much data each series comprises and decide how much of that data you want to keep online. Data that goes offline can always be retrieved automatically from the JSOC again. Data will arrive each day, so request from the JSOC an estimate of the rate of data growth. We recommend doing a rough calculation based upon these considerations, and then doubling the resulting number and installing that amount of storage.

Next, create a production Linux user {{{<NetDRMS production user>}}} (named {{{netdrms_production}}} by default):
{{{
$ sudo useradd <NetDRMS production user>
$ sudo passwd <NetDRMS production user>
Changing password for user <NetDRMS production user>.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
$
}}}
NOTE: ensure that {{{<NetDRMS production user>}}} is a valid PostgreSQL ''name'' because NetDRMS makes use of the PostgreSQL feature whereby attempts to connect to a database are made as the database user whose name matches the name of the Linux user connecting to the database. Please see [[https://www.postgresql.org/docs/12/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]] for a description of valid PostgreSQL names.

As {{{<NetDRMS production user>}}}, you will be running various {{{python}}} components. As such, you should make sure that the {{{PYTHONPATH}}} environment variable is not already set, otherwise it might interfere with the running of {{{Anaconda python}}}. You might also need to modify your {{{PATH}}} environment variable to point to {{{Anaconda}}} and {{{PostgreSQL}}} executables. Modify your {{{.bashrc}}} to do so:
{{{
# add to <NetDRMS production user>'s .bashrc

# to ensure that <NetDRMS production user> uses Anaconda
unset PYTHONPATH

# python executables
export PATH=<Anaconda3 install dir>/bin:$PATH

# PostgreSQL executables
export PATH=<PostgreSQL install dir>/bin:$PATH
}}}

For the changes to {{{.bashrc}}} to take effect, either logout/login or source {{{.bashrc}}}.

NetDRMS requires additional {{{python}}} packages not included in the {{{Anaconda}}} distribution, but if you install {{{Anaconda}}}, then the number of additional packages you need to install is minimal. If you have a different {{{python}}} distribution, then you may need to install additional packages. To install new {{{Anaconda}}} packages, as {{{<NetDRMS production user>}}} first create a virtual environment for NetDRMS (named {{{netdrms}}}):
{{{
$ whoami
<NetDRMS production user>
$ conda create --name netdrms
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/netdrms_production/.conda/envs/netdrms

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate netdrms
#
# To deactivate an active environment, use
#
# $ conda deactivate
}}}

Then install the new {{{Anaconda}}} packages using {{{conda}}}:
{{{
$ whoami
<NetDRMS production user>
$ conda install -n netdrms psycopg2 psutil python-dateutil
Collecting package metadata (current_repodata.json): done
...
}}}

In order for {{{conda}}} to succeed when installing {{{psycopg2}}}, {{{PostgreSQL}}} must have already been installed, and the {{{PostgreSQL}}} executables must be in the {{{PATH}}} environment variable (one step in the installation process is running {{{pg_config}}}. The changes to {{{.bashrc}}} described above should take care of setting the path - make sure you have either re-logged-in or sourced {{{.bashrc}}}.

Create the Linux group {{{<SUMS users>}}}, e.g. {{{sums_users}}}, to which all SUMS users belong, including {{{<NetDRMS production user>}}}. This group will be used to ensure that all SUMS users can create new data files in SUMS:
{{{
$ sudo groupadd sums_users
}}}
Add {{{<NetDRMS production user>}}} to this group (later you will add each SUMS user - users who will read/write SUMS data files - to this group as well):
{{{
$ sudo usermod -a -G <SUMS users> <NetDRMS production user>
$ id <NetDRMS production user>
uid=1001(netdrms_production) gid=1001(netdrms_production) groups=1001(netdrms_production),1002(sums_users)
}}}

On {{{NetDRMS host}}}, select a path {{{<NetDRMS root>}}} for the root directory of the NetDRMS source tree installed by {{{<NetDRMS production user>}}}. A typical choice for {{{<NetDRMS root>}}} is {{{/opt/netdrms}}} or {{{/usr/local/netdrms}}}, and a typical strategy is to install the source tree into a directory that contains the release version in its name ({{{<NetDRMS install dir>}}}), e.g., {{{/opt/netdrms-9.3}}}. Then {{{<NetDRMS production user>}}} makes a link from {{{<NetDRMS root>}}} to {{{<NetDRMS install dir>}}}. This facilitates the maintenance of multiple releases. To switch between releases, {{{<NetDRMS production user>}}} simply updates the link to point to the desired release directory. Create {{{<NetDRMS install dir>}}} and make {{{<NetDRMS production user>}}} the owner:
{{{
$ sudo mkdir -p <NetDRMS install dir>
$ sudo chown <NetDRMS production user>:<NetDRMS production user> <NetDRMS install dir>
}}}

As {{{<NetDRMS production user>}}}, obtain a NetDRMS tarball from [[http://jsoc.stanford.edu/netdrms/dist/]] and extract it into a release-specific directory:
{{{
$ cd <NetDRMS install dir>
$ curl -OL 'http://jsoc.stanford.edu/netdrms/dist/netdrms_X.X.tar.gz'
$ tar xvzf netdrms_X.X.tar.gz
$ <ctrl-d>
}}}

Create the link from <NetDRMS root> to <NetDRMS install dir>:
{{{
$ cd <NetDRMS install dir>/..
$ sudo ln -s <NetDRMS install dir> netdrms
$
}}}

As {{{<NetDRMS production user>}}}, set the two environment variables that are needed for proper NetDRMS operation. To do so, you'll need first to determine the appropriate {{{<architecture>}}} string for one of these variables:
{{{
$ whoami
<NetDRMS production user>
$ cd <NetDRMS root>
$ build/jsoc_machine.csh
<architecture>
}}}
It is best to set the following two environment variables in {{{<NetDRMS production user>}}}'s {{{.bashrc}}} file since they must always be set whenever any NetDRMS code is run:
{{{
# .bashrc
export JSOCROOT=<NetDRMS root>
export JSOC_MACHINE=<architecture>
}}}

Make the SUMS log directory on the SUMS server machine. Various SUMS log files will be written to this directory. A suitable directory would reside in the {{{<NetDRMS production user>}}} user's home directory, e.g., {{{$HOME/log/SUMS}}}
{{{
$ whoami
<NetDRMS production user>
$ mkdir -p <SUMS logs>
}}}
Select appropriate {{{C}}} and {{{Fortran}}} compilers. The DRMS part of NetDRMS must be compiled with a {{{C}}} compiler. NetDRMS supports both the GNU {{{C}}} compiler ({{{gcc}}}), and the Intel {{{C++}}} compiler ({{{icc}}}). Certain JSOC-specific code requires {{{Fortran}}} compilation. For those projects, NetDRMS supports the GNU {{{Fortran}}} compiler ({{{gfortran}}}), and the Intel {{{Fortran}}} compiler ({{{ifort}}}). SUMS is implemented as a Python daemon, so no compilation step is needed. Both GNU and Intel are widely used, so feel free to use either. By default, Intel compilers are used. There are two methods for changing the compilers:
 * as the {{{<NetDRMS production user>}}}, you can set the following environment variables (we recommend doing so in {{{.bashrc}}}):
 {{{
# .bashrc

# set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
export COMPILER=icc

# set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
export FCOMPILER=ifort
 }}}
 * you can edit the {{{make_basic.mk}}} file in {{{<NetDRMS root>}}}; to select the Intel compilers, edit the {{{COMPILER}}} and {{{FCOMPILER}}} make variables declared near the top of the file:
 {{{
# use Intel compilers
COMPILER = icc
FCOMPILER = ifort
# use GNU compilers
COMPILER = gcc
FCOMPILER = gfortran
 }}}
Create the {{{<NetDRMS root>/config.local}}} configuration file, using {{{<NetDRMS root>/config.local.newstyle.template}}} as a template. This file contains a number of configuration parameters, along with detailed descriptions of what they control and suggested values for those parameters. The configuration script, {{{configure}}}, reads this file, and then creates one output file, {{{drmsparams.*}}}, in {{{<NetDRMS root>/localization}}} for each of several programming languages/tools ({{{C}}}, {{{GNU make}}}, {{{perl}}}, {{{python}}}, {{{bash}}}). In this manner, the parameters are directly readable by several languages/tools used by NetDRMS. Lines that start with whitespace or the hash symbol, {{{#}}} are ignored.

Several sections compose {{{config.local}}}:
{{{
__STYLE__
new

__DEFS__
# these are NetDRMS-wide parameter values; the format is <quote code>:<parameter name><whitespace>+<parameter value>;
# the configuration script uses <quote code> to assist in creating language-specific parameters; <quote code> is one of:
# 'q' (enclose the parameter value in double quotes)
# 'p' (enclose the parameter value in parentheses)
# 'a' (do not modify the parameter value).

__MAKE__
# these are make variables used by the make system during compilation - they generally contain paths to third-party code
}}}

Before creating {{{config.local}}}, please request from the JSOC a value for {{{DRMS_LOCAL_SITE_CODE}}}. This code uniquely identifies each NetDRMS installation. Each site requires one ID for each of its NetDRMS installations.

The {{{__MAKE__}}} section:
 * {{{BIN_PY3}}} - the path to the Python 3 {{{python}}} executable.
 * {{{DBNAME}}} - the name of the DRMS database: this is {{{netdrms}}}; this parameter exists in case you want to select a different name, but we don't recommend changing it.
 * {{{DRMSPGPORT}}} - the port that the DRMS database cluster instance is listening on: this is {{{<PostgreSQL port>}}}.
 * {{{DRMS_LOCAL_SITE_CODE}}} - a 15-bit hexadecimal string that globally and uniquely identifies the NetDRMS. Each NetDRMS requires a unique code for each installation. Values greater than or equal to {{{0x4000}}} denote a development installation and need not be unique. If you plan on generating data that will be distributed outside of your site, please obtain a unique value from the JSOC.
 * {{{DRMS_LOCK_DIR}}} - the directory to which the DRMS library writes various lock files.
 * {{{DRMS_LOG_DIR}}} - the directory to which the DRMS library writes various log files.
 * {{{EXPORT_LOG_DIR}}} - the directory to which export programs write logs.
 * {{{EXPORT_LOCK_DIR}}} - the directory to which export programs write lock files.
 * {{{EXPORT_HANDLE_DIR}}} - the directory to which export programs save handles.
 * {{{POSTGRES_ADMIN}}} - the Linux user that owns the PostgreSQL installation and processes: this is {{{<PostgreSQL superuser>}}}.
 * {{{JMD_IS_INSTALLED}}} - if set to 1, then the Java Mirroring Daemon alternative to Remote SUMS is used: this should be {{{0}}}.
 * {{{RS_BINPATH}}} - the NetDRMS binary path that contains the external programs needed by the Remote SUMS (e.g., {{{jsoc_fetch}}}, {{{vso_sum_alloc}}}, {{{vso_sum_put}}}).
 * {{{RS_DBHOST}}} - the name of the Remote SUMS database cluster host; this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed.
 * {{{RS_DBNAME}}} - the Remote SUMS database - this is {{{netdrms_remotesums}}}.
 * {{{RS_DBPORT}}} - the port that the Remote SUMS database cluster instance is listening on: this is {{{<PostgreSQL port>}}}
 * {{{RS_DBUSER}}} - the Linux user that runs Remote SUMS; this is also the database user who owns the Remote SUMS database objects: this is {{{<NetDRMS production user>}}}.
 * {{{RS_DLTIMEOUT}}} - the timeout, in seconds, for an SU to download. If the download time exceeds this value, then all requests waiting for the SU to download will fail.
 * {{{RS_LOCKFILE}}} - the (advisory) lockfile used by Remote SUMS to prevent multiple instances from running.
 * {{{RS_LOGDIR}}} - the directory in which remote-sums log files are written.
 * {{{RS_MAXTHREADS}}} - the maximum number of SUs that Remote SUMS can process simultaneously.
 * {{{RS_N_WORKERS}}} - the number of {{{scp}}} worker threads - at most, this many {{{scp}}} processes will run simultaneously
 * {{{RS_REQTIMEOUT}}} - the timeout, in seconds, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.
 * {{{RS_REQUEST_TABLE}}} - the Remote SUMS database relation that contains Remote SUMS requests; this is {{{<Remote SUMS requests>}}}.
 * {{{RS_SCP_MAXPAYLOAD}}} - the maximum total payload, in MB, per download. As soon as the combined payload of SUs ready for download exceeds this value, then the SUs are downloaded with a single {{{scp}}} process.
 * {{{RS_SCP_MAXSUS}}} - the maximum size of the SU download queue. As soon as this many SUs are ready for download, they are downloaded with a single {{{scp}}} process.
 * {{{RS_SITE_INFO_URL}}} - the service at JSOC that is used by Remote SUMS to locate the NetDRMS site that owns SUMS storage units; this is {{{Remote SUMS site URL}}}.
 * {{{RS_SCP_TIMEOUT}}} - if there are SUs ready for download, and no {{{scp}}} has fired off within this many seconds, then the SUs that are ready to download are downloaded with a single {{{scp}}} process.
 * {{{RS_TMPDIR}}} - the temporary directory into which SUs are downloaded. This should be on the same file system on which the SUMS partitions reside.
 * {{{SCRIPTS_EXPORT}}} - the path to the directory in the NetDRMS installation that contains the export scripts.
 * {{{SERVER}}} - the name of the DRMS database cluster host: this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed.
 * {{{SUMLOG_BASEDIR}}} - the path to the directory that contains various SUMS log files; this is {{{<SUMS logs>}}}.
 * {{{SUMPGPORT}}} - the port that the SUMS database cluster host is listening on: this is {{{<PostgreSQL port>}}}, unless DRMS and SUMS reside in different clusters on the same host (something that is not recommended since a single PostgreSQL cluster requires a substantial amount of system resources).
 * {{{SUMS_DB_HOST}}} - the name of the SUMS database cluster host: this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed; NetDRMS allows for creating a second cluster for SUMS, but in general this will not be necessary unless extremely heavy usage requires separating the two clusters.
 * {{{SUMS_GROUP}}} - the name of the Linux group to which all SUMS Linux users belong: this is {{{<SUMS users>}}}.
 * {{{SUMS_MANAGER}}} - this is the SUMS database user who owns the SUMS database objects which are manipulated by Remote SUMS and SUMS itself; it should be the Linux user that runs SUMS and owns the SUMS storage directories - this is {{{<NetDRMS production user>}}}
 * {{{SUMS_READONLY_DB_USER}}} - this is the SUMS database user who has read-only access to the SUMS database objects; it is used by the Remote SUMS client ({{{rsums-clientd.py}}}) to check for the presence of SUs before requesting they be downloaded.
 * {{{SUMS_TAPE_AVAILABLE}}} - if set to 1, then SUMS has a tape-archive system.
 * {{{SUMS_USEMTSUMS}}} - if set to 1, use the multi-threaded Python SUMS: this is {{{1}}}.
 * {{{SUMS_USEMTSUMS_ALL}}} - if set to 1, use the multi-threaded Python SUMS for all SUMS API methods: this is {{{1}}}.
 * {{{SUMSD_LISTENPORT}}} - the port that SUMS listens to for incoming requests.
 * {{{SUMSD_MAX_THREADS}}} - the maximum number of SUs that SUMS can process simultaneously.
 * {{{SUMSERVER}}} - the SUMS host machine; this is {{{<SUMS host>}}}.
 * {{{WEB_DBUSER}}} - the DRMS database user account that {{{cgi}}} programs access when they need to read from or write to database relations.

The {{{__MAKE__}}} section:
 * {{{CFITSIO_INCS}}} - the path to the installed CFITSIO header files: this is {{{<CFITSIO install dir>/include}}}.
 * {{{CFITSIO_LIB}}} - the name of the CFITSIO library.
 * {{{CFITSIO_LIBS}}} - the path to the installed CFITSIO library files: this is {{{<CFITSIO install dir>/lib}}}.
 * {{{POSTGRES_INCS}}} - the path to the installed PostgreSQL header files: this is {{{<PostgreSQL install dir>/include}}}.
 * {{{POSTGRES_LIB}}} - the name of the PostgreSQL C API library (AKA {{{libpq}}}): this is always {{{pq}}}.
 * {{{POSTGRES_LIBS}}} - the path to the installed PostgreSQL library files: this is {{{<PostgreSQL install dir>/lib}}}.

When installing NetDRMS updates, copy the existing {{{config.local}}} to the new {{{<NetDRMS install dir>}}} and edit the copy as needed, using the new {{{config.local.newstyle.template}}} to obtain information about parameters new to the newer release. Many of the parameter values have been determined during the previous steps of the installation process.

Run the configuration script, {{{configure}}}, a csh shell script which is included in {{{<NetDRMS install dir>}}}.
 {{{
$ whoami
<NetDRMS production user>
$ cd <NetDRMS install dir>
$ ./configure
 }}}
{{{configure}}} reads {{{config.local}}} and uses the contained parameters to configure many of the NetDRMS features. It creates several directories in {{{<NetDRMS install dir>}}}:
 * {{{bin}}} - a directory that contains links to all executables in the DRMS code tree
 * {{{include}}} - a directory that contains links to all the header files in the DRMS code tree
 * {{{jsds}}} - a directory that contains links to all JSOC Series Definition (JSD) files
 * {{{lib}}} - a directory that contains links to all libraries in the DRMS code tree
 * {{{localization}}} - this directory contains project-specific make files and various files that contain the processed parameter information from config.local
 * {{{scripts}}} - a directory that contains links to all script files in the DRMS code tree
 * {{{_<architecture>}}} - a directory that contains all compiled files (object files, symbol files, binaries) in subdirectories that mirror the paths of the paths where the corresponding source code resides. For example, the source code for the {{{show_info}}} program resides in {{{<NetDRMS install dir>/base/util/apps/show_info.c}}}, and the {{{show_info}}} binary is placed in {{{<NetDRMS install dir>/_<architecture>/base/util/apps/show_info}}}. {{{<architecture>}}} was determined in a previous installation step. There are two possible names for {{{<architecture>}}}: {{{_linux_x86_64}}} and {{{_linux_avx}}} (for hosts that support Advanced Vector Extensions).

Compile DRMS. To make the DRMS part of NetDRMS, run:
 {{{
$ whoami
<NetDRMS production user>
$ cd <NetDRMS install dir>
$ make
 }}}
As {{{<PostgreSQL superuser>}}}, create the DRMS database production user {{{<DRMS DB production user>}}}. Since PostgreSQL automatically attempts to use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user {{{<NetDRMS production user>}}} for database user {{{<DRMS DB production user>}}}:
{{{
$ whoami
<PostgreSQL superuser>
$ psql -p <PostgreSQL port> netdrms
postgres=# CREATE ROLE <DRMS DB production user>;
postgres=# \q
$
}}}
As {{{<PostgreSQL superuser>}}}, run {{{psql}}} to add a password for this new database user:
{{{
$ whoami
<PostgreSQL superuser>
$ psql -p <PostgreSQL port> netdrms
netdrms=> ALTER ROLE <DRMS DB production user> WITH PASSWORD '<new password>';
netdrms=> \q
$
}}}
As {{{<NetDRMS production user>}}}, create a {{{.pgpass}}} file. This file contains the PostgreSQL user account password, obviating the need to manually enter the database password each time a database connection attempt is made:
{{{
$ whoami
<NetDRMS production user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass
}}}

As {{{<PostgreSQL superuser>}}}, run an {{{SQL}}} script, and a {{{perl}}} script (which executes several {{{SQL}}} scripts), both included in the NetDRMS installation, to create the {{{admin}}} and {{{drms}}} schemas and their relations, the {{{jsoc}}} and {{{sumsadmin}}} database users, data types, and functions:
{{{
$ whoami
<PostgreSQL superuser>
# use psql to execute SQL script
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U postgres -f <NetDRMS install dir>/base/drms/scripts/NetDRMS.sql
CREATE SCHEMA
GRANT
CREATE TABLE
CREATE TABLE
GRANT
GRANT
CREATE SCHEMA
GRANT
CREATE ROLE
CREATE ROLE
$ perl <NetDRMS install dir>/base/drms/scripts/createpgfuncs.pl netdrms
}}}
For more information about the purpose of these objects, read the comments in the {{{NetDRMS.sql}}} and {{{createpgfuncs.pl}}}.

We recommend using the NetDRMS database user {{{<DRMS DB production user>}}} as the SUMS database user {{{<SUMS DB production user>}}}. However, feel free to create a new user if necessary. If the DRMS and SUMS databases reside in different clusters, then you ''will'' need to create the {{{<SUMS DB production user>}}}. Again, since PostgreSQL automatically attempts to use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user {{{<NetDRMS production user>}}} for database user {{{<SUMS DB production user>}}}. If you choose a {{{<SUMS DB production user>}}} that is not {{{<NetDRMS production user>}}}, then you will need to pass {{{<SUMS DB production user>}}} to both Remote SUMS and SUMS when starting them.
{{{
$ whoami
<PosgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
postgres=# CREATE ROLE <SUMS DB production user>;
postgres=# \q
$
}}}

In addition, you will need to create a SUMS database user that has read-only access to the SUMS database objects:
{{{
$ whoami
<PosgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
postgres=# CREATE ROLE <SUMS DB readonly user>;
postgres=# \q
}}}
where {{{<SUMS DB readonly user>}}} is the {{{config.local}}} parameter {{{SUMS_READONLY_DB_USER}}}. This database account is used by the Remote SUMS Client, a daemon used to manage the auto-download of SUs for subscriptions.

If you created a new SUMS DB production user and the SUMS and DRMS databases reside on the same database cluster, add a password for this user. ''Ensure'' that you use the same password that you used for {{{<DRMS DB production user>}}} - you will use the same Linux user when connecting to either database, so the same {{{.pgpass}}} file will be used for authentication. As {{{<PostgreSQL superuser>}}}, run {{{psql}}} to add a password for this new database user:
{{{
$ whoami
<PostgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
netdrms=> ALTER ROLE <DRMS DB production user> WITH PASSWORD '<DRMS DB production user password>';
netdrms=> \q
$
}}}

SUMS stores directory and file information in relations in the SUMS database. To create those relations and initialize tables, as {{{<NetDRMS production user>}}} run:
{{{
$ whoami
<NetDRMS production user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> -f <NetDRMS root>/base/sums/scripts/postgres/create_sums_tables.sql netdrms_sums
CREATE TABLE
CREATE INDEX
CREATE INDEX
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
GRANT
CREATE TABLE
CREATE INDEX
GRANT
CREATE SEQUENCE
GRANT
CREATE SEQUENCE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> netdrms_sums
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>;
ALTER SEQUENCE
netdrms_sums=> \q
$
}}}
where {{{<min val>}}} is {{{<drms site code> << 48}}}, and {{{<max val>}}} is {{{<min val> + <maximum unsigned 48-bit integer> - 1}}}, where {{{<drms site code>}}} is the value of the {{{DRMS_LOCAL_SITE_CODE}}} config.local parameter, and <maximum unsigned 48-bit integer> is 2^48^ (which is {{{281474976710656}}}). For the JSOC (site code 0x0000), this {{{ALTER SEQUENCE}}} command looks like:
{{{
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START 0 RESTART 0 MINVALUE 0 MAXVALUE 281474976710655;
}}}

<<Anchor(initialize-sums-disk)>>
=== Initializing SUMS Storage ===
In addition to SUMS database relations, SUMS requires a file system on which SUMS maintains storage areas called ''SUMS partitions''. A SUMS paritition is really just a directory that contains SUMS Storage Units (each of which is implemented as a subdirectory inside the SUMS partition). As {{{<NetDRMS production user>}}} create one or more partitions now - although we have had success making them as large as 60 TB, make 40 TB partitions. For example, if you plan on setting aside X TB of SUMS storage, then make approximately {{{N = <total storage TB> / X}}} 40 TB partitions. The partitions can reside on a file server and be mounted onto all machines that will use NetDRMS, but the following example simply creates directories on a single file system on {{{<SUMS partition host>}}}:
{{{
$ whoami
<NetDRMS production user>
$ hostname
<SUMS partition host>
$ mkdir -p <SUMS root>/partition01
$ mkdir -p <SUMS root>/partition02
...
$ mkdir -p <SUMS root>/partitionN
$ chgrp -R <SUMS users> <SUMS root>
$ chmod -R g+w <SUMS root>
}}}

where {{{<SUMS root>}}} is something like {{{/opt/sums}}}. {{{<SUMS users>}}} is the Linux group that is allowed to write to SUMS. You created it in a previous step.

Initialize the SUMS DB {{{sum_partn_avail}}} table with the names of these partitions. For each SUMS partition run the following:
{{{
$ whoami
<NetDRMS production user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> netdrms_sums
$ netdrms_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);
}}}
where {{{<SUMS partition path>}}} is the full path of the partition as seen from {{{<SUMS host>}}} (which is where the SUMS daemon will run) and {{{<avail bytes>}}} is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

<<Anchor(create-users)>>
=== Creating DRMS User Accounts ===
For each Linux user, {{{<new DRMS user>}}} who will run installed DRMS modules, or who will write new DRMS modules, you will need to set up their environment, create their ''DRMS account'', and create their {{{.pgpass}}} file so they can run DRMS modules without having to manually authenticate to the database. Just like you did for the {{{<NetDRMS production user>}}}, you will need to set the {{{JSOCROOT}}} and {{{JSOC_MACHINE}}} environment variables by editing their {{{.bashrc}}} files:
{{{
# .bashrc
export JSOCROOT=<NetDRMS root>
export JSOC_MACHINE=<architecture>
}}}

You will also need to add them to the {{{<SUMS users>}}} group:
{{{
$ sudo usermod -a -G <SUMS users> <new DRMS user>
}}}

To create a DRMS account, you create a database account for the user, plus you add user-specific rows to various DRMS database tables. The script {{{newdrmsuser.pl}}} exists to facilitate these tasks:
{{{
$ perl <NetDRMS root>/base/drms/scripts/newdrmsuser.pl netdrms <PostgreSQL host> <PostgreSQL port> <new DRMS user> <initial password> <new DB user namespace> user 1
}}}
where {{{<new DB user namespace>}}} is the PostgreSQL namespace dedicated to the new user. A namespace is a logical container that allows a database user to own database objects, like relations, that have the same name as objects owned by other users - items in a namespace need only be uniquely named within the namespace, not between namespaces. For example, the relation {{{drms_series}}} in the namespace {{{su_arta}}} is not the same relation as the {{{drms_series}}} relation in the {{{su_phil}} namespace - the relations have the same name, but they are different relations. In virutally all PostgreSQL operations, a user can prefix the name of a relation with the namespace: {{{su_arta.drms_series}}} refers to the first relation, and {{{su_phil.drms_series}}} refers to the second relation.

The purpose of {{{<new DB user namespace>}}} is to hold non-production, private data series - sort of a private user space to develop new DRMS modules to create data. If those data should become a production-level products, then the data and the code that generates the data need to be moved to a production namespace. At the JSOC, we have several such production namespaces (e.g., {{{aia}}}, {{{hmi}}}, {{{mdi}}}). A site creates production namespaces with a different module ({{{masterlists}}}; {{{newdrmsuser.pl}}} is only for creating non-production namespaces.

Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this page]] for assistance with choosing {{{<new DB user namespace>}}}. The general naming convention is to prepend the namespace with an abbreviation to identify the site that owns the data in the namespace. For example, all private data created at Stanford reside in dataseries whose namespaces start with {{{su_}}} (Stanford University), regardless of the affiliation of the user who creates data in this namespace. Data created at NASA Ames start with {{{nas_}}} (NASA Supercomputing Division). Following the underscore is a string to identify a particular user - {{{su_arta}}} for Art, and {{{su_phil}}} for Phil. You can also specify a group with the suffix (e.g., {{{su_uscsolar}}} for a solar group at the University of Southern California that creates data at Stanford. {{{<initial password>}}} is the initial password for this account - the initial password does not matter much since you are going to have the user change it next.

Running {{{newdrmsuser.pl}}} will create a new DRMS database user that has the same name as the user's Linux account name.

Have the user change their password:
{{{
$ whoami
<new DRMS user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms
netdrms=> ALTER USER <new DRMS user> WITH PASSWORD '<new password>';
netdrms=> \q
$
}}}

And then have the user create their {{{.pgpass}}} file (to allow auto-login to their database account) and set permissions to {{{0600}}}:
{{{
$ whoami
<new DRMS user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass
}}}

Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for additional information on the {{{.pgpass}}} file.
 
If you plan on creating data that will be publicly distributed, you should also create one or more data-production users. For example, if you plan on making a new public HMI data series, you could create a user named {{{hmi_production}}}. Although you could follow previous steps to create a new Linux account for this database user, you do not necessarily need to. Instead you can use the existing <NetDRMS production user> and have it connect as the {{{hmi_production}}} user. To do that, first create the new {{{hmi_production}}} database user by running {{{newdrmsuser.pl}}} as just described. Choose a descriptive namespace that follows the naming guidelines described about, like {{{hmi}}}. Because a {{{.pgpass}}} already exists for <NetDRMS production user>, you want to '''''ADD''''' a new line to {{{.pgpass}}} for this user. Continuing with the {{{hmi_production}}} user example, add a password line for {{{hmi_production}}}:
{{{
# .pgpass
<PostgreSQL host>:*:*:hmi_production:<hmi_production password>
}}}


=== Set Up the SUMS database ===
 1. Although the SUMS data cluster and SUMS database have been already created, you must create certain tables and users in this newly created database.
   a. Create the production user in the SUMS database:<<BR>><<BR>>{{{% psql -h <db server host> -p 5434 data_sums -U postgres}}}<<BR>>{{{data_sums=# CREATE USER <db production user> PASSWORD '<password>';}}}
   a. Create a read-only user in the SUMS database (so users can read the SUMS DB tables):<<BR>><<BR>>{{{% psql -h <db server host> -p 5434 data_sums -U postgres}}}<<BR>>{{{data_sums=# CREATE USER readonlyuser PASSWORD '<password>';}}}<<BR>>{{{data_sums=# GRANT CONNECT ON DATABASE data_sums TO readonlyuser;}}}
   a. Put the DRMS production db user into the sumsadmin group:<<BR>><<BR>>{{{% psql -h <db server host> -p 5432 data -U postgres}}}<<BR>>{{{data=# GRANT sumsadmin TO <db production user>;}}}<<BR>><<BR>>sum_rm, when run properly by the linux production user, will attempt to connect to the DRMS database as <db production user>. By putting it into the sumsadmin DB user group, we are giving sum_rm the ability to delete any record in any DRMS data-series record table. This permission is required for the archive == -1 implementation; this is the feature that causes SUMS to delete DRMS records from series whose archive flag is -1 when the DRMS records' SUs are deleted.
   a. Put the production user's password into the .pgpass file. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file.
   
  




    



<<Anchor(run-sums)>>
=== Running SUMS Services ===
Before you can use NetDRMS, you will need to start SUMS as the <NetDRMS production user>. To launch the SUMS daemon, sumsd.py, use the {{{start-mt-sums.py}}} script:
{{{
$ ssh <NetDRMS production user>@<SUMS host>
$ sudo python3 start-mt-sums.py daemon=<path>/sumsd.py ports=6102 --instancesfile=sumsd-instances.txt --logfile=sumsd-6102-20190627.txt --loglevel=info
}}}

The complete usage is:
{{{
usage: start-mt-sums.py daemon=<path to daemon> ports=<listening ports> [ --instancesfile=<instances file path> ] [ --loglevel=<critical, error, warning, info, or debug>] [ --logfile=<file name> ] [ --quiet ]

optional arguments:
  -h, --help show this help message and exit
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -l LOGLEVEL, --loglevel LOGLEVEL
                        specifies the amount of logging to perform; in order
                        of increasing verbosity: critical, error, warning,
                        info, debug
  -L <file name>, --logfile <file name>
                        the file to which sumsd logging is written
  -q, --quiet do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to launch
  p <listening ports>, ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be spawned
}}}

{{{start-mt-sums.py}}} will fork one or more {{{sumsd.py}}} daemon processes. The {{{ports}}} argument identifies the SUMS host ports on which sumsd.py will listen for client (DRMS module) requests. One sumsd.py process will be invoked per port specified. The instances file and log file reside in the path identified by the SUMLOG_BASEDIR config.local parameter. The instances file is used to track the running instances of {{{sumsd.py}}} and is used by {{{stop-mt-sums.py}}} to identify running daemons.

To stop one or more SUMS services, use the {{{stop-mt-sums.py}}} script:
{{{
$ ssh <production user>@<SUMS host>
$ sudo python3 stop-mt-sums.py daemon=<path>/sumsd.py --ports=6102 --instancesfile=sumsd-instances.txt
}}}

The complete usage is:
{{{
usage: stop-mt-sums.py [ -h ] daemon=<path to daemon> [ ---ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --quiet ]

optional arguments:
  -h, --help show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be stopped
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -q, --quiet do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to halt
}}}

<<Anchor(register-subscriptions)>>
=== Registering for Subscriptions ===
Remote SUMS will connect to the SUMS DB as the {{{config.local}}} parameter {{{SUM_MANAGER}}}.

A NetDRMS site can optionally register for a data-series subscription to any NetDRMS site that offers such a service. The JSOC NetDRMS offers subscriptions, but at the time of this writing, no other site does. Once a site registers for a data series subscription, the site will become a mirror for that data series. The subscription process ensures that the mirroring site will receive regular updates made to the data series by the serving site. The subscribing site can configure the interval between updates such that the mirror can synchronize with the server and receive updates within a couple of minutes, keeping the mirror up-to-date in (almost) real time.

To register for a subscription, run the {{{subscribe.py}}} script (included in the base NetDRMS installation). This script makes subscription requests to the serving site's subscription-manager. The process entails the creation of a snapshot of the data-series at the serving site. Those data are downloaded, via HTML, to the subscribing site, where they are ingested by {{{subscribe.py}}}. Ingestion results in the creation of the DRMS database objects that maintain and store the data series. At this time, no SUMS data files are downloaded. Instead, and optionally, the IDs for the series' SUMS Storage Units (SU) are saved in a database relation. Other NetDRMS daemons can make use of this relation to automatically download and ingest the SUs into the subscriber's SUMS. The Remote SUMS Client, {{{rsums-clientd.py}}}, manages this list of SUs, making SU-download requests to another client-side daemon, Remote SUMS, {{{rsumsd.py}}}. {{{rsumsd.py}}} accepts SU requests from {{{rsums-clientd.py}}}, downloading SUs via {{{scp}}} - each {{{scp}}} instance downloads multiple SUs.

The automatic download of data-series SUs is optional. They can be downloaded on-demand as well. In fact, if the subscribing NetDRMS site were to automatically download an SU, then delete the SU (there is a method to do this, described later), then an on-demand download is the only way to re-fetch the deleted SU. On-demand downloads happen automatically; any DRMS module that attempts to access an SU (like with a {{{show_info -P command}}}) that is not present for any reason will trigger an {{{rsumsd.py}}} request. The module will pause until the SU has been downloaded, then automatically resume its operation on the previously missing SU.

As {{{rsumsd.py}}} uses {{{scp}}} to automatically download SUs, SSH public-private keys must be created at the subscribing site, and the public key must be provided to the serving site. Setting this up requires coordinated work at both the susbscribing and serving sites:
 1. On the subscribing site, run
{{{
$ sudo su - <production user>
$ ssh-keygen -t rsa
}}}
This will allow you to create a passphrase for the key. If you choose to do this, then save this phrase for later steps. In the home directory of <production user>, {{{ssh-keygen}}} will create a public key named {{{id_rsa.pub}}}.
 1. Provide {{{id_rsa.pub}}} to the serving site
 1. The serving site must then add the public key to its list of authorized keys. If the {{{.ssh}}} directory does not exist, then the serving site must first create this directory and give it {{{0700}}} permissions. If the {{{authorized_keys}}} file in {{{.ssh}}} does not exist, then it must first be created and given {{{0644}}} permissions:
{{{
$ sudo su - <subscription production user>
$ mkdir .ssh
$ chmod 0700 .ssh
$ cd .ssh
$ touch authorized_keys
$ chmod 0644 authorized_keys
}}}
Once the {{{.ssh}}} and {{{authorized_keys}}} files exist and have the proper permissions, the serving site administrator can then add the client site's public key to its list of authorized keys:
{{{
$ sudo su - <subscription production user>
$ cd <subscription production user home directory>/.ssh
$ cat id_rsa.py >> authorized_keys
}}}
 1. If an SSH passphrase was chosen in step 1, then back at the client site, <production user> must start an {{{ssh-agent}}} instance to automate the passphrase authentication. If no passphrase was provided in step 1, this step can be skipped. Otherwise, run (assuming bash syntax - read the man page for csh syntax):
{{{
$ sudo su - <production user>
$ ssh-agent > ~/.ssh-agent
$ source ~/.ssh-agent # needed for ssh-add, and also for rsumsd.py and get_slony_logs.pl
$ ssh-add ~/.ssh/id_rsa
}}}

To keep ingested data series synchronized with changes made to it at the serving site, a client-side cron tab runs periodically. It runs {{{get_slony_logs.pl}}}, a perl script that uses {{{scp}}} to download "slony log files" - SQL files that insert, delete, or update database relation rows. {{{get_slony_logs.pl}}} communicates with the Slony-I replication software running at the serving site. Slony-I generates these log (SQL) files at the server which are then downloaded by the client.

To register for a subscription to a new series, run:
{{{

}}}

You may find that a subscription has gotten out of sync, for various reasons, with the serving site's data series (accidental deletion of database rows, for example). {{{subscribe.py}}} can be used to alleviate this problem. Run the following to re-do the subscription registration:
{{{

}}}

Finally, there might come a time where you no longer which to hold on to a registration. To remove the subscription from your set of registered data series run:
{{{

}}}


for example, the JSOC maintains time-distance analysis code that is part of the JSOC DRMS code tree, but it is not part of the base NetDRMS package provided to remote sites; it is possible for a NetDRMS site to install such project code by modifying a configuration file (config.local); this may require the installation of third-party software, such as math libraries and mpi.

=== Performing a Test Run ===
At this point, it is a good idea to test your installation. Although you have no DRMS/SUMS data at this point, running {{{show_series}}} is a good way to test various components, like authentication, database connection, etc. To test SUMS, however, you will need to have a least one DRMS data series that has SUMS data. You can obtain such a data series by using the subscription system.

Test DRMS by running the show_series command:
{{{
$ show_series
}}}

If you see no errors, then life is good.

After you have a least one data series, then you can do more thorough testing. For example, you can run:

{{{
$ show_info -j <DRMS data series>
}}}

To test SUMS (once you have some data files in your NetDRMS), you can run:

{{{
$ show_info -P <DRMS record-set specification>
}}}

To update to a newer NetDRMS release, simply create a new directory to contain the build, copy the previous config.local into the new <JSOC root> and edit it if new parameters have been added to config.local, and follow the directions for compiling DRMS. Any previous-release daemons that were running will need to be shut down, and the daemons in the newer release started.


=== Deciding what's next ===
You may wish to run a JMD or use Remote SUMS. The decision should be discussed with JSOC personnel. Once you've made this decision and installed the appropriate software (see below for Remote SUMS), you'll need to populate your DRMS database with data. For this, you'll need to be a recipient of Slony subscription data. We recommend contacting the JSOC directly to become a subscriber.

== Remote SUMS ==
A local NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units (SUs) associated with these data series too. There are currently two methods to facilitate these SU downloads. The Java Mirroring Daemon (JMD) is a tool that can be installed and configured to download SUs automatically as the series data records are ingested into the local NetDRMS. It fetches these SUs before they are actually used. It can obtain the SUs from any other NetDRMS that has the SUs, not just the NetDRMS that originally produced them. Remote SUMS is a built-in tool that comes with NetDRMS. It downloads SUs as needed - i.e., if a module or program requests the path to the SU or attempts to read it, and it is not present in the local SUMS yet, Remote SUMS will download the SUs. While the SUs are being downloaded, the initiating module or program will poll waiting for the download to complete.

Several components compose Remote SUMS. On the client side, the local NetDRMS, is a daemon that must be running (rsumsd.py). There also must exist some database tables, as well as some binaries used by the daemon. On the server side, all NetDRMS sites that wish to act as a source of SUs for the client, is a CGI (rs.sh). This CGI returns file-server information (hostname, port, user, SU paths, etc.) for the SUs the server has available in response to requests that contain a list of SUNUMs. When the client encounters requests for remote SUs that are not contained in the local SUMS, it requests the daemon to download those SUs. The client code then polls waiting for the request to be serviced. The daemon in turn sends requests to all rs.sh CGIs at all the relevant providing sites. The owning sites return the file-server information to the daemon, and then the daemon downloads the SUs the client has requested, via scp, and notifies the client module once the SUs are available for use. The client module will then exit from its polling code and continue to use the freshly downloaded SUs.

To use Remote SUMS, the config.local configuration file must first be configured properly, and NetDRMS must be re-built. Here are the relevant config.local parameters:
 * JMD_IS_INSTALLED - This must be set to 0 for Remote SUMS use. Currently, either the JMD or the Remote SUMS features can be used, but not both at the same time.
 * RS_REQUEST_TABLE - This is the database table used by the local module and the rsumsd.py daemon running at the local site for communicating SU-download requests. Upon encountering a non-native SUNUM, DRMS will insert a new record into this table to intiate a request for the SUNUM from the owning NetDRMS. The Remote SUMS daemon will service the request and update this record with results.
 * RS_SU_TABLE - This is the database table used by the Remote SUMS daemon to track SUs downloaded from the providing sites.
 * RS_DBHOST - This is the local database-server host that contains the database that contain the requests and SU tables.
 * RS_DBNAME - This is the database on the host that contains the requests and SU tables.
 * RS_DBPORT - This is the port on the local on which the database-server host accepts connections.
 * RS_DBUSER - This is the database user account that the Remote SUMS daemon uses to manage the Remote SUMS requests.
 * RS_LOCKFILE - This is the path to a file that ensures that only one Remote SUMS daemon instance runs.
 * RS_LOGDIR - This is the directory into which the Remote SUMS daemon logs are written.
 * RS_REQTIMEOUT - This is the timeout, in minutes, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.
 * RS_DLTIMEOUT - This is the timeout, in minutes, for an SU to download. If the time the download takes exceeds this value, then all requests waiting for the SU to download will fail.
 * RS_MAXTHREADS - The maximum number of download threads that the Remote SUMS daemon is permitted to run simultaneously. One thread is one scp call.
 * RS_BINPATH - The NetDRMS-binary-path that contains the external programs needed by the Remote SUMS daemon (jsoc_fetch, vso_sum_alloc, vso_sum_put).

After setting-up config.local, you must build or re-build NetDRMS:
{{{
% cd $JSOCROOT
% configure
% make
}}}
It is important to ensure that three binaries needed by the Remote SUMS daemon have been built: jsoc_fetch, vso_sum_alloc, vso_sum_put.

Ensure that Python >= 2.7 is installed. You will need to install some package if they are not already installed: psycopg2, ...

An output log named rslog_YYYYMMDD.txt will be written to the directory identified by the RS_LOGDIR config.local parameter, so make sure that directory exists.

Provide all providing NetDRMS sites your public SSH key. They will need to put that key in their authorized_keys file.

Create the client-side Remote SUMS database tables. Run:
{{{
% $JSOCROOT/base/drms/scripts/rscreatetabs.py op=create tabs=req,su
}}}

Start the rsumsd.py daemon as the user specified by the RS_DBUSER config.local parameter. As this user, start an ssh-agent process and add the public key to it:
{{{
% ssh-agent -c > $HOME/.ssh-agent_rs
% source $HOME/.ssh-agent_rs
% ssh-add $HOME/.ssh/id_rsa
}}}
This will allow you to create a public-private key that has a passphrase while obviating the need to manually enter that passphrase when the Remote SUMS daemon runs scp.

Start SUMS:
{{{
% $JSOCROOT/base/sums/scripts/sum_start.NetDRMS >& <log dir>/sumsStart.log
}}}
Substitute your favorite log directory for <log dir>. There is another daemon, sums_procck.py, that keeps SUMS up and running once it is started. Redirecting to a log will preserve important information that this daemon prints. To stop SUMS, use $JSOCROOT/base/sums/scripts/sum_stop.NetDRMS.

Start the Remote SUMS daemon:
{{{
% $JSOCROOT/base/drms/scripts/rsumsd.py
}}}

== Subscribing to Series ==
 To learn about how your institution, using its NetDRMS installation, can maintain a mirror of DRMS data that receives real-time updates, click [[SeriesSubscription|here]].

NetDRMS - a shared data management system

Introduction

In order to process, archive, and distribute the substantial quantity of solar data captured by the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data-management system, NetDRMS. This system comprises two PostgreSQL databases, multiple file systems, a tape back-up system, and software to manage these components. Related sets of data are grouped into data series, each, conceptually, a table of data where each row of data typically associated with an observation time, or a Carrington rotation. As an example, the data series hmi.M_45s contains the HMI 45-second cadence magnetograms, both observation metadata and image FITS files. The columns contain metadata, such as the observation time, the ID of the camera used to acquire the data, the image rotation, etc. One column in this table contains an ID that refers to a set of data files, typically a set of FITS files that contain images.

The Data Record Management System (DRMS) is the subsystem that contains and manages the "DRMS" database of metadata and data-file-locator information. One component is a software library, written in C, that provides client programs, also known as "DRMS modules", with an Application Programming Interface (API) that allows the users to access these data. The Storage Unit Management System (SUMS) is the subsystem that contains and manages the "SUMS" database and associated storage hardware. The database contains information needed to locate data files that reside on hardware. The entire system as a whole is typically referred to as DRMS. The user interfaces with the DRMS subsystem only, and the DRMS subsystem interfaces with SUMS - the user does not interact with SUMS directly. The JSOC provides NetDRMS to non-JSOC institutions so that those sites can take advantage of the JSOC-developed software to manage large amounts of solar data.

A NetDRMS site is an institution with a local NetDRMS installation. It does not generate the JSOC-owned production data series (e.g., hmi.M_720s, aia.lev1) that Stanford generates for scientific use. A NetDRMS site can generate its own data, production or otherwise. That site can create software that uses NetDRMS to generate its own data series. But it can also act as a "mirror" for individual data series. When acting as a mirror for a Stanford data series, the site downloads from Stanford DRMS database information and stores it in its own NetDRMS database, and it downloads SUMS files, and stores them in its own SUMS subsystem. As the data files are downloaded to the local SUMS, the SUMS database is updated with the information needed to manage the data files. It is possible for a NetDRMS site to mirror the DRMS data of any other NetDRMS site, but at this point, the only site whose data are currently mirrored is the Stanford JSOC.

Installing NetDRMS

Installing the NetDRMS system requires:

Optional steps include:

  • registering for JSOC-data-series subscriptions and running NetDRMS software to receive, in real time, data updates [ Registering for Subscriptions ]

  • installing JSOC-specific project code that is not part of the base NetDRMS installation; the JSOC maintains code to generate JSOC-owned data that is not generally of interest to NetDRMS sites, but sites are welcome to obtain downloads of that code. Doing so involves additional configuration to the base NetDRMS system.
  • installing Slony PostgreSQL data-replication software to become a provider of your site's data
  • installing a webserver that hosts several NetDRMS CGIs to allow web access to your data
  • installing the Virtual Solar Observatory (VSO) software to become a VSO provider of data

For best results, and to facilitate debugging issues, please follow these steps in order.

Installing PostgreSQL

PostgreSQL is a relational database management system. Data are stored primarily in relations (tables) of records that can be mapped to each other - given one or more records, you can query the database to find other records. These relations are organized on disk in a hierarchical fashion. At the top level are one or more database clusters. A cluster is simply a storage location on disk (i.e., directory). PostgreSQL manages the cluster's data files with a single process, or PostgreSQL instance. Various operations on the cluster will result in PostgreSQL forking new ephemeral child processes, but ultimately there is only one master/parent process per cluster.

Each cluster contains the data for one or more databases. Each cluster requires a fair amount of system memory, so it makes sense to install a single cluster on a single host. It does not make sense to make separate clusters, each holding one database; each cluster can efficiently support many databases, which are then fairly independent of each other. In terms of querying the databases are completely independent (i.e., a query on one database cannot involve relations in different databases). However, two databases in a single cluster do share the same disk directory, so there is not the same degree of independence at the OS/filesystem level. This may only matter if an administrator is operating directly on the files (performing backups, replication, creating standby systems, etc.).

To install PostgreSQL, select a host machine, <PostgreSQL host>, to act as the PostgreSQL database server. We recommend installing only PostgreSQL on this machine, given the large amount of memory and resources required for optimal PostgreSQL operation. We find a Fedora-based system, such as CentOS, to be a good choice, but please visit https://www.postgresql.org/docs for system requirements and other information germane to installation. The following instructions assume a Fedora-based Linux system such as CentOS (documentation for other distributions, such as Debian and openSUSE can be found online) and a bash shell.

Install the needed PostgreSQL server packages on <PostgreSQL host> by first visiting https://yum.postgresql.org/repopackages.php to locate and download the PostgreSQL "repo" rpm file appropriate for your OS and architecture. Each repo rpm contains a yum configuration file that can be used to install all supported PostgreSQL releases. You should install the latest version if possible (version 12, as of the time of this writing). Although you can use your browser to download the file, it might be easier to use Linux command-line tools:

$ curl -OL https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Install the yum repo configuration file (pgdg-redhat-all.repo) from the downloaded repo rpm file:

$ sudo rpm -i pgdg-redhat-repo-latest.noarch.rpm

This installs the repo configuration file to /etc/yum.repos.d/. Find the names of the PostgreSQL packages needed from the repository; the following assumes PostgreSQL 12, but should you want to install an older version, replace "12" with one of 94, 95, 96, 10, or 11:

$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql[0-9]*\.' | cut -d '.' -f 1
postgresql12
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*devel\.' | cut -d '.' -f 1 
postgresql12-devel
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*contrib\.' | cut -d '.' -f 1
postgresql12-contrib
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*libs\.' | cut -d '.' -f 1 
postgresql12-libs
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*plperl\.' | cut -d '.' -f 1 
postgresql12-plperl
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*server\.' | cut -d '.' -f 1 
postgresql12-server

Use yum to install all four packages:

$ sudo yum install <packages>

where <packages> are the package names determined in the previous step (postgresql12 postgresql12-contrib postgresql12-devel postgresql12-libs postgresql12-plperl postgresql12-server). The rpm package installation will have created the PostgreSQL superuser Linux account <PostgreSQL superuser> (i.e., postgres); <PostgreSQL superuser> will own the PostgreSQL database clusters and server processes that will be created in the following steps. To perform the next steps, you will need to become user <PostgreSQL superuser>:

$ sudo su - <PosgreSQL superuser>

Depending on where the package files are installed, you might need to add the PostgreSQL command to your PATH environment variable. To test this, run:

$ which initdb

If the initdb command cannot be found, then add the PostgreSQL binaries path to PATH. Find the path to the PostgreSQL installation:

$ rpm -ql postgresql12
<PostgreSQL install dir>/bin/clusterdb
...

In this example, <PostgreSQL install dir> is /usr/pgsql-12. Then add the binary path to PATH:

$ export PATH=/usr/pgsql-12/bin:$PATH

<PosgreSQL superuser> will be using the binaries in this directory, so it is a good idea to add the export command to .bashrc. As described above, create one database cluster for the two databases (one for DRMS data, and one for SUMS data):

$ whoami
postgres
$ initdb --locale=C -D <PostgreSQL cluster>

where <PostgreSQL cluster> should be /var/lib/pgsql/netdrms. Use this path, unless there is some good reason you cannot. initdb will initialize the cluster data directory (identified by the -D argument). This will result in the creation of template databases, configuration files, and other items.

The database cluster will contain two configuration files you need to edit: postgresql.conf and pghba.conf. Please refer to the PostgreSQL documentation to properly edit these files. Here are some brief suggestions:

  • postgresql.conf - for changes to take effect of parameters marked with a *, a restart of a running server instance is required (pg_ctl restart), otherwise changes will require a reload (pg_ctl reload)

    • listen_addresses* specifies the interface on which the postgres server processes will listen for incoming connections. You will need to ensure that connections can be made from all machines that will run DRMS modules (the modules connect to both the DRMS and SUMS databases), so change the default 'localhost' to '*', which causes the servers to listen on all interfaces:

      listen_addresses = '*'
    • port* is the server port on which the server listens for connections.

      port = <PostgreSQL port>

      The default port is 5432, and unless there is a good reason use 5432 for <PostgreSQL port>

    • logging_collector* controls whether on not stdout and stderr are logged to a file in the database cluster (in the log or pg_log directory, depending on release). By default it is off - set it so on in each cluster.

      logging_collector = on
    • log_rotation_size sets the maximum size, in kilobytes, of a log file. Set this to 0 to disable rotation, otherwise a new log will be created after the current one grows to some size.

    • log_rotation_age set the maximum age, in minutes, of a log file. Set this to 1d (1 day) so that each day a new log file is created.

      log_rotation_age = 1d
    • log_min_duration_statement is the amount of time, in milliseconds, a query must run before triggering a log entry. Set to this 1000 so that only long-running queries, over a second, will be logged.

    • shared_buffers* is the size of shared-memory buffers. For a server dedicated to a single database cluster, this should be about 25% of the total memory.

      shared_buffers = 32GB
  • pg_hba.conf controls the methods by which client authentication is achieved (HBA stands for host-based authentication). It will likely take a little time to understand and properly edit this configuration file. If you are not familiar with networking concepts (such as subnets, name resolution, reverse name resolution, CIDR notation, IPv4 versus IPv6, network interfaces, etc.) then now is the time to become familiar.

    This configuration file contains a set of columns that identify which user can access which database from which machines. It also defines the method by which authenticaton occurs. When a user attempts to connect to a database, the server transverses this list looking for the first row that matches. Once this row is identified, the user must authenticate - if authentication fails, the connection is rejected. The server does not attempt additional rows. For changes to take effect of any of the parameters in this file, a reload of a server instance is required (not a restart)

Here are the recommended entries:

  • # local superuser connections
    # TYPE    DATABASE  USER                          AUTH-METHOD
      local   all       all                           trust           # this applies ONLY if the user is logged into the PG server AND they do not use the -h argument to psql
      host    all       all       127.0.0.1/8         trust           # for -h localhost, if localhost resolves to an IPv4 address; also for -h 127.0.0.1
      host    all       all       ::1/128             trust           # for -h localhost, if localhost resolves to an IPv6 address; also for -h ::1
    
    # non-local superuser connections
    # TYPE    DATABASE  USER      ADDRESS              AUTH-METHOD
      host    all       postgres  XXX.XXX.XXX.XXX/YY   trust
    
    # non-superuser connections (which can be made from any non-server machines only)
    # TYPE    DATABASE       USER      ADDRESS              AUTH-METHOD
      host    netdrms        all       XXX.XXX.XXX.XXX/YY   md5
      host    netdrms_sums   all       XXX.XXX.XXX.XXX/YY   md5
    where the columns are defined as follows:
  • TYPE - this column defines the type of socket connection made (Unix-domain, TCP/IP, the encryption used, etc.). local is only relevant to Unix-domain local connections from the database server host <PostgreSQL host> itself. Since only postgres will log into the database server, the first row above applies to the administrator only. host is only relevant to TCP/IP connections, regardless of the encryption status of the connection.

  • DATABASE - this column identifies the database to which the user has access. Whenever a user attempt to connect to the database server, they specify a database to access. That database must be in the DATABASE column. We recommend using netdrms for non-superusers, blocking such users from accessing all databases except the DRMS one. Conversely, we recommend using all for the superuser so they can access both the DRMS and SUMS databases (and any other that might exist).

    NOTE: You are using the database name netdrms in pg_hba.conf, even though you have not actually created that database yet. This is OK; you will do so once you start the PostgreSQL cluster instance.

  • USER - this column identifies which users can access the specified databases.

  • ADDRESS - this column identifies the host IP addresses (or host names, but do not use those) from which a connection is allowed. To specify a range of IP addresses, such as those on a subnet, use a CIDR address. This column should be empty for local connections.

  • AUTH-METHOD - this column identifies the type authentication to use. We recommend using either trust or md5. When trust is specified, PostreSQL will unconditionally accept the connection to the database specified in the row. If md5 is specified, then the user will be required to provide a password. If you follow the recommendations above, then for the local row, any user who can log into the database server can access any database in the cluster without any further authentication. Generally only a superuser will be able to log into the database server, so this choice makes sense. For non-local connections by postgres, the Linux PostreSQL superuser postgres can access any database on the server without further authentication. For the remaining non-local non-postgres connections, users will need to provide a password.

Should you need to edit either of these configuration files AFTER you have started the database instance (by running pg_ctl start, as described in the next section), you will need to either reload or restart the instance:

$ whoami
postgres
# reload
$ pg_ctl reload -D <PostgreSQL cluster>
# restart
$ pg_ctl restart -D <PostgreSQL cluster>

Initializing PostgreSQL

You need to now initialize your PostgreSQL instance by creating the DRMS and SUMS databases, installing database-server languages, creating a schema, creating a relation. To accomplish this become <PostgreSQL superuser>; all steps in this section must be performed by the superuser:

$ sudo su - postgres

Start the database instance for the cluster you created:

$ whoami
postgres
$ pg_ctl start -D <PostgreSQL cluster>

You previous created <PostgreSQL cluster>, which will most likely be /var/lib/pgsql/netdrms. Ensure the configuration files you created work. This can be done by attempting to connect to the database server as <PostgreSQL superuser> with psql from <PostgreSQL host>:

$ whoami
<PostgreSQL superuser>
$ hostname
<PostgreSQL host>
$ psql -p <PostgreSQL port>
psql (12.1)
Type "help" for help.

postgres=# 

You should not see any errors, and you should see the postgres=# superuser psql prompt. After you encounter no errors, create the two databases:

$ whoami
postgres
# create the DRMS database
$ createdb --locale C -E UTF8 -T template0 netdrms
# create the SUMS database
$ createdb --locale C -E UTF8 -T template0 netdrms_sums

Install the required database-server languages:

$ whoami
postgres
# create the PostgreSQL scripting language (versions <= 9.6)
# no need to create the PostgreSQL scripting language (versions > 9.6)
$ createlang plpgsql netdrms
# create the "trusted" perl language (versions <= 9.6)
createlang -h <PostgreSQL host> plperl netdrms
# create the "trusted" perl language (versions > 9.6)
$ psql -p <PostgreSQL port> netdrms
netdrms=# CREATE EXTENSION IF NOT EXISTS plperl;
netdrms=# \q
# create the "untrused" perl language (versions <= 9.6)
$ createlang -h <PostgreSQL host> plperlu netdrms
# create the "untrused" perl language (versions > 9.6)
netdrms=# CREATE EXTENSION IF NOT EXISTS plperlu;
netdrms=# \q

The SUMS database does not use any language extensions so there is no need to create any for the SUMS database.

Installing CFITSIO

The base NetDRMS release requires CFITSIO, a C library used by NetDRMS to read and write FITS files. Visit https://heasarc.gsfc.nasa.gov/fitsio/ to obtain the link to the CFITSIO source-code tarball. Create an installation directory <CFITSIO install dir> (such as /opt/cfitsio-X.XX, with a link from /opt/cfitsio to <CFITSIO install dir>), download the tarball, and extract the tarball into <CFITSIO install dir>:

$ sudo mkdir -p <CFITSIO install dir>
$ ctrl+d

$ curl -OL 'http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio-X.XX.tar.gz'
$ tar xvzf netdrms_X.X.tar.gz
$ cd cfitsio-X.XX

Please read the README file for complete installation instructions. As a quick start, run:

$ ./configure --prefix=<CFITSIO install dir>
# build the CFITSIO library
$ make
# install the libraries and binaries to <CFITSIO install dir>
$ sudo make install
# create the link from cfitsio to <CFITSIO install dir>
$ sudo su -
$ cd <CFITSIO install dir>/..
$ ln -s <CFITSIO install dir> cfitsio

CFITSIO has a dependency on libcurl - in fact, any program made by linking to cfitsio will also require libcurl-devel since cfitsio uses the libcurl API. We recommend using yum to install the two packages (if they are not already installed - it is quite likely that libcurl will already be installed):

$ sudo yum install libcurl-devel

Installing OpenSSL Development Packages

NetDRMS requires the OpenSSL Developer's API. If this API has not already been installed, do so now:

$ sudo yum install openssl-devel

Installing DBD::Pg

One step in the installation process will require running a perl script that accesses the PostgreSQL database. In order for this to work, you will need to ensure the DBD::Pg module has been installed. To check for installation, run:

$ perl -M'DBD::Pg'

If there is no error about not being able to locate the module, and the command simply hangs, then you are all set (enter ctrl-C to exit). If the module is not installed, and you are running the perl installed with the system, then run yum to identify the package:

$ yum list | grep -i 'dbd-pg'
...
perl-DBD-Pg.x86_64                         2.19.3-4.el7           base
...

then bringing to bear all your powers of divination, choose the correct package and install it:

$ sudo yum install 'perl-DBD-Pg'

If using a non-system perl, use the distro's installation method. If the distro does not have that module, or the distro installer does not work, as a final act of desperation use CPAN

$ sudo perl -MCPAN -e 'install DBD::Pg'

Installing Python3

NetDRMS requires that a number of python packages and modules be present that are not generally part of a system installation. In addition, many scripts require python3 and not python2. The easiest way to satisfy these eeds is to install a data-science-oriented python3 distribution, such as Anaconda. In that vein, install Anaconda into an appropriate installation directory such as /opt/anaconda3. To locate the Linux installer, visit https://docs.anaconda.com/anaconda/install/linux/:

$ curl -OL 'https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh'
$ sha256sum Anaconda3-2019.10-Linux-x86_64.sh
46d762284d252e51cd58a8ca6c8adc9da2eadc82c342927b2f66ed011d1d8b53  Anaconda3-2019.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2019.10-Linux-x86_64.sh

After some initial prompts, the installer will display

PREFIX=/home/<user>/anaconda3

This path is the default installation directory (<user> is the user running bash). Replace the PREFIX path with <Anaconda3 install dir>.

Installing NetDRMS

To install NetDRMS, you will need to select an appropriate machine on which to install NetDRMS, an appropriate machine/hardware on which to host the SUMS service, create Linux users and groups, download the NetDRMS release tarball and extract the release source, initialize the Linux environment, create log directories, create the configuration file and run the configuration script, compile and install the executables, create the the DRMS- and SUMS-database users/relations/functions/objects, initialize the SUMS storage hardware, install the SUMS and Remote SUMS daemons.

The optimal hardware configuration will likely depend on your needs, but the following recommendations should suffice for most sites. DRMS and SUMS can share a single host machine. The most widely used and tested Linux distributions are Fedora-based, and at the time of this writing, CentOS is the most popular. Sites have successfully used openSUSE too, but if possible, we would recommend using CentOS. SUMS requires a large amount of storage to hold the DRMS data-series data/image files. The amount needed can vary widely, and depends directly on the amount of data you wish to keep online at any given time. Most NetDRMS sites mirror some amount of (but not all) JSOC SDO data - the more data mirrored, the larger the amount of storage needed. To complicate matters, a site can also mirror only a subset of each data series' data; perhaps one site wishes to retain only the current month's data of many data series, but another wishes to retain all data for one or two series. To decide on the amount of storage needed, you will have to ask the JSOC how much data each series comprises and decide how much of that data you want to keep online. Data that goes offline can always be retrieved automatically from the JSOC again. Data will arrive each day, so request from the JSOC an estimate of the rate of data growth. We recommend doing a rough calculation based upon these considerations, and then doubling the resulting number and installing that amount of storage.

Next, create a production Linux user <NetDRMS production user> (named netdrms_production by default):

$ sudo useradd <NetDRMS production user>
$ sudo passwd <NetDRMS production user>
Changing password for user <NetDRMS production user>.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
$

NOTE: ensure that <NetDRMS production user> is a valid PostgreSQL name because NetDRMS makes use of the PostgreSQL feature whereby attempts to connect to a database are made as the database user whose name matches the name of the Linux user connecting to the database. Please see https://www.postgresql.org/docs/12/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS for a description of valid PostgreSQL names.

As <NetDRMS production user>, you will be running various python components. As such, you should make sure that the PYTHONPATH environment variable is not already set, otherwise it might interfere with the running of Anaconda python. You might also need to modify your PATH environment variable to point to Anaconda and PostgreSQL executables. Modify your .bashrc to do so:

# add to <NetDRMS production user>'s .bashrc 

# to ensure that <NetDRMS production user> uses Anaconda
unset PYTHONPATH

# python executables
export PATH=<Anaconda3 install dir>/bin:$PATH

# PostgreSQL executables
export PATH=<PostgreSQL install dir>/bin:$PATH

For the changes to .bashrc to take effect, either logout/login or source .bashrc.

NetDRMS requires additional python packages not included in the Anaconda distribution, but if you install Anaconda, then the number of additional packages you need to install is minimal. If you have a different python distribution, then you may need to install additional packages. To install new Anaconda packages, as <NetDRMS production user> first create a virtual environment for NetDRMS (named netdrms):

$ whoami
<NetDRMS production user>
$ conda create --name netdrms
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/netdrms_production/.conda/envs/netdrms

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate netdrms
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Then install the new Anaconda packages using conda:

$ whoami
<NetDRMS production user>
$ conda install -n netdrms psycopg2 psutil python-dateutil
Collecting package metadata (current_repodata.json): done
...

In order for conda to succeed when installing psycopg2, PostgreSQL must have already been installed, and the PostgreSQL executables must be in the PATH environment variable (one step in the installation process is running pg_config. The changes to .bashrc described above should take care of setting the path - make sure you have either re-logged-in or sourced .bashrc.

Create the Linux group <SUMS users>, e.g. sums_users, to which all SUMS users belong, including <NetDRMS production user>. This group will be used to ensure that all SUMS users can create new data files in SUMS:

$ sudo groupadd sums_users

Add <NetDRMS production user> to this group (later you will add each SUMS user - users who will read/write SUMS data files - to this group as well):

$ sudo usermod -a -G <SUMS users> <NetDRMS production user>
$ id <NetDRMS production user>
uid=1001(netdrms_production) gid=1001(netdrms_production) groups=1001(netdrms_production),1002(sums_users)

On NetDRMS host, select a path <NetDRMS root> for the root directory of the NetDRMS source tree installed by <NetDRMS production user>. A typical choice for <NetDRMS root> is /opt/netdrms or /usr/local/netdrms, and a typical strategy is to install the source tree into a directory that contains the release version in its name (<NetDRMS install dir>), e.g., /opt/netdrms-9.3. Then <NetDRMS production user> makes a link from <NetDRMS root> to <NetDRMS install dir>. This facilitates the maintenance of multiple releases. To switch between releases, <NetDRMS production user> simply updates the link to point to the desired release directory. Create <NetDRMS install dir> and make <NetDRMS production user> the owner:

$ sudo mkdir -p <NetDRMS install dir>
$ sudo chown <NetDRMS production user>:<NetDRMS production user> <NetDRMS install dir>

As <NetDRMS production user>, obtain a NetDRMS tarball from http://jsoc.stanford.edu/netdrms/dist/ and extract it into a release-specific directory:

$ cd <NetDRMS install dir>
$ curl -OL 'http://jsoc.stanford.edu/netdrms/dist/netdrms_X.X.tar.gz'
$ tar xvzf netdrms_X.X.tar.gz
$ <ctrl-d>

Create the link from <NetDRMS root> to <NetDRMS install dir>:

$ cd <NetDRMS install dir>/..
$ sudo ln -s <NetDRMS install dir> netdrms
$ 

As <NetDRMS production user>, set the two environment variables that are needed for proper NetDRMS operation. To do so, you'll need first to determine the appropriate <architecture> string for one of these variables:

$ whoami
<NetDRMS production user>
$ cd <NetDRMS root>
$ build/jsoc_machine.csh
<architecture>

It is best to set the following two environment variables in <NetDRMS production user>'s .bashrc file since they must always be set whenever any NetDRMS code is run:

# .bashrc
export JSOCROOT=<NetDRMS root>
export JSOC_MACHINE=<architecture>

Make the SUMS log directory on the SUMS server machine. Various SUMS log files will be written to this directory. A suitable directory would reside in the <NetDRMS production user> user's home directory, e.g., $HOME/log/SUMS

$ whoami
<NetDRMS production user>
$ mkdir -p <SUMS logs>

Select appropriate C and Fortran compilers. The DRMS part of NetDRMS must be compiled with a C compiler. NetDRMS supports both the GNU C compiler (gcc), and the Intel C++ compiler (icc). Certain JSOC-specific code requires Fortran compilation. For those projects, NetDRMS supports the GNU Fortran compiler (gfortran), and the Intel Fortran compiler (ifort). SUMS is implemented as a Python daemon, so no compilation step is needed. Both GNU and Intel are widely used, so feel free to use either. By default, Intel compilers are used. There are two methods for changing the compilers:

  • as the <NetDRMS production user>, you can set the following environment variables (we recommend doing so in .bashrc):

    # .bashrc
    
    # set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
    export COMPILER=icc
    
    # set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
    export FCOMPILER=ifort
  • you can edit the make_basic.mk file in <NetDRMS root>; to select the Intel compilers, edit the COMPILER and FCOMPILER make variables declared near the top of the file:

    # use Intel compilers
    COMPILER = icc
    FCOMPILER = ifort
    # use GNU compilers
    COMPILER = gcc
    FCOMPILER = gfortran

Create the <NetDRMS root>/config.local configuration file, using <NetDRMS root>/config.local.newstyle.template as a template. This file contains a number of configuration parameters, along with detailed descriptions of what they control and suggested values for those parameters. The configuration script, configure, reads this file, and then creates one output file, drmsparams.*, in <NetDRMS root>/localization for each of several programming languages/tools (C, GNU make, perl, python, bash). In this manner, the parameters are directly readable by several languages/tools used by NetDRMS. Lines that start with whitespace or the hash symbol, # are ignored.

Several sections compose config.local:

__STYLE__
new

__DEFS__
# these are NetDRMS-wide parameter values; the format is <quote code>:<parameter name><whitespace>+<parameter value>;
# the configuration script uses <quote code> to assist in creating language-specific parameters; <quote code> is one of:
#   'q' (enclose the parameter value in double quotes)
#   'p' (enclose the parameter value in parentheses)
#   'a' (do not modify the parameter value). 

__MAKE__
# these are make variables used by the make system during compilation - they generally contain paths to third-party code

Before creating config.local, please request from the JSOC a value for DRMS_LOCAL_SITE_CODE. This code uniquely identifies each NetDRMS installation. Each site requires one ID for each of its NetDRMS installations.

The __MAKE__ section:

  • BIN_PY3 - the path to the Python 3 python executable.

  • DBNAME - the name of the DRMS database: this is netdrms; this parameter exists in case you want to select a different name, but we don't recommend changing it.

  • DRMSPGPORT - the port that the DRMS database cluster instance is listening on: this is <PostgreSQL port>.

  • DRMS_LOCAL_SITE_CODE - a 15-bit hexadecimal string that globally and uniquely identifies the NetDRMS. Each NetDRMS requires a unique code for each installation. Values greater than or equal to 0x4000 denote a development installation and need not be unique. If you plan on generating data that will be distributed outside of your site, please obtain a unique value from the JSOC.

  • DRMS_LOCK_DIR - the directory to which the DRMS library writes various lock files.

  • DRMS_LOG_DIR - the directory to which the DRMS library writes various log files.

  • EXPORT_LOG_DIR - the directory to which export programs write logs.

  • EXPORT_LOCK_DIR - the directory to which export programs write lock files.

  • EXPORT_HANDLE_DIR - the directory to which export programs save handles.

  • POSTGRES_ADMIN - the Linux user that owns the PostgreSQL installation and processes: this is <PostgreSQL superuser>.

  • JMD_IS_INSTALLED - if set to 1, then the Java Mirroring Daemon alternative to Remote SUMS is used: this should be 0.

  • RS_BINPATH - the NetDRMS binary path that contains the external programs needed by the Remote SUMS (e.g., jsoc_fetch, vso_sum_alloc, vso_sum_put).

  • RS_DBHOST - the name of the Remote SUMS database cluster host; this is <PostgreSQL host>, the machine on which PostgreSQL was installed.

  • RS_DBNAME - the Remote SUMS database - this is netdrms_remotesums.

  • RS_DBPORT - the port that the Remote SUMS database cluster instance is listening on: this is <PostgreSQL port>

  • RS_DBUSER - the Linux user that runs Remote SUMS; this is also the database user who owns the Remote SUMS database objects: this is <NetDRMS production user>.

  • RS_DLTIMEOUT - the timeout, in seconds, for an SU to download. If the download time exceeds this value, then all requests waiting for the SU to download will fail.

  • RS_LOCKFILE - the (advisory) lockfile used by Remote SUMS to prevent multiple instances from running.

  • RS_LOGDIR - the directory in which remote-sums log files are written.

  • RS_MAXTHREADS - the maximum number of SUs that Remote SUMS can process simultaneously.

  • RS_N_WORKERS - the number of scp worker threads - at most, this many scp processes will run simultaneously

  • RS_REQTIMEOUT - the timeout, in seconds, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.

  • RS_REQUEST_TABLE - the Remote SUMS database relation that contains Remote SUMS requests; this is <Remote SUMS requests>.

  • RS_SCP_MAXPAYLOAD - the maximum total payload, in MB, per download. As soon as the combined payload of SUs ready for download exceeds this value, then the SUs are downloaded with a single scp process.

  • RS_SCP_MAXSUS - the maximum size of the SU download queue. As soon as this many SUs are ready for download, they are downloaded with a single scp process.

  • RS_SITE_INFO_URL - the service at JSOC that is used by Remote SUMS to locate the NetDRMS site that owns SUMS storage units; this is Remote SUMS site URL.

  • RS_SCP_TIMEOUT - if there are SUs ready for download, and no scp has fired off within this many seconds, then the SUs that are ready to download are downloaded with a single scp process.

  • RS_TMPDIR - the temporary directory into which SUs are downloaded. This should be on the same file system on which the SUMS partitions reside.

  • SCRIPTS_EXPORT - the path to the directory in the NetDRMS installation that contains the export scripts.

  • SERVER - the name of the DRMS database cluster host: this is <PostgreSQL host>, the machine on which PostgreSQL was installed.

  • SUMLOG_BASEDIR - the path to the directory that contains various SUMS log files; this is <SUMS logs>.

  • SUMPGPORT - the port that the SUMS database cluster host is listening on: this is <PostgreSQL port>, unless DRMS and SUMS reside in different clusters on the same host (something that is not recommended since a single PostgreSQL cluster requires a substantial amount of system resources).

  • SUMS_DB_HOST - the name of the SUMS database cluster host: this is <PostgreSQL host>, the machine on which PostgreSQL was installed; NetDRMS allows for creating a second cluster for SUMS, but in general this will not be necessary unless extremely heavy usage requires separating the two clusters.

  • SUMS_GROUP - the name of the Linux group to which all SUMS Linux users belong: this is <SUMS users>.

  • SUMS_MANAGER - this is the SUMS database user who owns the SUMS database objects which are manipulated by Remote SUMS and SUMS itself; it should be the Linux user that runs SUMS and owns the SUMS storage directories - this is <NetDRMS production user>

  • SUMS_READONLY_DB_USER - this is the SUMS database user who has read-only access to the SUMS database objects; it is used by the Remote SUMS client (rsums-clientd.py) to check for the presence of SUs before requesting they be downloaded.

  • SUMS_TAPE_AVAILABLE - if set to 1, then SUMS has a tape-archive system.

  • SUMS_USEMTSUMS - if set to 1, use the multi-threaded Python SUMS: this is 1.

  • SUMS_USEMTSUMS_ALL - if set to 1, use the multi-threaded Python SUMS for all SUMS API methods: this is 1.

  • SUMSD_LISTENPORT - the port that SUMS listens to for incoming requests.

  • SUMSD_MAX_THREADS - the maximum number of SUs that SUMS can process simultaneously.

  • SUMSERVER - the SUMS host machine; this is <SUMS host>.

  • WEB_DBUSER - the DRMS database user account that cgi programs access when they need to read from or write to database relations.

The __MAKE__ section:

  • CFITSIO_INCS - the path to the installed CFITSIO header files: this is <CFITSIO install dir>/include.

  • CFITSIO_LIB - the name of the CFITSIO library.

  • CFITSIO_LIBS - the path to the installed CFITSIO library files: this is <CFITSIO install dir>/lib.

  • POSTGRES_INCS - the path to the installed PostgreSQL header files: this is <PostgreSQL install dir>/include.

  • POSTGRES_LIB - the name of the PostgreSQL C API library (AKA libpq): this is always pq.

  • POSTGRES_LIBS - the path to the installed PostgreSQL library files: this is <PostgreSQL install dir>/lib.

When installing NetDRMS updates, copy the existing config.local to the new <NetDRMS install dir> and edit the copy as needed, using the new config.local.newstyle.template to obtain information about parameters new to the newer release. Many of the parameter values have been determined during the previous steps of the installation process.

Run the configuration script, configure, a csh shell script which is included in <NetDRMS install dir>.

  • $ whoami
    <NetDRMS production user>
    $ cd <NetDRMS install dir>
    $ ./configure

configure reads config.local and uses the contained parameters to configure many of the NetDRMS features. It creates several directories in <NetDRMS install dir>:

  • bin - a directory that contains links to all executables in the DRMS code tree

  • include - a directory that contains links to all the header files in the DRMS code tree

  • jsds - a directory that contains links to all JSOC Series Definition (JSD) files

  • lib - a directory that contains links to all libraries in the DRMS code tree

  • localization - this directory contains project-specific make files and various files that contain the processed parameter information from config.local

  • scripts - a directory that contains links to all script files in the DRMS code tree

  • _<architecture> - a directory that contains all compiled files (object files, symbol files, binaries) in subdirectories that mirror the paths of the paths where the corresponding source code resides. For example, the source code for the show_info program resides in <NetDRMS install dir>/base/util/apps/show_info.c, and the show_info binary is placed in <NetDRMS install dir>/_<architecture>/base/util/apps/show_info. <architecture> was determined in a previous installation step. There are two possible names for <architecture>: _linux_x86_64 and _linux_avx (for hosts that support Advanced Vector Extensions).

Compile DRMS. To make the DRMS part of NetDRMS, run:

  • $ whoami
    <NetDRMS production user>
    $ cd <NetDRMS install dir>
    $ make

As <PostgreSQL superuser>, create the DRMS database production user <DRMS DB production user>. Since PostgreSQL automatically attempts to use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user <NetDRMS production user> for database user <DRMS DB production user>:

$ whoami
<PostgreSQL superuser>
$ psql -p <PostgreSQL port> netdrms
postgres=# CREATE ROLE <DRMS DB production user>;
postgres=# \q
$ 

As <PostgreSQL superuser>, run psql to add a password for this new database user:

$ whoami
<PostgreSQL superuser>
$ psql -p <PostgreSQL port> netdrms
netdrms=> ALTER ROLE <DRMS DB production user> WITH PASSWORD '<new password>';
netdrms=> \q
$

As <NetDRMS production user>, create a .pgpass file. This file contains the PostgreSQL user account password, obviating the need to manually enter the database password each time a database connection attempt is made:

$ whoami
<NetDRMS production user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass

As <PostgreSQL superuser>, run an SQL script, and a perl script (which executes several SQL scripts), both included in the NetDRMS installation, to create the admin and drms schemas and their relations, the jsoc and sumsadmin database users, data types, and functions:

$ whoami 
<PostgreSQL superuser>
# use psql to execute SQL script
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U postgres -f <NetDRMS install dir>/base/drms/scripts/NetDRMS.sql
CREATE SCHEMA
GRANT
CREATE TABLE
CREATE TABLE
GRANT
GRANT
CREATE SCHEMA
GRANT
CREATE ROLE
CREATE ROLE
$ perl <NetDRMS install dir>/base/drms/scripts/createpgfuncs.pl netdrms

For more information about the purpose of these objects, read the comments in the NetDRMS.sql and createpgfuncs.pl.

We recommend using the NetDRMS database user <DRMS DB production user> as the SUMS database user <SUMS DB production user>. However, feel free to create a new user if necessary. If the DRMS and SUMS databases reside in different clusters, then you will need to create the <SUMS DB production user>. Again, since PostgreSQL automatically attempts to use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user <NetDRMS production user> for database user <SUMS DB production user>. If you choose a <SUMS DB production user> that is not <NetDRMS production user>, then you will need to pass <SUMS DB production user> to both Remote SUMS and SUMS when starting them.

$ whoami
<PosgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
postgres=# CREATE ROLE <SUMS DB production user>;
postgres=# \q
$ 

In addition, you will need to create a SUMS database user that has read-only access to the SUMS database objects:

$ whoami
<PosgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
postgres=# CREATE ROLE <SUMS DB readonly user>;
postgres=# \q

where <SUMS DB readonly user> is the config.local parameter SUMS_READONLY_DB_USER. This database account is used by the Remote SUMS Client, a daemon used to manage the auto-download of SUs for subscriptions.

If you created a new SUMS DB production user and the SUMS and DRMS databases reside on the same database cluster, add a password for this user. Ensure that you use the same password that you used for <DRMS DB production user> - you will use the same Linux user when connecting to either database, so the same .pgpass file will be used for authentication. As <PostgreSQL superuser>, run psql to add a password for this new database user:

$ whoami
<PostgreSQL superuser>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms_sums
netdrms=> ALTER ROLE <DRMS DB production user> WITH PASSWORD '<DRMS DB production user password>';
netdrms=> \q
$

SUMS stores directory and file information in relations in the SUMS database. To create those relations and initialize tables, as <NetDRMS production user> run:

$ whoami
<NetDRMS production user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> -f <NetDRMS root>/base/sums/scripts/postgres/create_sums_tables.sql netdrms_sums
CREATE TABLE
CREATE INDEX
CREATE INDEX
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
GRANT
CREATE TABLE
CREATE INDEX
GRANT
CREATE SEQUENCE
GRANT
CREATE SEQUENCE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> netdrms_sums
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>;
ALTER SEQUENCE
netdrms_sums=> \q
$ 

where <min val> is <drms site code> << 48, and <max val> is <min val> + <maximum unsigned 48-bit integer> - 1, where <drms site code> is the value of the DRMS_LOCAL_SITE_CODE config.local parameter, and <maximum unsigned 48-bit integer> is 248 (which is 281474976710656). For the JSOC (site code 0x0000), this ALTER SEQUENCE command looks like:

netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START 0 RESTART 0 MINVALUE 0 MAXVALUE 281474976710655;

Initializing SUMS Storage

In addition to SUMS database relations, SUMS requires a file system on which SUMS maintains storage areas called SUMS partitions. A SUMS paritition is really just a directory that contains SUMS Storage Units (each of which is implemented as a subdirectory inside the SUMS partition). As <NetDRMS production user> create one or more partitions now - although we have had success making them as large as 60 TB, make 40 TB partitions. For example, if you plan on setting aside X TB of SUMS storage, then make approximately N = <total storage TB> / X 40 TB partitions. The partitions can reside on a file server and be mounted onto all machines that will use NetDRMS, but the following example simply creates directories on a single file system on <SUMS partition host>:

$ whoami
<NetDRMS production user>
$ hostname
<SUMS partition host>
$ mkdir -p <SUMS root>/partition01
$ mkdir -p <SUMS root>/partition02
...
$ mkdir -p <SUMS root>/partitionN
$ chgrp -R <SUMS users> <SUMS root>
$ chmod -R g+w <SUMS root>

where <SUMS root> is something like /opt/sums. <SUMS users> is the Linux group that is allowed to write to SUMS. You created it in a previous step.

Initialize the SUMS DB sum_partn_avail table with the names of these partitions. For each SUMS partition run the following:

$ whoami
<NetDRMS production user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> -U <SUMS DB production user> netdrms_sums
$ netdrms_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);

where <SUMS partition path> is the full path of the partition as seen from <SUMS host> (which is where the SUMS daemon will run) and <avail bytes> is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

Creating DRMS User Accounts

For each Linux user, <new DRMS user> who will run installed DRMS modules, or who will write new DRMS modules, you will need to set up their environment, create their DRMS account, and create their .pgpass file so they can run DRMS modules without having to manually authenticate to the database. Just like you did for the <NetDRMS production user>, you will need to set the JSOCROOT and JSOC_MACHINE environment variables by editing their .bashrc files:

# .bashrc
export JSOCROOT=<NetDRMS root>
export JSOC_MACHINE=<architecture>

You will also need to add them to the <SUMS users> group:

$ sudo usermod -a -G <SUMS users> <new DRMS user>

To create a DRMS account, you create a database account for the user, plus you add user-specific rows to various DRMS database tables. The script newdrmsuser.pl exists to facilitate these tasks:

$ perl <NetDRMS root>/base/drms/scripts/newdrmsuser.pl netdrms <PostgreSQL host> <PostgreSQL port> <new DRMS user> <initial password> <new DB user namespace> user 1

where <new DB user namespace> is the PostgreSQL namespace dedicated to the new user. A namespace is a logical container that allows a database user to own database objects, like relations, that have the same name as objects owned by other users - items in a namespace need only be uniquely named within the namespace, not between namespaces. For example, the relation drms_series in the namespace su_arta is not the same relation as the drms_series relation in the su_phil}} namespace - the relations have the same name, but they are different relations. In virutally all PostgreSQL operations, a user can prefix the name of a relation with the namespace: {{{su_arta.drms_series refers to the first relation, and su_phil.drms_series refers to the second relation.

The purpose of <new DB user namespace> is to hold non-production, private data series - sort of a private user space to develop new DRMS modules to create data. If those data should become a production-level products, then the data and the code that generates the data need to be moved to a production namespace. At the JSOC, we have several such production namespaces (e.g., aia, hmi, mdi). A site creates production namespaces with a different module (masterlists; newdrmsuser.pl is only for creating non-production namespaces.

Please see the NOTE in this page for assistance with choosing <new DB user namespace>. The general naming convention is to prepend the namespace with an abbreviation to identify the site that owns the data in the namespace. For example, all private data created at Stanford reside in dataseries whose namespaces start with su_ (Stanford University), regardless of the affiliation of the user who creates data in this namespace. Data created at NASA Ames start with nas_ (NASA Supercomputing Division). Following the underscore is a string to identify a particular user - su_arta for Art, and su_phil for Phil. You can also specify a group with the suffix (e.g., su_uscsolar for a solar group at the University of Southern California that creates data at Stanford. <initial password> is the initial password for this account - the initial password does not matter much since you are going to have the user change it next.

Running newdrmsuser.pl will create a new DRMS database user that has the same name as the user's Linux account name.

Have the user change their password:

$ whoami
<new DRMS user>
$ psql -h <PostgreSQL host> -p <PostgreSQL port> netdrms
netdrms=> ALTER USER <new DRMS user> WITH PASSWORD '<new password>';
netdrms=> \q
$ 

And then have the user create their .pgpass file (to allow auto-login to their database account) and set permissions to 0600:

$ whoami
<new DRMS user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass

Please click here for additional information on the .pgpass file.

If you plan on creating data that will be publicly distributed, you should also create one or more data-production users. For example, if you plan on making a new public HMI data series, you could create a user named hmi_production. Although you could follow previous steps to create a new Linux account for this database user, you do not necessarily need to. Instead you can use the existing <NetDRMS production user> and have it connect as the hmi_production user. To do that, first create the new hmi_production database user by running newdrmsuser.pl as just described. Choose a descriptive namespace that follows the naming guidelines described about, like hmi. Because a .pgpass already exists for <NetDRMS production user>, you want to ADD a new line to .pgpass for this user. Continuing with the hmi_production user example, add a password line for hmi_production:

# .pgpass
<PostgreSQL host>:*:*:hmi_production:<hmi_production password>

Set Up the SUMS database

  1. Although the SUMS data cluster and SUMS database have been already created, you must create certain tables and users in this newly created database.
    1. Create the production user in the SUMS database:

      % psql -h <db server host> -p 5434 data_sums -U postgres
      data_sums=# CREATE USER <db production user> PASSWORD '<password>';

    2. Create a read-only user in the SUMS database (so users can read the SUMS DB tables):

      % psql -h <db server host> -p 5434 data_sums -U postgres
      data_sums=# CREATE USER readonlyuser PASSWORD '<password>';
      data_sums=# GRANT CONNECT ON DATABASE data_sums TO readonlyuser;

    3. Put the DRMS production db user into the sumsadmin group:

      % psql -h <db server host> -p 5432 data -U postgres
      data=# GRANT sumsadmin TO <db production user>;

      sum_rm, when run properly by the linux production user, will attempt to connect to the DRMS database as <db production user>. By putting it into the sumsadmin DB user group, we are giving sum_rm the ability to delete any record in any DRMS data-series record table. This permission is required for the archive == -1 implementation; this is the feature that causes SUMS to delete DRMS records from series whose archive flag is -1 when the DRMS records' SUs are deleted.

    4. Put the production user's password into the .pgpass file. Please click here for information on the .pgpass file.

Running SUMS Services

Before you can use NetDRMS, you will need to start SUMS as the <NetDRMS production user>. To launch the SUMS daemon, sumsd.py, use the start-mt-sums.py script:

$ ssh <NetDRMS production user>@<SUMS host>
$ sudo python3 start-mt-sums.py daemon=<path>/sumsd.py ports=6102 --instancesfile=sumsd-instances.txt --logfile=sumsd-6102-20190627.txt --loglevel=info

The complete usage is:

usage: start-mt-sums.py daemon=<path to daemon> ports=<listening ports> [ --instancesfile=<instances file path> ] [ --loglevel=<critical, error, warning, info, or debug>] [ --logfile=<file name> ] [ --quiet ]

optional arguments:
  -h, --help            show this help message and exit
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -l LOGLEVEL, --loglevel LOGLEVEL
                        specifies the amount of logging to perform; in order
                        of increasing verbosity: critical, error, warning,
                        info, debug
  -L <file name>, --logfile <file name>
                        the file to which sumsd logging is written
  -q, --quiet           do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to launch
  p <listening ports>, ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be spawned

start-mt-sums.py will fork one or more sumsd.py daemon processes. The ports argument identifies the SUMS host ports on which sumsd.py will listen for client (DRMS module) requests. One sumsd.py process will be invoked per port specified. The instances file and log file reside in the path identified by the SUMLOG_BASEDIR config.local parameter. The instances file is used to track the running instances of sumsd.py and is used by stop-mt-sums.py to identify running daemons.

To stop one or more SUMS services, use the stop-mt-sums.py script:

$ ssh <production user>@<SUMS host>
$ sudo python3 stop-mt-sums.py daemon=<path>/sumsd.py --ports=6102 --instancesfile=sumsd-instances.txt

The complete usage is:

usage: stop-mt-sums.py [ -h ] daemon=<path to daemon> [ ---ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --quiet ]

optional arguments:
  -h, --help            show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be stopped
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -q, --quiet           do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to halt

Registering for Subscriptions

Remote SUMS will connect to the SUMS DB as the config.local parameter SUM_MANAGER.

A NetDRMS site can optionally register for a data-series subscription to any NetDRMS site that offers such a service. The JSOC NetDRMS offers subscriptions, but at the time of this writing, no other site does. Once a site registers for a data series subscription, the site will become a mirror for that data series. The subscription process ensures that the mirroring site will receive regular updates made to the data series by the serving site. The subscribing site can configure the interval between updates such that the mirror can synchronize with the server and receive updates within a couple of minutes, keeping the mirror up-to-date in (almost) real time.

To register for a subscription, run the subscribe.py script (included in the base NetDRMS installation). This script makes subscription requests to the serving site's subscription-manager. The process entails the creation of a snapshot of the data-series at the serving site. Those data are downloaded, via HTML, to the subscribing site, where they are ingested by subscribe.py. Ingestion results in the creation of the DRMS database objects that maintain and store the data series. At this time, no SUMS data files are downloaded. Instead, and optionally, the IDs for the series' SUMS Storage Units (SU) are saved in a database relation. Other NetDRMS daemons can make use of this relation to automatically download and ingest the SUs into the subscriber's SUMS. The Remote SUMS Client, rsums-clientd.py, manages this list of SUs, making SU-download requests to another client-side daemon, Remote SUMS, rsumsd.py. rsumsd.py accepts SU requests from rsums-clientd.py, downloading SUs via scp - each scp instance downloads multiple SUs.

The automatic download of data-series SUs is optional. They can be downloaded on-demand as well. In fact, if the subscribing NetDRMS site were to automatically download an SU, then delete the SU (there is a method to do this, described later), then an on-demand download is the only way to re-fetch the deleted SU. On-demand downloads happen automatically; any DRMS module that attempts to access an SU (like with a show_info -P command) that is not present for any reason will trigger an rsumsd.py request. The module will pause until the SU has been downloaded, then automatically resume its operation on the previously missing SU.

As rsumsd.py uses scp to automatically download SUs, SSH public-private keys must be created at the subscribing site, and the public key must be provided to the serving site. Setting this up requires coordinated work at both the susbscribing and serving sites:

  1. On the subscribing site, run

$ sudo su - <production user>
$ ssh-keygen -t rsa

This will allow you to create a passphrase for the key. If you choose to do this, then save this phrase for later steps. In the home directory of <production user>, ssh-keygen will create a public key named id_rsa.pub.

  1. Provide id_rsa.pub to the serving site

  2. The serving site must then add the public key to its list of authorized keys. If the .ssh directory does not exist, then the serving site must first create this directory and give it 0700 permissions. If the authorized_keys file in .ssh does not exist, then it must first be created and given 0644 permissions:

$ sudo su - <subscription production user>
$ mkdir .ssh
$ chmod 0700 .ssh
$ cd .ssh
$ touch authorized_keys
$ chmod 0644 authorized_keys

Once the .ssh and authorized_keys files exist and have the proper permissions, the serving site administrator can then add the client site's public key to its list of authorized keys:

$ sudo su - <subscription production user>
$ cd <subscription production user home directory>/.ssh
$ cat id_rsa.py >> authorized_keys
  1. If an SSH passphrase was chosen in step 1, then back at the client site, <production user> must start an ssh-agent instance to automate the passphrase authentication. If no passphrase was provided in step 1, this step can be skipped. Otherwise, run (assuming bash syntax - read the man page for csh syntax):

$ sudo su - <production user>
$ ssh-agent > ~/.ssh-agent
$ source ~/.ssh-agent # needed for ssh-add, and also for rsumsd.py and get_slony_logs.pl
$ ssh-add ~/.ssh/id_rsa

To keep ingested data series synchronized with changes made to it at the serving site, a client-side cron tab runs periodically. It runs get_slony_logs.pl, a perl script that uses scp to download "slony log files" - SQL files that insert, delete, or update database relation rows. get_slony_logs.pl communicates with the Slony-I replication software running at the serving site. Slony-I generates these log (SQL) files at the server which are then downloaded by the client.

To register for a subscription to a new series, run:

You may find that a subscription has gotten out of sync, for various reasons, with the serving site's data series (accidental deletion of database rows, for example). subscribe.py can be used to alleviate this problem. Run the following to re-do the subscription registration:

Finally, there might come a time where you no longer which to hold on to a registration. To remove the subscription from your set of registered data series run:

for example, the JSOC maintains time-distance analysis code that is part of the JSOC DRMS code tree, but it is not part of the base NetDRMS package provided to remote sites; it is possible for a NetDRMS site to install such project code by modifying a configuration file (config.local); this may require the installation of third-party software, such as math libraries and mpi.

Performing a Test Run

At this point, it is a good idea to test your installation. Although you have no DRMS/SUMS data at this point, running show_series is a good way to test various components, like authentication, database connection, etc. To test SUMS, however, you will need to have a least one DRMS data series that has SUMS data. You can obtain such a data series by using the subscription system.

Test DRMS by running the show_series command:

$ show_series

If you see no errors, then life is good.

After you have a least one data series, then you can do more thorough testing. For example, you can run:

$ show_info -j <DRMS data series>

To test SUMS (once you have some data files in your NetDRMS), you can run:

$ show_info -P <DRMS record-set specification>

To update to a newer NetDRMS release, simply create a new directory to contain the build, copy the previous config.local into the new <JSOC root> and edit it if new parameters have been added to config.local, and follow the directions for compiling DRMS. Any previous-release daemons that were running will need to be shut down, and the daemons in the newer release started.

Deciding what's next

You may wish to run a JMD or use Remote SUMS. The decision should be discussed with JSOC personnel. Once you've made this decision and installed the appropriate software (see below for Remote SUMS), you'll need to populate your DRMS database with data. For this, you'll need to be a recipient of Slony subscription data. We recommend contacting the JSOC directly to become a subscriber.

Remote SUMS

A local NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units (SUs) associated with these data series too. There are currently two methods to facilitate these SU downloads. The Java Mirroring Daemon (JMD) is a tool that can be installed and configured to download SUs automatically as the series data records are ingested into the local NetDRMS. It fetches these SUs before they are actually used. It can obtain the SUs from any other NetDRMS that has the SUs, not just the NetDRMS that originally produced them. Remote SUMS is a built-in tool that comes with NetDRMS. It downloads SUs as needed - i.e., if a module or program requests the path to the SU or attempts to read it, and it is not present in the local SUMS yet, Remote SUMS will download the SUs. While the SUs are being downloaded, the initiating module or program will poll waiting for the download to complete.

Several components compose Remote SUMS. On the client side, the local NetDRMS, is a daemon that must be running (rsumsd.py). There also must exist some database tables, as well as some binaries used by the daemon. On the server side, all NetDRMS sites that wish to act as a source of SUs for the client, is a CGI (rs.sh). This CGI returns file-server information (hostname, port, user, SU paths, etc.) for the SUs the server has available in response to requests that contain a list of SUNUMs. When the client encounters requests for remote SUs that are not contained in the local SUMS, it requests the daemon to download those SUs. The client code then polls waiting for the request to be serviced. The daemon in turn sends requests to all rs.sh CGIs at all the relevant providing sites. The owning sites return the file-server information to the daemon, and then the daemon downloads the SUs the client has requested, via scp, and notifies the client module once the SUs are available for use. The client module will then exit from its polling code and continue to use the freshly downloaded SUs.

To use Remote SUMS, the config.local configuration file must first be configured properly, and NetDRMS must be re-built. Here are the relevant config.local parameters:

  • JMD_IS_INSTALLED - This must be set to 0 for Remote SUMS use. Currently, either the JMD or the Remote SUMS features can be used, but not both at the same time.
  • RS_REQUEST_TABLE - This is the database table used by the local module and the rsumsd.py daemon running at the local site for communicating SU-download requests. Upon encountering a non-native SUNUM, DRMS will insert a new record into this table to intiate a request for the SUNUM from the owning NetDRMS. The Remote SUMS daemon will service the request and update this record with results.
  • RS_SU_TABLE - This is the database table used by the Remote SUMS daemon to track SUs downloaded from the providing sites.
  • RS_DBHOST - This is the local database-server host that contains the database that contain the requests and SU tables.
  • RS_DBNAME - This is the database on the host that contains the requests and SU tables.
  • RS_DBPORT - This is the port on the local on which the database-server host accepts connections.
  • RS_DBUSER - This is the database user account that the Remote SUMS daemon uses to manage the Remote SUMS requests.
  • RS_LOCKFILE - This is the path to a file that ensures that only one Remote SUMS daemon instance runs.
  • RS_LOGDIR - This is the directory into which the Remote SUMS daemon logs are written.
  • RS_REQTIMEOUT - This is the timeout, in minutes, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.
  • RS_DLTIMEOUT - This is the timeout, in minutes, for an SU to download. If the time the download takes exceeds this value, then all requests waiting for the SU to download will fail.
  • RS_MAXTHREADS - The maximum number of download threads that the Remote SUMS daemon is permitted to run simultaneously. One thread is one scp call.
  • RS_BINPATH - The NetDRMS-binary-path that contains the external programs needed by the Remote SUMS daemon (jsoc_fetch, vso_sum_alloc, vso_sum_put).

After setting-up config.local, you must build or re-build NetDRMS:

% cd $JSOCROOT
% configure
% make

It is important to ensure that three binaries needed by the Remote SUMS daemon have been built: jsoc_fetch, vso_sum_alloc, vso_sum_put.

Ensure that Python >= 2.7 is installed. You will need to install some package if they are not already installed: psycopg2, ...

An output log named rslog_YYYYMMDD.txt will be written to the directory identified by the RS_LOGDIR config.local parameter, so make sure that directory exists.

Provide all providing NetDRMS sites your public SSH key. They will need to put that key in their authorized_keys file.

Create the client-side Remote SUMS database tables. Run:

% $JSOCROOT/base/drms/scripts/rscreatetabs.py op=create tabs=req,su

Start the rsumsd.py daemon as the user specified by the RS_DBUSER config.local parameter. As this user, start an ssh-agent process and add the public key to it:

% ssh-agent -c > $HOME/.ssh-agent_rs
% source $HOME/.ssh-agent_rs
% ssh-add $HOME/.ssh/id_rsa

This will allow you to create a public-private key that has a passphrase while obviating the need to manually enter that passphrase when the Remote SUMS daemon runs scp.

Start SUMS:

% $JSOCROOT/base/sums/scripts/sum_start.NetDRMS >& <log dir>/sumsStart.log

Substitute your favorite log directory for <log dir>. There is another daemon, sums_procck.py, that keeps SUMS up and running once it is started. Redirecting to a log will preserve important information that this daemon prints. To stop SUMS, use $JSOCROOT/base/sums/scripts/sum_stop.NetDRMS.

Start the Remote SUMS daemon:

% $JSOCROOT/base/drms/scripts/rsumsd.py

Subscribing to Series

  • To learn about how your institution, using its NetDRMS installation, can maintain a mirror of DRMS data that receives real-time updates, click here.

JsocWiki: DRMSSetup (last edited 2024-01-19 09:08:03 by ArtAmezcua)