Differences between revisions 86 and 410 (spanning 324 versions)
Revision 86 as of 2015-07-16 00:39:49
Size: 40481
Editor: ArtAmezcua
Comment:
Revision 410 as of 2023-10-28 02:26:02
Size: 130962
Editor: ArtAmezcua
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
In order to process, archive, and distribute the substantial quantity of data flowing from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data management system. This system, the Data Record Management System (DRMS), consists of ''data series'', each of which is a collection of related data. For example, there exists a data series named hmi.M_45s, which contains the HMI 45-second cadence magnetograms. Each data series consists of several DRMS objects: records, keywords, segments, and links. A DRMS record is the smallest unit of data-series data. Typically, it represents data for a single observation in time (hence the term ''series'' in data series), but there is no restriction on how a user organizes their data. A data series may contain one or more DRMS keywords, each of which represents a named bit of metadata. For example, many data series contain a DRMS keyword named CRPIX1. A DRMS segment is a collection of data that contains storage/retrieval information needed by DRMS to locate auxiliary data files. These data files contain large sets of data like image arrays. Generally, they are image files, but what they contain is arbitrary and user-defined. A data series optionally contains one or more DRMS links, each of which is a collection of data that ''links'' the data series to other DRMS data series. Each DRMS record contains record-specific values for the DRMS keywords, segments, and links. In this way, one record may have one set of keyword, segment, and link values, and another record may have a different set of these values.

The Storage Unit Management System (SUMS) is the file-management system that contains the data files that DRMS records refer to. Each DRMS segment value is used by DRMS code to derive the SUMS file-system path to a single data file. Because each DRMS series may contain multiple DRMS segments, each DRMS record may ''point'' to more than one data file.

To manage all these data, DRMS comprises several components, one of which is a database instance in a relational-database management system (PostgreSQL). The DRMS Library code uses a database instance and several tables to implement the DRMS objects. For each data-series record, there exists a database table that contains one row per each DRMS record. The columns of each of these records contain the DRMS keyword, segment, and link values - bits of data that are all small enough to efficiently fit in a database record. The data-file data are too large to fit into a database record, so those data reside in data files in SUMS. The DRMS-segment values ''point'' to the data files, using a unique identifier called a SUNUM. SUMS itself comprises several components, one of which is another database instance that contains several database tables. When DRMS needs a data file, it ''requests'' the file from SUMS by providing SUMS with a SUNUM, and then SUMS consults its database tables to derive the path to the data file. SUMS shuttles files between hard disk (aka the disk cache) and tape, so data files have no permanent file path. Therefore, when DRMS requests the path to a file, SUMS must obtain the current path by consulting a database table.
In order to process, archive, and distribute the substantial quantity of solar data captured by the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data-management system, NetDRMS. This system comprises two PostgreSQL databases, multiple file systems, a tape back-up system, and software to manage these components. Related sets of data are grouped into data series, each, conceptually, a table of data where each row of data typically associated with an observation time, or a Carrington rotation. As an example, the data series hmi.M_45s contains the HMI 45-second cadence magnetograms, both observation metadata and image FITS files. The columns contain metadata, such as the observation time, the ID of the camera used to acquire the data, the image rotation, etc. One column in this table contains an ID that refers to a set of data files, typically a set of FITS files that contain images.

The Data Record Management System (DRMS) is the subsystem that contains and manages the "DRMS" database of metadata and data-file-locator information. One component is a software library, written in C, that provides client programs, also known as "DRMS modules", with an Application Programming Interface (API) that allows the users to access these data. The Storage Unit Management System (SUMS) is the subsystem that contains and manages the "SUMS" database and associated storage hardware. The database contains information needed to locate data files that reside on hardware. The entire system as a whole is typically referred to as DRMS. The user interfaces with the DRMS subsystem only, and the DRMS subsystem interfaces with SUMS - the user does not interact with SUMS directly. The JSOC provides NetDRMS to non-JSOC institutions so that those sites can take advantage of the JSOC-developed software to manage large amounts of solar data.

A NetDRMS site is an institution with a local NetDRMS installation. It does not generate the JSOC-owned production data series (e.g., hmi.M_720s, aia.lev1) that Stanford generates for scientific use. A NetDRMS site can generate its own data, production or otherwise. That site can create software that uses NetDRMS to generate its own data series. But it can also act as a "mirror" for individual data series. When acting as a mirror for a Stanford data series, the site downloads from Stanford DRMS database information and stores it in its own NetDRMS database, and it downloads SUMS files, and stores them in its own SUMS subsystem. As the data files are downloaded to the local SUMS, the SUMS database is updated with the information needed to manage the data files. It is possible for a NetDRMS site to mirror the DRMS data of any other NetDRMS site, but at this point, the only site whose data are currently mirrored is the Stanford JSOC.
Line 10: Line 10:
=== Installing NetDRMS for the First Time ===
The initial installation of NetDRMS requires installing database software, adding one or more new users, allocating a fair bit of additional disk space for file storage, and installing, configuring and compiling the custom NetDRMS code.

The entire NetDRMS system involves, from base to top:
 a. A couple instances of a database called Postgres, users, procedures and data tables within that database
 a. NetDRMS software written mainly in C, with some embedded Postgres calls and some Python v2.7 or higher. There are two pieces to this software: DRMS and SUMS. Each are compiled/made separately. It requires several third party libraries as well, such as cfitsio. math libraries, and mpi.
 a. If you want to receive replicated data from JSOC, you'll need to install some scripts, third party libraries for tar and curl, and work with your ssh keys and a software called hpn-ssh.
 a. If you want to be a distributor of data, you'll need to install a 'JMD' java/derby database system and possibly Slony replication software.
 a. If you are a VSO installation, you'll need to run a web server and install further Perl code.

When installing NetDRMS, it is best to do it in a nested order, as listed above, and test each phase for success as you go. Don't move on to the next piece of the installation until reasonably assured that the software installed in the prior step works as planned.

First, you will need to create a few linux users and groups, giving them the needed permissions (see step 1 below). Second, you will need to install the PostgreSQL Relational Database Management System (PG) and create two databases (see step 2 below). Third, you will need to establish disk storage for SUMS (see "Setting up a SUMS" below). Fourth, you will need to install third-party libraries needed by DRMS and SUMS (see X below). Fifth, you will need to build and install SUMS (see X below).

To install NetDRMS and SUMS, please follow these directions in order. All accounts/paths/ports/etc. referenced can be modified, but we recommend not doing this unless you are certain they must be different. Debugging issues from Stanford becomes difficult if every site does things differently. The accounts/paths/ports/etc. listed below are the ones used on Stanford's test NetDRMS (on the machine "shoom").

 0. Download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms or $HOME/drms or /opt/netdrms. The size of the source distribution is currently about 10 MB. A built system (including SUMS) is typically about 300 MB. In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration.

You may wish to create a sim link to the NetDRMS directory. E.g. your code is really in /opt/netdrms87/, but you have a link for /opt/netdrms/ that points to whatever your most current NetDRMS code directory is. This will facilitate updates without recoding environment variables. Once you've decided where to put the code, untar it and have a look at it. In particular, read the config.local.template file. You will need to copy and rename and then adjust this file (as config.local) accordingly for your own site but do that later. For now, read config.local.template since these installation instructions reference its variables often.

When you do create your config.local file, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER.

Bear in mind that you may have to change the ownership and permissions on the $DRMS directory as you go through the install process and determine the user that will run the code.

 1. Set up your existing linux environment to accept NetDRMS (to be done by a superuser or someone with sudo privileges)
  a. Create a ''production'' linux user (named production by default). The name of this user is the value of the SUMS_MANAGER parameter in the config.local file. If necessary, modify the sudoers file to include the name of the production user so that this user has the privileges necessary to run a setuid program, sum_chmown, that is part of the SUMS-installation package:<<BR>><<BR>>{{{<production user> <host>=NOPASSWD:<path to sum_chmown>}}}<<BR>><<BR>>This will allow sum_chmown to be run without a password prompt being presented. Other sites have configured their ''production'' user to have highly specific ownership permissions as an alternative to giving the user sudo privileges, and nullified the sum_chmown script since all their data is written only by one user.
  a. Create a linux group to which the production user belongs, e.g. ''sumsadmin''. All users who will be using the NetDRMS system to access or create SUMS data files must also belong to this group.
  a. Ensure that the production user can connect to the database without being prompted for a password. To do this, create a .pgpass file in the production user's home directory. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file. It is important that the permissions for the .pgpass file are set to 600, readable only to the individual user. You may wish to wait on this step until you install Postgres - you will need to adjust your pg_hba.conf settings in Postgres in order for the .pgpass file to correctly work.
  a. Create a linux user named "postgres". This is the user that will own all of the Postgres data files. It is also the user that will run the server daemon process (postgres).
  a. Each user of DRMS, including the production user, must set two environment variables in their environment:<<BR>><<BR>>{{{setenv JSOCROOT <DRMS source tree root>}}}<<BR>>{{{setenv JSOC_MACHINE <OS and CPU>}}}<<BR>><<BR>>where <DRMS source tree root> is the root of the DRMS source tree installed by the production linux user, and <OS and CPU> is "linux_x86_64", if DRMS was installed on a machine with a Linux OS and a 64-bit processor, or "linux_avx", if DRMS was installed on a machine with a Linux OS and a 64-bit processor that supports Advanced Vector Extensions (which supports an extended instruction set). Again, you may wish to have the NetDRMS software installed and compiled before you put the $JSOC_MACHINE variable into play.
  a. Create the SUMS log directory on the SUMS server machine, if it does not already exist. The name/path for this directory is defined in config.local in the SUMS_LOG_BASEDIR parameter. The actual directory must match the value of this parameter, which defaults to /usr/local/logs/SUM. You are free to change this path in SUMS_LOG_BASEDIR. This directory must be writeable by the linux ''production'' user.
 1. Set up the Postgres database.
  a. Install server version 8.4 (this is the only version supported by Stanford) on a dedicated machine. Obtain the latest 8.4 rpm binaries from ftp://ftp.postgresql.org/pub/binary/. You can install later versions of Postgres, up to v.9.3 have been proven at other data sites, if you are not going to become a provider or generator of slony data. Slony is the database replication software that is used by the JSOC at Stanford to distribute records, and its version is tied to Postgres 8.4.x presently.
  a. Install the client software, version 8.4 or your chosen server version, on all machines that will be used to either access the database server or build DRMS software. All DRMS software must connect to the DRMS and SUMS databases. To do so, it must be linked against static and/or dynamic libraries that allow database access. These libraries are a component of the Postgres client software, so it must be installed on machines used to build DRMS software. Some dynamic libraries are involved, so the host on which this software is run must also have the Postgres client software installed.
  a. Create a database cluster for the DRMS data. A database cluster is a storage area on disk that contains the data for one or more databases. The storage area is implemented as a directory (the ''data directory'') and it is managed by a single instance of a Postgres server process. To create this cluster (data directory), first log-in as the linux user postgres, and then run the initdb command:<<BR>><<BR>>{{{initdb --locale=C -D /var/lib/pgsql/data}}}<<BR>><<BR>>This will create the data directory /var/lib/pgsql/data on the database server host. If you want to place the data in a different directory, go right ahead and change the -D parameter value. The "--locale" argument will set cluster locale to "C". Locale support refers to an application respecting cultural preferences regarding alphabets, sorting, number formatting, etc. PostgreSQL uses the standard ISO C and POSIX locale facilities provided by the server operating system. We recommend "C" and make no guarantees what will happen to your formatting if you deviate.
  a. Create a database cluster for the SUMS data. This cluster is distinct from the cluster for the DRMS data, and it is maintained by a separated server instance:<<BR>><<BR>>{{{initdb --locale=C -D /var/lib/pgsql/data_sums}}}<<BR>><<BR>>This will create the data directory /var/lib/pgsql/data_sums on the database server host (or wherever you've decided to put the cluster with the -D parameter).
  a. Edit the Postgres configuration files - you will have these in two different places, one for each initdb you created. The configuration files are cluster-specific, and they reside in the data directory created by the initdb command. These are the key parameters which will determine your database efficiency and security. A complete list of all modifiable parameters can be found in the Postgres online documentation, but a couple are worth mentioning now.
   i. listen_addresses (in postgresql.conf) is a list of IP addresses from which connections can be made. By default the value of the parameter is "localhost", which disallows IP connections from all machines, except the machine hosting the database server process. This is not what you want. The single-quoted string '*' will allow connections from all machines. If you want to be more restrictive, you can simply provide a comma-separated list of hostnames or IP addresses.
   i. port (in postgresql.conf) is the port on which the server listens for connections. If you create more than one cluster on the host server machine (e.g., if you create both the DRMS and SUMS clusters on a single host), then you'll need to change the port number for at least one cluster (you cannot have two server processes listening for connections on the same port). We suggest using port 5432 for the DRMS cluster (port = 5432 - no quotes), and port 5434 for the SUMS cluster. Note that port 5432 is the default port for Postgres.
   i. logging_collector (in postgresql.conf). Set this to 'on' so that the output of the Postgres server process will be captured into log files and rotated once per day.
   i. log_rotation_size (in postgresql.conf). Set this to 0. This will cause PG to emit one log every day (as opposed to starting a new log after the previous log is a certain size).
   i. log_min_duration_statement (in postgresql.conf). Set this to 1000 so that only queries that are greater than 1000 ms in run time will be logged. Otherwise, the log files will quickly get out of hand.
   i. shared_buffers. Set this to more than the 128 MB as default. This is how much memory you want the database to give your processes, and most machines have more memory to devote than the default. You may also wish to adjust the values for work_mem, maintenance_work_mem, and max_stack_depth, but consult the Postgres manual for a better understanding.
   i. The pg_hba.conf file. This file contains lines of the form<<BR>><<BR>>{{{<connection type> <databases> <user> <IP address> <IP mask> <authentication method>}}}<<BR>><<BR>>if you wish to use an IP-address mask to specify a range of IP addresses, or<<BR>><<BR>>{{{<connection type> <databases> <user> <CIDR-address> <authentication method>}}}<<BR>><<BR>>if you wish to use a CIDR-address to specify the range. To get yourself up and running, you'll need to add a line or two to this file. To allow access by one host, we suggest<<BR>><<BR>>{{{host all all XXX.XXX.XXX.XXX 255.255.255.255 md5}}}<<BR>><<BR>>or<<BR>><<BR>>{{{host all all XXX.XXX.XXX.XXX/32 md5}}}<<BR>><<BR>>For multiple-host access, we suggest<<BR>><<BR>>{{{host all all XXX.XXX.XXX.0 255.255.255.0 md5}}}<<BR>><<BR>>or<<BR>><<BR>>{{{host all all XXX.XXX.XXX.0/24 md5}}} The md5 encryption is what will trigger the use of user .pgpass files. You may also wish to comment out the line "local local trust" - this line allows anyone on the local machine to log in with no password, and isn't secure. Once you've commented out the "local local trust" line, you will no longer be able to log in without a .pgpass file correctly made. Please note that whenever you make changes to pg_hba.conf, you will need restart the database server to have changes take effect.
 1. The remainder of the instructions require that the Postgres servers (there is one for the DRMS cluster, and one for the SUMS cluster) be running. To start-up the server instances run:<<BR>><<BR>>{{{su postgres}}}<<BR>>{{{pg_ctl start -D /var/lib/pgsql/data # start the DRMS-database cluster server}}}<<BR>>{{{pg_ctl start -D /var/lib/pgsql/data_sums -o "-p 5434" # start the SUMS-database cluster server}}}.<<BR>><<BR>> The server logs will be placed in the pg_log subdirectory for each cluster.
 1. Create the DRMS database in the DRMS cluster, and create the SUMS database in the SUMS cluster:<<BR>><<BR>>{{{su postgres}}}<<BR>>{{{createdb --locale C -E LATIN1 -T template0 data # create the DRMS database in the DRMS-database cluster}}}<<BR>>{{{createdb --locale C -E LATIN1 -T template0 -p 5434 data_sums # create the SUMS database in the SUMS-database cluster}}}. NOTE: The -E flag sets the character encoding of the characters stored in the database. LATIN1 is not a great choice (it would have been better to have used SQL_ASCII or UTF8), but that is what was chosen at Stanford so we're stuck with it, which means remote sites that have become series subscribers are stuck with it too.
 1. Install the required DB-server languages:<<BR>><<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plpgsql data # Add the plpgsql language to the DRMS database}}}<<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plperl data # Add the plperl language to the DRMS database}}}<<BR>>{{{createlang -h <db server host> -p 5432 -U postgres plperlu data # Add the plperlu 'unstrusted' language to the DRMS database}}}<<BR>><<BR>>At this time, there are no auxiliary languages needed for the SUMS database.
 1. Create various tables and DRMS database functions needed by the DRMS library. You will need the NetDRMS source code for this:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 -U postgres data -f $JSOCROOT/base/drms/scripts/NetDRMS.sql # Create the 'admin' schema and tables within this schema; create the 'drms' schema}}}<<BR>>{{{# Create the SUMSADMIN database user}}}<<BR>>{{{su postgres}}}<<BR>>{{{cd $JSOCROOT/base/drms/scripts}}}<<BR>>{{{./createpgfuncs.pl data # Create functions in the DRMS database}}}
 1. Create database accounts for DRMS users. To use DRMS software/modules, a user of this software must have an account on the DRMS database (a DRMS series is implemented as several database objects). The software, when run, will log into a user account on the DRMS database - by default, the name of the user account is the name of the linux user account that the DRMS software runs under.
   a. Run the newdrmsuser.pl script - you will be prompted for the postgres dbuser password:<<BR>><<BR>>{{{$JSOCROOT/base/drms/scripts/newdrmsuser.pl data <db server host> 5432 <db user> <initial password> <db user namespace> user 1}}}<<BR>><<BR>>where <db user> is the name of the user whose account is to be created and <db user namespace> is the namespace DRMS should use when running as the db user and reading or writing database tables. The namespace is a logical container of database objects, like database tables, sequences, functions, etc. The names of all objects are qualified by the namespace. For example, to unambiguously refer to the table "mytable", you prepend the name with the namespace. So, for example, if this table is in the su_production namespace (container), then you refer to the table as "su_production.mytable". In this way, there can be other tables with the same name, but that reside in a different namespace (e.g., su_arta.mytable is a different table that just happens to have the same name). Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this page]] for assistance with choosing a namespace. <initial password> is the initial password for this account.
   a. Have the user that owns the account change the password:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 data}}}<<BR>>{{{data=> ALTER USER <db user> WITH PASSWORD '<new password>';}}}<<BR>><<BR>>where <new password> is the replacement for the original password. It must be enclosed in single quotes.
   a. Have the user put their password in their .pgpass file. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file. This file allows the user to login to their database account without having to provide a password at a prompt. As you come to this point, it would be wise to test that your own logins work with your .pgpass file. You may have a mis-configuration in your pg_hba.conf file that would make it appear that .pgpass was not working.
   a. Create a db account for the linux production user (the name is the value of the SUMS_MANAGER parameter in config.local). The name of the database user for this linux user is the same as the name of the linux user (typically 'production'). Follow the previous steps to create this database account.
   a. Create a password for the sumsadmin DRMS database user, following the "ALTER USER" directions above. The user was created by the newdrmsuser.pl script above.
   a. OPTIONALLY, create a table to be used for DRMS version control:<<BR>>{{{psql -h <db server host> -p 5432 -U <postgres administrator> data}}}<<BR>>{{{CREATE TABLE drms.minvers(minversion text default '1.0' not null);}}}<<BR>>{{{GRANT SELECT ON drms.minvers TO public;}}}<<BR>>{{{INSERT INTO drms.minvers(minversion) VALUES(<version>);}}}<<BR>>where <version> is the minimum DRMS version that a DRMS module must have before it can connect to the DRMS database.
 1. Set-up the SUMS database. Although the SUMS data cluster and SUMS database have been already created, you must create certain tables and users in this newly created database.
   a. Create the production user in the SUMS database:<<BR>><<BR>>{{{$JSOCROOT/base/drms/scripts/newdrmsuser.pl data_sums <db server host> 5434 <db production user> <password> <db production user namespace> sys 1}}}<<BR>><<BR>>where <db production user namespace> is the namespace. Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this link]] for assistance with choosing a namespace for the production user.
   a. Put the production db user into the sumsadmin group:<<BR>><<BR>>{{{psql -h <db server host> -p 5432 data -U postgres}}}<<BR>>{{{postgres=> GRANT sumsadmin TO <db production user>;}}}<<BR>><<BR>>
   a. Put the production user's password into the .pgpass file. Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for information on the .pgpass file.
   a. Create the SUMS database tables:<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U production -f scripts/create_sums_tables.sql data_sums}}}<<BR>>{{{ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>}}}<<BR>><<BR>>where <min val> is <drms site code> << 48, and and <max val> is <min val> + 281474976710655 (2^<drms site code> - 1), and <drms site code> is the value of the DRMS_SITE_CODE parameter in config.local.
   a. Grant elevated privileges to these tables to the db production user (the scripts should be modified to do this):<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U postgres data_sums}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_tape TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_ds_index_seq,sum_seq TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_file,sum_group,sum_main,sum_open TO production;}}}<<BR>>{{{data_sums=> GRANT ALL ON sum_partn_alloc,sum_partn_avail TO production;}}}<<BR>><<BR>>
   a. SUMS data files are organized into "partitions" which are implemented as directories. Each partition must be named /SUM[0-9]* (e.g., /SUM, /SUM0, /SUM101). Each directory must be owned by the production linux user (e.g., "production). The file-system group to which the directories belong, the SUMS user group (e.g., SOI) must also contain all DRMS users. So, if linux user art will be using DRMS and running DRMS modules, then art must be a member of the SUMS user group. You are free to create as few or many of these partitions as you desire. Create these directories now.<<BR>><<BR>>NOTE: Please avoid using file systems that limit the number of directories and/or files. For example, the EXT3 file system limits the number of directories to 64K. That number is far too small for SUMS usage.
   a. Initialize the sum_partn_avail table with the names of these partitions. For each SUMS partition run the following:<<BR>><<BR>>{{{psql -h <db server host> -p 5434 -U postgres data_sums}}}<<BR>>{{{data_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);}}}<<BR>><<BR>>where <SUMS partition path> is the full path of the partition (the path must be enclosed in single quotes) and <avail bytes> is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

=== Compiling NetDRMS ===
The configuration and compilation of NetDRMS described here can proceed largely independently of the site and/or user setup, which only needs to be done once. It is recommended that the site setup be done first, as the NetDRMS build requires the definition of certain site-dependent names, such as those of the database and server; however, if these names are already known, the libraries can be built without the database and SUMS storage in place. Any code that requires access to the database will not of course function until the DRMS and SUMS services have been set up.

These instructions assume that there is already a NetDRMS database server and associated SUMS server that you can connect to. If that is not the case, then you or someone else at your site will first have to do a Site Installation (above). You must also have the PostgreSQL Core installed at least as a client library on any machine on which you intend to build the package. You should have psql in your path.

If you have not already done so, download the NetDRMS Distribution. This is a gzipped tarfile. Unpack it into a target root directory of your choice, e.g. /usr/local/drms, /opt/netdrms/ or $HOME/drms. In the target root directory (hereinafter referred to as $DRMS), you must supply a config.local file describing your site configuration. If V 2.7 or higher has been installed by your site administrator, you should simply copy or link to their version of the file. For site administrators:

If you had not previously installed a V 2.7 release or higher, you should create the config.local file fresh. You can do so either by copying one from the file config.local.template and editing it to supply the appropriate values, or by running the perl script netdrms_setup.pl which will walk you through the fields. (That script has not been widely tested, and might require some tweaking. In particular it tries to execute some additional scripts at the end that are not yet in the release.)

Most of the entries in the file should be self-explanatory. It is essential that the first variable, LOCAL_CONFIG_SET be changed from NO or commented out. Other variables that are almost certain to require changes are DBSERVER_HOST, DRMS_DATABASE, SUMS_SERVER_HOST, and DRMS_SITE_CODE. If you intend to export as well as import data, your DRMS_SITE_CODE must be registered. See the site code page for a list of currently assigned codes.

However you create your config.local file, as previously stated, it is a good idea to save a copy in a directory outside your $DRMS directory; the SUMS_LOG_BASEDIR would be a good place to keep it if you are the SUMS_MANAGER. Other users' config.local files should match that of the SUMS_MANAGER in any case. In the target root directory $DRMS, run

 . /configure

This simply builds a set of links for include files, man pages, scripts, and jsd (JSOC Series Descriptor) files in common subdirectories below the root. Note that it is a csh script. If you do not have csh or tcsh installed on your system, you will have to make those links yourself. (Chances are that you will have to perform the whole site configuration by hand.) The NetDRMS distribution is currently supported for three target architectures under Linux, named (by default): linux_ia32 (`uname -s` = Linux, `uname -m` = ia32 | i686 | i386) linux_x86_64 (`uname -s` = Linux, `uname -m` = x86_64) and linux_avx. The distribution has been built on both Enterprise Linux versions 4 and 5. Enterprise 5, has a system bug that needs to be fixed in order to build the SUMS server (it does not affect the DRMS client.) See platform notes for instructions on how to fix this bug.

If you are making on any other architecture, the target name will be custom. Binaries and libraries will be placed in appropriate subdirectories based on these names. If you will be making on multiple architectures, or if you wish to change the target architecture name, you should either add the following line near the beginning of the file $DRMS/make_basic.mk

 . JSOC_MACHINE = name

or set your environment variable JSOC_MACHINE to name before running the make. The latter is recommended for future use, so that you can set appropriate paths in your login or shell initialization scripts. If necessary, edit the file $DRMS/make_basic.mk to set your compiler options. The default compilers for Linux are the Intel compiler icc and ifort if available; otherwise gcc and gfortran. If you prefer to use different compilers, change the following two lines in the file accordingly:

 . COMPILER = icc FCOMPILER= ifort

Note that the DRMS Fortran API requires a Fortran 90 compiler. The Fortran compiler is only required if you wish to build Fortran modules that will link against the DRMS library; nothing in the DRMS and SUMS internals and applications uses Fortran. Besides ifort, the gfortran43 compiler should work; there may be a problem with f95. Note that you can only build on a system on which the Postgres SQL Client Applications libraries exist (e.g. libecpg.a). You will also require the OpenSSL secure sockets toolkit; You should have a /usr/include/openssl directory or equivalent on your system where the compiler can locate it by default. N.B. If you are using the icc compiler, it is recommended to use version 11 . There are some very nasty bugs using version 10.*. In the root directory $DRMS, type make. If all goes well, the directory $DRMS/bin/arch_name will be created and filled, likewise the library directory $DRMS/lib/arch_name. If you are building on multiple architectures, repeat this step on each one, being careful to observe the rules in the previous three steps. These instructions should suffice for all users except the manager who needs to initialize the database and/or start the SUMS server. If you do not need to start a SUMS server, you are done. The SUMS manager (production user) should continue with the next step.

There are two parts to setting up NetDRMS. First, the necessary services must be set up at the institution or group that will be hosting the NetDRMS service. The basic preparation and installation only needs to be done once, although the actual software distribution may be updated from time to time without affecting the setup. Second, individual users may wish to set up the NetDRMS software distribution for use or development in their own environment. Again, there are a few administrative tasks that need to be performed once when a user is registered, but the software may be updated or rebuilt at any time. Once the site preparation and setup is complete, user setup is a simple task, so there are two sets of instructions. Most users only need to concern themselves with the second, Installing / Upgrading NetDRMS.

=== Building and installing SUMS ===

To make the SUMS server available, follow steps below, or the SUMS manager (only) needs to run ''make sums'' in the DRMS root directory. This only needs to be done once for the system; individual users do not need to do it. At this point, if you are the SUMS manager, you are ready to proceed with the configuration, build and start of SUMS services. Proceed to the SUMS setup instructions. Otherwise you are ready to go. Please note that you will see many, many warning messages as NetDRMS and SUMS compile. Pages and pages of warnings will likely appear. Unless you have an error, you should be okay to proceed.

 1. Build the SUMS binaries:<<BR>><<BR>>{{{su - <production user>; cd $JSOCROOT; ./configure; make sums}}}<<BR>><<BR>>
 1. Copy the sum_chmown program to <path to sum_chmown> (chosen in step 1a. above), make the production user the owner, and give it setuid privileges:<<BR>><<BR>>{{{su - root}}}<<BR>>{{{cp $JSOCROOT/drms/_linux_x86_64/base/sums/apps/sum_chmown <path to sum_chmown>}}}<<BR>>{{{chown root:root <path to sum_chmown>}}}<<BR>>{{{chmod u+s <path to sum_chmown>}}}<<BR>><<BR>> Note: some sites have made this program into a program that does nothing when called. These sites have only one user that writes files to sums, however, and need not be concerned about different users with different permissions writing files to sums.
 1. Start SUMS: <<BR>><<BR>>{{{$JSOCROOT/base/sums/scripts/sum_start.NetDRMS}}}<<BR>><<BR>>The script does not return a prompt after echoing "sum_svc now available". Just hit RETURN.
 1. To stop SUMS for any reason, run this script:<<BR>><<BR>>{{{$JSOCROOT/base/sums/scripts/sum_stop.NetDRMS}}}<<BR>><<BR>>


== Remote SUMS ==
A local NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units (SUs) associated with these data series too. There are currently two methods to facilitate these SU downloads. The Java Mirroring Daemon (JMD) is a tool that can be installed and configured to download SUs automatically as the series data records are ingested into the local NetDRMS. It fetches these SUs before they are actually used. It can obtain the SUs from any other NetDRMS that has the SUs, not just the NetDRMS that originally produced them. Remote SUMS is a built-in tool that comes with NetDRMS. It downloads SUs as needed - i.e., if a module or program requests the path to the SU or attempts to read it, and it is not present in the local SUMS yet, Remote SUMS will download the SUs. While the SUs are being downloaded, the initiating module or program will poll waiting for the download to complete.

Several components compose Remote SUMS. On the client side, the local NetDRMS, is a daemon that must be running (rsumsd.py). There also must exist some database tables, as well as some binaries used by the daemon. On the server side, all NetDRMS sites that wish to act as a source of SUs for the client, is a CGI (rs.sh). This CGI returns file-server information (hostname, port, user, SU paths, etc.) for the SUs the server has available in response to requests that contain a list of SUNUMs. When the client encounters requests for remote SUs that are not contained in the local SUMS, it requests the daemon to download those SUs. The client code then polls waiting for the request to be serviced. The daemon in turn sends requests to all rs.sh CGIs at all the relevant providing sites. The owning sites return the file-server information to the daemon, and then the daemon downloads the SUs the client has requested, via scp, and notifies the client module once the SUs are available for use. The client module will then exit from its polling code and continue to use the freshly downloaded SUs.

To use Remote SUMS, the config.local configuration file must first be configured properly, and NetDRMS must be re-built. Here are the relevant config.local parameters:
 * JMD_IS_INSTALLED - This must be set to 0 for Remote SUMS use. Currently, either the JMD or the Remote SUMS features can be used, but not both at the same time.
 * RS_REQUEST_TABLE - This is the database table used by the local module and the rsumsd.py daemon running at the local site for communicating SU-download requests. Upon encountering a non-native SUNUM, DRMS will insert a new record into this table to intiate a request for the SUNUM from the owning NetDRMS. The Remote SUMS daemon will service the request and update this record with results.
 * RS_SU_TABLE - This is the database table used by the Remote SUMS daemon to track SUs downloaded from the providing sites.
 * RS_DBHOST - This is the local database-server host that contains the database that contain the requests and SU tables.
 * RS_DBNAME - This is the database on the host that contains the requests and SU tables.
 * RS_DBPORT - This is the port on the local on which the database-server host accepts connections.
 * RS_DBUSER - This is the database user account that the Remote SUMS daemon uses to manage the Remote SUMS requests.
 * RS_LOCKFILE - This is the path to a file that ensures that only one Remote SUMS daemon instance runs.
 * RS_LOGDIR - This is the directory into which the Remote SUMS daemon logs are written.
 * RS_REQTIMEOUT - This is the timeout, in minutes, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.
 * RS_DLTIMEOUT - This is the timeout, in minutes, for an SU to download. If the time the download takes exceeds this value, then all requests waiting for the SU to download will fail.
 * RS_MAXTHREADS - The maximum number of download threads that the Remote SUMS daemon is permitted to run simultaneously. One thread is one scp call.
 * RS_BINPATH - The NetDRMS-binary-path that contains the external programs needed by the Remote SUMS daemon (jsoc_fetch, vso_sum_alloc, vso_sum_put).

After setting-up config.local, you must build or re-build NetDRMS:
{{{
> cd $JSOCROOT
> configure
> make
}}}
It is important to ensure that three binaries needed by the Remote SUMS daemon have been built: jsoc_fetch, vso_sum_alloc, vso_sum_put.

Ensure that Python >= 2.7 is installed. You will need to install some package if they are not already installed: psycopg2, ...

An output log named rslog_YYYYMMDD.txt will be written to the directory identified by the RS_LOGDIR config.local parameter, so make sure that directory exists.

Provide all providing NetDRMS sites your public SSH key. They will need to put that key in their authorized_keys file.

Create the client-side Remote SUMS database tables. Run:
{{{
> $JSOCROOT/base/drms/scripts/rscreatetabs.py op=create tabs=req,su
}}}

Start the rsumsd.py daemon as the user specified by the RS_DBUSER config.local parameter. As this user, start an ssh-agent process and add the public key to it:
{{{
> ssh-agent -c > $HOME/.ssh-agent_rs
> source $HOME/.ssh-agent_rs
> ssh-add $HOME/.ssh/id_rsa
}}}
This will allow you to create a public-private key that has a passphrase while obviating the need to manually enter that passphrase when the Remote SUMS daemon runs scp.

Start SUMS:
{{{
> $JSOCROOT/base/sums/scripts/sum_start.NetDRMS >& <log dir>/sumsStart.log
}}}
Substitute your favorite log directory for <log dir>. There is another daemon, sums_procck.py, that keeps SUMS up and running once it is started. Redirecting to a log will preserve important information that this daemon prints. To stop SUMS, use $JSOCROOT/base/sums/scripts/sum_stop.NetDRMS.

Start the Remote SUMS daemon:
{{{
> $JSOCROOT/base/drms/scripts/rsumsd.py
}}}
Installing the NetDRMS system requires:
 * installing PostgreSQL [ [[#install-pg|Installing PostgreSQL]] ]
 * instantiating a PostgreSQL cluster for two databases (one for DRMS and one for SUMS) [ [[#initialize-pg|Initializing PostgreSQL]] ]
 * installing CFITSIO [ [[#install-cfitsio|Installing CFITSIO]] ]
 * installing the DBD::Pg Perl package [ [[#install-perl-dbdpg|Installing DBD::Pg]] ]
 * installing packages to the system Python 3, or installing a new distribution, like Anaconda [ [[#install-python3|Installing Python3]] ]
 * installing {{{openssl}}} development packages [ [[#install-openssldev|Installing OpenSSL Development Packages]] ]
 * installing the NetDRMS software code tree, which includes code to create DRMS libraries and modules and SUMS libraries [ [[#install-netdrms|Installing NetDRMS]] ]
 * initializing SUMS storage such as hard drives or SSD drives [ [[#initialize-sums-disk|Initializing SUMS Storage]] ]
 * creating DRMS user accounts [ [[#create-users|Creating DRMS User Accounts]] ]
 * running the SUMS daemon (which accepts and processes SUMS requests from DRMS clients) [ [[#run-sums|Running SUMS]] ]

Optional steps include:
 * registering for JSOC-data-series subscriptions and running NetDRMS software to receive, in real time, data updates [ [[#register-subscriptions|Registering for Subscriptions]] ]
 * running the Remote SUMS daemon (which accepts and processes requests for SUs that reside at other NetDRMSs) [ [[#run-remote-sums|Running Remote SUMS]] ]
 * installing the SunPy/drms python package (a Python interface to DRMS) [ [[#install-drms-package|Installing SunPy/drms]] ]
 * installing JSOC-specific project code that is not part of the base NetDRMS installation; the JSOC maintains code to generate JSOC-owned data that is not generally of interest to NetDRMS sites, but sites are welcome to obtain downloads of that code. Doing so involves additional configuration to the base NetDRMS system.
 * installing Slony PostgreSQL data-replication software to become a provider of your site's data
 * installing a webserver that hosts several NetDRMS CGIs to allow web access to your data
 * installing the Virtual Solar Observatory (VSO) software to become a VSO provider of data
 * installing the DRMS Export System [ [[#install-drms-export|Installing DRMS Export]]]

For best results, and to facilitate debugging issues, please follow these steps in order.

== Conventions ==
In this document, parameters to be determined by you the NetDRMS administrator are denoted with angled brackes, {{{<a parameter>}}}. For example, you will need to select a machine to host the PostgreSQL database system, and the name of that host is represented by {{{<PostgreSQL host>}}}. If you choose a host named {{{netdrms_db}}}, let's say, then functionally, you can substitute {{{netdrms_db}}} for {{{<PostgreSQL host>}}} throughout this document.

Part of the process of installing NetDRMS is creating a configuration file named {{{config.local}}}. That file contains numerous parameters, such as {{{SUMS_USEMTSUMS}}}. Through this document, those parameters are denoted with square brackets. As such, this document refers to the the parameter {{{SUMS_USEMTSUMS}}} as {{{[SUMS_USEMTSUMS]}}}.

<<Anchor(install-pg)>>
== Installing PostgreSQL ==
PostgreSQL is a relational database management system. Data are stored primarily in relations (tables) of records that can be ''mapped'' to each other - given one or more records, you can query the database to find other records. These relations are organized on disk in a hierarchical fashion. At the top level are one or more database ''clusters''. A cluster is simply a storage location on disk (i.e., directory). PostgreSQL manages the cluster's data files with a single process, or PostgreSQL ''instance''. Various operations on the cluster will result in PostgreSQL forking new ephemeral child processes, but ultimately there is only one master/parent process per cluster.

Each cluster contains the data for one or more databases. Each cluster requires a fair amount of system memory, so it makes sense to install a single cluster on a single host. It does ''not'' make sense to make separate clusters, each holding one database; each cluster can efficiently support many databases, which are then fairly independent of each other. ''In terms of querying'' the databases are completely independent (i.e., a query on one database cannot involve relations in different databases). However, two databases in a single cluster ''do'' share the same disk directory, so there is not the same degree of independence at the OS/filesystem level. This may only matter if an administrator is operating directly on the files (performing backups, replication, creating standby systems, etc.).

To install PostgreSQL, select a host machine, {{{<PostgreSQL host>}}}, to act as the PostgreSQL database server. We recommend installing ''only'' PostgreSQL on this machine, given the large amount of memory and resources required for optimal PostgreSQL operation. We find a Fedora-based system, such as CentOS, to be a good choice, but please visit [[https://www.postgresql.org/docs]] for system requirements and other information germane to installation. The following instructions assume a Fedora-based Linux system such as CentOS (documentation for other distributions, such as Debian and openSUSE can be found online) and a bash shell.<<BR>><<BR>>
Install the needed PostgreSQL server packages on {{{<PostgreSQL host>}}} by first visiting [[https://yum.postgresql.org/repopackages.php]] to locate and download the PostgreSQL "repo" rpm file appropriate for your OS and architecture. Each repo rpm contains a {{{yum}}} configuration file that can be used to install all supported PostgreSQL releases. You should install the latest version if possible (version 12, as of the time of this writing). Although you can use your browser to download the file, it might be easier to use Linux command-line tools:
{{{
$ curl -OL https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
}}}
Install the yum repo configuration file ({{{pgdg-redhat-all.repo}}}) from the downloaded repo rpm file:
{{{
$ sudo rpm -i pgdg-redhat-repo-latest.noarch.rpm
}}}
This installs the repo configuration file to {{{/etc/yum.repos.d/}}}. Find the names of the PostgreSQL packages needed from the repository; the following assumes PostgreSQL 12, but should you want to install an older version, replace "12" with one of 94, 95, 96, 10, or 11:
{{{
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql[0-9]*\.' | cut -d '.' -f 1
postgresql12
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*devel\.' | cut -d '.' -f 1
postgresql12-devel
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*contrib\.' | cut -d '.' -f 1
postgresql12-contrib
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*libs\.' | cut -d '.' -f 1
postgresql12-libs
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*plperl\.' | cut -d '.' -f 1
postgresql12-plperl
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*server\.' | cut -d '.' -f 1
postgresql12-server
}}}

Use {{{yum}}} to install all four packages:
{{{
$ sudo yum install <packages>
}}}
where {{{<packages>}}} are the package names determined in the previous step ({{{postgresql12 postgresql12-contrib postgresql12-devel postgresql12-libs postgresql12-plperl postgresql12-server}}}).

The rpm package installation will have created the PostgreSQL superuser Linux account {{{postgres}}}; {{{postgres}}} will own the PostgreSQL database clusters and server processes that will be created in the following steps. To perform the next steps, you will need to become user {{{postgres}}}:
{{{
$ sudo su - postgres
}}}

Locate the PostgreSQL-exeutables path:
{{{
$ rpm -ql postgresql12 | grep psql$
/usr/pgsql-12/bin/psql
}}}

and add it to the {{{postgres}}}'s {{{PATH}}} environment variable
{{{
$ export PATH=/usr/pgsql-12/bin:$PATH
}}}

{{{postgres}}} will be running various PostgreSQL programs over time as part of administration and maintenance, so add this {{{export}}} command to {{{/home/postgres/.bashrc}}}.

As described above, create one database cluster for the two databases (one for DRMS data, and one for SUMS data):
{{{
$ whoami
postgres
$ initdb --locale=C -D /var/lib/pgsql/netdrms
}}}

{{{initdb}}} will initialize the cluster data directory, {{{/var/lib/pgsql/netdrms}}} (identified by the {{{-D}}} argument). This will result in the creation of template databases, configuration files, and other items.

<<BR>><<BR>>
IMPORTANT: Make sure that the disk space on {{{/var/lib/pgsql/netdrms}}} is sufficient to hold the DRMS database information for all the desired DRMS series. Please ask the source of these data (e.g., the JSOC if the DRMS data series originate at the JSOC) for an estimate of disk-space usage. If the set of DRMS data series is not known at installation time, then overestimate - obtain a disk-space-usage estimate for the largest series, then multiply that by 10. As a very rough estimate, be prepared to provide a least several terabytes.

<<BR>><<BR>>
The database cluster will contain two configuration files you need to edit: {{{postgresql.conf}}} and {{{pghba.conf}}}. Please refer to the PostgreSQL documentation to properly edit these files. Here are some brief suggestions:
 * {{{postgresql.conf}}} - for changes to take effect of parameters marked with a ^*^, a restart of a running server instance is required ({{{pg_ctl restart}}}), otherwise changes will require a reload ({{{pg_ctl reload}}})
  * {{{listen_addresses}}}^*^ specifies the interface on which the {{{postgres}}} server processes will listen for incoming connections. You will need to ensure that connections can be made from all machines that will run DRMS modules (the modules connect to both the DRMS and SUMS databases), so change the default {{{'localhost'}}} to {{{'*'}}}, which causes the servers to listen on all interfaces:
  {{{
listen_addresses = '*'
  }}}
IMPORTANT!! {{{listen_addresses}}} must be set correctly, along with the entries in the {{{pg_hba.conf}}} file. You cannot use a local connection because DRMS modules do not operate this way. They always make non-local connections to the DB, even if they are running on the DB host machine.
  * {{{port}}}^*^ is the server port on which the server listens for connections.
  {{{
port = 5432
  }}}
  The default port is {{{5432}}}, and unless there is a good reason use {{{5432}}}. This value '''must''' match the value of the {{{RS_DBPORT}}} config.local parameter.
  * {{{logging_collector}}}^*^ controls whether on not stdout and stderr are logged to a file in the database cluster (in the {{{log}}} or {{{pg_log}}} directory, depending on release). By default it is off - set it so {{{on}}} in each cluster.
  {{{
logging_collector = on
  }}}
  * {{{log_rotation_size}}} sets the maximum size, in kilobytes, of a log file. Set this to {{{0}}} to disable rotation, otherwise a new log will be created after the current one grows to some size.
  {{{
log_rotation_size = 0
  }}}
  * {{{log_rotation_age}}} set the maximum age, in minutes, of a log file. Set this to {{{1d}}} (1 day) so that each day a new log file is created.
  {{{
log_rotation_age = 1d
  }}}
  * {{{log_min_duration_statement}}} is the amount of time, in milliseconds, a query must run before triggering a log entry. Set to this 1000 so that only long-running queries, over a second, will be logged.
  {{{
log_min_duration_statement = 1000
  }}}
  * {{{shared_buffers}}}^*^ is the size of shared-memory buffers. For a server dedicated to a single database cluster, this should be about 25% of the total memory.
  {{{
shared_buffers = 32GB
  }}}
 * {{{pg_hba.conf}}} controls the methods by which client authentication is achieved (HBA stands for host-based authentication). It will likely take a little time to understand and properly edit this configuration file. If you are not familiar with networking concepts (such as subnets, name resolution, reverse name resolution, CIDR notation, IPv4 versus IPv6, network interfaces, etc.) then now is the time to become familiar.<<BR>><<BR>>This configuration file contains a set of columns that identify which user can access which database from which machines. It also defines the method by which authenticaton occurs. When a user attempts to connect to a database, the server transverses this list looking for the ''first'' row that matches. Once this row is identified, the user must authenticate - if authentication fails, the connection is rejected. The server does '''''not''''' attempt additional rows.
 For changes to take effect of any of the parameters in this file, a {{{reload}}} of a server instance is required (not a {{{restart}}})
Here are the recommended entries:
 {{{
# local superuser connections
# TYPE DATABASE USER AUTH-METHOD
  local all all trust # this applies ONLY if the user is logged into the PG server AND they do not use the -h argument to psql
  host all all 127.0.0.1/8 trust # for -h localhost, if localhost resolves to an IPv4 address; also for -h 127.0.0.1
  host all all ::1/128 trust # for -h localhost, if localhost resolves to an IPv6 address; also for -h ::1

# non-local superuser connections
# TYPE DATABASE USER ADDRESS AUTH-METHOD
  host all postgres XXX.XXX.XXX.XXX/YY trust

# non-superuser connections (which can be made from any non-server machines only)
# TYPE DATABASE USER ADDRESS AUTH-METHOD
  host netdrms all XXX.XXX.XXX.XXX/YY md5
  host netdrms_sums all XXX.XXX.XXX.XXX/YY md5
 }}}
 where the columns are defined as follows:
 * {{{TYPE}}} - this column defines the type of socket connection made (Unix-domain, TCP/IP, the encryption used, etc.). {{{local}}} is ''only'' relevant to Unix-domain local connections from the database server host {{{<PostgreSQL host>}}} itself. Since only {{{postgres}}} will log into the database server, the first row above applies to the administrator only. {{{host}}} is ''only'' relevant to TCP/IP connections, regardless of the encryption status of the connection.
 * {{{DATABASE}}} - this column identifies the database to which the user has access. Whenever a user attempt to connect to the database server, they specify a database to access. That database must be in the DATABASE column. We recommend using {{{netdrms}}} for non-superusers, blocking such users from accessing all databases except the DRMS one. Conversely, we recommend using {{{all}}} for the superuser so they can access both the DRMS and SUMS databases (and any other that might exist).<<BR>><<BR>>
 NOTE: You are using the database name {{{netdrms}}} in {{{pg_hba.conf}}}, even though you have not actually created that database yet. This is OK; you will do so once you start the PostgreSQL cluster instance.
 * {{{USER}}} - this column identifies which users can access the specified databases.
 * {{{ADDRESS}}} - this column identifies the host IP addresses (or host names, but do not use those) from which a connection is allowed. To specify a range of IP addresses, such as those on a subnet, use a CIDR address. This column should be empty for {{{local}}} connections.
 * {{{AUTH-METHOD}}} - this column identifies the type authentication to use. We recommend using either {{{trust}}} or {{{md5}}}. When {{{trust}}} is specified, PostreSQL will unconditionally accept the connection to the database specified in the row. If {{{md5}}} is specified, then the user will be required to provide a password. If you follow the recommendations above, then for the {{{local}}} row, any user who can log into the database server can access any database in the cluster without any further authentication. Generally only a superuser will be able to log into the database server, so this choice makes sense. For non-{{{local}}} connections by {{{postgres}}}, the Linux PostreSQL superuser {{{postgres}}} can access any database on the server without further authentication. For the remaining non-{{{local}}} non-{{{postgres}}} connections, users will need to provide a password.
Should you need to edit either of these configuration files AFTER you have started the database instance (by running {{{pg_ctl start}}}, as described in the next section), you will need to either {{{reload}}} or {{{restart}}} the instance:
{{{
$ whoami
postgres
# reload
$ pg_ctl reload -D /var/lib/pgsql/netdrms
# restart
$ pg_ctl restart -D /var/lib/pgsql/netdrms
}}}

IMPORTANT!!! You MUST have host/md5 type non-local connections because DRMS modules always make non-local connections to the DB, even when running on the DB host machine.

<<Anchor(initialize-pg)>>
== Initializing PostgreSQL ==
You need to now initialize your PostgreSQL instance by creating the DRMS and SUMS databases, installing database-server languages, creating a schema, creating a relation. To accomplish this become {{{postgres}}}; all steps in this section must be performed by the superuser:
{{{
$ sudo su - postgres
}}}
Start the database instance for the cluster you created:
{{{
$ whoami
postgres
$ pg_ctl start -D /var/lib/pgsql/netdrms
}}}
You previous created {{{/var/lib/pgsql/netdrms}}}, which will most likely be {{{/var/lib/pgsql/netdrms}}}. Ensure the configuration files you created work. This can be done by attempting to connect to the database server as {{{postgres}}} with {{{psql}}} from {{{<PostgreSQL host>}}}:
{{{
$ whoami
postgres
$ hostname
<PostgreSQL host>
$ psql -h <PostgreSQL host> -p 5432
psql (12.1)
Type "help" for help.

postgres=# \q
$
}}}
The PostgreSQL installation resulted in the creation of the {{{postgres}}} database superuser, and since {{{psql}}} connects to the database as the database user with the same name as the Linux user running {{{psql}}}, you will be logged in as database user {{{postgres}}}. This is indicated by the {{{postgres=#}}} prompt (the hash refers to a superuser).

After you successfully see the superuser prompt, create the two databases:
{{{
$ whoami
postgres
# create the DRMS database
$ createdb --locale C -E UTF8 -T template0 netdrms
# create the SUMS database
$ createdb --locale C -E UTF8 -T template0 netdrms_sums
}}}
Install the required database-server languages:
{{{
$ whoami
postgres
# create the PostgreSQL scripting language (versions <= 9.6)
# no need to create the PostgreSQL scripting language (versions > 9.6)
$ createlang plpgsql netdrms
# create the "trusted" perl language (versions <= 9.6)
createlang -h <PostgreSQL host> plperl netdrms
# create the "trusted" perl language (versions > 9.6)
$ psql -h <PostgreSQLhost> -p 5432 netdrms
netdrms=# CREATE EXTENSION IF NOT EXISTS plperl;
netdrms=# \q
# create the "untrused" perl language (versions <= 9.6)
$ createlang -h <PostgreSQL host> plperlu netdrms
# create the "untrused" perl language (versions > 9.6)
netdrms=# CREATE EXTENSION IF NOT EXISTS plperlu;
netdrms=# \q
}}}
The SUMS database does not use any language extensions so there is no need to create any for the SUMS database.

At this point, it is a good idea to create a password for the {{{postgres}}} database superuser:
{{{
$ whoami
postgres
$ psql -h <PostgreSQLhost> -p 5432 netdrms
netdrms=# ALTER ROLE postgres WITH PASSWORD '<new password>';
ALTER ROLE
netdrms=# \q
$
}}}

<<Anchor(install-cfitsio)>>
== Installing CFITSIO ==
The base NetDRMS release requires CFITSIO, a {{{C}}} library used by NetDRMS to read and write FITS files. Visit [[https://heasarc.gsfc.nasa.gov/fitsio/]] to obtain the link to the CFITSIO source-code tarball. The CFITSIO tarball has root directory named {{{cfitsio-X.Y.Z}}}, so download the tarball, then extract it into {{{/opt}}} or some other suitable installation directory. Then make a link from the {{{cfitsio}}} subdirectory to the extracted {{{cfitsio-X.Y.Z}}} directory:
{{{
$ cd
$ curl -OL 'http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio-X.XX.tar.gz'
$ cd /opt
$ sudo tar xvzf ~/cfitsio-X.XX.tar.gz
}}}
Please read the {{{README}}} file for complete installation instructions. As a quick start, run:
{{{
$ cd /opt/cfitsio-X.Y.Z
$ ./configure --prefix=/opt/cfitsio-X.Y.Z
# build the CFITSIO library
$ make
# install the libraries and binaries to /opt/cfitsio-X.Y.Z
$ sudo make install
# create the link from cfitsio to /opt/cfitsio-X.Y.Z
$ sudo su -
$ cd /opt/cfitsio-X.Y.Z/..
$ ln -s /opt/cfitsio-X.Y.Z cfitsio
}}}

CFITSIO has a dependency on {{{libcurl}}} - in fact, any program made by linking to {{{cfitsio}}} will also require {{{libcurl-devel}}} since {{{cfitsio}}} uses the {{{libcurl}}} API. We recommend using {{{yum}}} to install the two packages (if they are not already installed - it is quite likely that {{{libcurl}}} will already be installed):
{{{
$ sudo yum install libcurl-devel
}}}
<<Anchor(install-openssldev)>>
== Installing OpenSSL Development Packages ==
NetDRMS requires the OpenSSL Developer's API. If this API has not already been installed, do so now:
{{{
$ sudo yum install openssl-devel
}}}

<<Anchor(install-perl-dbdpg)>>
== Installing DBD::Pg ==
One step in the installation process will require running a {{{perl}}} script that accesses the PostgreSQL database. In order for this to work, you will need to ensure the {{{DBD::Pg}}} module has been installed. To check for installation, run:
{{{
$ perl -M'DBD::Pg'
}}}
If there is no error about not being able to locate the module, and the command simply hangs, then you are all set (enter {{{ctrl-C}}} to exit). If the module is not installed, and you are running the {{{perl}}} installed with the system, then run {{{yum}}} to identify the package:
{{{
$ yum list | grep -i 'dbd-pg'
...
perl-DBD-Pg.x86_64 2.19.3-4.el7 base
...
}}}

then bringing to bear all your powers of divination, choose the correct package and install it:
{{{
$ sudo yum install 'perl-DBD-Pg'
}}}

If using a non-system {{{perl}}}, use the distro's installation method. If the distro does not have that module, or the distro installer does not work, as a final act of desperation use CPAN
{{{
$ sudo perl -MCPAN -e 'install DBD::Pg'
}}}

<<Anchor(install-python3)>>
== Installing Python3 ==
NetDRMS requires that a number of {{{python}}} packages and modules be present that are not generally part of a system installation. In addition, many scripts require {{{python3}}} and not {{{python2}}}. The easiest way to satisfy these eeds is to install a ''data-science''-oriented {{{python3}}} distribution, such as {{{Anaconda}}}. In that vein, install {{{Anaconda}}} into an appropriate installation directory such as {{{/opt/anaconda3}}}. To locate the {{{Linux}}} installer, visit [[https://docs.anaconda.com/anaconda/install/linux/]]:
{{{
$ curl -OL 'https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh'
$ sha256sum Anaconda3-2019.10-Linux-x86_64.sh
46d762284d252e51cd58a8ca6c8adc9da2eadc82c342927b2f66ed011d1d8b53 Anaconda3-2019.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2019.10-Linux-x86_64.sh
}}}
After some initial prompts, the installer will display
{{{
PREFIX=/home/<user>/anaconda3
}}}
This path is the default installation directory ({{{<user>}}} is the user running {{{bash}}}). Replace the {{{PREFIX}}} path with {{{<Anaconda3 install dir>}}}.

<<Anchor(install-netdrms)>>
== Installing NetDRMS ==
To install NetDRMS, you will need to select an appropriate machine on which to install NetDRMS, an appropriate machine/hardware on which to host the SUMS service, create Linux users and groups, download the NetDRMS release tarball and extract the release source, initialize the Linux environment, create log directories, create the configuration file and run the configuration script, compile and install the executables, create the the DRMS- and SUMS-database users/relations/functions/objects, initialize the SUMS storage hardware, install the SUMS and Remote SUMS daemons.

The optimal hardware configuration will likely depend on your needs, but the following recommendations should suffice for most sites. DRMS and SUMS can share a single host machine. The most widely used and tested Linux distributions are Fedora-based, and at the time of this writing, CentOS is the most popular. Sites have successfully used openSUSE too, but if possible, we would recommend using CentOS. SUMS requires a large amount of storage to hold the DRMS data-series data/image files. The amount needed can vary widely, and depends directly on the amount of data you wish to keep online at any given time. Most NetDRMS sites mirror some amount of (but not all) JSOC SDO data - the more data mirrored, the larger the amount of storage needed. To complicate matters, a site can also mirror only a subset of each data series' data; perhaps one site wishes to retain only the current month's data of many data series, but another wishes to retain all data for one or two series. To decide on the amount of storage needed, you will have to ask the JSOC how much data each series comprises and decide how much of that data you want to keep online. Data that goes offline can always be retrieved automatically from the JSOC again. Data will arrive each day, so request from the JSOC an estimate of the rate of data growth. We recommend doing a rough calculation based upon these considerations, and then doubling the resulting number and installing that amount of storage.

Next, create NetDRMS production Linux user {{{netdrms_production}}}:
{{{
$ sudo useradd netdrms_production
$ sudo passwd netdrms_production
Changing password for user netdrms_production.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
$
}}}
NOTE: ensure that {{{netdrms_production}}} is a valid PostgreSQL ''name'' because NetDRMS makes use of the PostgreSQL feature whereby attempts to connect to a database are made as the database user whose name matches the name of the Linux user connecting to the database. Please see [[https://www.postgresql.org/docs/12/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]] for a description of valid PostgreSQL names.

{{{netdrms_production}}} will access the PostgreSQL databases previously installed by running PostgreSQL executables. Modify {{{netdrms_production}}}'s {{{PATH}}} environment variable so that these executables can be located by the shell. Put the following into {{{/home/netdrms_production/.bashrc}}}:

{{{
# PostgreSQL executables
export PATH=/usr/pgsql-12/bin:$PATH
}}}

Make sure you have either re-logged-in or sourced {{{/home/netdrms_production/.bashrc}}}.

NetDRMS requires additional Python packages not included in the Anaconda distribution, but if you install Anacoda, then the number of additional packages you need to install is minimal. If you have a different {{{python}}} distribution, then you may need to install additional packages. To install new {{{Anaconda}}} packages, as {{{netdrms_production}}} first create a virtual environment for NetDRMS (named {{{netdrms}}}):
{{{
$ whoami
netdrms_production
# Sets path to Anaconda3 binaries so that the next conda command will success (will edit .bashrc)
$ /opt/miniconda3/bin/conda init bash
$ source ~/.bashrc
$ conda create --name netdrms
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/netdrms_production/.conda/envs/netdrms

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate netdrms
#
# To deactivate an active environment, use
#
# $ conda deactivate
}}}

VERY IMPORTANT!! As described above in the {{{conda create}}} output, run:
{{{
$ conda activate netdrms
}}}

If you do not do this, then conda might invoke the wrong python later in this tutorial.

== VERY, VERY IMPORTANT!! ==

{{{conda init bash}}} modifies the user environment to manage which python executables are invoked when {{{python}}}, et al., are run from the shell. It does so by modifying the {{{/home/netdrms_production/.bashrc}}} file, adding {{{PATH}}}-setting code that is conditional on environment variables that are in turn modified when you run {{{conda activate}}} and {{{conda deactivate}}}. In the ''deactivated'' state, the path to the base conda system is included in {{{PATH}}}. In this tutorial, this would be {{{/opt/miniconda3/bin}}}, the installation directory you chose when you installed {{{miniconda}}}. When you activate the {{{netdrms}}} environment, the conditional code in {{{/home/netdrms_production/.bashrc}}} '''replaces''' {{{/opt/miniconda3/bin}}} with the path to the executables in the {{{netdrms}}} virtual environment, {{{/home/netdrms_production/.conda/envs/netdrms/bin}}}.

If you were to run {{{conda activate netdrms}}} right now, before you had installed any packages into the virtual environment, the {{{PATH}}} environment variable would include the path to the virtual-environment executables, {{{/home/netdrms_production/.conda/envs/netdrms/bin}}}. But since you have not yet installed any packages into the virtual environment, {{{/home/netdrms_production/.conda/envs/netdrms/bin}}} would be empty. And due to the way the {{{PATH}}}-setting code removes the path to {{{/opt/miniconda3/bin}}} when the virtual environment is activated, {{{PATH}}} would not include any path to the {{{miniconda}}} instance you installed. If you were to run {{{python}}} from the command line, the system python would likely launch, which is definitely not what you want to happen.

I consider this to be a conda bug - I do not expect activation of a virtual environment to result in {{{PATH}}} pointing to python executables outside of the installation. And this is true for not only {{{python}}}, but also for other python executables, like {{{pip}}} and {{{activate}}}. If you install any package into this environment, then it seems that conda will install '''ALL''' executables (newer version when available) into the environment, and from then on your {{{PATH}}} will be set correctly - always pointing to python executables inside the {{{miniconda}}} installation, regardless of activation status. But if you determine that you do not need to install any additional python packages (unlikely), then this situation is a bit of a land mine. To avoid this headache, '''ensure that you always install some package into your virtual environment'''.

Then install the new conda packages using {{{conda}}}:
{{{
$ whoami
netdrms_production
$ conda install -n netdrms psycopg2 psutil
Collecting package metadata (current_repodata.json): done
...
# pySmartDL is in the conda-forge channel
$ conda install -n netdrms -c conda-forge pySmartDL python-dateutil
}}}

NOTE: By installing these packages in the {{{netdrms}}} virtual environment, you will also install the {{{python}}} package in the environment since the explicitly listed packages have a dependency on {{{python}}}. This is important, because the next step requires that {{{pip}}}, part of {{{python}}}, be present in the environment.

From now on, the {{{netdrms_production}}} should use the virtual environment. You should make sure that {{{$PYTHONPATH}}} is not set, otherwise it might interfere with the running of miniconda3 {{{python}}}. Modify {{{/home/netdrms_production/.bashrc}}} to do so:
{{{
# use Python virtual environment by default
unset PYTHONPATH
}}}

For the changes to {{{/home/netdrms_production/.bashrc}}} to take effect, either logout/login or source {{{/home/netdrms_production/.bashrc}}}.

All non-production users will use the {{{netdrms}}} virtual environment. Make sure that users can access the environment:
{{{
$ whoami
netdrms_production
$ chmod o+x /home/netdrms_production
}}}

Create the Linux group {{{<SUMS users>}}}, e.g. {{{sums_users}}}, to which all SUMS users belong, including {{{netdrms_production}}}. This group will be used to ensure that all SUMS users can create new data files in SUMS:
{{{
$ sudo groupadd <SUMS users>
}}}
Add {{{netdrms_production}}} to this group (later you will add each SUMS user - users who will read/write SUMS data files - to this group as well):
{{{
$ sudo usermod -a -G <SUMS users> netdrms_production
$ id netdrms_production
uid=1001(netdrms_production) gid=1001(netdrms_production) groups=1001(netdrms_production),1002(sums_users)
}}}

On {{{NetDRMS host}}}, clone the JSOC git repo into {{{/opt/jsoc-git}}}. From your local copy of this repo, you will install NetDRMS executables and libraries into {{{/opt/netdrms-vXX.X}}} and then make a link from {{{/opt/netdrms}}} to {{{/opt/netdrms-vXX.X}}}. In this way, you can upgrade your version of NetDRMS by installing into a new version into a new {{{/opt/netdrms-vXX.X}}} directory, and then updating the {{{/opt/netdrms}}} link.

Create {{{/opt/jsoc-git}}} and make {{{netdrms_production}}} the owner:
{{{
$ sudo mkdir -p /opt/jsoc-git
$ sudo chown netdrms_production:netdrms_production /opt/jsoc-git
}}}

As {{{netdrms_production}}}, clone the git repository that contains the desired NetDRMS release. Currently, this repository is private, which means you will need to create a GitHub account, set up ssh keys, and be given permission to access this repository. To set up the ssh keys, create a modern ssh key, like an {{{ed25519}}} key:

{{{
$ ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/netdrms_production/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/netdrms_production/.ssh/id_ed25519.
Your public key has been saved in /home/netdrms_production/.ssh/id_ed25519.pub.
The key fingerprint is:
...
}}}

Once you have your private-public key pair, you will need to upload it to your GitHub account. Click on your profile picture in GitHub, then in the left pane, click on SSH and GPG keys, click on the blue button labeled "New SSH key", provide a helpful title, then paste the contents of your public key into the text-edit box and, finally, click on the blue button to finalize the key.

After you have successfully uploaded the public part of your ssh key, locate the release's tag at {{{https://github.com/JSOC-SDP/JSOC-orig/tags}}}, and then clone the repository at this tag:
{{{
$ whoami
netdrms_production
$ cd /opt/jsoc-git/..
$ git clone --branch <release tag> git@github.com:JSOC-SDP/JSOC-orig.git /opt/jsoc-git
}}}

This will put your repo into a "detached HEAD" state, which essentially means that you can no longer make changes to the repo's main branch. If you would like to make release hotfixes, you will first need to create a hotfix branch off of the main branch, switching to it:
{{{
$ git switch -c <new hotfix branch>
}}}

If you want to work on a new feature, first switch to the {{{develop}}} branch:
{{{
$ git switch develop
}}}

and then make a feature branch off the {{{develop}}} branch, switching to it:
{{{
$ git switch -c <new feature branch>
}}}

To propose that the JSOC incorporates these changes into the official GitHub repo, push hotfix or feature branches back to the GitHub repo. If these changes look good to us, we will merge them into our {{{main}}} and/or {{{develop}}} branches to be available to all in future releases.

Make the SUMS log directory {{{<SUMS logs>}}} on the SUMS server machine. Various SUMS log files will be written to this directory. A suitable directory would reside in the {{{netdrms_production}}} user's home directory, e.g., {{{$HOME/log/SUMS}}}
{{{
$ whoami
netdrms_production
$ cd ~
$ mkdir -p <SUMS logs>
}}}

Select appropriate {{{C}}} and {{{Fortran}}} compilers. The DRMS part of NetDRMS must be compiled with a {{{C}}} compiler. NetDRMS supports both the GNU {{{C}}} compiler ({{{gcc}}}), and the Intel {{{C++}}} compiler ({{{icc}}}). Certain JSOC-specific code requires {{{Fortran}}} compilation. For those projects, NetDRMS supports the GNU {{{Fortran}}} compiler ({{{gfortran}}}), and the Intel {{{Fortran}}} compiler ({{{ifort}}}). SUMS is implemented as a Python daemon, so no compilation step is needed. Both GNU and Intel are widely used, so feel free to use either. By default, Intel compilers are used. There are two methods for changing the compilers:
 * as the {{{netdrms_production}}}, you can set the {{{JSOC_COMPILER}}} and {{{JSOC_FCOMPILER}}} environment variables (we recommend doing so in {{{.bashrc}}}):
 {{{
$ whoami
netdrms_production
$ vi
i
# .bashrc

# set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
export JSOC_COMPILER=<C compiler>

# set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
export JSOC_FCOMPILER=<Fortran compiler>
ESC
:wq
$
 }}}

If you have chosen an Intel compiler, please keep in mind that it might be necessary to source an environment file for proper linking to occur. If so, do so in your {{{~/.bash_rc}}}, {{{~/.bash_profile}}} or other appropriate resource file:

{{{
[ -f /opt/intel/oneapi/setvars.sh ] && source /opt/intel/oneapi/setvars.sh
}}}

Create the {{{/opt/jsoc-git/config.local}}}, the master configuration file for both DRMS and SUMS, using {{{/opt/jsoc-git/config.local.newstyle.template}}} as a template. This file contains a number of configuration parameters, along with detailed descriptions of what they control and suggested values for those parameters. The configuration script, {{{configure}}}, reads this file, and then creates one output files, {{{drmsparams.*}}}, in {{{/opt/jsoc-git/<architecture dir>/base/localization}}} for each of several programming and scripting languages({{{C}}}, {{{GNU make}}}, {{{perl}}}, {{{python}}}, {{{bash}}}). These files are directly readable by the several languages used by NetDRMS. Lines that start with whitespace or the hash symbol, {{{#}}}, are ignored.

NOTE: if you have an older NetDRMS and you are going to use the {{{config.local}}} from that NetDRMS installation for a new NetDRMS installation, then you might have an ''old-style'' {{{config.local}}}. You know you have an old-style configuration file if the {{{__STYLE__}}} section does not exist in the file, or if it does exist, the value in that section is {{{old}}}. If that is the case, then you will need to compare {{{/opt/jsoc-git/config.local.template}}} in the old installation to the analogous file in the new installation to determine the set of parameters that have changed between releases.

Several sections compose {{{config.local}}}:
{{{
__STYLE__
new

__DEFS__
# these are NetDRMS-wide parameter values; the format is <quote code>:<parameter name><whitespace>+<parameter value>;
# the configuration script uses <quote code> to assist in creating language-specific parameters; <quote code> is one of:
# 'q' (enclose the parameter value in double quotes)
# 'p' (enclose the parameter value in parentheses)
# 'a' (do not modify the parameter value).

__MAKE__
# these are make variables used by the make system during compilation - they generally contain paths to third-party code
}}}

Before creating {{{config.local}}}, please request from the JSOC a value for {{{DRMS_LOCAL_SITE_CODE}}}. This code uniquely identifies each NetDRMS installation. Each site requires one ID for each of its NetDRMS installations.

The {{{__DEFS__}}} section:
 * {{{BIN_PY3}}} - the path to the Python 3 {{{python}}} executable.
 * {{{DBNAME}}} - the name of the DRMS database: this is {{{netdrms}}}; this parameter exists in case you want to select a different name, but we don't recommend changing it.
 * {{{DRMS_LOCAL_SITE_CODE}}} - a 15-bit hexadecimal string that globally and uniquely identifies the NetDRMS. Each NetDRMS requires a unique code for each installation. Values greater than or equal to {{{0x4000}}} denote a development installation and need not be unique. If you plan on generating data that will be distributed outside of your site, please obtain a unique value from the JSOC.
 * {{{DRMS_LOCK_DIR}}} - the directory to which the DRMS library writes various lock files.
 * {{{DRMS_LOG_DIR}}} - the directory to which the DRMS library writes various log files.
 * {{{DRMSPGPORT}}} - the port on the database host on which the database server is listening (5432 by default)
 * {{{DX_LISTEN_PORT}}} - the port (on the host machine that accepts data-transfer client connections) to which the data-transfer server listens
 * {{{DX_PACKAGE_HOST}}} - the host machine that accepts data-transfer client connections
 * {{{DX_PACKAGE_ROOT}}} - the directory to which the data-transfer server writes tar files
 * {{{EXPORT_HANDLE_DIR}}} - the directory to which export programs save handles; create this directory if it does not exist and make sure is globally writeable.
 * {{{EXPORT_LOCK_DIR}}} - the directory to which export programs write lock files; create this directory if it does not exist and make sure is globally writeable.
 * {{{EXPORT_LOG_DIR}}} - the directory to which export programs write logs; create this directory if it does not exist.
 * {{{EXPORT_PENDING_REQUESTS_MAX_TABLE}}} - an optional database table that contains the maximum number of export requests an export user can make simultaneously
 * {{{EXPORT_PENDING_REQUESTS_TABLE}}} - a database table that contains a record of each pending export request
 * {{{EXPORT_PENDING_REQUESTS_TIME_OUT}}} - after this number of minutes a request is considered to have timed out
 * {{{EXPORT_USER_INFO_FN}}} - an optional database function that returns user information associated with an export email address
 * {{{JMD_IS_INSTALLED}}} - if set to 1, then the Java Mirroring Daemon alternative to Remote SUMS is used: this should be {{{0}}}.
 * {{{POSTGRES_ADMIN}}} - the Linux user that owns the PostgreSQL installation and processes: this is {{{postgres}}}.
 * {{{RS_BINPATH}}} - the NetDRMS binary path that contains the external programs needed by the Remote SUMS (e.g., {{{jsoc_fetch}}}, {{{vso_sum_alloc}}}, {{{vso_sum_put}}}).
 * {{{RSCLIENT_TIMEOUT}}} - after this number of minutes the remote-sums client will cancel a pending Storage-Unit download
 * {{{RS_DBHOST}}} - the name of the Remote SUMS database cluster host; this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed.
 * {{{RS_DBNAME}}} - the Remote SUMS database - this is {{{netdrms}}}.
 * {{{RS_DBPORT}}} - the port that the Remote SUMS database cluster instance is listening on: this is {{{5432}}}.
 * {{{RS_DBUSER}}} - the Linux user that runs Remote SUMS; this is also the database user who owns the Remote SUMS database objects: this is {{{netdrms_production}}}.
 * {{{RS_DLTIMEOUT}}} - the timeout, in seconds, for an SU to download. If the download time exceeds this value, then all requests waiting for the SU to download will fail.
 * {{{RS_LOCKFILE}}} - the (advisory) lockfile used by Remote SUMS to prevent multiple instances from running.
 * {{{RS_LOGDIR}}} - the directory in which remote-sums log files are written.
 * {{{RS_MAXTHREADS}}} - the maximum number of SUs that Remote SUMS can process simultaneously.
 * {{{RS_N_WORKERS}}} - the number of {{{scp}}} worker threads - at most, this many {{{scp}}} processes will run simultaneously
 * {{{RS_REQTIMEOUT}}} - the timeout, in seconds, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.
 * {{{RS_REQUEST_TABLE}}} - the Remote SUMS database relation that contains Remote SUMS requests; this is {{{<Remote SUMS requests>}}}, which should be {{{drms.rs_requests}}}; DRMS modules insert request rows in this table, and Remote SUMS locates the requests and manages rows in this table.
 * {{{RS_SCP_MAXPAYLOAD}}} - the maximum total payload, in MB, per download. As soon as the combined payload of SUs ready for download exceeds this value, then the SUs are downloaded with a single {{{scp}}} process.
 * {{{RS_SCP_MAXSUS}}} - the maximum size of the SU download queue. As soon as this many SUs are ready for download, they are downloaded with a single {{{scp}}} process.
 * {{{RS_SCP_TIMEOUT}}} - if there are SUs ready for download, and no {{{scp}}} has fired off within this many seconds, then the SUs that are ready to download are downloaded with a single {{{scp}}} process.
 * {{{RS_SITE_INFO_URL}}} - the service at JSOC that is used by Remote SUMS to locate the NetDRMS site that owns SUMS storage units; this is {{{Remote SUMS site URL}}}.
 * {{{RS_SU_EXPIRATION}}} - the default expiration date for all SUs ingested by Remote SUMS; if the SU being ingested is part of a data series, then Remote SUMS obtains the expiration for the SU from the data series' definition instead; as an alternative to {{{RS_SU_EXPIRATION}}}, {{{RS_SU_LIFESPAN}}} can be used to specify the expiration date of newly ingested SUs; {{{RS_SU_EXPIRATION}}} takes predent over {{{RS_SU_LIFESPAN}}}. NOTE: you will need to define at least one of {{{RS_SU_EXPIRATION}}} or {{{RS_SU_LIFESPAN}}} for {{{rsumsd.py}}} to work properly.
 * {{{RS_SU_ARCHIVE}}} - the default value of the archive flag for newly ingested SUs; if the SU being ingested is part of a data series, then Remote SUMS obtains the archive flag from the data series' definition instead; the truth value can be one of several character strings that implies TRUE or FALSE.
 * {{{RS_SU_LIFESPAN}}} - the default lifespan ("retention time"), in days, of a newly ingested SU; if the SU being ingested is part of a data series, then Remote SUMS obtains the lifespan for the SU from the data series' definition instead; as an alternative to {{{RS_SU_LIFESPAN}}}, {{{RS_SU_EXPIRATION}}} can be used to specify the lifespan of newly ingested SUs; {{{RS_SU_EXPIRATION}}} takes predent over {{{RS_SU_LIFESPAN}}}.
 * {{{RS_SU_TAPEGROUP}}} - the default value of the tapegroup for newly ingested SUs; if the SU being ingested is part of a data series, then Remote SUMS obtains the tapegroup from the data series' definition instead.
 * {{{RS_TMPDIR}}} - the temporary directory into which SUs are downloaded. This should be on the same file system on which the SUMS partitions reside.
 * {{{RSCLIENT_TIMEOUT}}} - the time interval, in minutes, after which {{{rsums-clientd.py}}} will error-out a request IF during that time interval at least one SU could not be downloaded.
 * {{{SCRIPTS_EXPORT}}} - the path to the directory in the NetDRMS installation that contains the export scripts.
 * {{{SERVER}}} - the name of the DRMS database cluster host: this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed.
 * {{{SS_HIGH_WATER}}} - partition scrubbing is initiated only after partition percent usage rises above the high-water mark.
 * {{{SS_LOCKFILE}}} - the (advisory) lockfile used by the SU steward to prevent multiple instances of the steward from running.
 * {{{SS_LOW_WATER}}} - each SUMS partition is scrubbed until its percent usage falls below the low-water mark.
 * {{{SS_REHYDRATE_INTERVAL}}} - the time interval, in seconds, between updates to the per-partition cache of expired SUs; this value applies to all partitions that are scrubbed; for each partition, a steward thread queries its cache to select the next SUs to delete (which are sorted by increasing expiration date).
 * {{{SS_SLEEP_INTERVAL}}} - the interval, in seconds, between flushing/caching expired SU lists (use a smaller number if the system experience a high rate of SU expiration).
 * {{{SS_SU_CHUNK}}} - the number of SUs in a partition that are deleted at one time; SUs are deleted one chunk at a time until the partition usage falls below the low-water mark.
 * {{{SUMBIN_BASEDIR}}} - the directory in which sum_chmown, a root setuid program is installed; must be mounted locally on the machine on which the SUMS partition are mounted: this is {{{<NetDRMS root>}}}.
 * {{{SUMLOG_BASEDIR}}} - the path to the directory that contains various SUMS log files; this is {{{<SUMS logs>}}}.
 * {{{SUMPGPORT}}} - the port that the SUMS database cluster host is listening on: this is {{{5432}}}, unless DRMS and SUMS reside in different clusters on the same host (something that is not recommended since a single PostgreSQL cluster requires a substantial amount of system resources).
 * {{{SUMS_DB_HOST}}} - the name of the SUMS database cluster host: this is {{{<PostgreSQL host>}}}, the machine on which PostgreSQL was installed; NetDRMS allows for creating a second cluster for SUMS, but in general this will not be necessary unless extremely heavy usage requires separating the two clusters.
 * {{{SUMS_GROUP}}} - the name of the Linux group to which all SUMS Linux users belong: this is {{{<SUMS users>}}}.
 * {{{SUMS_MANAGER}}} - the SUMS database user who owns the SUMS database objects which are manipulated by Remote SUMS and SUMS itself; it should be the Linux user that runs SUMS and owns the SUMS storage directories - this is {{{netdrms_production}}}
 * {{{SUMS_MULTIPLE_PARTNSETS}}} - SUMS has more than one partition set: more than likely, this is {{{0}}}.
 * {{{SUMS_MT_CLIENT_RESP_TIMEOUT}}} - the interval of time, in minutes, {{{sumsd.py}}} will wait for a client response; after this interval elapses without a client resopnse {{{sumsd.py}}} will destroy the client connection.
 * {{{SUMS_READONLY_DB_USER}}} - the SUMS database user who has read-only access to the SUMS database objects; it is used by the Remote SUMS client ({{{rsums-clientd.py}}}) to check for the presence of SUs before requesting they be downloaded.
 * {{{SUMS_TAPE_AVAILABLE}}} - SUMS has a tape-archive system.
 * {{{SUMS_USEMTSUMS}}} - use the multi-threaded Python SUMS: this is {{{1}}}.
 * {{{SUMS_USEMTSUMS_ALL}}} - use the multi-threaded Python SUMS for all SUMS API methods; {{{SUMS_USEMTSUMS_ALLOC}}}, {{{SUMS_USEMTSUMS_CONNECTION}}}, {{{SUMS_USEMTSUMS_DELETESUS}}}, {{{SUMS_USEMTSUMS_GET}}}, {{{SUMS_USEMTSUMS_INFO}}}, and {{{SUMS_USEMTSUMS_PUT}}} are ignored: this is {{{1}}}.
 * {{{SUMS_USEMTSUMS_ALLOC}}} - use the MT SUMS daemon for the SUM_alloc() and SUM_alloc2() API function.
 * {{{SUMS_USEMTSUMS_CONNECTION}}} - use the MT SUMS daemon for the SUM_open() and SUM_close() API functions.
 * {{{SUMS_USEMTSUMS_DELETESUS}}} - use the MT SUMS daemon for the SUM_delete_series() API function.
 * {{{SUMS_USEMTSUMS_GET}}} - use the MT SUMS daemon for the SUM_get() API function.
 * {{{SUMS_USEMTSUMS_INFO}}} - use the MT SUMS daemon for the SUM_infoArray() API function.
 * {{{SUMS_USEMTSUMS_PUT}}} - use the MT SUMS daemon for the SUM_put() API function.
 * {{{SUMSD_LISTENPORT}}} - the port that SUMS listens to for incoming requests.
 * {{{SUMSD_MAX_THREADS}}} - the maximum number of SUs that SUMS can process simultaneously.
 * {{{SUMSERVER}}} - the SUMS host machine; this is {{{<SUMS host>}}}.
 * {{{WEB_DBUSER}}} - the DRMS database user account that {{{cgi}}} programs access when they need to read from or write to database relations.
 * {{{WL_HASWL}}} - if 1 then this DRMS has a whitelist of private series that are accessible on a public web site

The {{{__MAKE__}}} section:
 * {{{INCS_INSTALL_DIR_cfitsio}}} - the path to the installed CFITSIO header files: this is {{{/opt/cfitsio-X.Y.Z/include}}}
 * {{{LIBS_INSTALL_DIR_cfitsio}}} - the path to the installed CFITSIO library files: this is {{{/opt/cfitsio-X.Y.Z/lib}}}
 * {{{CFITSIO_LIB}}} - the name of the CFITSIO library (cfitsio)
 * {{{INCS_INSTALL_DIR_pq}}} - the path to the installed PostgreSQL header files: this is {{{/usr/pgsql-12/include}}}
 * {{{LIBS_INSTALL_DIR_pq}}} - the path to the installed PostgreSQL library files: this is {{{/usr/pgsql-12/lib}}}
 * {{{POSTGRES_LIB}}} - the name of the PostgreSQL C API library (AKA {{{libpq}}}): this is always {{{pq}}}
 * {{{INCS_INSTALL_DIR_ecpg}}} - the path to the installed PostgreSQL ecpg header files: this is {{{/usr/pgsql-12/include}}}
 * {{{LIBS_INSTALL_DIR_ecpg}}} - the path to the installed PostgreSQL ecgp library files: this is {{{/usr/pgsql-12/lib}}}
 * {{{INCS_INSTALL_DIR_crypto}}} - the system path to the crypto-library header file
 * {{{LIBS_INSTALL_DIR_crypto}}} - the system path to the crypto library
 * {{{INCS_INSTALL_DIR_png}}} - the system path to the png-library header file
 * {{{LIBS_INSTALL_DIR_png}}} - the system path to the png library

When installing NetDRMS updates, you might need to update your {{{/opt/jsoc-git/config.local}}}. Use the new {{{config.local.newstyle.template}}} to obtain information about parameters new to the newer release. Many of the parameter values have been determined during the previous steps of the installation process.

Run the configuration {{{csh}}} script, {{{configure}}}, which is included in {{{/opt/jsoc-git}}}.
{{{
$ whoami
netdrms_production
$ cd /opt/jsoc-git
$ ./configure
}}}

Compile and install NetDRMS:
{{{
$ whoami
netdrms_production
$ cd /opt/jsoc-git
$ make base_all
$ make install prefix=<NetDRMS installation dir>
}}}

As {{{make install prefix=<NetDRMS installation dir>}}} completes, it prints the following just before exiting:
{{{
Make sure you source one of the generated env files:
For bash: <NetDRMS installation dir>/drms-env-linux_avx2.bash
For csh: <NetDRMS installation dir>/drms-env-linux_avx2.csh
}}}

You will need to source the appropriate environment file before you continue as many of the commands require that installation-directory environment variables are set properly. Also, add installation paths the {{{PATH}}} environment variable. It is probably best to add the following to {{{~/.bash_rc}}}, {{{~/.bash_profile}}} or a similar resource file for each user who will run NetDRMS programs. In this instruction manual, this will be true of every NetDRMS user, of {{{postgres}}}, and of {{{netdrms_production}}}:

{{{
[ -f /opt/netdrms/drms-env-linux_avx2.bash ] && source /opt/netdrms/drms-env-linux_avx2.bash
PATH=$DRMS_BINS_INSTALL_DIR:$DRMS_SCRS_INSTALL_DIR:$PATH
}}}

As {{{postgres}}}, run two {{{SQL}}} scripts included in the NetDRMS installation, to create the {{{admin}}} and {{{drms}}} schemas and their relations, and the {{{jsoc}}} and {{{sumsadmin}}} database users, data types, and functions:
{{{
$ whoami
postgres
# use psql to execute SQL script
# creates DRMS database tables
$ psql -h <PostgreSQL host> -p 5432 -U postgres -f $DRMS_SCRS_INSTALL_DIR/NetDRMS.sql netdrms
CREATE SCHEMA
GRANT
CREATE TABLE
CREATE TABLE
GRANT
GRANT
CREATE SCHEMA
GRANT
CREATE ROLE
CREATE ROLE
# creates DRMS databse functions
$ psql -h <PostgreSQL host> -p 5432 -U postgres -f $DRMS_SCRS_INSTALL_DIR/create_database_functions.sql netdrms
}}}

For more information about the purpose of these objects, read the comments in the {{{NetDRMS.sql}}} and {{{createpgfuncs.pl}}}.

Make {{{<DRMS DB production user>}}} a ''DRMS user''. This not only makes a database account for {{{<DRMS DB production user>}}} (whose name is {{{<DRMS DB production user>}}}), but it also enters information into the database that allows {{{<DRMS DB production user>}}} to run DRMS modules, such as {{{show_info}}}. In order to do this, you will need to connect to the {{{netdrms}}} database, using the only account that currently exists: {{{<PostgreSQL DB superuser>}}}. Make sure that you've modified {{{postgres}}}'s environment as described above to source the DRMS environment file, to set the {{{PATH}}} environment variable, and to also source the Intel environment file (if needed).

DRMS modules connect to the database using the Linux user name as the database user name because, by default, PosgreSQL clients use the operating-system name for the database account. For example, if Linux user {{{netdrms_production}}} runs {{{show_info}}}, then {{{show_info}}} will connect to the DRMS database as ''database user'' {{{netdrms_production}}}. So {{{netdrms_production == <DRMS DB production user>}}}. So to make {{{netdrms_production}}} a DRMS user, provide {{{netdrms_production}}} for the {{{<DRMS DB production user>}}} argument

{{{
$ whoami
postgres
$ perl $DRMS_SCRS_INSTALL_DIR/newdrmsuser.pl netdrms <PostgreSQL host> 5432 <DRMS DB production user> <initial password> <DRMS DB production user namespace> user 1
Connection to database with 'dbi:Pg:dbname=netdrms;host=drms;port=5432' as user '<DRMS DB production user>' ... success!
executing db statment ==> CREATE USER <DRMS DB production user>
executing db statment ==> ALTER USER <DRMS DB production user> WITH password '<initial password>'
executing db statment ==> GRANT jsoc to <DRMS DB production user>
running cmd-line ==> masterlists dbuser=<DRMS DB production user> namespace=<new DB user namespace> nsgrp=user
Please type the password for database user "postgres":
Connected to database 'netdrms' on host '<PostgreSQL host>' and port '5432' as user 'postgres'.
Created new drms_series...
Created new 'drms_keyword'...
Created new 'drms_link'...
Created new 'drms_segment'...
Created new 'drms_session'...
Created new drms_sessionid_seq sequence...
Commiting...
Done.
executing db statment ==> INSERT INTO admin.sessionns VALUES ('<DRMS DB production user>', '<DRMS DB production user namespace>')
}}}

where {{{<DRMS DB production user namespace>}}} is the PostgreSQL namespace dedicated to {{{<DRMS DB production user}}}. Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this page]] for assistance with choosing {{{<DRMS DB production user namespace>}}}. The general naming convention is to prepend the database user name with an abbreviation to identify the site that owns the data in the namespace, like {{{<site id>_<DRMS DB production user>}}}. The {{{<site_id>}}} used here should be used for all NetDRMS users created later in these instructions.

Add database permissions to the {{{<DRMS DB production user>}}}. This will allow {{{<DRMS DB production user>}}} to create schemas in the DRMS and SUMS databases:
{{{
$ whoami
postgres
# for the DRMS database
$ psql -h <PostgreSQL host> -p 5432 netdrms
netdrms=# GRANT ALL ON DATABASE netdrms TO <DRMS DB production user>;
GRANT
netdrms=# \q
# for the SUMS database
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms=# GRANT ALL ON DATABASE netdrms_sums TO <DRMS DB production user>;
GRANT
netdrms=# \q
}}}

As {{{netdrms_production}}}, create a {{{.pgpass}}} file. This file contains the PostgreSQL user account password, obviating the need to manually enter the database password each time a database connection attempt is made:
{{{
$ whoami
netdrms_production
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass
}}}

Now that we have a production user, {{{<DRMS DB production user>}}}, in the database, we'd like for it to own all the database objects that were created by the {{{NetDRMS.sql}}} and {{{createpgfuncs.pl}}} scripts (these objects are all specific to NetDRMS). These two scripts were run by {{{<PostgreSQL DB superuser>}}} because this super user had the elevated privileges needed to create these objects - plus {{{<PostgreSQL DB superuser>}}} was the only database user in existence at that point. Run the following, as {{{postgres}}}, to alter ownerships:
{{{
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432 netdrms << EOF
ALTER SCHEMA admin OWNER TO <DRMS DB production user>;
ALTER TABLE admin.ns OWNER TO <DRMS DB production user>;
ALTER TABLE admin.sessionns OWNER TO <DRMS DB production user>;
ALTER SCHEMA drms OWNER TO <DRMS DB production user>;
ALTER TYPE drmskw OWNER TO <DRMS DB production user>;
ALTER TYPE drmsseries OWNER TO <DRMS DB production user>;
ALTER TYPE drmssession OWNER TO <DRMS DB production user>;
ALTER TYPE drmssg OWNER TO <DRMS DB production user>;
ALTER TYPE rep_item OWNER TO <DRMS DB production user>;
EOF
ALTER SCHEMA
ALTER TABLE
ALTER TABLE
ALTER SCHEMA
ALTER TYPE
ALTER TYPE
ALTER TYPE
ALTER TYPE
ALTER TYPE
$
}}}

Some features of NetDRMS require the installation of a python packages included in the NetDRMS distribution: {{{drms_parameters}}}, {{{drms_utils}}}, {{{drms_export}}}, and {{{sums_client}}}. {{{drms_export}}} has a dependency on {{{Sunpy/drms}}}, so make sure to install it first (see [ [[#install-drms-package|Installing SunPy/drms]] ]). Each package is contained within a {{{lib/py}}} directory. To install these packages, make sure you are running as {{{netdrms_production}}} (since this user created the {{{netdrms}}} virtual environment) and you have already activated the {{{netdrms}}} virtual environment. As {{{netdrms_production}}}, {{{cd}}} to each of the {{{lib/py}}} directories. Each one contains a {{{setup.py}}} file that will be used by {{{pip}}} to install the packages that reside in the {{{lib/py}}} directory:
{{{
$ cd $DRMS_SRC_INSTALL_DIR/base/libs/py
$ pip install .
$ cd $DRMS_SRC_INSTALL_DIR/base/export/libs/py
$ pip install .
$ cd $DRMS_SRC_INSTALL_DIR/base/sums/libs/py
$ pip install .
}}}

We recommend using {{{<DRMS DB production user>}}} as the SUMS database production user {{{<SUMS DB production user>}}}. However, feel free to create a new user if necessary. If the DRMS and SUMS databases reside in different clusters, then you ''will'' need to create the {{{<SUMS DB production user>}}}. Again, since PostgreSQL clients automatically use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user {{{netdrms_production}}} for database user {{{<SUMS DB production user>}}}. If you choose a {{{<SUMS DB production user>}}} that is not {{{netdrms_production}}}, then you will need to pass {{{<SUMS DB production user>}}} to both Remote SUMS and SUMS when starting them.
{{{
$ # DO THIS ONLY IF <SUMS DB production user> != <DRMS DB production user> OR IF <DRMS database cluster> != <SUMS database cluster>
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# CREATE ROLE <SUMS DB production user>;
postgres=# \q
$
}}}

In addition, you will need to create a SUMS database user that has read-only access to the SUMS database objects:
{{{
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# CREATE ROLE <SUMS DB readonly user> WITH LOGIN;
postgres=# ALTER ROLE <SUMS DB readonly user> WITH PASSWORD 'readonlyuser';
postgres=# \q
}}}
where {{{<SUMS DB readonly user>}}} is {{{[SUMS_READONLY_DB_USER]}}}. This database account is used by the Remote SUMS Client, a daemon used to manage the auto-download of SUs for subscriptions. Remote SUMS Client will be run by {{{netdrms_production}}}, so add the password to {{{netdrms_production}}}'s {{{.pgpass}}} file:
{{{
$ whoami
netdrms_production
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:[SUMS_READONLY_DB_USER]:readonlyuser
ESC
:wq
$ chmod 0600 .pgpass
}}}

If you created a new SUMS DB production user, add a password for this user. ''Ensure'' that you use the same password that you used for {{{<DRMS DB production user>}}} - you will use the same Linux user when connecting to either database, so the same {{{.pgpass}}} file will be used for authentication. As {{{postgres}}}, run {{{psql}}} to add a password for this new database user:
{{{
$ # DO THIS ONLY IF <SUMS DB production user> != <DRMS DB production user>
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# ALTER ROLE <DRMS DB production user> WITH PASSWORD '<DRMS DB production user password>';
postgres=# \q
$
}}}

SUMS stores directory and file information in relations in the SUMS database. To create those relations and initialize tables, as {{{netdrms_production}}} run:
{{{
$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> -f $DRMS_SCRS_INSTALL_DIR/postgres/create_sums_tables.sql netdrms_sums
CREATE TABLE
CREATE INDEX
CREATE INDEX
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
GRANT
CREATE TABLE
CREATE INDEX
GRANT
CREATE SEQUENCE
GRANT
CREATE SEQUENCE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> netdrms_sums
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>;
ALTER SEQUENCE
netdrms_sums=> \q
$
}}}
where {{{<min val>}}} is {{{<drms site code> << 48}}}, and {{{<max val>}}} is {{{<min val> + <maximum unsigned 48-bit integer> - 1}}}, where {{{<drms site code>}}} is the value of the {{{[DRMS_LOCAL_SITE_CODE]}}}, and <maximum unsigned 48-bit integer> is 2^48^ (which is {{{281474976710656}}}). For the JSOC (site code 0x0000), this {{{ALTER SEQUENCE}}} command looks like:
{{{
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START 0 RESTART 0 MINVALUE 0 MAXVALUE 281474976710655;
}}}

Lastly, as {{{postgres}}} you will need to ensure that {{{[SUMS_READONLY_DB_USER]}}} can read from the {{{sum_partn_alloc}}} table:
{{{
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms_sums=# GRANT SELECT ON sum_partn_alloc TO [SUMS_READONLY_DB_USER];
GRANT
netdrms_sums=# \q
$
}}}

<<Anchor(initialize-sums-disk)>>
== Initializing SUMS Storage ==
In addition to SUMS database relations, SUMS requires a file system on which SUMS maintains storage areas called ''SUMS partitions''. A SUMS paritition is really just a directory that contains SUMS Storage Units (each of which is implemented as a subdirectory inside the SUMS partition).

=== New Storage ===
As {{{netdrms_production}}} create one or more partitions now - although we have had success making them as large as 60 TB, make 40 TB partitions. For example, if you plan on setting aside X TB of SUMS storage, then make approximately {{{N = <total storage TB> / X}}} 40 TB partitions. The partitions can reside on a file server and be mounted onto all machines that will use NetDRMS, but the following example simply creates directories on a single file system on {{{<SUMS partition host>}}}. First, make the root directory that contains the SUMS paritions (something like {{{/opt/sums}}}):
{{{
$ sudo mkdir <SUMS root>
# allow SUMS users to write into SUMS
$ sudo chown netdrms_production:<SUMS users> <SUMS root>
# when a user writes a file into SUMS, make sure that the file's group owner is <SUMS users>
$ sudo chmod g+s <SUMS root>
}}}

{{{<SUMS users>}}} is the Linux group that is allowed to write to SUMS. You created it in a previous step. Second, make the SUMS partitions:
{{{
$ whoami
netdrms_production
$ hostname
<SUMS partition host>
$ mkdir -p <SUMS root>/partition01
$ mkdir -p <SUMS root>/partition02
...
$ mkdir -p <SUMS root>/partitionN
}}}

Initialize the SUMS DB {{{sum_partn_avail}}} table with the names of these partitions. For each SUMS partition run the following:
{{{
$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> netdrms_sums
netdrms_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);
}}}
where {{{<SUMS partition path>}}} is the full path of the partition as seen from {{{<SUMS host>}}} (which is where the SUMS daemon will run) and {{{<avail bytes>}}} is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

=== Existing Storage ===
It might be the case that you would like to have a new NetDRMS installation use existing SUMS partitions. To do this, you would enter the paths to the existing partitions into the {{{sum_partn_avail}}} table. At this point, the SUMS database tables that contain the ''pointers'' to SUs ({{{sum_main}}} and {{{sum_partn_alloc}}}) do not contain any references to the existing SUs. Most likely the existing SUMS partitions are part of an existing, old NetDRMS, in which case these references are in that old NetDRMS SUMS database table. If so, then you can manually ''dump'' four database tables/sequences from the old NetDRMS to files that can be ingested into the new NetDRMS installation. To do that, as {{{<OLD NetDRMS production user>}}} run the following:

{{{
$ whoami
<OLD NetDRMS production user>
$ psql -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> netdrms_sums
netdrms_sums=> COPY public.sum_main TO '/tmp/sum_main_dump.txt' WITH ENCODING 'UTF8';
netdrms_sums=> COPY public.sum_partn_alloc TO '/tmp/sum_partn_alloc_dump.txt' WITH ENCODING 'UTF8';
}}}

You will also need to copy two database sequence tables from your existing, old SUMS database. The {{{COPY}}} command does not work for sequences. Instead you will use the {{{pg_dump}}} command provided by the PosgreSQL installation:
{{{
$ whoami
<OLD NetDRMS production user>
pg_dump -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> -t public.sum_seq netdrms_sums > /tmp/sum_seq_dump.txt
pg_dump -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> -t public.sum_ds_index_seq netdrms_sums > /tmp/sum_ds_index_seq_dump.txt
}}}

The {{{COPY}}} command will save the table data onto a file on the OLD NetDRMS PostgreSQL server. Copy those files to {{{<PostgreSQL host>}}} and then ingest them:
{{{
$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms_sums=> COPY public.sum_main FROM '/tmp/sum_main_dump.txt' WITH ENCODING 'UTF8';
netdrms_sums=> COPY public.sum_partn_alloc FROM '/tmp/sum_partn_alloc_dump.txt' WITH ENCODING 'UTF8';
}}}

To ingest the sequence data, you will first need to edit {{{/tmp/sum_main_dump.txt}}} and {{{/tmp/sum_partn_alloc_dump.txt}}}. Those files will attempt to create sequences that already exist in your new installation, so you will need to first delete them. Before the first {{{CREATE SEQUENCE}}} statement in each file, add a {{{DROP SEQUENCE}}} command:
{{{
-- /tmp/sum_main_dump.txt
...
-- ADD THIS DROP STATEMENT BEFORE THE CREATE STATEMENT
DROP SEQUENCE public.sum_ds_index_seq;
--
CREATE SEQUENCE public.sum_ds_index_seq
...
;
...

}}}

and

{{{
-- /tmp/sum_partn_alloc_dump.txt
...
-- ADD THIS DROP STATEMENT BEFORE THE CREATE STATEMENT
DROP SEQUENCE public.sum_seq;
--
CREATE SEQUENCE public.sum_seq
...
;
...

}}}

After you have edited these two text files, then ingest them with {{{psql -f}}}:
{{{
$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -f /tmp/sum_main_dump.txt netdrms_sums
$ psql -h <PostgreSQL host> -p 5432 -f /tmp/sum_partn_alloc_dump.txt netdrms_sums
}}}

<<Anchor(create-users)>>
== Creating DRMS User Accounts ==
For each Linux user, {{{<new DRMS user>}}} who will run installed DRMS modules, or who will write new DRMS modules, you will need to set up their environment, create their ''DRMS account'', and create their {{{.pgpass}}} file so they can run DRMS modules without having to manually authenticate to the database. Just like you did for the {{{netdrms_production}}}, you will need to unset the {{{PYTHONPATH}}} environment variable, and put PostgreSQL executables into the user's {{{$PATH}}}. You will also need to put the {{{netdrms_production}}}'s {{{netdrms}}} Python virtual environment into the user's {{{$PATH}}} - each user will use this Python environment, not the base installation. Also set the {{{JSOCROOT}}}, {{{JSOC_MACHINE}}}, {{{JSOC_COMPILER}}}, and {{{JSOC_FCOMPILER}}} environment variables:
{{{
# .bashrc

# PostgreSQL executables
export PATH=/usr/pgsql-12/bin:$PATH

# python production virtual environment
unset PYTHONPATH
export PATH=<NetDRMS production home>/.conda/envs/netdrms/bin:$PATH

# NetDRMS binary paths
[ -f /opt/netdrms/drms-env-linux_avx2.bash ] && source /opt/netdrms/drms-env-linux_avx2.bash
PATH=$DRMS_BINS_INSTALL_DIR:$DRMS_SCRS_INSTALL_DIR:$PATH

# set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
export JSOC_COMPILER=<C compiler>

# set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
export JSOC_FCOMPILER=<Fortan compiler>
}}}

You will also need to add them to the {{{<SUMS users>}}} group:
{{{
$ sudo usermod -a -G <SUMS users> <new DRMS user>
}}}

To create a DRMS account, you create a database account for the user, plus you add user-specific rows to various DRMS database tables. The script {{{newdrmsuser.pl}}} exists to facilitate these tasks:
{{{
$ whoami
netdrms_production
$ perl $DRMS_SCRS_INSTALL_DIR/newdrmsuser.pl netdrms <PostgreSQL host> 5432 <new DRMS user> <initial password> <new DB user namespace> user 1
Connection to database with 'dbi:Pg:dbname=netdrms;host=drms;port=5432' as user '<new DRMS user>' ... success!
executing db statment ==> CREATE USER <new DRMS user>
executing db statment ==> ALTER USER <new DRMS user> WITH password '<initial password>'
executing db statment ==> GRANT jsoc to <new DRMS user>
running cmd-line ==> masterlists dbuser=<new DRMS user> namespace=<new DB user namespace> nsgrp=user
Please type the password for database user "postgres":
Connected to database 'netdrms' on host '<PostgreSQL host>' and port '5432' as user 'postgres'.
Created new drms_series...
Created new 'drms_keyword'...
Created new 'drms_link'...
Created new 'drms_segment'...
Created new 'drms_session'...
Created new drms_sessionid_seq sequence...
Commiting...
Done.
executing db statment ==> INSERT INTO admin.sessionns VALUES ('<new DRMS user>', '<new DB user namespace>')
}}}
where {{{<new DB user namespace>}}} is the PostgreSQL namespace dedicated to the new user. A namespace is a logical container that allows a database user to own database objects, like relations, that have the same name as objects owned by other users - items in a namespace need only be uniquely named within the namespace, not between namespaces. For example, the relation {{{drms_series}}} in the namespace {{{su_arta}}} is not the same relation as the {{{drms_series}}} relation in the {{{su_phil}}} namespace - the relations have the same name, but they are different relations. In virutally all PostgreSQL operations, a user can prefix the name of a relation with the namespace: {{{su_arta.drms_series}}} refers to the first relation, and {{{su_phil.drms_series}}} refers to the second relation.

The purpose of {{{<new DB user namespace>}}} is to hold non-production, private data series - sort of a private user space to develop new DRMS modules to create data. If those data should become a production-level products, then the data and the code that generates the data need to be moved to a production namespace. At the JSOC, we have several such production namespaces (e.g., {{{aia}}}, {{{hmi}}}, {{{mdi}}}). A site creates production namespaces with a different module ({{{masterlists}}}; {{{newdrmsuser.pl}}} is only for creating non-production namespaces.

Please see the NOTE in [[http://jsoc.stanford.edu/jsocwiki/NewDrmsUser|this page]] for assistance with choosing {{{<new DB user namespace>}}}. The general naming convention is to prepend the namespace with an abbreviation to identify the site that owns the data in the namespace. For example, all private data created at Stanford reside in dataseries whose namespaces start with {{{su_}}} (Stanford University), regardless of the affiliation of the user who creates data in this namespace. Data created at NASA Ames start with {{{nas_}}} (NASA Supercomputing Division). Following the underscore is a string to identify a particular user - {{{su_arta}}} for Art, and {{{su_phil}}} for Phil. You can also specify a group with the suffix (e.g., {{{su_uscsolar}}} for a solar group at the University of Southern California that creates data at Stanford. {{{<initial password>}}} is the initial password for this account - the initial password does not matter much since you are going to have the user change it next.

Running {{{newdrmsuser.pl}}} will create a new DRMS database user that has the same name as the user's Linux account name.

Have the user change their password:
{{{
$ whoami
<new DRMS user>
$ psql -h <PostgreSQL host> -p 5432 netdrms
netdrms=> ALTER USER <new DRMS user> WITH PASSWORD '<new password>';
netdrms=> \q
$
}}}

And then have the user create their {{{.pgpass}}} file (to allow auto-login to their database account) and set permissions to {{{0600}}}:
{{{
$ whoami
<new DRMS user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<new DRMS user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass
}}}

Please click [[http://jsoc.stanford.edu/jsocwiki/DrmsPassword|here]] for additional information on the {{{.pgpass}}} file.
 
If you plan on creating data that will be publicly distributed, you should also create one or more data-production users. For example, if you plan on making a new public HMI data series, you could create a user named {{{hmi_production}}}. Although you could follow previous steps to create a new Linux account for this database user, you do not necessarily need to. Instead you can use the existing netdrms_production and have it connect as the {{{hmi_production}}} user. To do that, first create the new {{{hmi_production}}} database user by running {{{newdrmsuser.pl}}} as just described. Choose a descriptive namespace that follows the naming guidelines described about, like {{{hmi}}}. Because a {{{.pgpass}}} already exists for netdrms_production, you want to '''''ADD''''' a new line to {{{.pgpass}}} for this user. Continuing with the {{{hmi_production}}} user example, add a password line for {{{hmi_production}}}:
{{{
# .pgpass
<PostgreSQL host>:*:*:hmi_production:<hmi_production password>
}}}

At this point, it is a good idea to test that the new DRMS user can use a basic NetDRMS program. Although your NetDRMS has no DRMS data series, running {{{show_series}}} is a good way to test various components, like authentication, database connection, etc. Test DRMS by running the show_series command:
{{{
$ whoami
<new DRMS user>
$ show_series
$
}}}

Nothing will be printed when this command is run, since your NetDRMS is devoid of data at the moment, but if you see no errors, then life is good. If not, then contact the JSOC for help troubleshooting.

To perform a more thorough test involving SUMS, you will need to have a least one DRMS data series that has SUMS data. You can obtain such a data series by [[#register-subscriptions|registering for subscriptions]].

<<Anchor(run-sums)>>
== Running SUMS Services ==
Before you can use NetDRMS, you, as {{{netdrms_production}}} on {{{<SUMS host>}}}, will need to start SUMS. To launch one or more SUMS daemons, {{{sumsd.py}}}, use the {{{start-mt-sums.py}}} script:
{{{
$ whoami
netdrms_production
$ hostname
<SUMS host>
$ python3 $DRMS_SCRS_INSTALL_DIR/start-mt-sums.py daemon=$DRMS_SCRS_INSTALL_DIR/sumsd.py --ports=[SUMSD_LISTENPORT] --logging-level=debug
running /home/netdrms_production/.conda/envs/netdrms/bin/python3 /opt/netdrms-v10.0-rc2/scripts/sumsd.py --listen-port=6100 --logging-level=debug
started instance /opt/netdrms-v10.0-rc2/scripts/sumsd.py:6100 (pid 821705)
{"started": [821705]}
}}}

NOTE: as of this writing, ports '''''MUST''''' be the value of {{{[SUMSD_LISTENPORT]}}}. In future releases, this parameter will be made optional, in which case the value will be obtained from [SUMSD_LISTENPORT] in {{{config.local}}}.

This command starts {{{sumsd.py}}}, which then listens for connections from DRMS modules, such as {{{show_info}}}, on port {{{[[SUMSD_LISTENPORT]]}}}. In the example above, {{{[[SUMSD_LISTENPORT]]}}} is {{{6102}}}, which is displayed in the output. {{{sumsd.py}}} creates an ''instances'' file and a log file in {{{[SUMLOG_BASEDIR]}}} by default. The instances file is a state file used by {{{start-mt-sums.py}}} and {{{stop-mt-sums.py}}} to manage the running {{{sumsd.py}}} instances. By default, the log file is named {{{sumsd-<PPPPP>-<YYYYMMDD.HHMMSS>.txt}}}, where {{{<PPPPP>}}} is the PID of the {{{sumsd.py}}} process, and {{{<YYYYMMDD.HHMMSS>}}} is the time string representing the time the instance was launched.

The complete usage is:
{{{
usage: start-mt-sums.py daemon=<path to daemon> [ --ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --logging-level=<critical, error, warning, info, or debug>] [ --log-file=<filename> ] [ --quiet ]

optional arguments:
  -h, --help show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be spawned
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -l LOGLEVEL, --logging-level LOGLEVEL
                        specifies the amount of logging to perform; in order
                        of increasing verbosity: critical, error, warning,
                        info, debug
  -L <file name>, --log-file <file name>
                        the file to which sumsd logging is written
  -q, --quiet do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to launch
}}}

{{{start-mt-sums.py}}} will fork one or more {{{sumsd.py}}} daemon processes. The {{{ports}}} argument identifies the SUMS host ports on which sumsd.py will listen for client (DRMS module) requests. One sumsd.py process will be invoked per port specified. Each process creates a log-file in {{{[SUMLOG_BASEDIR]}}} named, by default, {{{sumsd-<port>-<YYYYMMDD>.<HHMMSS>.txt}}}. The {{{-L/--logfile}}} argument allows you to override the path and name to this log file.

To stop one or more SUMS services, use the {{{stop-mt-sums.py}}} script:
{{{
$ whoami
netdrms_production
$ hostname
<SUMS host>
$ python3 $DRMS_SCRS_INSTALL_DIR/stop-mt-sums.py daemon=$DRMS_SCRS_INSTALL_DIR/sumsd.py
}}}

This will stop all running {{{sumsd.py}}} daemons.

The complete usage is:
{{{
usage: stop-mt-sums.py [ -h ] daemon=<path to daemon> [ --ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --quiet ]

optional arguments:
  -h, --help show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be stopped
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -q, --quiet do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to halt
}}}

<<Anchor(register-subscriptions)>>
== Registering for Subscriptions ==
A NetDRMS site can optionally register for a data-series subscription to any NetDRMS site that offers subscription service. The JSOC NetDRMS offers subscriptions, but at the time of this writing, no other site does. Once a site registers for a data series subscription, the site will become a mirror for that data series. The subscription process ensures that the mirroring site will receive regular and timely updates made to the data series by the serving site. The subscribing site can configure the interval between updates such that the mirror can synchronize with the server and receive updates within a couple of minutes, keeping the mirror up-to-date in (almost) real time.

To register for a subscription, {{{<Subscription production user>}}} will set up {{{ssh}}} keys (for SU transfer), start daemons, and run the subscription script, {{{subscribe.py}}}. The assumption is that {{{<Subscription production user>}}} is {{{netdrms_production}}}, but you are free to choose a different user. {{{subscribe.py}}} makes subscription requests to the serving site's subscription manager. The process entails the creation of a snapshot of the data-series DRMS database information at the serving site. Those data are downloaded, via HTML, to the subscribing site, where they are ingested by {{{subscribe.py}}}. {{{get_slony_logs.pl}}}, a client-side cron task, updates the data-series snapshot with any server-side changes that have been made since the snapshot was created. Ingestion of the snapshot results in the creation of the DRMS database objects that maintain and store the data series, and {{{get_slony_logs.pl}}} updates those objects when the server makes changes. At this time, no SUMS data files are downloaded. Instead, and optionally, the IDs for the series' SUMS Storage Units (SU) are saved in a database relation. It is the function of Remote SUMS ({{{rsums.py}}}), another NetDRMS daemon, to download the SUs and ingest them into your SUMS.

Remote SUMS accepts ''requests'' from DRMS for SUs. It then communicates with the serving NetDRMS and manages the {{{scp}}} download and ingestion of those SUs. Once Remote SUMS is running, should any DRMS code/module request an SU that is not present in the local NetDRMS, DRMS will send a download request to Remote SUMS. The Remote SUMS Client ({{{rsums-clientd.py}}}), an optional NetDRMS daemon, can automate this process so that when new subscription data are ingested into the DRMS database, it submits requests for the associated SUs to Remote SUMS. In this way, it is possible to ''pre-fetch'' the SU files before any user requests them. But pre-fetching is optional. The SUs will be downloaded on-demand as described above. In fact, if the subscribing NetDRMS site were to automatically download an SU, then delete the SU (there is a method to do this, described later), then an on-demand download is the only way to re-fetch the deleted SU. On-demand downloads happen automatically; any DRMS module that attempts to access an SU (like with a {{{show_info -p}}} command) that is not present for any reason will trigger an {{{rsumsd.py}}} request. The module will pause until the SU has been downloaded, then automatically resume its operation on the previously missing SU.

As {{{get_slony_logs.pl}}} users {{{scp}}} to download update files (''Slony logs'') and {{{rsumsd.py}}} uses {{{scp}}} to automatically download SUs, SSH public-private keys must be created at the subscribing site, and the public key must be provided to the serving site. Setting this up requires coordinated work at both the susbscribing and serving sites. As {{{<Subscription production user>}}} on the subscribing site, run
{{{
$ whoami
<Subscription production user>
$ ssh-keygen -t rsa
}}}
This will allow you to create a passphrase for the key. If you choose to do this, then save this phrase for later steps. In the home directory of {{{<Subscription production user>}}}, {{{ssh-keygen}}} will create a public key named {{{id_rsa.pub}}}. Provide this public key to the serving site.

The serving site must then add the public key to its list of authorized keys. If the {{{.ssh}}} directory does not exist, then the serving site must first create this directory and give it {{{0700}}} permissions. If the {{{authorized_keys}}} file in {{{.ssh}}} does not exist, then it must first be created and given {{{0644}}} permissions:
{{{
$ whoami
<Subscription manager user>
$ mkdir .ssh
$ chmod 0700 .ssh
$ cd .ssh
$ touch authorized_keys
$ chmod 0644 authorized_keys
}}}
Once the {{{.ssh}}} and {{{authorized_keys}}} files exist and have the proper permissions, the serving site administrator can then add the client site's public key to its list of authorized keys:
{{{
$ whoami
<Subscription manager user>
$ cd $HOME/.ssh
$ cat <remote-site public key file> >> authorized_keys
}}}

Back at the client NetDRMS site, if an {{{ssh}}} passphrase was chosen, then as {{{<Subscription production user>}}} start an {{{ssh-agent}}} instance to automate the passphrase authentication. If no passphrase was provided when {{{ssh-keygen}}} was run, this step can be skipped. Otherwise, run:
{{{
$ whoami
<Subscription production user>
$ ssh-agent > $HOME/.ssh-agent
$ source $HOME/.ssh-agent # needed for ssh-add, and also for rsumsd.py and get_slony_logs.pl
$ ssh-add $HOME/.ssh/id_rsa
}}}
and provide the passphrase. To keep ingested data series synchronized with changes made to it at the serving site, a client-side cron tab runs {{{get_slony_logs.pl}}} periodically. This perl script uses {{{scp}}} to download ''slony log files'' - SQL files that insert, delete, or update database relation rows. {{{get_slony_logs.pl}}} communicates with the Slony-I replication software running at the serving site. Slony-I generates these log (SQL) files at the server which are then downloaded by the client.

To register for a subscription to a new series, run:
{{{
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=subscribe series=<published DRMS data series> --loglevel=debug
}}}

The complete usage is:
{{{
usage: subscribe.py [ -hjpl ] cfg=<client configuration file> reqtype=<subscribe, resubscribe, unsubscribe> series=<comma-separated list of series> [ --archive=<0, 1>] [ --retention=<number of days>] [ --tapegroup=<tape-group number> ] [ --pg_user=<subscripton client DB user> ] [ --logfile=<log-file name> ]

optional arguments:
  -h, --help show this help message and exit
  archive <series archive flag>, --archive <series archive flag>
                        The tape archive flag for the series - either 0 (do not archive) or 1 (archive).
  retention <series SU disk retention>, --retention <series SU disk retention>
                        The number of days the series SUs remain on disk before becoming subject to deletion.
  tapegroup <series SU tape group>, --tapegroup <series SU tape group>
                        If the archive flag is 1, the number identifying the group of series that share tape files.
  pg_user <series SU tape group>, --pg_user <series SU tape group>
                        The DB account the subscription client uses.
  -p, --pause Pause and ask for user confirmation before applying the downloaded SQL dump file.
  --loglevel LOGLEVEL Specifies the amount of logging to perform. In increasing order: critical, error, warning, info, debug
  --logfile <file name>
                        The file to which logging is written.
  --filtersus <SU filter>
                        Specifies a series keyword K and a number of days D, comma separated; a remote-SUMS request for an SU will occur only if the keyword K of the record
                        containing the SU has a value that lies within the time interval determined by the days D.

required arguments:
  cfg <client configuration file>, --config <client configuration file>
                        The client-side configuration file used by the subscription service.
  reqtype <request type>, --reqtype <request type>
                        The type of request (subscribe, resubscribe, or unsubscribe).
  series SERIES, --series SERIES
                        A comma-separated list of DRMS series to subscribe/resubscribe to, or to unsubscribe from.
}}}
The debug level logging is not necessary, but if this is your first subscription, we recommend running with lots of debug statement - if there are issues, you can send this output to the JSOC for help.

{{{<subscription config file>}}} contains parameters used by both the subscription client {{{subscribe.py}}} and the program that downloads Slony logs file, {{{get_slony_logs.pl}}}. We recommend making a directory {{{subscription}}} in the home directory of {{{netdrms_production}}} and saving the configuration file in that directory. The parameters to be set in {{{<subscription config file>}}} are as follows:
 * {{{node}}} - the name of the subscription client (e.g., {{{jsoc}}}, {{{nsocu}}}, {{{sdac}}}; must be globally unique across all NetDRMS sites; this string will be used in various state files and in file/directory names; to obtain this name, ask the JSOC for the sitename assigned to your site during the NetDRMS installation process.
 * {{{kRSServer}}} - the full domain name of the subscription log server (e.g., {{{jsocport.stanford.edu}}} for a client subscibing to data series published by the JSOC).
 * {{{kRSUser}}} - the account on {{{kRSServer}}} that will be used for data transfer (e.g., {{{jsocexp}}} for a client subscibing to data series published by the JSOC).
 * {{{kRSPort}}} - the port on {{{kRSServer}}} that will be used for data transfer (e.g., {{{22}}} for {{{scp}}}); if the JSOC is the serving site, then the port must be {{{55000}}}.
 * {{{kRSBaseURL}}} - the base URL for all subscription services provided by subscription server for the Slony cluster (identified by {{{slony_cluster}}}); ask the DRMS site serving the subscriptions for this value - when subscribing to series at the JSOC, use "http://jsoc.stanford.edu/cgi-bin/ajax"
 * {{{pg_host}}} - the client machine that hosts the client PostgreSQL database that will contain the replicated data series - this is {{{<PostgreSQL host>}}}.
 * {{{pg_port}}} - the port on the {{{pg_host}}} machine that will be used for communication with the data-series database - this is {{{5432}}}.
 * {{{pg_user}}} - the PostgreSQL user that will own the replicated series - this is {{{netdrms_production}}}.
 * {{{pg_dbname}}} - the name of the PostgreSQL database that resides on {{{pg_host}}} - this is {{{netdrms}}}.
 * {{{slony_cluster}}} - the name of the Slony cluster to which this node belongs (e.g., {{{jsoc}}} for a client subscribing to data series published by the JSOC).
 * {{{kLocalLogDir}}} - the client directory that will contain the subscription-process logs; we recommend {{{<NetDRMS production user home>/subscription/log}}}; make sure this path exists.
 * {{{kLocalWorkingDir}}} - the path to a directory for temporary working subscription files; we recommend {{{<NetDRMS production user home>/subscription}}}; make sure this path exists.
 * {{{kSQLIngestionProgram}}} - the path to the script/program that will ingest the site-specific slony logs — usually the path to {{{get_slony_logs.pl}}} ({{{<NetDRMS root>/base/drms/replication/get_slony_logs.pl}}}).
 * {{{kSubService}}} - the URL of the application at the subscription-serving site that accepts new subscription requests (for the JSOC subscripton server this is {{{${kRSBaseURL}/request-subs.py}}}).
 * {{{kPubListService}}} - the URL of the application at the subscription-serving site that that lists published data series (for the JSOC subscripton server this is {{{${kRSBaseURL}/publist.py}}}).
 * {{{kSubXfer}}} - the URL of the application at the subscription-serving site where subscription dump files are located (for the JSOC subscripton server this is {{{http://jsoc.stanford.edu/subscription}}}).
 * {{{kDeleteSeriesProgram}}} - the path to the program {{{delete_series}}}, which is used to delete DRMS data series on the client when requested ({{{<NetDRMS root>/bin/<architecture>/delete_series}}}).
 * {{{archive}}} - for new subscriptions, the default data series archive action; set to 1 if the NetDRMS site has a tape archive system AND the default is to archive all series obtained by subscription.
 * {{{retention}}} - for new subscriptions, the default number of days to retain SUs (after this many days SUs are marked for deletion and subjected to garbage collection as needed).
 * {{{tapegroup}}} - for new subscriptions, the default archive tape group for the the series' SUs (ignored if archive == 0); unless you have a tape backup system, use "0"
 * {{{ingestion_path}}} - the local directory that will contain the ingestion "die" file — used by {{{get_slony_logs.pl}}}; we recommend {{{<NetDRMS production user home>/subscription}}}; make sure this path exists.
 * {{{scp_cmd}}} - the absolute path to the client's scp program.
 * {{{ssh_cmd}}} - the absolute path to the client's ssh program.
 * {{{rmt_slony_dir}}} - the absolute path, accessible from the {{{kRSUser}}} account, on the server to the directory that contains the site-specific slony logs (for the JSOC subscription server, use "/data/pgsql/slon_logs/live/site_logs").
 * {{{slony_logs}}} - the client directory that contains the downloaded site-specific slony logs; we recommend {{{<NetDRMS production user home>/subscription/slon_logs}}}; make sure this path exists.
 * {{{PSQL}}} - the path to the client's {{{psql}}} program, and any flags needed to run psql as the pg_user user, like -h {{{pg_host}}}.
 * {{{email_list}}} - the email account to which error messages will be sent.

You may find that a subscription has gotten out of sync, for various reasons, with the serving site's data series (accidental deletion of database rows, for example). You can re-register for the subscription to true-up. The existing DRMS data will be deleted, and replaced with a fresh snapshot. You will be prompted asking if you would like to delete the series' SUMS data (SUs, FITS files, etc.). If you are sure you no longer need them, go ahead and say ''yes''.
{{{
python $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=resubscribe series=<subscription series> --loglevel=debug
}}}

Finally, there might come a time where you no longer which to hold on to a registration. To remove the subscription from your set of registered data series run:
{{{subscribe.py}}} can be used to alleviate this problem. Run the following to re-do the subscription registration:
{{{
python $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=unsubscribe series=<subscription series> --loglevel=debug
}}}

NOTE: In this case, the {{{subscribe.py}}} will prompt you to ask if you would like to delete the existing DRMS data series. The answer is usually ''yes''. However, if you need to keep the existing data series snapshot for some reason (e.g., you want to work with the existing data, but simply do not want to ingest new data), then respond ''no''. Keep in mind that you will not be able to register for a subscription to the same series if that series already exists - you will need to run {{{delete_series}}}.

In all the above commands, the logging level is set to {{{debug}}}. It is a good idea to enable verbose logging like this the first time you run one of these commands, just in case an issue occurs (usually do to some configuration issue). Providing the debug log to the JSOC when troubleshooting will be invaluable.

Once you have successfully registered for subscription to at least one series, you will need to install a crontab to run {{{get_slony_logs.pl}}}, which updates the subscription series, on a regular basis. {{{get_slony_logs.pl}}} has a dependency on the {{{Net::SSH}}} perl package. If you are running CentOS, then there is an {{{rpm}}} package available from the {{{epel/x86_64}}} yum repository that contains this perl package:
{{{
$ yum list | grep perl-Net-SSH
perl-Net-SSH.noarch 0.09-26.el7 epel
$ sudo yum install perl-Net-SSH
...
Installed:
  perl-Net-SSH.noarch 0:0.09-26.el7

Complete!
$
}}}

You will also need {{{String::ShellQuote}}}:
{{{
$ yum list | grep ShellQuote
perl-String-ShellQuote.noarch 1.04-10.el7 base
$ sudo yum install perl-String-ShellQuote
...
Installed:
  perl-String-ShellQuote.noarch 0:1.04-10.el7

Complete!
}}}


If you cannot find a package, then you can use [[ https://metacpan.org/pod/Net::SSH | CPAN ]].

Once you have the {{{Net::SSH}}} and {{{String::ShellQuote}}} Perl packages installed, run {{{get_slony_logs.pl}}} as {{{<Subscription production user>}}}from a cron tab:
{{{
$ whoami
<Subscription production user>
$ crontab -e
*/5 * * * * (. ~/.ssh-agent; $DRMS_SRC_INSTALL_DIR/base/drms/replication/get_slony_logs.pl <subscription config file> >> ~/<path>/get_slony_logs.log 2>&1 )
}}}

NOTE: you should manually run {{{get_slony_logs.pl}}} the first time since an interactive prompt will be displayed the first time an SSH connection is made to {{{[kRSServer]}}}

At the JSOC, Slony log files are created every minute. It is not the case that your series' subscriptions will have been updated every minute (this depends on the cadence of the series to which you have subscriptions), but checking every minute will minimize the lag between the state of your series and the state of your series at the serving site.

<<Anchor(run-remote-sums)>>
== Running Remote SUMS ==
Your NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units (SUs) associated with these data series too. Remote SUMS, a tool that comes with NetDRMS, downloads SUs as needed - i.e., if a DRMS module or program requests the path to the SU or attempts to read it, and it is not present in the local SUMS yet, Remote SUMS will download the SUs. While the SUs are being downloaded, the initiating module or program will poll waiting for the download to complete.

Several components compose Remote SUMS. On the client side, the local NetDRMS, is a daemon that must be running {{{rsumsd.py}}}. There also must exist some database tables, as well as some binaries used by the daemon. On the server side, all NetDRMS sites that wish to act as a source of SUs for the client, is a CGI ({{{rs.sh}}}). This CGI returns file-server information (hostname, port, user, SU paths, etc.) for the SUs the server has available in response to requests that contain a list of SUNUMs. When the client encounters requests for remote SUs that are not contained in the local SUMS, it sends a request the Remote SUMS daemon to download those SUs. It does so by inserting a row into the {{{<Remote SUMS requests>}}} database table in the {{{[RS_DBNAME]}}} database table. The client code then polls waiting for the request to be serviced. The daemon in turn sends requests to all {{{rs.sh}}} CGIs at all the relevant providing sites. The owning sites return the file-server information to the daemon, and then the daemon downloads the SUs the client has requested, via scp, and notifies the client module once the SUs are available for use. The client module will then exit from its polling code and continue to use the freshly downloaded SUs.

To use Remote SUMS, several {{{config.local}}} parameters must be present. If you followed the steps in this document, then you have already set those parameters. Please see [[#install-netdrms | Installing NetDRMS]] for a description of each one. Each SU that is downloaded has an associated expiration date, a flag indicating whether or not the SU is archived, and if the SU is archived the tapegroup to which the SU belongs. The manner in which the values for these parameters are determined is a bit complicated. When you register for a series subscription, the series is created at your site, and at that time, the values for these parameters are determined. The series is then initialized with these values. There are default value for these parameters that are overridden by optional parameters supplied during the call to {{{subscribe.py}}}. In the following list, the lower-numbered items, if present, override the higher-numbered items, in this order:

 1. the {{{--archive, --retention, --tapegroup}}} command-line arguments to {{{subscribe.py}}}
 1. the {{{archive, retention, tapegroup}}} parameters in {{{subscription config file}}}

Now, when Remote SUMS runs, the values of the parameters for SUs downloaded and ingested are determined in a similar hierarchical fashion, with higher-numbered items overriding lower-numbered items:

 1. the parameters associated with the series, as determined above
 1. the {{{--archive, --expiration, --lifespan, --tapegroup}}} command-line arguments to {{{rsumsd.py}}}
 1. {{{[RS_SU_ARCHIVE], [RS_SU_EXPIRATION], [RS_SU_LIFESPAN], [RS_SU_TAPEGROUP]}}}

To run Remote SUMS, as {{{NetDRMS production user}}} run the following to create the requests database table:
{{{
$ whoami
netdrms_production
$ python3 $DRMS_SCRS_INSTALL_DIR/rscreatetabs.py op=create tabs=req
}}}

Remote SUMS downloads SUs via {{{scp}}}. As such, you will need to create SSH keys, distribute the public one to the site serving the SUs, and start up an {{{ssh-agent}}} if you have not already done so - you should have already though since it is needed by {{{get_slony_logs.pl}}}, a component of the subscription system (please see [ [[#register-subscriptions|Registering for Subscriptions]] ]).

To launch Remote SUMS, as {{{netdrms_production}}} create {{{[RS_LOGDIR]}}} and the directory that will contain the Remote SUMS lock file {{{[RS_LOCKFILE]}}}, {{{source}}} the {{{ssh-agent}}} environment file, and then run {{{rsumsd.py}}} in the background:
{{{
$ whoami
netdrms_production
$ mkdir -p [RS_LOGDIR]
$ source $HOME/.ssh-agent
$ python $DRMS_SCRS_INSTALL_DIR/rsumsd.py --logging-level=debug &
}}}

For now, we recommend setting the log level to debug; when things appear to be running smoothly, then you can restart with the default level (info). The output log named {{{<rslog_YYYYMMDD.txt>}}} will be written to the directory identified by {{{[RS_LOGDIR]}}}, so make sure that directory exists before running {{{rsumsd.py}}}.

To stop {{{rsumsd.py}}}, send a {{{SIGINT}}} signal ({{{kill -2}}} to the process. Remote SUMS will intercept that signal and shut down cleanly. If you need to shut it down with a {{{SIGKILL}}} signal for any reason, then you will need to manually clean up. To do that, delete the lock file ({{{[RS_LOCKFILE]}}}) and delete all requests in the requests database table (i.e., delete all rows in {{{[RS_REQUEST_TABLE]}}}).

<<Anchor(#run-remote-sums-client)>>
== Running Remote SUMS Client ==
If you have at least one subscription, and you have Remote SUMS running (''not'' the JMD), then you can automate the download of SUs for the subscriptions. This is optional, however. If you skip this step, then when a NetDRMS user attempts to use one or more SUs that are part of a subscription (and hence not at your NetDRMS site initially), the SUs will be downloaded in an on-demand fashion. However, it may be desirable to pre-fetch the SUs if certain usage patterns hold. If one or more users are going to use a block of SUs for a two-week period, for example, then downloading hundreds or thousands of SUs one at a time would be very inefficient. Also, it is very common for NetDRMS users to use newly produced SUs soon after they are created at the subscription server. In this case, a good strategy might be to automatically download the latest SUs for all subscriptions, knowing that lots of very recent SUs will be popular.

The Remote SUMS Client, {{{rsums-clientd.py}}}, monitors SU references in incoming Slony logs. It groups SUs into batches, and submits requests containing these batches to Remote SUMS, {{{rsumsd.py}}}. {{{rsums-clientd.py}}} then monitors the progress of these requests, logging the results.

In addition to a running {{{rsumsd.py}}}, {{{rsums-clientd.py}}} requires the existence of three components. The ''capture table'' is a database table that contains a list of all SUNUMs that are to be downloaded. It gets automatically populated as Slony logs are ingested. The ''capture function'' is a database function that performs the actual work of inserting rows into the capture table. It is called every time a Slony log inserts a row into a subscribed series' series table. You will need to create a ''capture trigger'' database trigger for each series table for which you want to automate SU downloads. These components work together as follows:
 * each capture trigger watches a series table, and when a row is inserted (due to {{{get_slony_logs.pl}}} ingesting new Slony logs), the trigger runs the capture function
 * this function then inserts a row into the capture table
 * {{{rsums-clientd.py}}} then ''sees'' the newly inserted capture-table row and extracts the SUNUM
 * {{{rsums-clientd.py}}} batches several of these SUNUMs, and then makes a {{{rsumsd.py}}} request out of them
 * {{{rsumsd.py}}} processes the requests (multiple in parallel), downloading one or more SUs and ingesting them into the local SUMS
 * {{{rsumsd.py}}} updates a status column in the capture table
 * {{{rsums-clientd.py}}} reads the status, and upon success, it removes the rows that contained the SUs that were sucessfully ingested; upon failure, {{{rsums-clientd.py}}} will log an error, and can optionally re-try one or more times

'''''IF YOU REGISTERED FOR YOUR FIRST SUBSCRIPTION FROM A >= 9.0 NetDRMS''''' then the capture table and capture function already exist. In addition, a capture trigger has been installed on each of your subscribed series. As {{{netdrms_production}}}, ensure that {{{rsumsd.py}}} is running, and then you can start {{{rsums-clientd.py}}}:
{{{
$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py --loglevel=debug &
}}}

If you do not have a >= 9.0 NetDRMS OR you were already a subscriber before upgrading to >= 9.0 NetDRMS, then to use {{{rsums-clientd.py}}}, {{{netdrms_production}}} must first create the capture table and the capture function. You must also add the capture trigger to the series table of the DRMS data series for which they want to enable automatic SU downloads. {{{rsums-clientd.py}}} can be run with arguments so that it will print SQL that will create the capture table and function, and the capture triggers on the existing DRMS data series:
{{{
$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py [ --setup [ --capturesetup ] [ --seriessetup=<series> ]... ]
}}}

{{{--capturesetup}}} creates the capture table and capture function. {{{--seriessetup}}} creates the capture trigger for a single series.

For example, to set up automatic SU downloads for hmi.M_45s and hmi.V_720s for a site that lacks the capture table and capture function, the {{{netdrms_production}}} can run:
{{{
$ whoami
netdrms_production
$ python3 <NetDRMS root>/base/drms/replication/subscribe_series/rsums-clientd.py [ --setup [ --capturesetup ] [ --seriessetup=hmi.M_45s --seriessetup=hmi.V_720s ] ]
}}}

Once these capture components are in place, you can start {{{rsumsd.py}}} if it is not already running, then start {{{rsums-clientd.py}}}:
{{{
$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py --loglevel=debug &
}}}

To stop {{{rsums-clientd.py}}}, send a {{{SIGINT}}} signal ({{{kill -2}}} to the process. Remote SUMS Client will intercept that signal and shut down cleanly. If you need to shut it down with a {{{SIGKILL}}} signal for any reason, then you will need to manually clean up. To do that, delete the lock file ({{{[DRMS_LOCK_DIR]/rsums-client.lck}}}).

<<Anchor(install-drms-package)>>
== Installing the DRMS Python Package ==
The DRMS package is a python client interface to DRMS. By default, it obtains DRMS data series information and SUMS storage units / FITS files from the JSOC DRMS via a CGI API. As such, it can be installed and used without independently of DRMS. However, it can be configured so that it uses your NetDRMS site directly and not the JSOC.

Two interfaces to your NetDRMS are available: a web-based one, and an ssh-based one. The web-based interface requires setting up a web server and several CGI scripts. The ssh-based one requires a python module that is not currently available online. Instead, the module, {{{securedrms}}}, is included as part of the NetDRMS installation. With either interface, a user can run the python package on a machine that is not the NetDRMS host. Since the ssh interface is much simpler to set up, not requiring a web context, the following describes how to set up the ssh interface.

To install the {{{drms}}} python package, first obtain it from [[ https://github.com/sunpy/drms | GitHub]]. If {{{git}}} is not yet installed on your system, use yum to install it:
{{{
$ sudo yum install git-all
...
Installed:
  git-all.noarch 0:1.8.3.1-21.el7_7
...
Complete!
$
}}}
As {{{netdrms_production}}}, use {{{git}}} to clone the Sunpy/drms python package and then install the code into the {{{netdrms}}} virtual environment:
{{{
$ whoami
netdrms_production
$ git clone https://github.com/sunpy/drms
Cloning into 'drms'...
...
$ cp <NetDRMS root>/base/libs/py/securedrms.py drms/drms
$ which pip
~/.conda/envs/netdrms/bin/pip
$ pip install ./drms
Processing ./drms
...
Successfully installed drms-0.5.7+1.ga9de281
}}}

{{{securedrms.py}}} has python dependencies on packages/modules that may not exist. Use {{{conda}}} to install them:
{{{
$ whoami
netdrms_production
$ conda install -n netdrms pexpect
Collecting package metadata (current_repodata.json): done
...
}}}

This module has a dependency on system {{{ed}}}, so make sure this editor is installed on your system:
{{{
$ sudo yum install ed
...
Installed: ed.x86_64 0:1.9-4.el7 Complete!
}}}

<<Anchor(install-drms-export)>>
== Installing the DRMS Export Web Application ==
The DRMS Export Web Application is web interface to the export system. It comprises five components
 * a set of HTML web pages that contain forms to collect information from the export-system user; the web pages use JavaScript, HTML elements, and AJAX tools to create HTTP AJAX requests
 * a browsers/network tools that send the form data contained within the AJAX requests to an HTTP server; the browsers/network tools receive HTTP responses, updating the web pages displayed
 * an HTTP server that acts as a reverse proxy: it receives HTTP requests from browsers/network tools on the internet, forwards the contained data as uwsgi requests to an upstream WSGI server, receives uwsgi responses from the WSGI server, and finally sends HTTP responses back to the originating browsers/tools
 * a WSGI server that receives uwsgi requests from the reverse-proxy server, sends corresponding WSGI requests to the Flask web-application entry point, and receives WSGI responses from the Flask web-application entry point, and sends uwsgi responses back to the originating reverse-proxy server
 * a Flask app that services WSGI requests it receives from the WSGI server, and sends WSGI responses back to the WSGI server

=== HTML Web Pages ===
The drmsexport web application HTML and JavaScript files are in proj/export/webapps
 1. the static web pages are exportdata.html and export_request_form.html; they contain in-line JavaScript, as well as references to JavaScript contained in separate files
   a. {{{exportdata.html}}} this file contains JavaScript only; it includes a JavaScript script that contains a single string variable that contains a text representation of the export_request_form.html
   a. {{{export_request_form.html}}} this file contains the definitions of the HTML elements that compose the export web page
 2. the export JavaScript files are:
   a. {{{export_request_form.htmlesc}}} this is a version of export_request_form.html that has been converted into a single JavaScript string variable, where whitespace has been removed and characters have been percent-escaped if necessary; exportdata.html 'includes' this file as a JavaScript script
   a. {{{export_email_verify.js}}} this file contains code that makes HTTP requests that access the email-registration system
   a. {{{no_processing.js}}} this file contains a single array JavaScript variable that lists all DRMS data series for which export-processing is prohibited
   a. {{{processing.js}}} this file contains code that makes HTTP requests that cause export processing to occur during the export process
   a. {{{protocols.js}}} this file contains code that makes HTTP requests that gets image-protocol (export file type) parameters

=== Browser/Network Tool ===
The export-system root URL is {{{http://solarweb2.stanford.edu:8080/export}}}. Several endpoints are provided to support various export-system requests. The endpoints, which expect arguments to be provided in a single JSON-string object, are (square brackets denote optional arguments, JSON data types are in parentheses):
 1. {{{http://solarweb2.stanford.edu:8080/export/address-registration}}} this endpoint provides access to services that check the registration status of an email address, and register a new email address; arguments:
   a. {{{address (str)}}} the email address to check on/register
   a. {{{[ db-host (str) ]}}} the INTERNAL/PRIVATE database server that hosts the registered export-system user address information
   a. {{{[ db-name (str) ]}}} the name of the database that contains email address and user informaton
   a. {{{[ db-port (number) ]}}} the port on the database host machine accepting connections
   a. {{{[ db-user (str) ]}}} the name of the database user account to use
   a. {{{[ user-name (str) ]}}} the full name of the export-system user
   a. {{{[ user-snail (str) ]}}} the physical address of the export-system user
 1. {{{http://solarweb2.stanford.edu:8080/export/series-server}}} this endpoint provides access to services that provide information about DRMS data series; arguments:
   a. {{{public-db-host (str)}}} the EXTERNAL/PUBLIC database server that hosts the DRMS data-series data
   a. {{{series (array)}}} the set of DRMS data series for which information is to be obtained
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS data-series information
   a. {{{[ db-port (number) ]}}} the port on the database host machine accepting connections
   a. {{{[ db-user (str)]}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/record-set}}} this endpoint provides access to services that provide keyword, segment, and link informaton about DRMS record sets; arguments:
   a. {{{specification (str)}}} the DRMS record-set specification identifying the records for which information is to be obtained
   a. {{{db-host (str)}}} the database server that hosts the DRMS record-set data
   a. {{{[ parse-only (bool)]}}} if {{{true}}}, then parse record-set string only
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{keywords (array)]}}} the list of keywords for which information is to be obtained
   a. {{{[ segments (array)]}}} the list of segments for which information is to be obtained
   a. {{{[ links (array)]}}} the list of links for which information is to be obtained
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS record-set information
   a. {{{[ db-port ] (number)}}} the port on the database host machine accepting connections
   a. {{{[ db-user ] (str)}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/series}}} this endpoint provides access toservices that provide informaton about DRMS data series; arguments:
   a. {{{series (str)}}} the DRMS series for which information is to be obtained
   a. {{{db-host (str)}}} the database server that hosts the DRMS data-series information
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS data-series information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ db-user (str)]}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/new-premium-request}}} this endpoint provides access to services that export DRMS data-series data; the full suite of export options is available; arguments:
   a. {{{address (str)}}} the email address registered for export
   a. {{{db-host (str)}}} the database server that hosts the DRMS data series
   a. {{{export-arguments (json str)}}} the export-request arguments
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS data-series information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ requestor (str)]}}} the full name of the export-system user
   a. {{{[ db-user (str)]}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/new-mini-request}}} this endpoint provides access to services that export DRMS data-series data; a reduced suite of export options is available to allow for quicker payload delivery; arguments:
   a. {{{address (str)}}} the email address registered for export
   a. {{{db-host (str)}}} the database server that hosts the DRMS data series
   a. {{{export-arguments (json str)}}} the export-request arguments
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS data-series information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ requestor (str)]}}} the full name of the export-system user
   a. {{{[ db-user (str)]}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/new-streamed-request}}} this endpoint provides access to services that stream export DRMS data-series data; a reduced suite of export options is available to allow for quicker payload delivery; arguments:
   a. {{{address (str)}}} the email address registered for export
   a. {{{db-host (str)}}} the database server that hosts the DRMS data series
   a. {{{export-arguments (str)}}} the export-request arguments
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains DRMS data-series information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ requestor (str)]}}} the full name of the export-system user
   a. {{{[ db-user (str)]}}} the name of the database user account to use
 1. {{{http://solarweb2.stanford.edu:8080/export/pending-request}}} this endpoint provides access to services that check for the presence of pending requests; arguments:
   a. {{{address (str)}}} the email address registered for export
   a. {{{db-host (str)}}} the database server that hosts the DRMS data series
   a. {{{[ db-name (str)]}}} the name of the database that contains pending-request information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ requestor (str)]}}} the full name of the export-system user
   a. {{{[ db-user (str)]}}} the name of the database user account to use
   a. {{{[ pending_requests_table (str)]}}} the database table of pending requests
   a. {{{[ timeout (number)]}}} after this number of minutes have elapsed, requests are no longer considered pending
 1. {{{http://solarweb2.stanford.edu:8080/export/pending-request-status}}} this endpoint provides access to services that return the export status of a pending request; arguments:
   a. {{{address (str)}}} the email address registered for export
   a. {{{db-host (str)}}} the database server that hosts export-request information
   a. {{{request-id (str)}}} the export system request ID
   a. {{{[ client-type (str)]}}} the securedrms client type (ssh, http)
   a. {{{[ db-name (str)]}}} the name of the database that contains export-request information
   a. {{{[ db-port (number)]}}} the port on the database host machine accepting connections
   a. {{{[ db-user (str)]}}} the name of the database user account to use
   a. {{{[ pending_requests_table (str)]}}} the database table of pending requests
   a. {{{[ timeout (number)]}}} after this number of minutes have elapsed, requests are no longer considered pending
== Python Environment ==
DRMS Export uses the {{{securedrms}} python package, which is included in /base/export/libs/py. If you have not already installed it, do so now. As {{{netdrms_production}}}, {{{cd}}} to {{{<NetDRMS root>/base/export/libs/py}}} and edit {{{securedrms.py}}} to configure the package. Please consult the documentation in the python module in the {{{class SecureServerConfig}}} definition and then edit the arguments to the {{{SecureServerConfig}}} constructor. In particular, you will need to set {{{ssh_remote_user}}} and {{{ssh_remote_host}}} to a linux account and machine that will allow remote access to the NetDRMS binaries (such as show_info, jsoc_info, etc.). You might need to run {{{ssh-keygen}}} on the webserver host and put the public key in the {{{ssh_remote_user@ssh_remote_host:$HOME/.ssh/authorized_keys}}} file so that {{{ssh}}} access will not prompt for a password.


Once you have configured {{{securedrms.py}}}, install it:

{{{
$ whoami
netdrms_production
$ conda activate netdrms
$ which pip
<NetDRMS production user home dir>/.conda/envs/netdrms/bin/pip
$ cd <NetDRMS root>/base/export/libs/py
$ pip install -e .
}}}

The {{{-e}}} flag allows edits to {{{securedrms.py}}} to take effect without having to re-install it.


You will also need to install {{{flask}}}, {{{flask_restful}}}, {{{webargs}}}, and {{{uwsgi}}}:

{{{
$ whoami
netdrms_production
$ conda activate netdrms
$ which pip
<NetDRMS production user home dir>/.conda/envs/netdrms/bin/pip
$ conda install -c conda-forge -n netdrms flask==2.0
$ conda install -c conda-forge -n netdrms flask-restful
$ conda install -c conda-forge -n netdrms webargs
$ conda install -c conda-forge -n netdrms uwsgi
}}}

The {{{async}}} dependency might be needed, depending on future drms-export features.

NetDRMS - a shared data management system

Introduction

In order to process, archive, and distribute the substantial quantity of solar data captured by the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI) instruments on the Solar Dynamics Observatory (SDO), the Joint Science Operations Center (JSOC) has developed its own data-management system, NetDRMS. This system comprises two PostgreSQL databases, multiple file systems, a tape back-up system, and software to manage these components. Related sets of data are grouped into data series, each, conceptually, a table of data where each row of data typically associated with an observation time, or a Carrington rotation. As an example, the data series hmi.M_45s contains the HMI 45-second cadence magnetograms, both observation metadata and image FITS files. The columns contain metadata, such as the observation time, the ID of the camera used to acquire the data, the image rotation, etc. One column in this table contains an ID that refers to a set of data files, typically a set of FITS files that contain images.

The Data Record Management System (DRMS) is the subsystem that contains and manages the "DRMS" database of metadata and data-file-locator information. One component is a software library, written in C, that provides client programs, also known as "DRMS modules", with an Application Programming Interface (API) that allows the users to access these data. The Storage Unit Management System (SUMS) is the subsystem that contains and manages the "SUMS" database and associated storage hardware. The database contains information needed to locate data files that reside on hardware. The entire system as a whole is typically referred to as DRMS. The user interfaces with the DRMS subsystem only, and the DRMS subsystem interfaces with SUMS - the user does not interact with SUMS directly. The JSOC provides NetDRMS to non-JSOC institutions so that those sites can take advantage of the JSOC-developed software to manage large amounts of solar data.

A NetDRMS site is an institution with a local NetDRMS installation. It does not generate the JSOC-owned production data series (e.g., hmi.M_720s, aia.lev1) that Stanford generates for scientific use. A NetDRMS site can generate its own data, production or otherwise. That site can create software that uses NetDRMS to generate its own data series. But it can also act as a "mirror" for individual data series. When acting as a mirror for a Stanford data series, the site downloads from Stanford DRMS database information and stores it in its own NetDRMS database, and it downloads SUMS files, and stores them in its own SUMS subsystem. As the data files are downloaded to the local SUMS, the SUMS database is updated with the information needed to manage the data files. It is possible for a NetDRMS site to mirror the DRMS data of any other NetDRMS site, but at this point, the only site whose data are currently mirrored is the Stanford JSOC.

Installing NetDRMS

Installing the NetDRMS system requires:

Optional steps include:

  • registering for JSOC-data-series subscriptions and running NetDRMS software to receive, in real time, data updates [ Registering for Subscriptions ]

  • running the Remote SUMS daemon (which accepts and processes requests for SUs that reside at other NetDRMSs) [ Running Remote SUMS ]

  • installing the SunPy/drms python package (a Python interface to DRMS) [ Installing SunPy/drms ]

  • installing JSOC-specific project code that is not part of the base NetDRMS installation; the JSOC maintains code to generate JSOC-owned data that is not generally of interest to NetDRMS sites, but sites are welcome to obtain downloads of that code. Doing so involves additional configuration to the base NetDRMS system.
  • installing Slony PostgreSQL data-replication software to become a provider of your site's data
  • installing a webserver that hosts several NetDRMS CGIs to allow web access to your data
  • installing the Virtual Solar Observatory (VSO) software to become a VSO provider of data
  • installing the DRMS Export System [ Installing DRMS Export]

For best results, and to facilitate debugging issues, please follow these steps in order.

Conventions

In this document, parameters to be determined by you the NetDRMS administrator are denoted with angled brackes, <a parameter>. For example, you will need to select a machine to host the PostgreSQL database system, and the name of that host is represented by <PostgreSQL host>. If you choose a host named netdrms_db, let's say, then functionally, you can substitute netdrms_db for <PostgreSQL host> throughout this document.

Part of the process of installing NetDRMS is creating a configuration file named config.local. That file contains numerous parameters, such as SUMS_USEMTSUMS. Through this document, those parameters are denoted with square brackets. As such, this document refers to the the parameter SUMS_USEMTSUMS as [SUMS_USEMTSUMS].

Installing PostgreSQL

PostgreSQL is a relational database management system. Data are stored primarily in relations (tables) of records that can be mapped to each other - given one or more records, you can query the database to find other records. These relations are organized on disk in a hierarchical fashion. At the top level are one or more database clusters. A cluster is simply a storage location on disk (i.e., directory). PostgreSQL manages the cluster's data files with a single process, or PostgreSQL instance. Various operations on the cluster will result in PostgreSQL forking new ephemeral child processes, but ultimately there is only one master/parent process per cluster.

Each cluster contains the data for one or more databases. Each cluster requires a fair amount of system memory, so it makes sense to install a single cluster on a single host. It does not make sense to make separate clusters, each holding one database; each cluster can efficiently support many databases, which are then fairly independent of each other. In terms of querying the databases are completely independent (i.e., a query on one database cannot involve relations in different databases). However, two databases in a single cluster do share the same disk directory, so there is not the same degree of independence at the OS/filesystem level. This may only matter if an administrator is operating directly on the files (performing backups, replication, creating standby systems, etc.).

To install PostgreSQL, select a host machine, <PostgreSQL host>, to act as the PostgreSQL database server. We recommend installing only PostgreSQL on this machine, given the large amount of memory and resources required for optimal PostgreSQL operation. We find a Fedora-based system, such as CentOS, to be a good choice, but please visit https://www.postgresql.org/docs for system requirements and other information germane to installation. The following instructions assume a Fedora-based Linux system such as CentOS (documentation for other distributions, such as Debian and openSUSE can be found online) and a bash shell.

Install the needed PostgreSQL server packages on <PostgreSQL host> by first visiting https://yum.postgresql.org/repopackages.php to locate and download the PostgreSQL "repo" rpm file appropriate for your OS and architecture. Each repo rpm contains a yum configuration file that can be used to install all supported PostgreSQL releases. You should install the latest version if possible (version 12, as of the time of this writing). Although you can use your browser to download the file, it might be easier to use Linux command-line tools:

$ curl -OL https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Install the yum repo configuration file (pgdg-redhat-all.repo) from the downloaded repo rpm file:

$ sudo rpm -i pgdg-redhat-repo-latest.noarch.rpm

This installs the repo configuration file to /etc/yum.repos.d/. Find the names of the PostgreSQL packages needed from the repository; the following assumes PostgreSQL 12, but should you want to install an older version, replace "12" with one of 94, 95, 96, 10, or 11:

$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql[0-9]*\.' | cut -d '.' -f 1
postgresql12
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*devel\.' | cut -d '.' -f 1 
postgresql12-devel
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*contrib\.' | cut -d '.' -f 1
postgresql12-contrib
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*libs\.' | cut -d '.' -f 1 
postgresql12-libs
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*plperl\.' | cut -d '.' -f 1 
postgresql12-plperl
$ yum list --disablerepo='*' --enablerepo=pgdg12 2>/dev/null | grep -Eo '^.*postgresql.*server\.' | cut -d '.' -f 1 
postgresql12-server

Use yum to install all four packages:

$ sudo yum install <packages>

where <packages> are the package names determined in the previous step (postgresql12 postgresql12-contrib postgresql12-devel postgresql12-libs postgresql12-plperl postgresql12-server).

The rpm package installation will have created the PostgreSQL superuser Linux account postgres; postgres will own the PostgreSQL database clusters and server processes that will be created in the following steps. To perform the next steps, you will need to become user postgres:

$ sudo su - postgres

Locate the PostgreSQL-exeutables path:

$ rpm -ql postgresql12 | grep psql$
/usr/pgsql-12/bin/psql

and add it to the postgres's PATH environment variable

$ export PATH=/usr/pgsql-12/bin:$PATH

postgres will be running various PostgreSQL programs over time as part of administration and maintenance, so add this export command to /home/postgres/.bashrc.

As described above, create one database cluster for the two databases (one for DRMS data, and one for SUMS data):

$ whoami
postgres
$ initdb --locale=C -D /var/lib/pgsql/netdrms

initdb will initialize the cluster data directory, /var/lib/pgsql/netdrms (identified by the -D argument). This will result in the creation of template databases, configuration files, and other items.



IMPORTANT: Make sure that the disk space on /var/lib/pgsql/netdrms is sufficient to hold the DRMS database information for all the desired DRMS series. Please ask the source of these data (e.g., the JSOC if the DRMS data series originate at the JSOC) for an estimate of disk-space usage. If the set of DRMS data series is not known at installation time, then overestimate - obtain a disk-space-usage estimate for the largest series, then multiply that by 10. As a very rough estimate, be prepared to provide a least several terabytes.



The database cluster will contain two configuration files you need to edit: postgresql.conf and pghba.conf. Please refer to the PostgreSQL documentation to properly edit these files. Here are some brief suggestions:

  • postgresql.conf - for changes to take effect of parameters marked with a *, a restart of a running server instance is required (pg_ctl restart), otherwise changes will require a reload (pg_ctl reload)

    • listen_addresses* specifies the interface on which the postgres server processes will listen for incoming connections. You will need to ensure that connections can be made from all machines that will run DRMS modules (the modules connect to both the DRMS and SUMS databases), so change the default 'localhost' to '*', which causes the servers to listen on all interfaces:

      listen_addresses = '*'

IMPORTANT!! listen_addresses must be set correctly, along with the entries in the pg_hba.conf file. You cannot use a local connection because DRMS modules do not operate this way. They always make non-local connections to the DB, even if they are running on the DB host machine.

  • port* is the server port on which the server listens for connections.

    port = 5432

    The default port is 5432, and unless there is a good reason use 5432. This value must match the value of the RS_DBPORT config.local parameter.

  • logging_collector* controls whether on not stdout and stderr are logged to a file in the database cluster (in the log or pg_log directory, depending on release). By default it is off - set it so on in each cluster.

    logging_collector = on
  • log_rotation_size sets the maximum size, in kilobytes, of a log file. Set this to 0 to disable rotation, otherwise a new log will be created after the current one grows to some size.

    log_rotation_size = 0
  • log_rotation_age set the maximum age, in minutes, of a log file. Set this to 1d (1 day) so that each day a new log file is created.

    log_rotation_age = 1d
  • log_min_duration_statement is the amount of time, in milliseconds, a query must run before triggering a log entry. Set to this 1000 so that only long-running queries, over a second, will be logged.

    log_min_duration_statement = 1000
  • shared_buffers* is the size of shared-memory buffers. For a server dedicated to a single database cluster, this should be about 25% of the total memory.

    shared_buffers = 32GB
  • pg_hba.conf controls the methods by which client authentication is achieved (HBA stands for host-based authentication). It will likely take a little time to understand and properly edit this configuration file. If you are not familiar with networking concepts (such as subnets, name resolution, reverse name resolution, CIDR notation, IPv4 versus IPv6, network interfaces, etc.) then now is the time to become familiar.

    This configuration file contains a set of columns that identify which user can access which database from which machines. It also defines the method by which authenticaton occurs. When a user attempts to connect to a database, the server transverses this list looking for the first row that matches. Once this row is identified, the user must authenticate - if authentication fails, the connection is rejected. The server does not attempt additional rows. For changes to take effect of any of the parameters in this file, a reload of a server instance is required (not a restart)

Here are the recommended entries:

  • # local superuser connections
    # TYPE    DATABASE  USER                          AUTH-METHOD
      local   all       all                           trust           # this applies ONLY if the user is logged into the PG server AND they do not use the -h argument to psql
      host    all       all       127.0.0.1/8         trust           # for -h localhost, if localhost resolves to an IPv4 address; also for -h 127.0.0.1
      host    all       all       ::1/128             trust           # for -h localhost, if localhost resolves to an IPv6 address; also for -h ::1
    
    # non-local superuser connections
    # TYPE    DATABASE  USER      ADDRESS              AUTH-METHOD
      host    all       postgres  XXX.XXX.XXX.XXX/YY   trust
    
    # non-superuser connections (which can be made from any non-server machines only)
    # TYPE    DATABASE       USER      ADDRESS              AUTH-METHOD
      host    netdrms        all       XXX.XXX.XXX.XXX/YY   md5
      host    netdrms_sums   all       XXX.XXX.XXX.XXX/YY   md5
    where the columns are defined as follows:
  • TYPE - this column defines the type of socket connection made (Unix-domain, TCP/IP, the encryption used, etc.). local is only relevant to Unix-domain local connections from the database server host <PostgreSQL host> itself. Since only postgres will log into the database server, the first row above applies to the administrator only. host is only relevant to TCP/IP connections, regardless of the encryption status of the connection.

  • DATABASE - this column identifies the database to which the user has access. Whenever a user attempt to connect to the database server, they specify a database to access. That database must be in the DATABASE column. We recommend using netdrms for non-superusers, blocking such users from accessing all databases except the DRMS one. Conversely, we recommend using all for the superuser so they can access both the DRMS and SUMS databases (and any other that might exist).

    NOTE: You are using the database name netdrms in pg_hba.conf, even though you have not actually created that database yet. This is OK; you will do so once you start the PostgreSQL cluster instance.

  • USER - this column identifies which users can access the specified databases.

  • ADDRESS - this column identifies the host IP addresses (or host names, but do not use those) from which a connection is allowed. To specify a range of IP addresses, such as those on a subnet, use a CIDR address. This column should be empty for local connections.

  • AUTH-METHOD - this column identifies the type authentication to use. We recommend using either trust or md5. When trust is specified, PostreSQL will unconditionally accept the connection to the database specified in the row. If md5 is specified, then the user will be required to provide a password. If you follow the recommendations above, then for the local row, any user who can log into the database server can access any database in the cluster without any further authentication. Generally only a superuser will be able to log into the database server, so this choice makes sense. For non-local connections by postgres, the Linux PostreSQL superuser postgres can access any database on the server without further authentication. For the remaining non-local non-postgres connections, users will need to provide a password.

Should you need to edit either of these configuration files AFTER you have started the database instance (by running pg_ctl start, as described in the next section), you will need to either reload or restart the instance:

$ whoami
postgres
# reload
$ pg_ctl reload -D /var/lib/pgsql/netdrms
# restart
$ pg_ctl restart -D /var/lib/pgsql/netdrms

IMPORTANT!!! You MUST have host/md5 type non-local connections because DRMS modules always make non-local connections to the DB, even when running on the DB host machine.

Initializing PostgreSQL

You need to now initialize your PostgreSQL instance by creating the DRMS and SUMS databases, installing database-server languages, creating a schema, creating a relation. To accomplish this become postgres; all steps in this section must be performed by the superuser:

$ sudo su - postgres

Start the database instance for the cluster you created:

$ whoami
postgres
$ pg_ctl start -D /var/lib/pgsql/netdrms

You previous created /var/lib/pgsql/netdrms, which will most likely be /var/lib/pgsql/netdrms. Ensure the configuration files you created work. This can be done by attempting to connect to the database server as postgres with psql from <PostgreSQL host>:

$ whoami
postgres
$ hostname
<PostgreSQL host>
$ psql -h <PostgreSQL host> -p 5432
psql (12.1)
Type "help" for help.

postgres=# \q
$ 

The PostgreSQL installation resulted in the creation of the postgres database superuser, and since psql connects to the database as the database user with the same name as the Linux user running psql, you will be logged in as database user postgres. This is indicated by the postgres=# prompt (the hash refers to a superuser).

After you successfully see the superuser prompt, create the two databases:

$ whoami
postgres
# create the DRMS database
$ createdb --locale C -E UTF8 -T template0 netdrms
# create the SUMS database
$ createdb --locale C -E UTF8 -T template0 netdrms_sums

Install the required database-server languages:

$ whoami
postgres
# create the PostgreSQL scripting language (versions <= 9.6)
# no need to create the PostgreSQL scripting language (versions > 9.6)
$ createlang plpgsql netdrms
# create the "trusted" perl language (versions <= 9.6)
createlang -h <PostgreSQL host> plperl netdrms
# create the "trusted" perl language (versions > 9.6)
$ psql -h <PostgreSQLhost> -p 5432 netdrms
netdrms=# CREATE EXTENSION IF NOT EXISTS plperl;
netdrms=# \q
# create the "untrused" perl language (versions <= 9.6)
$ createlang -h <PostgreSQL host> plperlu netdrms
# create the "untrused" perl language (versions > 9.6)
netdrms=# CREATE EXTENSION IF NOT EXISTS plperlu;
netdrms=# \q

The SUMS database does not use any language extensions so there is no need to create any for the SUMS database.

At this point, it is a good idea to create a password for the postgres database superuser:

$ whoami
postgres
$ psql -h <PostgreSQLhost> -p 5432 netdrms
netdrms=# ALTER ROLE postgres WITH PASSWORD '<new password>';
ALTER ROLE
netdrms=# \q
$ 

Installing CFITSIO

The base NetDRMS release requires CFITSIO, a C library used by NetDRMS to read and write FITS files. Visit https://heasarc.gsfc.nasa.gov/fitsio/ to obtain the link to the CFITSIO source-code tarball. The CFITSIO tarball has root directory named cfitsio-X.Y.Z, so download the tarball, then extract it into /opt or some other suitable installation directory. Then make a link from the cfitsio subdirectory to the extracted cfitsio-X.Y.Z directory:

$ cd
$ curl -OL 'http://heasarc.gsfc.nasa.gov/FTP/software/fitsio/c/cfitsio-X.XX.tar.gz'
$ cd /opt
$ sudo tar xvzf ~/cfitsio-X.XX.tar.gz

Please read the README file for complete installation instructions. As a quick start, run:

$ cd /opt/cfitsio-X.Y.Z
$ ./configure --prefix=/opt/cfitsio-X.Y.Z
# build the CFITSIO library
$ make
# install the libraries and binaries to /opt/cfitsio-X.Y.Z
$ sudo make install
# create the link from cfitsio to /opt/cfitsio-X.Y.Z
$ sudo su -
$ cd /opt/cfitsio-X.Y.Z/..
$ ln -s /opt/cfitsio-X.Y.Z cfitsio

CFITSIO has a dependency on libcurl - in fact, any program made by linking to cfitsio will also require libcurl-devel since cfitsio uses the libcurl API. We recommend using yum to install the two packages (if they are not already installed - it is quite likely that libcurl will already be installed):

$ sudo yum install libcurl-devel

Installing OpenSSL Development Packages

NetDRMS requires the OpenSSL Developer's API. If this API has not already been installed, do so now:

$ sudo yum install openssl-devel

Installing DBD::Pg

One step in the installation process will require running a perl script that accesses the PostgreSQL database. In order for this to work, you will need to ensure the DBD::Pg module has been installed. To check for installation, run:

$ perl -M'DBD::Pg'

If there is no error about not being able to locate the module, and the command simply hangs, then you are all set (enter ctrl-C to exit). If the module is not installed, and you are running the perl installed with the system, then run yum to identify the package:

$ yum list | grep -i 'dbd-pg'
...
perl-DBD-Pg.x86_64                         2.19.3-4.el7           base
...

then bringing to bear all your powers of divination, choose the correct package and install it:

$ sudo yum install 'perl-DBD-Pg'

If using a non-system perl, use the distro's installation method. If the distro does not have that module, or the distro installer does not work, as a final act of desperation use CPAN

$ sudo perl -MCPAN -e 'install DBD::Pg'

Installing Python3

NetDRMS requires that a number of python packages and modules be present that are not generally part of a system installation. In addition, many scripts require python3 and not python2. The easiest way to satisfy these eeds is to install a data-science-oriented python3 distribution, such as Anaconda. In that vein, install Anaconda into an appropriate installation directory such as /opt/anaconda3. To locate the Linux installer, visit https://docs.anaconda.com/anaconda/install/linux/:

$ curl -OL 'https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh'
$ sha256sum Anaconda3-2019.10-Linux-x86_64.sh
46d762284d252e51cd58a8ca6c8adc9da2eadc82c342927b2f66ed011d1d8b53  Anaconda3-2019.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2019.10-Linux-x86_64.sh

After some initial prompts, the installer will display

PREFIX=/home/<user>/anaconda3

This path is the default installation directory (<user> is the user running bash). Replace the PREFIX path with <Anaconda3 install dir>.

Installing NetDRMS

To install NetDRMS, you will need to select an appropriate machine on which to install NetDRMS, an appropriate machine/hardware on which to host the SUMS service, create Linux users and groups, download the NetDRMS release tarball and extract the release source, initialize the Linux environment, create log directories, create the configuration file and run the configuration script, compile and install the executables, create the the DRMS- and SUMS-database users/relations/functions/objects, initialize the SUMS storage hardware, install the SUMS and Remote SUMS daemons.

The optimal hardware configuration will likely depend on your needs, but the following recommendations should suffice for most sites. DRMS and SUMS can share a single host machine. The most widely used and tested Linux distributions are Fedora-based, and at the time of this writing, CentOS is the most popular. Sites have successfully used openSUSE too, but if possible, we would recommend using CentOS. SUMS requires a large amount of storage to hold the DRMS data-series data/image files. The amount needed can vary widely, and depends directly on the amount of data you wish to keep online at any given time. Most NetDRMS sites mirror some amount of (but not all) JSOC SDO data - the more data mirrored, the larger the amount of storage needed. To complicate matters, a site can also mirror only a subset of each data series' data; perhaps one site wishes to retain only the current month's data of many data series, but another wishes to retain all data for one or two series. To decide on the amount of storage needed, you will have to ask the JSOC how much data each series comprises and decide how much of that data you want to keep online. Data that goes offline can always be retrieved automatically from the JSOC again. Data will arrive each day, so request from the JSOC an estimate of the rate of data growth. We recommend doing a rough calculation based upon these considerations, and then doubling the resulting number and installing that amount of storage.

Next, create NetDRMS production Linux user netdrms_production:

$ sudo useradd netdrms_production
$ sudo passwd netdrms_production
Changing password for user netdrms_production.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
$

NOTE: ensure that netdrms_production is a valid PostgreSQL name because NetDRMS makes use of the PostgreSQL feature whereby attempts to connect to a database are made as the database user whose name matches the name of the Linux user connecting to the database. Please see https://www.postgresql.org/docs/12/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS for a description of valid PostgreSQL names.

netdrms_production will access the PostgreSQL databases previously installed by running PostgreSQL executables. Modify netdrms_production's PATH environment variable so that these executables can be located by the shell. Put the following into /home/netdrms_production/.bashrc:

# PostgreSQL executables
export PATH=/usr/pgsql-12/bin:$PATH

Make sure you have either re-logged-in or sourced /home/netdrms_production/.bashrc.

NetDRMS requires additional Python packages not included in the Anaconda distribution, but if you install Anacoda, then the number of additional packages you need to install is minimal. If you have a different python distribution, then you may need to install additional packages. To install new Anaconda packages, as netdrms_production first create a virtual environment for NetDRMS (named netdrms):

$ whoami
netdrms_production
# Sets path to Anaconda3 binaries so that the next conda command will success (will edit .bashrc)
$ /opt/miniconda3/bin/conda init bash
$ source ~/.bashrc
$ conda create --name netdrms
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/netdrms_production/.conda/envs/netdrms

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate netdrms
#
# To deactivate an active environment, use
#
#     $ conda deactivate

VERY IMPORTANT!! As described above in the conda create output, run:

$ conda activate netdrms

If you do not do this, then conda might invoke the wrong python later in this tutorial.

VERY, VERY IMPORTANT!!

conda init bash modifies the user environment to manage which python executables are invoked when python, et al., are run from the shell. It does so by modifying the /home/netdrms_production/.bashrc file, adding PATH-setting code that is conditional on environment variables that are in turn modified when you run conda activate and conda deactivate. In the deactivated state, the path to the base conda system is included in PATH. In this tutorial, this would be /opt/miniconda3/bin, the installation directory you chose when you installed miniconda. When you activate the netdrms environment, the conditional code in /home/netdrms_production/.bashrc replaces /opt/miniconda3/bin with the path to the executables in the netdrms virtual environment, /home/netdrms_production/.conda/envs/netdrms/bin.

If you were to run conda activate netdrms right now, before you had installed any packages into the virtual environment, the PATH environment variable would include the path to the virtual-environment executables, /home/netdrms_production/.conda/envs/netdrms/bin. But since you have not yet installed any packages into the virtual environment, /home/netdrms_production/.conda/envs/netdrms/bin would be empty. And due to the way the PATH-setting code removes the path to /opt/miniconda3/bin when the virtual environment is activated, PATH would not include any path to the miniconda instance you installed. If you were to run python from the command line, the system python would likely launch, which is definitely not what you want to happen.

I consider this to be a conda bug - I do not expect activation of a virtual environment to result in PATH pointing to python executables outside of the installation. And this is true for not only python, but also for other python executables, like pip and activate. If you install any package into this environment, then it seems that conda will install ALL executables (newer version when available) into the environment, and from then on your PATH will be set correctly - always pointing to python executables inside the miniconda installation, regardless of activation status. But if you determine that you do not need to install any additional python packages (unlikely), then this situation is a bit of a land mine. To avoid this headache, ensure that you always install some package into your virtual environment.

Then install the new conda packages using conda:

$ whoami
netdrms_production
$ conda install -n netdrms psycopg2 psutil
Collecting package metadata (current_repodata.json): done
...
# pySmartDL is in the conda-forge channel
$ conda install -n netdrms -c conda-forge pySmartDL python-dateutil

NOTE: By installing these packages in the netdrms virtual environment, you will also install the python package in the environment since the explicitly listed packages have a dependency on python. This is important, because the next step requires that pip, part of python, be present in the environment.

From now on, the netdrms_production should use the virtual environment. You should make sure that $PYTHONPATH is not set, otherwise it might interfere with the running of miniconda3 python. Modify /home/netdrms_production/.bashrc to do so:

# use Python virtual environment by default
unset PYTHONPATH

For the changes to /home/netdrms_production/.bashrc to take effect, either logout/login or source /home/netdrms_production/.bashrc.

All non-production users will use the netdrms virtual environment. Make sure that users can access the environment:

$ whoami
netdrms_production
$ chmod o+x /home/netdrms_production

Create the Linux group <SUMS users>, e.g. sums_users, to which all SUMS users belong, including netdrms_production. This group will be used to ensure that all SUMS users can create new data files in SUMS:

$ sudo groupadd <SUMS users>

Add netdrms_production to this group (later you will add each SUMS user - users who will read/write SUMS data files - to this group as well):

$ sudo usermod -a -G <SUMS users> netdrms_production
$ id netdrms_production
uid=1001(netdrms_production) gid=1001(netdrms_production) groups=1001(netdrms_production),1002(sums_users)

On NetDRMS host, clone the JSOC git repo into /opt/jsoc-git. From your local copy of this repo, you will install NetDRMS executables and libraries into /opt/netdrms-vXX.X and then make a link from /opt/netdrms to /opt/netdrms-vXX.X. In this way, you can upgrade your version of NetDRMS by installing into a new version into a new /opt/netdrms-vXX.X directory, and then updating the /opt/netdrms link.

Create /opt/jsoc-git and make netdrms_production the owner:

$ sudo mkdir -p /opt/jsoc-git
$ sudo chown netdrms_production:netdrms_production /opt/jsoc-git

As netdrms_production, clone the git repository that contains the desired NetDRMS release. Currently, this repository is private, which means you will need to create a GitHub account, set up ssh keys, and be given permission to access this repository. To set up the ssh keys, create a modern ssh key, like an ed25519 key:

$ ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/netdrms_production/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/netdrms_production/.ssh/id_ed25519.
Your public key has been saved in /home/netdrms_production/.ssh/id_ed25519.pub.
The key fingerprint is:
...

Once you have your private-public key pair, you will need to upload it to your GitHub account. Click on your profile picture in GitHub, then in the left pane, click on SSH and GPG keys, click on the blue button labeled "New SSH key", provide a helpful title, then paste the contents of your public key into the text-edit box and, finally, click on the blue button to finalize the key.

After you have successfully uploaded the public part of your ssh key, locate the release's tag at https://github.com/JSOC-SDP/JSOC-orig/tags, and then clone the repository at this tag:

$ whoami
netdrms_production
$ cd /opt/jsoc-git/..
$ git clone --branch <release tag> git@github.com:JSOC-SDP/JSOC-orig.git /opt/jsoc-git

This will put your repo into a "detached HEAD" state, which essentially means that you can no longer make changes to the repo's main branch. If you would like to make release hotfixes, you will first need to create a hotfix branch off of the main branch, switching to it:

$ git switch -c <new hotfix branch>

If you want to work on a new feature, first switch to the develop branch:

$ git switch develop

and then make a feature branch off the develop branch, switching to it:

$ git switch -c <new feature branch>

To propose that the JSOC incorporates these changes into the official GitHub repo, push hotfix or feature branches back to the GitHub repo. If these changes look good to us, we will merge them into our main and/or develop branches to be available to all in future releases.

Make the SUMS log directory <SUMS logs> on the SUMS server machine. Various SUMS log files will be written to this directory. A suitable directory would reside in the netdrms_production user's home directory, e.g., $HOME/log/SUMS

$ whoami
netdrms_production
$ cd ~
$ mkdir -p <SUMS logs>

Select appropriate C and Fortran compilers. The DRMS part of NetDRMS must be compiled with a C compiler. NetDRMS supports both the GNU C compiler (gcc), and the Intel C++ compiler (icc). Certain JSOC-specific code requires Fortran compilation. For those projects, NetDRMS supports the GNU Fortran compiler (gfortran), and the Intel Fortran compiler (ifort). SUMS is implemented as a Python daemon, so no compilation step is needed. Both GNU and Intel are widely used, so feel free to use either. By default, Intel compilers are used. There are two methods for changing the compilers:

  • as the netdrms_production, you can set the JSOC_COMPILER and JSOC_FCOMPILER environment variables (we recommend doing so in .bashrc):

    $ whoami
    netdrms_production
    $ vi
    i
    # .bashrc
    
    # set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
    export JSOC_COMPILER=<C compiler>
    
    # set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
    export JSOC_FCOMPILER=<Fortran compiler>
    ESC
    :wq
    $ 

If you have chosen an Intel compiler, please keep in mind that it might be necessary to source an environment file for proper linking to occur. If so, do so in your ~/.bash_rc, ~/.bash_profile or other appropriate resource file:

[ -f /opt/intel/oneapi/setvars.sh ] && source /opt/intel/oneapi/setvars.sh

Create the /opt/jsoc-git/config.local, the master configuration file for both DRMS and SUMS, using /opt/jsoc-git/config.local.newstyle.template as a template. This file contains a number of configuration parameters, along with detailed descriptions of what they control and suggested values for those parameters. The configuration script, configure, reads this file, and then creates one output files, drmsparams.*, in /opt/jsoc-git/<architecture dir>/base/localization for each of several programming and scripting languages(C, GNU make, perl, python, bash). These files are directly readable by the several languages used by NetDRMS. Lines that start with whitespace or the hash symbol, #, are ignored.

NOTE: if you have an older NetDRMS and you are going to use the config.local from that NetDRMS installation for a new NetDRMS installation, then you might have an old-style config.local. You know you have an old-style configuration file if the __STYLE__ section does not exist in the file, or if it does exist, the value in that section is old. If that is the case, then you will need to compare /opt/jsoc-git/config.local.template in the old installation to the analogous file in the new installation to determine the set of parameters that have changed between releases.

Several sections compose config.local:

__STYLE__
new

__DEFS__
# these are NetDRMS-wide parameter values; the format is <quote code>:<parameter name><whitespace>+<parameter value>;
# the configuration script uses <quote code> to assist in creating language-specific parameters; <quote code> is one of:
#   'q' (enclose the parameter value in double quotes)
#   'p' (enclose the parameter value in parentheses)
#   'a' (do not modify the parameter value). 

__MAKE__
# these are make variables used by the make system during compilation - they generally contain paths to third-party code

Before creating config.local, please request from the JSOC a value for DRMS_LOCAL_SITE_CODE. This code uniquely identifies each NetDRMS installation. Each site requires one ID for each of its NetDRMS installations.

The __DEFS__ section:

  • BIN_PY3 - the path to the Python 3 python executable.

  • DBNAME - the name of the DRMS database: this is netdrms; this parameter exists in case you want to select a different name, but we don't recommend changing it.

  • DRMS_LOCAL_SITE_CODE - a 15-bit hexadecimal string that globally and uniquely identifies the NetDRMS. Each NetDRMS requires a unique code for each installation. Values greater than or equal to 0x4000 denote a development installation and need not be unique. If you plan on generating data that will be distributed outside of your site, please obtain a unique value from the JSOC.

  • DRMS_LOCK_DIR - the directory to which the DRMS library writes various lock files.

  • DRMS_LOG_DIR - the directory to which the DRMS library writes various log files.

  • DRMSPGPORT - the port on the database host on which the database server is listening (5432 by default)

  • DX_LISTEN_PORT - the port (on the host machine that accepts data-transfer client connections) to which the data-transfer server listens

  • DX_PACKAGE_HOST - the host machine that accepts data-transfer client connections

  • DX_PACKAGE_ROOT - the directory to which the data-transfer server writes tar files

  • EXPORT_HANDLE_DIR - the directory to which export programs save handles; create this directory if it does not exist and make sure is globally writeable.

  • EXPORT_LOCK_DIR - the directory to which export programs write lock files; create this directory if it does not exist and make sure is globally writeable.

  • EXPORT_LOG_DIR - the directory to which export programs write logs; create this directory if it does not exist.

  • EXPORT_PENDING_REQUESTS_MAX_TABLE - an optional database table that contains the maximum number of export requests an export user can make simultaneously

  • EXPORT_PENDING_REQUESTS_TABLE - a database table that contains a record of each pending export request

  • EXPORT_PENDING_REQUESTS_TIME_OUT - after this number of minutes a request is considered to have timed out

  • EXPORT_USER_INFO_FN - an optional database function that returns user information associated with an export email address

  • JMD_IS_INSTALLED - if set to 1, then the Java Mirroring Daemon alternative to Remote SUMS is used: this should be 0.

  • POSTGRES_ADMIN - the Linux user that owns the PostgreSQL installation and processes: this is postgres.

  • RS_BINPATH - the NetDRMS binary path that contains the external programs needed by the Remote SUMS (e.g., jsoc_fetch, vso_sum_alloc, vso_sum_put).

  • RSCLIENT_TIMEOUT - after this number of minutes the remote-sums client will cancel a pending Storage-Unit download

  • RS_DBHOST - the name of the Remote SUMS database cluster host; this is <PostgreSQL host>, the machine on which PostgreSQL was installed.

  • RS_DBNAME - the Remote SUMS database - this is netdrms.

  • RS_DBPORT - the port that the Remote SUMS database cluster instance is listening on: this is 5432.

  • RS_DBUSER - the Linux user that runs Remote SUMS; this is also the database user who owns the Remote SUMS database objects: this is netdrms_production.

  • RS_DLTIMEOUT - the timeout, in seconds, for an SU to download. If the download time exceeds this value, then all requests waiting for the SU to download will fail.

  • RS_LOCKFILE - the (advisory) lockfile used by Remote SUMS to prevent multiple instances from running.

  • RS_LOGDIR - the directory in which remote-sums log files are written.

  • RS_MAXTHREADS - the maximum number of SUs that Remote SUMS can process simultaneously.

  • RS_N_WORKERS - the number of scp worker threads - at most, this many scp processes will run simultaneously

  • RS_REQTIMEOUT - the timeout, in seconds, for a new SU request to be accepted for processing by the daemon. If the daemon encounters a request older than this value, it will reject the new request.

  • RS_REQUEST_TABLE - the Remote SUMS database relation that contains Remote SUMS requests; this is <Remote SUMS requests>, which should be drms.rs_requests; DRMS modules insert request rows in this table, and Remote SUMS locates the requests and manages rows in this table.

  • RS_SCP_MAXPAYLOAD - the maximum total payload, in MB, per download. As soon as the combined payload of SUs ready for download exceeds this value, then the SUs are downloaded with a single scp process.

  • RS_SCP_MAXSUS - the maximum size of the SU download queue. As soon as this many SUs are ready for download, they are downloaded with a single scp process.

  • RS_SCP_TIMEOUT - if there are SUs ready for download, and no scp has fired off within this many seconds, then the SUs that are ready to download are downloaded with a single scp process.

  • RS_SITE_INFO_URL - the service at JSOC that is used by Remote SUMS to locate the NetDRMS site that owns SUMS storage units; this is Remote SUMS site URL.

  • RS_SU_EXPIRATION - the default expiration date for all SUs ingested by Remote SUMS; if the SU being ingested is part of a data series, then Remote SUMS obtains the expiration for the SU from the data series' definition instead; as an alternative to RS_SU_EXPIRATION, RS_SU_LIFESPAN can be used to specify the expiration date of newly ingested SUs; RS_SU_EXPIRATION takes predent over RS_SU_LIFESPAN. NOTE: you will need to define at least one of RS_SU_EXPIRATION or RS_SU_LIFESPAN for rsumsd.py to work properly.

  • RS_SU_ARCHIVE - the default value of the archive flag for newly ingested SUs; if the SU being ingested is part of a data series, then Remote SUMS obtains the archive flag from the data series' definition instead; the truth value can be one of several character strings that implies TRUE or FALSE.

  • RS_SU_LIFESPAN - the default lifespan ("retention time"), in days, of a newly ingested SU; if the SU being ingested is part of a data series, then Remote SUMS obtains the lifespan for the SU from the data series' definition instead; as an alternative to RS_SU_LIFESPAN, RS_SU_EXPIRATION can be used to specify the lifespan of newly ingested SUs; RS_SU_EXPIRATION takes predent over RS_SU_LIFESPAN.

  • RS_SU_TAPEGROUP - the default value of the tapegroup for newly ingested SUs; if the SU being ingested is part of a data series, then Remote SUMS obtains the tapegroup from the data series' definition instead.

  • RS_TMPDIR - the temporary directory into which SUs are downloaded. This should be on the same file system on which the SUMS partitions reside.

  • RSCLIENT_TIMEOUT - the time interval, in minutes, after which rsums-clientd.py will error-out a request IF during that time interval at least one SU could not be downloaded.

  • SCRIPTS_EXPORT - the path to the directory in the NetDRMS installation that contains the export scripts.

  • SERVER - the name of the DRMS database cluster host: this is <PostgreSQL host>, the machine on which PostgreSQL was installed.

  • SS_HIGH_WATER - partition scrubbing is initiated only after partition percent usage rises above the high-water mark.

  • SS_LOCKFILE - the (advisory) lockfile used by the SU steward to prevent multiple instances of the steward from running.

  • SS_LOW_WATER - each SUMS partition is scrubbed until its percent usage falls below the low-water mark.

  • SS_REHYDRATE_INTERVAL - the time interval, in seconds, between updates to the per-partition cache of expired SUs; this value applies to all partitions that are scrubbed; for each partition, a steward thread queries its cache to select the next SUs to delete (which are sorted by increasing expiration date).

  • SS_SLEEP_INTERVAL - the interval, in seconds, between flushing/caching expired SU lists (use a smaller number if the system experience a high rate of SU expiration).

  • SS_SU_CHUNK - the number of SUs in a partition that are deleted at one time; SUs are deleted one chunk at a time until the partition usage falls below the low-water mark.

  • SUMBIN_BASEDIR - the directory in which sum_chmown, a root setuid program is installed; must be mounted locally on the machine on which the SUMS partition are mounted: this is <NetDRMS root>.

  • SUMLOG_BASEDIR - the path to the directory that contains various SUMS log files; this is <SUMS logs>.

  • SUMPGPORT - the port that the SUMS database cluster host is listening on: this is 5432, unless DRMS and SUMS reside in different clusters on the same host (something that is not recommended since a single PostgreSQL cluster requires a substantial amount of system resources).

  • SUMS_DB_HOST - the name of the SUMS database cluster host: this is <PostgreSQL host>, the machine on which PostgreSQL was installed; NetDRMS allows for creating a second cluster for SUMS, but in general this will not be necessary unless extremely heavy usage requires separating the two clusters.

  • SUMS_GROUP - the name of the Linux group to which all SUMS Linux users belong: this is <SUMS users>.

  • SUMS_MANAGER - the SUMS database user who owns the SUMS database objects which are manipulated by Remote SUMS and SUMS itself; it should be the Linux user that runs SUMS and owns the SUMS storage directories - this is netdrms_production

  • SUMS_MULTIPLE_PARTNSETS - SUMS has more than one partition set: more than likely, this is 0.

  • SUMS_MT_CLIENT_RESP_TIMEOUT - the interval of time, in minutes, sumsd.py will wait for a client response; after this interval elapses without a client resopnse sumsd.py will destroy the client connection.

  • SUMS_READONLY_DB_USER - the SUMS database user who has read-only access to the SUMS database objects; it is used by the Remote SUMS client (rsums-clientd.py) to check for the presence of SUs before requesting they be downloaded.

  • SUMS_TAPE_AVAILABLE - SUMS has a tape-archive system.

  • SUMS_USEMTSUMS - use the multi-threaded Python SUMS: this is 1.

  • SUMS_USEMTSUMS_ALL - use the multi-threaded Python SUMS for all SUMS API methods; SUMS_USEMTSUMS_ALLOC, SUMS_USEMTSUMS_CONNECTION, SUMS_USEMTSUMS_DELETESUS, SUMS_USEMTSUMS_GET, SUMS_USEMTSUMS_INFO, and SUMS_USEMTSUMS_PUT are ignored: this is 1.

  • SUMS_USEMTSUMS_ALLOC - use the MT SUMS daemon for the SUM_alloc() and SUM_alloc2() API function.

  • SUMS_USEMTSUMS_CONNECTION - use the MT SUMS daemon for the SUM_open() and SUM_close() API functions.

  • SUMS_USEMTSUMS_DELETESUS - use the MT SUMS daemon for the SUM_delete_series() API function.

  • SUMS_USEMTSUMS_GET - use the MT SUMS daemon for the SUM_get() API function.

  • SUMS_USEMTSUMS_INFO - use the MT SUMS daemon for the SUM_infoArray() API function.

  • SUMS_USEMTSUMS_PUT - use the MT SUMS daemon for the SUM_put() API function.

  • SUMSD_LISTENPORT - the port that SUMS listens to for incoming requests.

  • SUMSD_MAX_THREADS - the maximum number of SUs that SUMS can process simultaneously.

  • SUMSERVER - the SUMS host machine; this is <SUMS host>.

  • WEB_DBUSER - the DRMS database user account that cgi programs access when they need to read from or write to database relations.

  • WL_HASWL - if 1 then this DRMS has a whitelist of private series that are accessible on a public web site

The __MAKE__ section:

  • INCS_INSTALL_DIR_cfitsio - the path to the installed CFITSIO header files: this is /opt/cfitsio-X.Y.Z/include

  • LIBS_INSTALL_DIR_cfitsio - the path to the installed CFITSIO library files: this is /opt/cfitsio-X.Y.Z/lib

  • CFITSIO_LIB - the name of the CFITSIO library (cfitsio)

  • INCS_INSTALL_DIR_pq - the path to the installed PostgreSQL header files: this is /usr/pgsql-12/include

  • LIBS_INSTALL_DIR_pq - the path to the installed PostgreSQL library files: this is /usr/pgsql-12/lib

  • POSTGRES_LIB - the name of the PostgreSQL C API library (AKA libpq): this is always pq

  • INCS_INSTALL_DIR_ecpg - the path to the installed PostgreSQL ecpg header files: this is /usr/pgsql-12/include

  • LIBS_INSTALL_DIR_ecpg - the path to the installed PostgreSQL ecgp library files: this is /usr/pgsql-12/lib

  • INCS_INSTALL_DIR_crypto - the system path to the crypto-library header file

  • LIBS_INSTALL_DIR_crypto - the system path to the crypto library

  • INCS_INSTALL_DIR_png - the system path to the png-library header file

  • LIBS_INSTALL_DIR_png - the system path to the png library

When installing NetDRMS updates, you might need to update your /opt/jsoc-git/config.local. Use the new config.local.newstyle.template to obtain information about parameters new to the newer release. Many of the parameter values have been determined during the previous steps of the installation process.

Run the configuration csh script, configure, which is included in /opt/jsoc-git.

$ whoami
netdrms_production
$ cd /opt/jsoc-git
$ ./configure

Compile and install NetDRMS:

$ whoami
netdrms_production
$ cd /opt/jsoc-git
$ make base_all
$ make install prefix=<NetDRMS installation dir>

As make install prefix=<NetDRMS installation dir> completes, it prints the following just before exiting:

Make sure you source one of the generated env files:
For bash: <NetDRMS installation dir>/drms-env-linux_avx2.bash
For csh: <NetDRMS installation dir>/drms-env-linux_avx2.csh

You will need to source the appropriate environment file before you continue as many of the commands require that installation-directory environment variables are set properly. Also, add installation paths the PATH environment variable. It is probably best to add the following to ~/.bash_rc, ~/.bash_profile or a similar resource file for each user who will run NetDRMS programs. In this instruction manual, this will be true of every NetDRMS user, of postgres, and of netdrms_production:

[ -f /opt/netdrms/drms-env-linux_avx2.bash ] && source /opt/netdrms/drms-env-linux_avx2.bash
PATH=$DRMS_BINS_INSTALL_DIR:$DRMS_SCRS_INSTALL_DIR:$PATH

As postgres, run two SQL scripts included in the NetDRMS installation, to create the admin and drms schemas and their relations, and the jsoc and sumsadmin database users, data types, and functions:

$ whoami 
postgres
# use psql to execute SQL script
# creates DRMS database tables
$ psql -h <PostgreSQL host> -p 5432 -U postgres -f $DRMS_SCRS_INSTALL_DIR/NetDRMS.sql netdrms
CREATE SCHEMA
GRANT
CREATE TABLE
CREATE TABLE
GRANT
GRANT
CREATE SCHEMA
GRANT
CREATE ROLE
CREATE ROLE
# creates DRMS databse functions
$ psql -h <PostgreSQL host> -p 5432 -U postgres -f $DRMS_SCRS_INSTALL_DIR/create_database_functions.sql netdrms

For more information about the purpose of these objects, read the comments in the NetDRMS.sql and createpgfuncs.pl.

Make <DRMS DB production user> a DRMS user. This not only makes a database account for <DRMS DB production user> (whose name is <DRMS DB production user>), but it also enters information into the database that allows <DRMS DB production user> to run DRMS modules, such as show_info. In order to do this, you will need to connect to the netdrms database, using the only account that currently exists: <PostgreSQL DB superuser>. Make sure that you've modified postgres's environment as described above to source the DRMS environment file, to set the PATH environment variable, and to also source the Intel environment file (if needed).

DRMS modules connect to the database using the Linux user name as the database user name because, by default, PosgreSQL clients use the operating-system name for the database account. For example, if Linux user netdrms_production runs show_info, then show_info will connect to the DRMS database as database user netdrms_production. So netdrms_production == <DRMS DB production user>. So to make netdrms_production a DRMS user, provide netdrms_production for the <DRMS DB production user> argument

$ whoami
postgres
$ perl $DRMS_SCRS_INSTALL_DIR/newdrmsuser.pl netdrms <PostgreSQL host> 5432 <DRMS DB production user> <initial password> <DRMS DB production user namespace> user 1
Connection to database with 'dbi:Pg:dbname=netdrms;host=drms;port=5432' as user '<DRMS DB production user>' ... success!
executing db statment ==> CREATE USER <DRMS DB production user>
executing db statment ==> ALTER USER <DRMS DB production user> WITH password '<initial password>'
executing db statment ==> GRANT jsoc to <DRMS DB production user>
running cmd-line ==> masterlists dbuser=<DRMS DB production user> namespace=<new DB user namespace> nsgrp=user
Please type the password for database user "postgres":
Connected to database 'netdrms' on host '<PostgreSQL host>' and port '5432' as user 'postgres'.
Created new drms_series...
Created new 'drms_keyword'...
Created new 'drms_link'...
Created new 'drms_segment'...
Created new 'drms_session'...
Created new drms_sessionid_seq sequence...
Commiting...
Done.
executing db statment ==> INSERT INTO admin.sessionns VALUES ('<DRMS DB production user>', '<DRMS DB production user namespace>')

where <DRMS DB production user namespace> is the PostgreSQL namespace dedicated to <DRMS DB production user. Please see the NOTE in this page for assistance with choosing <DRMS DB production user namespace>. The general naming convention is to prepend the database user name with an abbreviation to identify the site that owns the data in the namespace, like <site id>_<DRMS DB production user>. The <site_id> used here should be used for all NetDRMS users created later in these instructions.

Add database permissions to the <DRMS DB production user>. This will allow <DRMS DB production user> to create schemas in the DRMS and SUMS databases:

$ whoami
postgres
# for the DRMS database
$ psql -h <PostgreSQL host> -p 5432 netdrms
netdrms=# GRANT ALL ON DATABASE netdrms TO <DRMS DB production user>;
GRANT
netdrms=# \q
# for the SUMS database
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms=# GRANT ALL ON DATABASE netdrms_sums TO <DRMS DB production user>;
GRANT
netdrms=# \q

As netdrms_production, create a .pgpass file. This file contains the PostgreSQL user account password, obviating the need to manually enter the database password each time a database connection attempt is made:

$ whoami
netdrms_production
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<DRMS DB production user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass

Now that we have a production user, <DRMS DB production user>, in the database, we'd like for it to own all the database objects that were created by the NetDRMS.sql and createpgfuncs.pl scripts (these objects are all specific to NetDRMS). These two scripts were run by <PostgreSQL DB superuser> because this super user had the elevated privileges needed to create these objects - plus <PostgreSQL DB superuser> was the only database user in existence at that point. Run the following, as postgres, to alter ownerships:

$ whoami 
postgres
$ psql -h <PostgreSQL host> -p 5432 netdrms << EOF
ALTER SCHEMA admin OWNER TO <DRMS DB production user>;
ALTER TABLE admin.ns OWNER TO <DRMS DB production user>;
ALTER TABLE admin.sessionns OWNER TO <DRMS DB production user>;
ALTER SCHEMA drms OWNER TO <DRMS DB production user>;
ALTER TYPE drmskw OWNER TO <DRMS DB production user>;
ALTER TYPE drmsseries OWNER TO <DRMS DB production user>;
ALTER TYPE drmssession OWNER TO <DRMS DB production user>;
ALTER TYPE drmssg OWNER TO <DRMS DB production user>;
ALTER TYPE rep_item OWNER TO <DRMS DB production user>;
EOF
ALTER SCHEMA
ALTER TABLE
ALTER TABLE
ALTER SCHEMA
ALTER TYPE
ALTER TYPE
ALTER TYPE
ALTER TYPE
ALTER TYPE
$ 

Some features of NetDRMS require the installation of a python packages included in the NetDRMS distribution: drms_parameters, drms_utils, drms_export, and sums_client. drms_export has a dependency on Sunpy/drms, so make sure to install it first (see [ Installing SunPy/drms ]). Each package is contained within a lib/py directory. To install these packages, make sure you are running as netdrms_production (since this user created the netdrms virtual environment) and you have already activated the netdrms virtual environment. As netdrms_production, cd to each of the lib/py directories. Each one contains a setup.py file that will be used by pip to install the packages that reside in the lib/py directory:

$ cd $DRMS_SRC_INSTALL_DIR/base/libs/py
$ pip install .
$ cd $DRMS_SRC_INSTALL_DIR/base/export/libs/py
$ pip install .
$ cd $DRMS_SRC_INSTALL_DIR/base/sums/libs/py
$ pip install .

We recommend using <DRMS DB production user> as the SUMS database production user <SUMS DB production user>. However, feel free to create a new user if necessary. If the DRMS and SUMS databases reside in different clusters, then you will need to create the <SUMS DB production user>. Again, since PostgreSQL clients automatically use the Linux user name as the PostgreSQL user name when a connection attempt is made, use Linux user netdrms_production for database user <SUMS DB production user>. If you choose a <SUMS DB production user> that is not netdrms_production, then you will need to pass <SUMS DB production user> to both Remote SUMS and SUMS when starting them.

$ # DO THIS ONLY IF <SUMS DB production user> != <DRMS DB production user> OR IF <DRMS database cluster> != <SUMS database cluster>
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# CREATE ROLE <SUMS DB production user>;
postgres=# \q
$ 

In addition, you will need to create a SUMS database user that has read-only access to the SUMS database objects:

$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# CREATE ROLE <SUMS DB readonly user> WITH LOGIN;
postgres=# ALTER ROLE <SUMS DB readonly user> WITH PASSWORD 'readonlyuser';
postgres=# \q

where <SUMS DB readonly user> is [SUMS_READONLY_DB_USER]. This database account is used by the Remote SUMS Client, a daemon used to manage the auto-download of SUs for subscriptions. Remote SUMS Client will be run by netdrms_production, so add the password to netdrms_production's .pgpass file:

$ whoami
netdrms_production
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:[SUMS_READONLY_DB_USER]:readonlyuser
ESC
:wq
$ chmod 0600 .pgpass

If you created a new SUMS DB production user, add a password for this user. Ensure that you use the same password that you used for <DRMS DB production user> - you will use the same Linux user when connecting to either database, so the same .pgpass file will be used for authentication. As postgres, run psql to add a password for this new database user:

$ # DO THIS ONLY IF <SUMS DB production user> != <DRMS DB production user>
$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432
postgres=# ALTER ROLE <DRMS DB production user> WITH PASSWORD '<DRMS DB production user password>';
postgres=# \q
$

SUMS stores directory and file information in relations in the SUMS database. To create those relations and initialize tables, as netdrms_production run:

$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> -f $DRMS_SCRS_INSTALL_DIR/postgres/create_sums_tables.sql netdrms_sums
CREATE TABLE
CREATE INDEX
CREATE INDEX
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TABLE
GRANT
CREATE TABLE
CREATE INDEX
GRANT
CREATE SEQUENCE
GRANT
CREATE SEQUENCE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
CREATE TABLE
GRANT
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> netdrms_sums
netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START <min val> RESTART <min val> MINVALUE <min val> MAXVALUE <max val>;
ALTER SEQUENCE
netdrms_sums=> \q
$ 

where <min val> is <drms site code> << 48, and <max val> is <min val> + <maximum unsigned 48-bit integer> - 1, where <drms site code> is the value of the [DRMS_LOCAL_SITE_CODE], and <maximum unsigned 48-bit integer> is 248 (which is 281474976710656). For the JSOC (site code 0x0000), this ALTER SEQUENCE command looks like:

netdrms_sums=> ALTER SEQUENCE sum_ds_index_seq START 0 RESTART 0 MINVALUE 0 MAXVALUE 281474976710655;

Lastly, as postgres you will need to ensure that [SUMS_READONLY_DB_USER] can read from the sum_partn_alloc table:

$ whoami
postgres
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms_sums=# GRANT SELECT ON sum_partn_alloc TO [SUMS_READONLY_DB_USER];
GRANT
netdrms_sums=# \q
$ 

Initializing SUMS Storage

In addition to SUMS database relations, SUMS requires a file system on which SUMS maintains storage areas called SUMS partitions. A SUMS paritition is really just a directory that contains SUMS Storage Units (each of which is implemented as a subdirectory inside the SUMS partition).

New Storage

As netdrms_production create one or more partitions now - although we have had success making them as large as 60 TB, make 40 TB partitions. For example, if you plan on setting aside X TB of SUMS storage, then make approximately N = <total storage TB> / X 40 TB partitions. The partitions can reside on a file server and be mounted onto all machines that will use NetDRMS, but the following example simply creates directories on a single file system on <SUMS partition host>. First, make the root directory that contains the SUMS paritions (something like /opt/sums):

$ sudo mkdir <SUMS root>
# allow SUMS users to write into SUMS
$ sudo chown netdrms_production:<SUMS users> <SUMS root>
# when a user writes a file into SUMS, make sure that the file's group owner is <SUMS users>
$ sudo chmod g+s <SUMS root>

<SUMS users> is the Linux group that is allowed to write to SUMS. You created it in a previous step. Second, make the SUMS partitions:

$ whoami
netdrms_production
$ hostname
<SUMS partition host>
$ mkdir -p <SUMS root>/partition01
$ mkdir -p <SUMS root>/partition02
...
$ mkdir -p <SUMS root>/partitionN

Initialize the SUMS DB sum_partn_avail table with the names of these partitions. For each SUMS partition run the following:

$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -U <SUMS DB production user> netdrms_sums
netdrms_sums=> INSERT INTO sum_partn_avail (partn_name, total_bytes, avail_bytes, pds_set_num, pds_set_prime) VALUES ('<SUMS partition path>', <avail bytes>, <avail bytes>, 0, 0);

where <SUMS partition path> is the full path of the partition as seen from <SUMS host> (which is where the SUMS daemon will run) and <avail bytes> is some number less than the number of bytes in the directory (multiply the number of blocks in the directory by the number of bytes per block). The number does not matter, as long as it is not bigger than the total number of bytes available. SUMS will adjust this number as needed.

Existing Storage

It might be the case that you would like to have a new NetDRMS installation use existing SUMS partitions. To do this, you would enter the paths to the existing partitions into the sum_partn_avail table. At this point, the SUMS database tables that contain the pointers to SUs (sum_main and sum_partn_alloc) do not contain any references to the existing SUs. Most likely the existing SUMS partitions are part of an existing, old NetDRMS, in which case these references are in that old NetDRMS SUMS database table. If so, then you can manually dump four database tables/sequences from the old NetDRMS to files that can be ingested into the new NetDRMS installation. To do that, as <OLD NetDRMS production user> run the following:

$ whoami
<OLD NetDRMS production user>
$ psql -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> netdrms_sums
netdrms_sums=> COPY public.sum_main TO '/tmp/sum_main_dump.txt' WITH ENCODING 'UTF8';
netdrms_sums=> COPY public.sum_partn_alloc TO '/tmp/sum_partn_alloc_dump.txt' WITH ENCODING 'UTF8';

You will also need to copy two database sequence tables from your existing, old SUMS database. The COPY command does not work for sequences. Instead you will use the pg_dump command provided by the PosgreSQL installation:

$ whoami
<OLD NetDRMS production user>
pg_dump -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> -t public.sum_seq netdrms_sums > /tmp/sum_seq_dump.txt
pg_dump -h <OLD PostgreSQL host> -p <OLD PostgreSQL port> -t public.sum_ds_index_seq netdrms_sums > /tmp/sum_ds_index_seq_dump.txt

The COPY command will save the table data onto a file on the OLD NetDRMS PostgreSQL server. Copy those files to <PostgreSQL host> and then ingest them:

$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 netdrms_sums
netdrms_sums=> COPY public.sum_main FROM '/tmp/sum_main_dump.txt' WITH ENCODING 'UTF8';
netdrms_sums=> COPY public.sum_partn_alloc FROM '/tmp/sum_partn_alloc_dump.txt' WITH ENCODING 'UTF8';

To ingest the sequence data, you will first need to edit /tmp/sum_main_dump.txt and /tmp/sum_partn_alloc_dump.txt. Those files will attempt to create sequences that already exist in your new installation, so you will need to first delete them. Before the first CREATE SEQUENCE statement in each file, add a DROP SEQUENCE command:

-- /tmp/sum_main_dump.txt
...
-- ADD THIS DROP STATEMENT BEFORE THE CREATE STATEMENT
DROP SEQUENCE public.sum_ds_index_seq;
--
CREATE SEQUENCE public.sum_ds_index_seq
...
;
...

and

-- /tmp/sum_partn_alloc_dump.txt
...
-- ADD THIS DROP STATEMENT BEFORE THE CREATE STATEMENT
DROP SEQUENCE public.sum_seq;
--
CREATE SEQUENCE public.sum_seq
...
;
...

After you have edited these two text files, then ingest them with psql -f:

$ whoami
netdrms_production
$ psql -h <PostgreSQL host> -p 5432 -f /tmp/sum_main_dump.txt netdrms_sums
$ psql -h <PostgreSQL host> -p 5432 -f /tmp/sum_partn_alloc_dump.txt netdrms_sums

Creating DRMS User Accounts

For each Linux user, <new DRMS user> who will run installed DRMS modules, or who will write new DRMS modules, you will need to set up their environment, create their DRMS account, and create their .pgpass file so they can run DRMS modules without having to manually authenticate to the database. Just like you did for the netdrms_production, you will need to unset the PYTHONPATH environment variable, and put PostgreSQL executables into the user's $PATH. You will also need to put the netdrms_production's netdrms Python virtual environment into the user's $PATH - each user will use this Python environment, not the base installation. Also set the JSOCROOT, JSOC_MACHINE, JSOC_COMPILER, and JSOC_FCOMPILER environment variables:

# .bashrc

# PostgreSQL executables
export PATH=/usr/pgsql-12/bin:$PATH

# python production virtual environment
unset PYTHONPATH
export PATH=<NetDRMS production home>/.conda/envs/netdrms/bin:$PATH

# NetDRMS binary paths
[ -f /opt/netdrms/drms-env-linux_avx2.bash ] && source /opt/netdrms/drms-env-linux_avx2.bash
PATH=$DRMS_BINS_INSTALL_DIR:$DRMS_SCRS_INSTALL_DIR:$PATH

# set COMPILER to icc for the Intel C++ compiler, and to gcc for the GNU C++ compiler
export JSOC_COMPILER=<C compiler>

# set to ifort for the Intel Fortran compiler, and to gfortran for the GNU Fortran compiler
export JSOC_FCOMPILER=<Fortan compiler>

You will also need to add them to the <SUMS users> group:

$ sudo usermod -a -G <SUMS users> <new DRMS user>

To create a DRMS account, you create a database account for the user, plus you add user-specific rows to various DRMS database tables. The script newdrmsuser.pl exists to facilitate these tasks:

$ whoami
netdrms_production
$ perl $DRMS_SCRS_INSTALL_DIR/newdrmsuser.pl netdrms <PostgreSQL host> 5432 <new DRMS user> <initial password> <new DB user namespace> user 1
Connection to database with 'dbi:Pg:dbname=netdrms;host=drms;port=5432' as user '<new DRMS user>' ... success!
executing db statment ==> CREATE USER <new DRMS user>
executing db statment ==> ALTER USER <new DRMS user> WITH password '<initial password>'
executing db statment ==> GRANT jsoc to <new DRMS user>
running cmd-line ==> masterlists dbuser=<new DRMS user> namespace=<new DB user namespace> nsgrp=user
Please type the password for database user "postgres":
Connected to database 'netdrms' on host '<PostgreSQL host>' and port '5432' as user 'postgres'.
Created new drms_series...
Created new 'drms_keyword'...
Created new 'drms_link'...
Created new 'drms_segment'...
Created new 'drms_session'...
Created new drms_sessionid_seq sequence...
Commiting...
Done.
executing db statment ==> INSERT INTO admin.sessionns VALUES ('<new DRMS user>', '<new DB user namespace>')

where <new DB user namespace> is the PostgreSQL namespace dedicated to the new user. A namespace is a logical container that allows a database user to own database objects, like relations, that have the same name as objects owned by other users - items in a namespace need only be uniquely named within the namespace, not between namespaces. For example, the relation drms_series in the namespace su_arta is not the same relation as the drms_series relation in the su_phil namespace - the relations have the same name, but they are different relations. In virutally all PostgreSQL operations, a user can prefix the name of a relation with the namespace: su_arta.drms_series refers to the first relation, and su_phil.drms_series refers to the second relation.

The purpose of <new DB user namespace> is to hold non-production, private data series - sort of a private user space to develop new DRMS modules to create data. If those data should become a production-level products, then the data and the code that generates the data need to be moved to a production namespace. At the JSOC, we have several such production namespaces (e.g., aia, hmi, mdi). A site creates production namespaces with a different module (masterlists; newdrmsuser.pl is only for creating non-production namespaces.

Please see the NOTE in this page for assistance with choosing <new DB user namespace>. The general naming convention is to prepend the namespace with an abbreviation to identify the site that owns the data in the namespace. For example, all private data created at Stanford reside in dataseries whose namespaces start with su_ (Stanford University), regardless of the affiliation of the user who creates data in this namespace. Data created at NASA Ames start with nas_ (NASA Supercomputing Division). Following the underscore is a string to identify a particular user - su_arta for Art, and su_phil for Phil. You can also specify a group with the suffix (e.g., su_uscsolar for a solar group at the University of Southern California that creates data at Stanford. <initial password> is the initial password for this account - the initial password does not matter much since you are going to have the user change it next.

Running newdrmsuser.pl will create a new DRMS database user that has the same name as the user's Linux account name.

Have the user change their password:

$ whoami
<new DRMS user>
$ psql -h <PostgreSQL host> -p 5432 netdrms
netdrms=> ALTER USER <new DRMS user> WITH PASSWORD '<new password>';
netdrms=> \q
$ 

And then have the user create their .pgpass file (to allow auto-login to their database account) and set permissions to 0600:

$ whoami
<new DRMS user>
$ cd $HOME
$ vi .pgpass
i
<PostgreSQL host>:*:*:<new DRMS user>:<new password>
ESC
:wq
$ chmod 0600 .pgpass

Please click here for additional information on the .pgpass file.

If you plan on creating data that will be publicly distributed, you should also create one or more data-production users. For example, if you plan on making a new public HMI data series, you could create a user named hmi_production. Although you could follow previous steps to create a new Linux account for this database user, you do not necessarily need to. Instead you can use the existing netdrms_production and have it connect as the hmi_production user. To do that, first create the new hmi_production database user by running newdrmsuser.pl as just described. Choose a descriptive namespace that follows the naming guidelines described about, like hmi. Because a .pgpass already exists for netdrms_production, you want to ADD a new line to .pgpass for this user. Continuing with the hmi_production user example, add a password line for hmi_production:

# .pgpass
<PostgreSQL host>:*:*:hmi_production:<hmi_production password>

At this point, it is a good idea to test that the new DRMS user can use a basic NetDRMS program. Although your NetDRMS has no DRMS data series, running show_series is a good way to test various components, like authentication, database connection, etc. Test DRMS by running the show_series command:

$ whoami
<new DRMS user>
$ show_series
$ 

Nothing will be printed when this command is run, since your NetDRMS is devoid of data at the moment, but if you see no errors, then life is good. If not, then contact the JSOC for help troubleshooting.

To perform a more thorough test involving SUMS, you will need to have a least one DRMS data series that has SUMS data. You can obtain such a data series by registering for subscriptions.

Running SUMS Services

Before you can use NetDRMS, you, as netdrms_production on <SUMS host>, will need to start SUMS. To launch one or more SUMS daemons, sumsd.py, use the start-mt-sums.py script:

$ whoami
netdrms_production
$ hostname
<SUMS host>
$ python3 $DRMS_SCRS_INSTALL_DIR/start-mt-sums.py daemon=$DRMS_SCRS_INSTALL_DIR/sumsd.py --ports=[SUMSD_LISTENPORT] --logging-level=debug
running /home/netdrms_production/.conda/envs/netdrms/bin/python3 /opt/netdrms-v10.0-rc2/scripts/sumsd.py --listen-port=6100 --logging-level=debug
started instance /opt/netdrms-v10.0-rc2/scripts/sumsd.py:6100 (pid 821705)
{"started": [821705]}

NOTE: as of this writing, ports MUST be the value of [SUMSD_LISTENPORT]. In future releases, this parameter will be made optional, in which case the value will be obtained from [SUMSD_LISTENPORT] in config.local.

This command starts sumsd.py, which then listens for connections from DRMS modules, such as show_info, on port [[SUMSD_LISTENPORT]]. In the example above, [[SUMSD_LISTENPORT]] is 6102, which is displayed in the output. sumsd.py creates an instances file and a log file in [SUMLOG_BASEDIR] by default. The instances file is a state file used by start-mt-sums.py and stop-mt-sums.py to manage the running sumsd.py instances. By default, the log file is named sumsd-<PPPPP>-<YYYYMMDD.HHMMSS>.txt, where <PPPPP> is the PID of the sumsd.py process, and <YYYYMMDD.HHMMSS> is the time string representing the time the instance was launched.

The complete usage is:

usage: start-mt-sums.py daemon=<path to daemon> [ --ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --logging-level=<critical, error, warning, info, or debug>] [ --log-file=<filename> ] [ --quiet ]

optional arguments:
  -h, --help            show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be spawned
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -l LOGLEVEL, --logging-level LOGLEVEL
                        specifies the amount of logging to perform; in order
                        of increasing verbosity: critical, error, warning,
                        info, debug
  -L <file name>, --log-file <file name>
                        the file to which sumsd logging is written
  -q, --quiet           do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to launch

start-mt-sums.py will fork one or more sumsd.py daemon processes. The ports argument identifies the SUMS host ports on which sumsd.py will listen for client (DRMS module) requests. One sumsd.py process will be invoked per port specified. Each process creates a log-file in [SUMLOG_BASEDIR] named, by default, sumsd-<port>-<YYYYMMDD>.<HHMMSS>.txt. The -L/--logfile argument allows you to override the path and name to this log file.

To stop one or more SUMS services, use the stop-mt-sums.py script:

$ whoami
netdrms_production
$ hostname
<SUMS host>
$ python3 $DRMS_SCRS_INSTALL_DIR/stop-mt-sums.py daemon=$DRMS_SCRS_INSTALL_DIR/sumsd.py

This will stop all running sumsd.py daemons.

The complete usage is:

usage: stop-mt-sums.py [ -h ] daemon=<path to daemon> [ --ports=<listening ports> ] [ --instancesfile=<instances file path> ] [ --quiet ]

optional arguments:
  -h, --help            show this help message and exit
  -p <listening ports>, --ports <listening ports>
                        a comma-separated list of listening-port numbers, one
                        for each instance to be stopped
  -i <instances file path>, --instancesfile <instances file path>
                        the json file which contains a list of all the
                        sumsd.py instances running
  -q, --quiet           do not print any run information

required arguments:
  d <path to daemon>, daemon <path to daemon>
                        path of the sumsd.py daemon to halt

Registering for Subscriptions

A NetDRMS site can optionally register for a data-series subscription to any NetDRMS site that offers subscription service. The JSOC NetDRMS offers subscriptions, but at the time of this writing, no other site does. Once a site registers for a data series subscription, the site will become a mirror for that data series. The subscription process ensures that the mirroring site will receive regular and timely updates made to the data series by the serving site. The subscribing site can configure the interval between updates such that the mirror can synchronize with the server and receive updates within a couple of minutes, keeping the mirror up-to-date in (almost) real time.

To register for a subscription, <Subscription production user> will set up ssh keys (for SU transfer), start daemons, and run the subscription script, subscribe.py. The assumption is that <Subscription production user> is netdrms_production, but you are free to choose a different user. subscribe.py makes subscription requests to the serving site's subscription manager. The process entails the creation of a snapshot of the data-series DRMS database information at the serving site. Those data are downloaded, via HTML, to the subscribing site, where they are ingested by subscribe.py. get_slony_logs.pl, a client-side cron task, updates the data-series snapshot with any server-side changes that have been made since the snapshot was created. Ingestion of the snapshot results in the creation of the DRMS database objects that maintain and store the data series, and get_slony_logs.pl updates those objects when the server makes changes. At this time, no SUMS data files are downloaded. Instead, and optionally, the IDs for the series' SUMS Storage Units (SU) are saved in a database relation. It is the function of Remote SUMS (rsums.py), another NetDRMS daemon, to download the SUs and ingest them into your SUMS.

Remote SUMS accepts requests from DRMS for SUs. It then communicates with the serving NetDRMS and manages the scp download and ingestion of those SUs. Once Remote SUMS is running, should any DRMS code/module request an SU that is not present in the local NetDRMS, DRMS will send a download request to Remote SUMS. The Remote SUMS Client (rsums-clientd.py), an optional NetDRMS daemon, can automate this process so that when new subscription data are ingested into the DRMS database, it submits requests for the associated SUs to Remote SUMS. In this way, it is possible to pre-fetch the SU files before any user requests them. But pre-fetching is optional. The SUs will be downloaded on-demand as described above. In fact, if the subscribing NetDRMS site were to automatically download an SU, then delete the SU (there is a method to do this, described later), then an on-demand download is the only way to re-fetch the deleted SU. On-demand downloads happen automatically; any DRMS module that attempts to access an SU (like with a show_info -p command) that is not present for any reason will trigger an rsumsd.py request. The module will pause until the SU has been downloaded, then automatically resume its operation on the previously missing SU.

As get_slony_logs.pl users scp to download update files (Slony logs) and rsumsd.py uses scp to automatically download SUs, SSH public-private keys must be created at the subscribing site, and the public key must be provided to the serving site. Setting this up requires coordinated work at both the susbscribing and serving sites. As <Subscription production user> on the subscribing site, run

$ whoami
<Subscription production user>
$ ssh-keygen -t rsa

This will allow you to create a passphrase for the key. If you choose to do this, then save this phrase for later steps. In the home directory of <Subscription production user>, ssh-keygen will create a public key named id_rsa.pub. Provide this public key to the serving site.

The serving site must then add the public key to its list of authorized keys. If the .ssh directory does not exist, then the serving site must first create this directory and give it 0700 permissions. If the authorized_keys file in .ssh does not exist, then it must first be created and given 0644 permissions:

$ whoami
<Subscription manager user>
$ mkdir .ssh
$ chmod 0700 .ssh
$ cd .ssh
$ touch authorized_keys
$ chmod 0644 authorized_keys

Once the .ssh and authorized_keys files exist and have the proper permissions, the serving site administrator can then add the client site's public key to its list of authorized keys:

$ whoami
<Subscription manager user>
$ cd $HOME/.ssh
$ cat <remote-site public key file> >> authorized_keys

Back at the client NetDRMS site, if an ssh passphrase was chosen, then as <Subscription production user> start an ssh-agent instance to automate the passphrase authentication. If no passphrase was provided when ssh-keygen was run, this step can be skipped. Otherwise, run:

$ whoami
<Subscription production user>
$ ssh-agent > $HOME/.ssh-agent
$ source $HOME/.ssh-agent # needed for ssh-add, and also for rsumsd.py and get_slony_logs.pl
$ ssh-add $HOME/.ssh/id_rsa

and provide the passphrase. To keep ingested data series synchronized with changes made to it at the serving site, a client-side cron tab runs get_slony_logs.pl periodically. This perl script uses scp to download slony log files - SQL files that insert, delete, or update database relation rows. get_slony_logs.pl communicates with the Slony-I replication software running at the serving site. Slony-I generates these log (SQL) files at the server which are then downloaded by the client.

To register for a subscription to a new series, run:

$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=subscribe series=<published DRMS data series> --loglevel=debug

The complete usage is:

usage: subscribe.py [ -hjpl ] cfg=<client configuration file> reqtype=<subscribe, resubscribe, unsubscribe> series=<comma-separated list of series> [ --archive=<0, 1>] [ --retention=<number of days>] [ --tapegroup=<tape-group number> ] [ --pg_user=<subscripton client DB user> ] [ --logfile=<log-file name> ]

optional arguments:
  -h, --help            show this help message and exit
  archive <series archive flag>, --archive <series archive flag>
                        The tape archive flag for the series - either 0 (do not archive) or 1 (archive).
  retention <series SU disk retention>, --retention <series SU disk retention>
                        The number of days the series SUs remain on disk before becoming subject to deletion.
  tapegroup <series SU tape group>, --tapegroup <series SU tape group>
                        If the archive flag is 1, the number identifying the group of series that share tape files.
  pg_user <series SU tape group>, --pg_user <series SU tape group>
                        The DB account the subscription client uses.
  -p, --pause           Pause and ask for user confirmation before applying the downloaded SQL dump file.
  --loglevel LOGLEVEL   Specifies the amount of logging to perform. In increasing order: critical, error, warning, info, debug
  --logfile <file name>
                        The file to which logging is written.
  --filtersus <SU filter>
                        Specifies a series keyword K and a number of days D, comma separated; a remote-SUMS request for an SU will occur only if the keyword K of the record
                        containing the SU has a value that lies within the time interval determined by the days D.

required arguments:
  cfg <client configuration file>, --config <client configuration file>
                        The client-side configuration file used by the subscription service.
  reqtype <request type>, --reqtype <request type>
                        The type of request (subscribe, resubscribe, or unsubscribe).
  series SERIES, --series SERIES
                        A comma-separated list of DRMS series to subscribe/resubscribe to, or to unsubscribe from.

The debug level logging is not necessary, but if this is your first subscription, we recommend running with lots of debug statement - if there are issues, you can send this output to the JSOC for help.

<subscription config file> contains parameters used by both the subscription client subscribe.py and the program that downloads Slony logs file, get_slony_logs.pl. We recommend making a directory subscription in the home directory of netdrms_production and saving the configuration file in that directory. The parameters to be set in <subscription config file> are as follows:

  • node - the name of the subscription client (e.g., jsoc, nsocu, sdac; must be globally unique across all NetDRMS sites; this string will be used in various state files and in file/directory names; to obtain this name, ask the JSOC for the sitename assigned to your site during the NetDRMS installation process.

  • kRSServer - the full domain name of the subscription log server (e.g., jsocport.stanford.edu for a client subscibing to data series published by the JSOC).

  • kRSUser - the account on kRSServer that will be used for data transfer (e.g., jsocexp for a client subscibing to data series published by the JSOC).

  • kRSPort - the port on kRSServer that will be used for data transfer (e.g., 22 for scp); if the JSOC is the serving site, then the port must be 55000.

  • kRSBaseURL - the base URL for all subscription services provided by subscription server for the Slony cluster (identified by slony_cluster); ask the DRMS site serving the subscriptions for this value - when subscribing to series at the JSOC, use "http://jsoc.stanford.edu/cgi-bin/ajax"

  • pg_host - the client machine that hosts the client PostgreSQL database that will contain the replicated data series - this is <PostgreSQL host>.

  • pg_port - the port on the pg_host machine that will be used for communication with the data-series database - this is 5432.

  • pg_user - the PostgreSQL user that will own the replicated series - this is netdrms_production.

  • pg_dbname - the name of the PostgreSQL database that resides on pg_host - this is netdrms.

  • slony_cluster - the name of the Slony cluster to which this node belongs (e.g., jsoc for a client subscribing to data series published by the JSOC).

  • kLocalLogDir - the client directory that will contain the subscription-process logs; we recommend <NetDRMS production user home>/subscription/log; make sure this path exists.

  • kLocalWorkingDir - the path to a directory for temporary working subscription files; we recommend <NetDRMS production user home>/subscription; make sure this path exists.

  • kSQLIngestionProgram - the path to the script/program that will ingest the site-specific slony logs — usually the path to get_slony_logs.pl (<NetDRMS root>/base/drms/replication/get_slony_logs.pl).

  • kSubService - the URL of the application at the subscription-serving site that accepts new subscription requests (for the JSOC subscripton server this is ${kRSBaseURL}/request-subs.py).

  • kPubListService - the URL of the application at the subscription-serving site that that lists published data series (for the JSOC subscripton server this is ${kRSBaseURL}/publist.py).

  • kSubXfer - the URL of the application at the subscription-serving site where subscription dump files are located (for the JSOC subscripton server this is http://jsoc.stanford.edu/subscription).

  • kDeleteSeriesProgram - the path to the program delete_series, which is used to delete DRMS data series on the client when requested (<NetDRMS root>/bin/<architecture>/delete_series).

  • archive - for new subscriptions, the default data series archive action; set to 1 if the NetDRMS site has a tape archive system AND the default is to archive all series obtained by subscription.

  • retention - for new subscriptions, the default number of days to retain SUs (after this many days SUs are marked for deletion and subjected to garbage collection as needed).

  • tapegroup - for new subscriptions, the default archive tape group for the the series' SUs (ignored if archive == 0); unless you have a tape backup system, use "0"

  • ingestion_path - the local directory that will contain the ingestion "die" file — used by get_slony_logs.pl; we recommend <NetDRMS production user home>/subscription; make sure this path exists.

  • scp_cmd - the absolute path to the client's scp program.

  • ssh_cmd - the absolute path to the client's ssh program.

  • rmt_slony_dir - the absolute path, accessible from the kRSUser account, on the server to the directory that contains the site-specific slony logs (for the JSOC subscription server, use "/data/pgsql/slon_logs/live/site_logs").

  • slony_logs - the client directory that contains the downloaded site-specific slony logs; we recommend <NetDRMS production user home>/subscription/slon_logs; make sure this path exists.

  • PSQL - the path to the client's psql program, and any flags needed to run psql as the pg_user user, like -h pg_host.

  • email_list - the email account to which error messages will be sent.

You may find that a subscription has gotten out of sync, for various reasons, with the serving site's data series (accidental deletion of database rows, for example). You can re-register for the subscription to true-up. The existing DRMS data will be deleted, and replaced with a fresh snapshot. You will be prompted asking if you would like to delete the series' SUMS data (SUs, FITS files, etc.). If you are sure you no longer need them, go ahead and say yes.

python $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=resubscribe series=<subscription series> --loglevel=debug

Finally, there might come a time where you no longer which to hold on to a registration. To remove the subscription from your set of registered data series run: subscribe.py can be used to alleviate this problem. Run the following to re-do the subscription registration:

python $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/subscribe.py cfg=<subscription config file> reqtype=unsubscribe series=<subscription series> --loglevel=debug

NOTE: In this case, the subscribe.py will prompt you to ask if you would like to delete the existing DRMS data series. The answer is usually yes. However, if you need to keep the existing data series snapshot for some reason (e.g., you want to work with the existing data, but simply do not want to ingest new data), then respond no. Keep in mind that you will not be able to register for a subscription to the same series if that series already exists - you will need to run delete_series.

In all the above commands, the logging level is set to debug. It is a good idea to enable verbose logging like this the first time you run one of these commands, just in case an issue occurs (usually do to some configuration issue). Providing the debug log to the JSOC when troubleshooting will be invaluable.

Once you have successfully registered for subscription to at least one series, you will need to install a crontab to run get_slony_logs.pl, which updates the subscription series, on a regular basis. get_slony_logs.pl has a dependency on the Net::SSH perl package. If you are running CentOS, then there is an rpm package available from the epel/x86_64 yum repository that contains this perl package:

$ yum list | grep perl-Net-SSH
perl-Net-SSH.noarch                        0.09-26.el7            epel
$ sudo yum install perl-Net-SSH
...
Installed:
  perl-Net-SSH.noarch 0:0.09-26.el7

Complete!
$

You will also need String::ShellQuote:

$ yum list | grep ShellQuote
perl-String-ShellQuote.noarch              1.04-10.el7            base
$ sudo yum install perl-String-ShellQuote
...
Installed:
  perl-String-ShellQuote.noarch 0:1.04-10.el7

Complete!

If you cannot find a package, then you can use CPAN.

Once you have the Net::SSH and String::ShellQuote Perl packages installed, run get_slony_logs.pl as <Subscription production user>from a cron tab:

$ whoami
<Subscription production user>
$ crontab -e
*/5 * * * * (. ~/.ssh-agent; $DRMS_SRC_INSTALL_DIR/base/drms/replication/get_slony_logs.pl <subscription config file> >> ~/<path>/get_slony_logs.log 2>&1 )

NOTE: you should manually run get_slony_logs.pl the first time since an interactive prompt will be displayed the first time an SSH connection is made to [kRSServer]

At the JSOC, Slony log files are created every minute. It is not the case that your series' subscriptions will have been updated every minute (this depends on the cadence of the series to which you have subscriptions), but checking every minute will minimize the lag between the state of your series and the state of your series at the serving site.

Running Remote SUMS

Your NetDRMS may contain data produced by other, non-local NetDRMSs. Via a variety of means, the local NetDRMS can obtain and ingest the database information for these data series produced non-locally. In order to use the associated data files (typically image files), the local NetDRMS must download the storage units (SUs) associated with these data series too. Remote SUMS, a tool that comes with NetDRMS, downloads SUs as needed - i.e., if a DRMS module or program requests the path to the SU or attempts to read it, and it is not present in the local SUMS yet, Remote SUMS will download the SUs. While the SUs are being downloaded, the initiating module or program will poll waiting for the download to complete.

Several components compose Remote SUMS. On the client side, the local NetDRMS, is a daemon that must be running rsumsd.py. There also must exist some database tables, as well as some binaries used by the daemon. On the server side, all NetDRMS sites that wish to act as a source of SUs for the client, is a CGI (rs.sh). This CGI returns file-server information (hostname, port, user, SU paths, etc.) for the SUs the server has available in response to requests that contain a list of SUNUMs. When the client encounters requests for remote SUs that are not contained in the local SUMS, it sends a request the Remote SUMS daemon to download those SUs. It does so by inserting a row into the <Remote SUMS requests> database table in the [RS_DBNAME] database table. The client code then polls waiting for the request to be serviced. The daemon in turn sends requests to all rs.sh CGIs at all the relevant providing sites. The owning sites return the file-server information to the daemon, and then the daemon downloads the SUs the client has requested, via scp, and notifies the client module once the SUs are available for use. The client module will then exit from its polling code and continue to use the freshly downloaded SUs.

To use Remote SUMS, several config.local parameters must be present. If you followed the steps in this document, then you have already set those parameters. Please see Installing NetDRMS for a description of each one. Each SU that is downloaded has an associated expiration date, a flag indicating whether or not the SU is archived, and if the SU is archived the tapegroup to which the SU belongs. The manner in which the values for these parameters are determined is a bit complicated. When you register for a series subscription, the series is created at your site, and at that time, the values for these parameters are determined. The series is then initialized with these values. There are default value for these parameters that are overridden by optional parameters supplied during the call to subscribe.py. In the following list, the lower-numbered items, if present, override the higher-numbered items, in this order:

  1. the --archive, --retention, --tapegroup command-line arguments to subscribe.py

  2. the archive, retention, tapegroup parameters in subscription config file

Now, when Remote SUMS runs, the values of the parameters for SUs downloaded and ingested are determined in a similar hierarchical fashion, with higher-numbered items overriding lower-numbered items:

  1. the parameters associated with the series, as determined above
  2. the --archive, --expiration, --lifespan, --tapegroup command-line arguments to rsumsd.py

  3. [RS_SU_ARCHIVE], [RS_SU_EXPIRATION], [RS_SU_LIFESPAN], [RS_SU_TAPEGROUP]

To run Remote SUMS, as NetDRMS production user run the following to create the requests database table:

$ whoami
netdrms_production
$ python3 $DRMS_SCRS_INSTALL_DIR/rscreatetabs.py op=create tabs=req

Remote SUMS downloads SUs via scp. As such, you will need to create SSH keys, distribute the public one to the site serving the SUs, and start up an ssh-agent if you have not already done so - you should have already though since it is needed by get_slony_logs.pl, a component of the subscription system (please see [ Registering for Subscriptions ]).

To launch Remote SUMS, as netdrms_production create [RS_LOGDIR] and the directory that will contain the Remote SUMS lock file [RS_LOCKFILE], source the ssh-agent environment file, and then run rsumsd.py in the background:

$ whoami
netdrms_production
$ mkdir -p [RS_LOGDIR]
$ source $HOME/.ssh-agent
$ python $DRMS_SCRS_INSTALL_DIR/rsumsd.py --logging-level=debug &

For now, we recommend setting the log level to debug; when things appear to be running smoothly, then you can restart with the default level (info). The output log named <rslog_YYYYMMDD.txt> will be written to the directory identified by [RS_LOGDIR], so make sure that directory exists before running rsumsd.py.

To stop rsumsd.py, send a SIGINT signal (kill -2 to the process. Remote SUMS will intercept that signal and shut down cleanly. If you need to shut it down with a SIGKILL signal for any reason, then you will need to manually clean up. To do that, delete the lock file ([RS_LOCKFILE]) and delete all requests in the requests database table (i.e., delete all rows in [RS_REQUEST_TABLE]).

Running Remote SUMS Client

If you have at least one subscription, and you have Remote SUMS running (not the JMD), then you can automate the download of SUs for the subscriptions. This is optional, however. If you skip this step, then when a NetDRMS user attempts to use one or more SUs that are part of a subscription (and hence not at your NetDRMS site initially), the SUs will be downloaded in an on-demand fashion. However, it may be desirable to pre-fetch the SUs if certain usage patterns hold. If one or more users are going to use a block of SUs for a two-week period, for example, then downloading hundreds or thousands of SUs one at a time would be very inefficient. Also, it is very common for NetDRMS users to use newly produced SUs soon after they are created at the subscription server. In this case, a good strategy might be to automatically download the latest SUs for all subscriptions, knowing that lots of very recent SUs will be popular.

The Remote SUMS Client, rsums-clientd.py, monitors SU references in incoming Slony logs. It groups SUs into batches, and submits requests containing these batches to Remote SUMS, rsumsd.py. rsums-clientd.py then monitors the progress of these requests, logging the results.

In addition to a running rsumsd.py, rsums-clientd.py requires the existence of three components. The capture table is a database table that contains a list of all SUNUMs that are to be downloaded. It gets automatically populated as Slony logs are ingested. The capture function is a database function that performs the actual work of inserting rows into the capture table. It is called every time a Slony log inserts a row into a subscribed series' series table. You will need to create a capture trigger database trigger for each series table for which you want to automate SU downloads. These components work together as follows:

  • each capture trigger watches a series table, and when a row is inserted (due to get_slony_logs.pl ingesting new Slony logs), the trigger runs the capture function

  • this function then inserts a row into the capture table
  • rsums-clientd.py then sees the newly inserted capture-table row and extracts the SUNUM

  • rsums-clientd.py batches several of these SUNUMs, and then makes a rsumsd.py request out of them

  • rsumsd.py processes the requests (multiple in parallel), downloading one or more SUs and ingesting them into the local SUMS

  • rsumsd.py updates a status column in the capture table

  • rsums-clientd.py reads the status, and upon success, it removes the rows that contained the SUs that were sucessfully ingested; upon failure, rsums-clientd.py will log an error, and can optionally re-try one or more times

IF YOU REGISTERED FOR YOUR FIRST SUBSCRIPTION FROM A >= 9.0 NetDRMS then the capture table and capture function already exist. In addition, a capture trigger has been installed on each of your subscribed series. As netdrms_production, ensure that rsumsd.py is running, and then you can start rsums-clientd.py:

$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py --loglevel=debug &

If you do not have a >= 9.0 NetDRMS OR you were already a subscriber before upgrading to >= 9.0 NetDRMS, then to use rsums-clientd.py, netdrms_production must first create the capture table and the capture function. You must also add the capture trigger to the series table of the DRMS data series for which they want to enable automatic SU downloads. rsums-clientd.py can be run with arguments so that it will print SQL that will create the capture table and function, and the capture triggers on the existing DRMS data series:

$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py [ --setup [ --capturesetup ] [ --seriessetup=<series> ]... ]

--capturesetup creates the capture table and capture function. --seriessetup creates the capture trigger for a single series.

For example, to set up automatic SU downloads for hmi.M_45s and hmi.V_720s for a site that lacks the capture table and capture function, the netdrms_production can run:

$ whoami
netdrms_production
$ python3 <NetDRMS root>/base/drms/replication/subscribe_series/rsums-clientd.py [ --setup [ --capturesetup ] [ --seriessetup=hmi.M_45s --seriessetup=hmi.V_720s ] ]

Once these capture components are in place, you can start rsumsd.py if it is not already running, then start rsums-clientd.py:

$ whoami
netdrms_production
$ python3 $DRMS_SRC_INSTALL_DIR/base/drms/replication/subscribe_series/rsums-clientd.py --loglevel=debug &

To stop rsums-clientd.py, send a SIGINT signal (kill -2 to the process. Remote SUMS Client will intercept that signal and shut down cleanly. If you need to shut it down with a SIGKILL signal for any reason, then you will need to manually clean up. To do that, delete the lock file ([DRMS_LOCK_DIR]/rsums-client.lck).

Installing the DRMS Python Package

The DRMS package is a python client interface to DRMS. By default, it obtains DRMS data series information and SUMS storage units / FITS files from the JSOC DRMS via a CGI API. As such, it can be installed and used without independently of DRMS. However, it can be configured so that it uses your NetDRMS site directly and not the JSOC.

Two interfaces to your NetDRMS are available: a web-based one, and an ssh-based one. The web-based interface requires setting up a web server and several CGI scripts. The ssh-based one requires a python module that is not currently available online. Instead, the module, securedrms, is included as part of the NetDRMS installation. With either interface, a user can run the python package on a machine that is not the NetDRMS host. Since the ssh interface is much simpler to set up, not requiring a web context, the following describes how to set up the ssh interface.

To install the drms python package, first obtain it from GitHub. If git is not yet installed on your system, use yum to install it:

$ sudo yum install git-all
...
Installed:
  git-all.noarch 0:1.8.3.1-21.el7_7
...
Complete!
$ 

As netdrms_production, use git to clone the Sunpy/drms python package and then install the code into the netdrms virtual environment:

$ whoami
netdrms_production
$ git clone https://github.com/sunpy/drms
Cloning into 'drms'...
...
$ cp <NetDRMS root>/base/libs/py/securedrms.py drms/drms
$ which pip
~/.conda/envs/netdrms/bin/pip
$ pip install ./drms
Processing ./drms
...
Successfully installed drms-0.5.7+1.ga9de281

securedrms.py has python dependencies on packages/modules that may not exist. Use conda to install them:

$ whoami
netdrms_production
$ conda install -n netdrms pexpect
Collecting package metadata (current_repodata.json): done
...

This module has a dependency on system ed, so make sure this editor is installed on your system:

$ sudo yum install ed
...
Installed:                                                                                                                                                                                                ed.x86_64 0:1.9-4.el7                                                                                                                                                                                                                                                                                                                                                                                         Complete!

Installing the DRMS Export Web Application

The DRMS Export Web Application is web interface to the export system. It comprises five components

  • a set of HTML web pages that contain forms to collect information from the export-system user; the web pages use JavaScript, HTML elements, and AJAX tools to create HTTP AJAX requests

  • a browsers/network tools that send the form data contained within the AJAX requests to an HTTP server; the browsers/network tools receive HTTP responses, updating the web pages displayed
  • an HTTP server that acts as a reverse proxy: it receives HTTP requests from browsers/network tools on the internet, forwards the contained data as uwsgi requests to an upstream WSGI server, receives uwsgi responses from the WSGI server, and finally sends HTTP responses back to the originating browsers/tools
  • a WSGI server that receives uwsgi requests from the reverse-proxy server, sends corresponding WSGI requests to the Flask web-application entry point, and receives WSGI responses from the Flask web-application entry point, and sends uwsgi responses back to the originating reverse-proxy server
  • a Flask app that services WSGI requests it receives from the WSGI server, and sends WSGI responses back to the WSGI server

HTML Web Pages

The drmsexport web application HTML and JavaScript files are in proj/export/webapps

  1. the static web pages are exportdata.html and export_request_form.html; they contain in-line JavaScript, as well as references to JavaScript contained in separate files

    1. exportdata.html this file contains JavaScript only; it includes a JavaScript script that contains a single string variable that contains a text representation of the export_request_form.html

    2. export_request_form.html this file contains the definitions of the HTML elements that compose the export web page

  2. the export JavaScript files are:

    1. export_request_form.htmlesc this is a version of export_request_form.html that has been converted into a single JavaScript string variable, where whitespace has been removed and characters have been percent-escaped if necessary; exportdata.html 'includes' this file as a JavaScript script

    2. export_email_verify.js this file contains code that makes HTTP requests that access the email-registration system

    3. no_processing.js this file contains a single array JavaScript variable that lists all DRMS data series for which export-processing is prohibited

    4. processing.js this file contains code that makes HTTP requests that cause export processing to occur during the export process

    5. protocols.js this file contains code that makes HTTP requests that gets image-protocol (export file type) parameters

Browser/Network Tool

The export-system root URL is http://solarweb2.stanford.edu:8080/export. Several endpoints are provided to support various export-system requests. The endpoints, which expect arguments to be provided in a single JSON-string object, are (square brackets denote optional arguments, JSON data types are in parentheses):

  1. http://solarweb2.stanford.edu:8080/export/address-registration this endpoint provides access to services that check the registration status of an email address, and register a new email address; arguments:

    1. address (str) the email address to check on/register

    2. [ db-host (str) ] the INTERNAL/PRIVATE database server that hosts the registered export-system user address information

    3. [ db-name (str) ] the name of the database that contains email address and user informaton

    4. [ db-port (number) ] the port on the database host machine accepting connections

    5. [ db-user (str) ] the name of the database user account to use

    6. [ user-name (str) ] the full name of the export-system user

    7. [ user-snail (str) ] the physical address of the export-system user

  2. http://solarweb2.stanford.edu:8080/export/series-server this endpoint provides access to services that provide information about DRMS data series; arguments:

    1. public-db-host (str) the EXTERNAL/PUBLIC database server that hosts the DRMS data-series data

    2. series (array) the set of DRMS data series for which information is to be obtained

    3. [ client-type (str)] the securedrms client type (ssh, http)

    4. [ db-name (str)] the name of the database that contains DRMS data-series information

    5. [ db-port (number) ] the port on the database host machine accepting connections

    6. [ db-user (str)] the name of the database user account to use

  3. http://solarweb2.stanford.edu:8080/export/record-set this endpoint provides access to services that provide keyword, segment, and link informaton about DRMS record sets; arguments:

    1. specification (str) the DRMS record-set specification identifying the records for which information is to be obtained

    2. db-host (str) the database server that hosts the DRMS record-set data

    3. [ parse-only (bool)] if true, then parse record-set string only

    4. [ client-type (str)] the securedrms client type (ssh, http)

    5. keywords (array)] the list of keywords for which information is to be obtained

    6. [ segments (array)] the list of segments for which information is to be obtained

    7. [ links (array)] the list of links for which information is to be obtained

    8. [ db-name (str)] the name of the database that contains DRMS record-set information

    9. [ db-port ] (number) the port on the database host machine accepting connections

    10. [ db-user ] (str) the name of the database user account to use

  4. http://solarweb2.stanford.edu:8080/export/series this endpoint provides access toservices that provide informaton about DRMS data series; arguments:

    1. series (str) the DRMS series for which information is to be obtained

    2. db-host (str) the database server that hosts the DRMS data-series information

    3. [ client-type (str)] the securedrms client type (ssh, http)

    4. [ db-name (str)] the name of the database that contains DRMS data-series information

    5. [ db-port (number)] the port on the database host machine accepting connections

    6. [ db-user (str)] the name of the database user account to use

  5. http://solarweb2.stanford.edu:8080/export/new-premium-request this endpoint provides access to services that export DRMS data-series data; the full suite of export options is available; arguments:

    1. address (str) the email address registered for export

    2. db-host (str) the database server that hosts the DRMS data series

    3. export-arguments (json str) the export-request arguments

    4. [ client-type (str)] the securedrms client type (ssh, http)

    5. [ db-name (str)] the name of the database that contains DRMS data-series information

    6. [ db-port (number)] the port on the database host machine accepting connections

    7. [ requestor (str)] the full name of the export-system user

    8. [ db-user (str)] the name of the database user account to use

  6. http://solarweb2.stanford.edu:8080/export/new-mini-request this endpoint provides access to services that export DRMS data-series data; a reduced suite of export options is available to allow for quicker payload delivery; arguments:

    1. address (str) the email address registered for export

    2. db-host (str) the database server that hosts the DRMS data series

    3. export-arguments (json str) the export-request arguments

    4. [ client-type (str)] the securedrms client type (ssh, http)

    5. [ db-name (str)] the name of the database that contains DRMS data-series information

    6. [ db-port (number)] the port on the database host machine accepting connections

    7. [ requestor (str)] the full name of the export-system user

    8. [ db-user (str)] the name of the database user account to use

  7. http://solarweb2.stanford.edu:8080/export/new-streamed-request this endpoint provides access to services that stream export DRMS data-series data; a reduced suite of export options is available to allow for quicker payload delivery; arguments:

    1. address (str) the email address registered for export

    2. db-host (str) the database server that hosts the DRMS data series

    3. export-arguments (str) the export-request arguments

    4. [ client-type (str)] the securedrms client type (ssh, http)

    5. [ db-name (str)] the name of the database that contains DRMS data-series information

    6. [ db-port (number)] the port on the database host machine accepting connections

    7. [ requestor (str)] the full name of the export-system user

    8. [ db-user (str)] the name of the database user account to use

  8. http://solarweb2.stanford.edu:8080/export/pending-request this endpoint provides access to services that check for the presence of pending requests; arguments:

    1. address (str) the email address registered for export

    2. db-host (str) the database server that hosts the DRMS data series

    3. [ db-name (str)] the name of the database that contains pending-request information

    4. [ db-port (number)] the port on the database host machine accepting connections

    5. [ requestor (str)] the full name of the export-system user

    6. [ db-user (str)] the name of the database user account to use

    7. [ pending_requests_table (str)] the database table of pending requests

    8. [ timeout (number)] after this number of minutes have elapsed, requests are no longer considered pending

  9. http://solarweb2.stanford.edu:8080/export/pending-request-status this endpoint provides access to services that return the export status of a pending request; arguments:

    1. address (str) the email address registered for export

    2. db-host (str) the database server that hosts export-request information

    3. request-id (str) the export system request ID

    4. [ client-type (str)] the securedrms client type (ssh, http)

    5. [ db-name (str)] the name of the database that contains export-request information

    6. [ db-port (number)] the port on the database host machine accepting connections

    7. [ db-user (str)] the name of the database user account to use

    8. [ pending_requests_table (str)] the database table of pending requests

    9. [ timeout (number)] after this number of minutes have elapsed, requests are no longer considered pending

Python Environment

DRMS Export uses the securedrms}} python package, which is included in /base/export/libs/py. If you have not already installed it, do so now. As {{{netdrms_production, cd to <NetDRMS root>/base/export/libs/py and edit securedrms.py to configure the package. Please consult the documentation in the python module in the class SecureServerConfig definition and then edit the arguments to the SecureServerConfig constructor. In particular, you will need to set ssh_remote_user and ssh_remote_host to a linux account and machine that will allow remote access to the NetDRMS binaries (such as show_info, jsoc_info, etc.). You might need to run ssh-keygen on the webserver host and put the public key in the ssh_remote_user@ssh_remote_host:$HOME/.ssh/authorized_keys file so that ssh access will not prompt for a password.

Once you have configured securedrms.py, install it:

$ whoami
netdrms_production
$ conda activate netdrms
$ which pip
<NetDRMS production user home dir>/.conda/envs/netdrms/bin/pip
$ cd <NetDRMS root>/base/export/libs/py
$ pip install -e .

The -e flag allows edits to securedrms.py to take effect without having to re-install it.

You will also need to install flask, flask_restful, webargs, and uwsgi:

$ whoami
netdrms_production
$ conda activate netdrms
$ which pip
<NetDRMS production user home dir>/.conda/envs/netdrms/bin/pip
$ conda install -c conda-forge -n netdrms flask==2.0
$ conda install -c conda-forge -n netdrms flask-restful
$ conda install -c conda-forge -n netdrms webargs
$ conda install -c conda-forge -n netdrms  uwsgi

The async dependency might be needed, depending on future drms-export features.

JsocWiki: DRMSSetup (last edited 2024-01-19 09:08:03 by ArtAmezcua)