Becoming a log-shipping receiver for a Slony cluster

Our Slony logs are presently kept on our solarport machine at /data/pgsql/slon_logs under subdirectories by site subscriber name. If you wish to receive our data, you'll need to become a subscriber.

This wiki page is intended to instruct remote DRMS users about how to receive and subscribe to our exported Slony data. If you're reading this, we assume you want to get a dynamic set of our data and regularly pull updates from us. We assume you have system resources set aside, a postgres database set up, and a copy of the JSOC DRMS code in place. A basic understanding of SSH/SCP will be useful. Pulling Slony logs requires you to have certain trusted privileges on our system via ssh keys, so we want to know who you are. If we don't know you yet, or this is your first time setting up a remote DRMS, please contact either Rick Bogart or Jennifer Spencer at Stanford (see the Contacts page).

Provenance of data is critical to a successful science mission. We must all work together to maintain the integrity of our data namespaces so that data provenance remains unequivocally clear. Namespace is another word for "schema" inside a Postgres database. and our subscription scripts will create requested namespaces in your database when you replicate from us. A popular SDO-reserved namespace is "mdi", for example. If you need to use a namespace similar to an existing reserved namespace, for example to identify analysis or repackaged products based on data from various sources, we recommend that you create one from your institutional identifier and the source identifier, e.g. nso_mdi, sao_mdi, sha_gong, cu_mdi, mps_hmi, etc. The JSOC teams reserves the following namespaces, and explicitly forbids remote DRMS users adding any new data to these spaces; only replicated data received from the JSOC is allowed in a JSOC-reserved namespace. To put it in a more general way, any data series in a DRMS with one of these reserved namespaces must have been generated at or replicated from the owning site. We expect that remote DRMS sites will have a similar policy about their own namespace designations, and we encourage them to document accordingly. Please see DrmsSeriesNames#namespaces for a complete list of reserved namespaces.

For more information about Slony, please read: SlonyI, and http://jsoc.stanford.edu/netdrms/, and http://www.slony.info. Please note there is more to becoming a remote DRMS than pulling Slony logs. Please read the NetDRMS documentation (link below) for more information as needed. For more on setting up a remote SUMS, please see: http://jsoc.stanford.edu/netdrms/rmtsums.html. To learn more about SSH keys, please see SSHKeyNotes.

The following document assumes you have already successfully set up and tested scp/ssh to and from solarport.stanford.edu. If you cannot do so, you will need to contact us before proceeding. To regularly pull Slony logs will require that you have a key on our system and we recommend that you use HPN-SSH, as it is more efficient than regular SSH/SCP (it can be configured to encrypts the password and command exchange and not the data stream).


Subscribe Series

To receive different data series from the JSOC, you'll need to run the script 'subscribe_series'. Below are all the prerequisites and instructions to do that. Obviously, you'll need the subscription code; downloading it is covered in the installation instructions below.

The subscribe_series component of NetDRMS, run with a single configuration file, allows for a subscriber to create or modify its subscription with the webdb host. Webdb is a general name for a machine here at the Stanford SDO group. That machine is a Slony replica of our primary database, and webdb also serves public requests for export data via our web interface (http://jsoc.stanford.edu/ajax/lookdata.html). Slony logs from the webdb replica are available in real time for pickup on the main SDO portal "solarport".

Architecture

Subscribe_series goes through the following steps:

  1. Extracts variables from the subscriber-modified configuration file (sample is named subscribe_series.cfg.sample).
  2. Check to see if the to-be-replicated schema and database exists on the subscriber's database
  3. Stops the log ingestion script, if there is one running.
  4. Creates a trigger file to alert SDO's webdb box of a subscription request.
  5. Checks to see if the schema and tables already exist on the subscriber.
    1. Appends result to trigger file
  6. Copies trigger file to the SDO webdb machine and awaits a response
  7. When a return trigger file on the SDO webdb box is found, it secure-copies over the sql file generated by the subscription manager on the SDO webdb box.
  8. Executes the sql file on the subscriber and get a success or failure return value
  9. Creates a trigger file on the SDO webdb box containing a success or failure on sql application and waits for final notification of subscription completion.
  10. Finds subscription completion trigger file on SDO webdb and restarts (or initially starts) ingestion script on the subscriber

Pre-requisites

The subscriber needs the following:

Installation

  1. Check out $JSOC_ROOT/base from cvs
  2. Edit the subscribe_series config file:
    • a) cd to $JSOC_ROOT/base/drms/scripts/subscribe_series/etc

      b) copy subscribe_series.cfg.sample to <site_name>.subscribe_series.cfg, where <site_name> is your lower-case site name
      c) Edit the new <site_name>.subscribe_series.cfg file as follows:

      • set pg_host to either localhost or the I.P. Address of your local database server
      • set pg_port to the port of your local db (default = 5432)
      • set pg_user to slony
      • set pg_dbname to the local database name (e.g. nso_drms)
      • set pg_schema to public
      • set ingestion_path to the full path/directory where the get_slony_logs.pl script lives (i.e. $JSOC_ROOT/base/drms/scripts)
      • set node to the site_name (the same site_name used in re-naming the sample.subscribe_series.cfg.sample file e.g. nso)

      d) Edit the new <site_name>.subscribe_list.cfg file as follows:

      • add a list of entries for the series names to subscribe to. The entry should be in two parts. First will be <schema>.<series> and second will be either “subscribe” or “unsubscribe”. NOTE: you can add multiple series names (one per line)

      • Example: su_arta.lev0 subscribe
  3. Create the following directories, where <local_storage> is the full path of top directory where you can store arriving slony logs:

    • <local_storage>/log

    • <local_storage>/slon_logs

    • <local_storage>/slon_logs/<site_name> using the same site name as in the node parameter in the <site_name>.subscribe_series.cfg file

  4. Execute subscribe_series
    • cd $JSOC_ROOT/base/drms/scripts/subscribe_series
    • ./subscribe_series ../etc/<site_name>.subscribe_series.cfg ../etc/<site_name>.subscribe_list.cfg <ssh_agent_rs file> (Note: on most systems the ssh_agent_rs file is likely ~/.ssh-agent_rs)

    • validate that no errors occurred (via output text at the command line)
  5. Setup get_slony_logs.pl
    • a) setup the get_slony_logs.pl hard-coded values -- Make sure you do NOT leave any spaces between your quotes.
      • cd $JSOC_ROOT/base/drms/scripts
      • Edit get_slony_logs.pl as follows (in the “#### CONFIGURATION variables” section of the script):
      • set my $rmt_hostname="solarport.stanford.edu";
      • set my $rmt_port="55000"; (Note rmt_port is the port for SCP, not for database, and HPN-SCP is running on SDO port 55000)
      • set my $user="jsocexp";
      • set my $rmt_slony_dir="/data/pgsql/slon_logs/site_logs" (SDO log pickup site);
      • set my $slony_logs="<local_storage>/slon_logs"; (your local log storage area, using the full path)

      • set my $PSQL="/usr/bin/psql -Uslony -p 5432 <dbname>"; where <dbname> is the name of your local database, and -p indicates your postgres port.

      • set my $email_list='<names>'; where <names> is a list of email addresses to be alerted in case of errors (note: this can be set to empty single quotes if no alerts are desired)

      b) create the slony_counter.txt file
      • run this query in your local db (as user slony or another superuser):
        • "select at_counter from _jsoc.sl_archive_tracking ; " --you should get an integer value back (e.g. 756)
        • create the needed slony_counter.txt file as follows:
          • echo “<ID>” > $JSOC_ROOT/base/drms/scripts/slon_logs/<site_name> where <ID> = the at_counter value from the above query (i.e. 756) and the same site name as in the node parameter in the <site_name>.subscribe_series.cfg file

      c) verify that a manual run of get_slony_logs.pl works; execute get_slony_logs.pl at command line
      d) setup the get_slony_logs.pl cron entry. For example:

*/1 * * * * (. ~/.ssh-agent_rs;/home/slony/Scripts/get_slony_logs.pl myconfigfile.cfg >> /home/slony/Scripts/get_slony_logs.jsoc.log 2>&1 )

Initialization / Configuration

1) The subscribe_series script requires a configuration file, and a subscribe list file :

{{{# Subscribe series file name <Subscription_node_id>.subscribe_series.cfg # Required connection information for communicatoin between localhost and the subscription server # webdb_ip # target IP address or hostname (e.g. solarport) webdb_ip 192.168.0.88 # webdb_dir # target dir webdb_dir /stage/dev/triggers/ # webdb_user # user to connect to the target ip webdb_user postgres # pg_host # host of the subsribers database pg_host localhost # pg_port # port of the subscribers database pg_port 5432 # pg_user # user of the subscribers database pg_user slony # pg_dbname # name of the subscribers database pg_dbname postgres # pg_schema # name of the schema the data will live in pg_schema public #These three configuration settings are related to the setup of the specific data series' "jsd" series definition. # archive (archive to tape when retention is up? 0=no, 1=yes -- Set to 1 only if you have a tape archiving system for SUMS and wish to archive the data segments in the series) archive 0 # retention (time in days your SUMS data files live on your server -- see below) # Default retention time in days for data segments on SUMS disk cache for records in the series; data segments will be flushed no sooner than this many days after having been staged. Note: this value, and the others relating to SUMS segments and archiving, is irrelevant if there are no data segments associated with the series, i.e. if all the data are in the DRMS database. retention 21 # tapegroup (only applicable for SUMS with tape archive capability; controls which tapes data segments from the series are archived to. Leave this line as is if you don't have a tape group set up.)# tapegroup 1 # attempts # number of attempts the script checks webdb_ip for notifications. This number may need to be very large if the data set you are waiting to subscribe to is very large. Each attempt is 5 seconds apart. attempts 50 # ingestion_path # the path to put the ingestion die file to stop the ingestion script. ingestion_path /home/production/subscribe_series/bin/ #node # node name or id node node1 }}}

{{{#schemaname.seriesname subscribe/unsubscribe (if this is a new site, assumes these are "subscribe")

public.admin subscribe }}}

Subscription_manager

Subscription_manager is a daemon that runs on SDO's webdb machine and constantly checks for certain trigger files, and executes other scripts depending on those files. It will execute three other scripts in the subscription process, subscription_update, sql_gen, and subscription_cleanup.

Architecture

Subscription_manager loops continuously looking for two types of trigger files.

  1. Upon startup:
    • subscription_manager creates a subscription_manager.pid file with its process ID. It will compare the proccess ID in subscription_manager.pid file to its current process id. The pid file is meant to avoid two subscription_managers running at once, or to ensure that it is still running using a cron watchdog.
    • The program loops itself every 10 seconds.
    • At the start of every loop, the program looks for a subscription_manager.die file.
    • A .die file indicates that the subscription_manager should exit immediately and not run.
    • Subscription_manager also creates a general log file in a ../log dir parellel to its run location.
  2. Finds a <node name>.subscription_series.tgr file

    • a) Creates new configuration file ($subupdatefile) after extracting the following variables:
      • $node
      • $archive
      • $retention
      • $tapegroup
      b) Checks to see if the $node already exists or is a new subscriber.
      • Sets $new_site to either true or false
      • Writes “new_node (true/false)” to the trigger file for future extraction.

      c) Executes ./subscription_update $node $new_site $archive $retention $tapegroup $subupdatefile &

  3. Find a <node name>.subscription_series.sqlapplied file

    • a) Creates new configuration file ($appliedfile) after extracting the following variables:
      • $node
      • $success (reffers to whether or not the subscriber applied sql files successfully)
      • $new_site

      b) Executes ./subscription_cleanup $node $new_site $success $appliedfile &

Subscription_update

Subscription_update is executed by subscription_manager when an initial subscription request is found

  1. Executed by ./subscription_update $node $new_site $archive $retention $tapegroup $subupdatefile &

  2. Creates a directory named <node name>.new in the subscribers directory.

    • a) If new_site = true, it copies the current directory <node> to <node>.new b) If new_site = false, just creates a new, empty directory <node>.new

  3. If new_site is true, adds all subscriptions in the $subupdate file to the new usertables file
  4. If new_site is false, does two things
    • a) Removes subscriptions with the tag “unsubscribe” from the user table list b) Adds all subscriptions with the tag “subscribe” to the end of the user tables list
      • Checks to see if the subscription already exists, if so do nothing.
  5. Add the $node.new directory entry to the slony_parser.cfg file
  6. Executes ./sql_gen $node $new_site $archive $retention $tapegroup $input_file &

sql_gen

Sql_gen is executed by subscription_update after it has completed updating the subscription tables.

  1. Executed by ./sql_gen $node $new_site $archive $retention $tapegroup $input_file &

  2. Removes any “unsubscribe” requests from the subscription list file.
  3. If there are no NEW subscriptions, then skips steps 4, 5, and 6.
  4. For each <schema> requested, executes ./createns ns=<schema> nsgroup=user dbusr=<pg user> >> <trigger dir>/$<node>.createns.sql

  5. For each <schema>.<series> executes ./createtabstructure in=$<schema>.<table> out=$<schema>.<table> archive=<archive> retention=<retention> tapegroup=<tapegroup> owner=slony >> <trigger dir>/<node>.subscribe_series.sql

  6. Executes ./sdo_slony1_dump.sh <pg dbname> jsoc <pg port> <new site (1=true, 0=false)> <all tables requested> >> <triggerdir/$node.subscribe_series.sql 2> /dev/null

  7. Creates trigger file for subscribers to retrieve sql file

Subscription_cleanup

Subscription_cleanup is executed by subscription_update

  1. Executed by ./subscription_cleanup $node $new_site $success $appliedfile &

  2. Removes the $node.new list from the slony_parser.cfg file
  3. Depending on success of sql application on the subscriber:
    • a) If $success is true and $new_site is true
      • Adds new subscription listing to slony_parser.cfg
      • Moves directory and usertable list file from $node.new to just $node
      b) If $success is true and $new_site is false
      • Replaces old directory and usertable list with new ones
      c) If $success is false and $new_site is true
      • Removes new directory and usertable list file, subscription completely removed.
      d) If $success is false and $new_site is false
      • Removes new Directory and usertable list file, previous subscription state remains.

Pre-requisites

Requires slony installed and shipping logs.

Installation

  1. Place scripts in the desired bin directory
  2. Create the following directories up one level:
    • a) etc b) log c) triggers
  3. Determine locations of the subscribers and tables directories for input into config files

Initialization / Configuration

All scripts on webdb requires this single configuration file:

  1. ../etc/subscription_manager.cfg

{{{# Directory the trigger files appear in from the subscriber triggerdir=/solarport/pgsql/slon_logs/triggers/ # Directory the subscribers directories exist subscribers_dir=/solarport/pgsql/slon_logs/site_logs/ # File the slon parser uses as input parser_config=/solarport/pgsql/slon_logs/etc/slon_parser.cfg # Directory the table files exist tables_dir=/solarport/pgsql/slon_logs/etc/ # port of the database pg_port=5432 # host of the database pg_host=localhost # user of the database pg_user=slony # name of the database pg_dbname=jsoc }}}


From here down requires edit merging... Below should be superceded by subscribe_series programs and documentation above.

Make sure you have the get_slony_logs.pl file in your DRMS code scripts directory. If not, you may need to update your code.

#### CONFIGURATION variables
$Net::SCP::scp="/usr/bin/scp";                           ## path to fast scp instance (HPN-SCP).
my $rmt_hostname="solarport.stanford.edu";               ## remote machine name
my $rmt_port="55000";                                    ## port to fast scp server in remote machine
my $user="igor";                                         ## user login in remote machine
my $rmt_slony_dir="/data/pgsql/slon_logs/site_logs";     ## directory in remote machine where slony logs get staged (e.g. on solarport)
my $slony_logs="/home/slony/logs";                       ## local slony cluster directory area; wherever you the remote subscriber choose to put these logs on arrival
my $PSQL="/usr/bin/psql -Uslony (-h myhostserver if needed) myjsoc_db";               ## psql path
my $email_list='igor@igorhost.edu';                          ## list of people to email in case some problem occurs.
#### END of CONFIGURATION variables
###########################################################################

designate a directory where your acquired Slony logs will be stored, with a cluster subdirectory, e.g. /home/slony/slony_logs/mdi_mag (where mdi_mag is the cluster). These logs are not huge files, but there will be a lot of them over time. Slony generates logs every two minutes at least; a log file that documents no change in the system is typically ~700 bytes. But even that reported every two minutes for weeks will add up, so choose a good spot for these logs and keep an eye on the growth. Another script, slony_logs_housekeeping.pl, is available to tar.gz your log files in a given directory, one tar.gz file per 5000 logs (and the file interval is settable). You may wish to cron the housekeeping script to ensure the logs aren't eating up space too quickly, and delete or archive eventually.

This script assumes you've sourced your .ssh-agent file and will not work properly without doing that first. Typical errors include 'permission denied' or request of a passphrase or password.

The first time you run get_slony_logs.pl, it will create a file called slony_counter.txt (e.g. /home/slony/logs/mdi_mag/slony_counter.txt ). The file contains a number, and that number is the current slony counter for your data set. If you want to double-check the value of the counter.txt file, you can run:

psql -Uslony myjsoc_db -c 'select at_counter from _mdi_mag.sl_archive_tracking;'
This should return a number, e.g. 872 or 100043.  This number is the current log number for this cluster, and should be the number in slony_counter.txt.  

We do not recommend editing the slony_counter.txt file. The get_slony_logs.pl script will manage the value in the counter file, and if there is an error, it will double-check with the database for the correct number.

Edit the crontab and add

*/1 * * * * (. /home/slony/subscribe_series/run_get_slony_logs.sh ~/.ssh-agent_rs >> /home/slony/subscribe_series/log/get_slony_logs.log 2>&1 &)

Please enter any additional tips and help to other users here in this wiki.

JsocWiki: SetupForSlonyData (last edited 2013-05-01 04:35:24 by localhost)