Diff for "SetupForSlonyData"

Differences between revisions 24 and 25

Becoming a log-shipping receiver for a Slony cluster

This wiki page is intended to instruct remote DRMS users of our exported Slony data clusters. If you're reading this, we assume you want to get a dynamic set of our data and regularly pull updates from us. We assume you have system resources set aside, a postgres database set up, and a copy of the JSOC DRMS code in place. A basic understanding of SSH/SCP will be useful. Pulling Slony logs requires you to have certain trusted privileges on our system via ssh keys, so we want to know who you are. If we don't know you yet, or this is your first time setting up a remote DRMS, please contact either Rick Bogart or Jennifer Spencer at Stanford (see the ["Contacts"] page). BR BR For more information, please read: ["SlonyI"], and [http://jsoc.stanford.edu/netdrms/]. Please note there is more to becoming a remote DRMS than pulling Slony logs. Please read the NetDRMS documentation (link above) for more information if needed.

The following document assumes you have already set up scp/ssh to and from solarport.stanford.edu. If you cannot do so, you will need to contact us before proceeding. To regularly pull Slony logs will require that you have a key on our system and we prefer you use HPN-SSH, as it is more efficient than regular SSH/SCP (it can be configured to encrypts the password and command exchange and not the data stream).

Each Slony cluster represents a discrete set of JSOC data. Pulling logs and data must be done by cluster. It is our intention to have several clusters, each tailored to suit as many users as possible. A Slony cluster is not divisible; we can only replicate our data per cluster, not per user, so you may expect to subscribe to more than one cluster at a time. These instructions are for working with one cluster, and are meant to be extrapolated for use with many clusters.

The official available cluster list:

Cluster name	Namespace.Seriesname
mdi_mag	mdi.fd_m_96m_lev18

Our Slony logs are presently kept at /scr21/jennifer/slony_logs/ under subdirectories by cluster name. The initial JSD and slony1_dump.sh output needed to setup your cluster are found in those subdirectories as well.

Get a Cluster

Go get the JSD file(s) for the cluster.

scp username@solarport.stanford.edu:/scr21/jennifer/slony_logs/mdi_mag/mdi.fd_M_96m_lev18.jsd .

Then if you do not have a database user "slony", you'll need to create one. Check and see if you have one in postgres: {{{psql mydatabase... Welcome to psql 8.x.x, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms

\h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit

jsoc=# \du [to describe users in your database]

List of roles

Role name | Superuser | Create role | Create DB | Connections | Member of

apache | no | no | no | no limit | {jsoc_reader} arta | no | no | yes | no limit | {dsdsdata,jsoc,sdodata} ash | no | no | no | no limit | {}

...

slony | yes | yes | yes | no limit | {}

Will appear as above if you have a user slony in your system. Otherwise, you'll need to add one.

\q}}}

To add the 'slony' user, go into your postgres database as user 'postgres' (this document assumes you have the required password). Then create your slony admin-level user. This is the user that will handle running all your slon processes should you ever change from Slony log-shipping mode to "classic" mode, and you do not need to have a Linux/Unix login for this user. create user slony with admin postgres password 'slonypassword' where 'slonypassword' is a unique password for your system. Then make sure that the password for slony is in user postgres' .pgpass file if you are using md5 security. If user postgres does not have a .pgpass file, you'll need to make one in the postgres home directory. For example:

cat ~/.pgpass
postgreshostname:*:*:slony:slonypassword

Don't forget to make sure the permissions are correct for this file: chmod 600 ~/.pgpass

Check your environment variables:

echo $JSOC_DBHOST
echo $JSOC_DBNAME
echo $JSOC_DBUSER

You should get the correct hostname for the database machine, your local JSOC database name, and the user 'slony'.  If not, set these variables for this session:

$ export JSOC_DBHOST=hostname.stanford.edu
$ export JSOC_DBNAME=myjsoc_db
$ export JSOC_DBUSER=slony

(setenv for you cshell users - the above is for BASH)

You will need to create any namespaces in your database referenced by this cluster. In this example, you will need to create an 'mdi' namespace if you do not have one. BR

If you need a namespace similar to an existing reserved namespace, for example to identify analysis or repackaged products based on data from various sources, we recommend that you create one from your institutional identifier and the source identifier, e.g. sha_gong, cu_mdi, mps_hmi, etc. BR Provenance of data is critical to a successful science mission. We must all work together to maintain the integrity of our data namespaces so that data provenance remains unequivocally clear. The JSOC teams reserves the following namespaces, and explicitly forbids remote DRMS users adding any new data to these spaces; only replicated data received from the JSOC is allowed in a JSOC-reserved namespace. To put it in a more general way, any data series in a DRMS with one of these reserved namespaces must have been generated at or replicated from the owning site. We expect that remote DRMS sites will have a similar policy about their own namespace designations, and we encourage them to document accordingly.

Reserved Namespaces

Please see [:DrmsSeriesNames#namespaces] for a complete list of reserved namespaces.

Create your namespace using the DRMS program masterlists:

masterlists dbuser=slony namespace=mdi nsgrp=user

Edit the header of the JSD as needed to reflect your particular setup. Here are some of the header fields you may need to change.

Owner	May be set to any individual Postgres user on your system, but should be user slony if mirror updates are to be handled by slony processing. N.B.: This field in the JSD file is ignored by the create_series program; the owner of the series will be the user running the program, unless changed by the Postgres administrator.
Retention	Default retention time in days for data segments on SUMS disk cache for records in the series; data segments will be flushed no sooner than this many days after having been staged. Note: this value, and the others relating to SUMS segments and archiving, is irrelevant if there are no data segments associated with the series, i.e. if all the data are in the DRMS database.
Archive	Set to 1 only if you have a tape archiving system for SUMS and wish to archive the data segments in the series
Tapegroup	only applicable for SUMS with tape archive capability; controls which tapes data segments from the series are archived to.

Create the series in the cluster as user slony. That way, database user slony will own the series and they can remain read-only to all other local users.

Use the DRMS create_series module. {{{create_series -f mdi.fd_M_96m_lev18.jsd JSOC_DBUSER=slony

Creating new series 'test1.fd_M_96m_lev18'... NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "fd_m_96m_lev18_pkey" for table "fd_m_96m_lev18" }}}

Now install the slony cluster

Go get the cluster setup data. scp username@solarport.stanford.edu:/scr21/jennifer/slony_logs/mdi_mag/mdi_mag_slony_setup.sql .

Ingest this data in psql

psql -Uslony myjsoc_db -f mdi_mag_slony_setup.sql You should some output from this, but not a whole lot. This script will install two tables and three functions as detailed in SlonyDumpFuncs, and it will copy over the data in the cluster into your new series.

To see if it worked, you can enter postgres and check for new tables and functions: {{{psql myjsoc_db -U slony \dt _mdi_mag. (should return two tables, sl_archive_tracking and sl_sequence_offline)

select count(*) from mdi.fd_m_96m_lev18; should return over 65,000 records. }}}

Get ready to start pulling Slony logs

Make sure you have the get_slony_logs.pl file in your DRMS code JSOC/scripts directory. If not, you may need to update your code. Edit that file's variables to be sure it accurately reflects your environment. BR Designate a directory where your acquired Slony logs will be stored, with a cluster subdirectory, e.g. /home/slony/slony_logs/mdi_mag (where mdi_mag is the cluster). BR The first time you run get_slony_logs.pl, it will create a file called slony_counter.txt (e.g. /home/slony/logs/mdi_mag/slony_counter.txt ). The file contains a number, and that number is the current slony counter for your data set. If you want to double-check the value of the counter.txt file, you can run:

psql -Uslony myjsoc_db -c 'select at_counter from _mdi_mag.sl_archive_tracking;'
This should return a number, e.g. 872 or 100043.  This number is the current log number for this cluster, and should be the number in slony_counter.txt.

We do not recommend editing the slony_counter.txt file. The get_slony_logs.pl script will manage the value in the counter file, and if there is an error, it will double-check with the database for the correct number.

Get your logs with a cron

Edit the crontab and add */1 * * * * (. ~/.ssh-agent_rs;/home/slony/Scripts/get_slony_logs.pl mdi_mag >> /home/slony/Scripts/get_slony_logs.mdi_mag.log 2>&1 )

Please enter any additional tips and help to other users here in this wiki.

-  ⇤ ← Revision 24 as of 2009-08-13 05:16:54 → 
  Size: 10210
  Editor: equinox
  Comment:
+   ← Revision 25 as of 2009-08-13 05:28:20 → ⇥
  Size: 10216
  Editor: equinox
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 84:
- * Edit the header of the JSD as needed to reflect your particular setup.  Here are some of the header fields you may need to change.
+ * '''Edit the header of the JSD as needed to reflect your particular setup.'''  Here are some of the header fields you may need to change.