Diff for "SlonyI" - JSOC

Differences between revisions 15 and 16

Slony-1 Documentation

A Bit about Slony-1

Slony-1 is a database replication engine used for Postgres databases. The JSOC will be running Slony-1 to handle remote DRMS, and possibly for its web-based data querying.

There are two different ways to implement Slony-1: [:Slony1Classic:"Classic"] and [:Slony1_LogShip:"Log-shipping"] (note these are not technical terms and are used for purposes of this explanation).

JSOC has decided to use Slony-1 in "log shipping" mode. We will have to set up one classic mode cluster to create the logs, but it will contain two nodes only. Slony-1 is designed to function with a minimum of two nodes and a defined cluster between them. It will not function with only one node. There is a rough sketch below outlining our plan.

Recipient only, or Log Shipper too? You decide.

To become a recipient of our log-shipping Slony-1 is far less work than creating your own logs to ship to the JSOC. If you wish to regularly ship your data to the JSOC, you will have to set up Slony-1 classic on two instances of Postgres, define a cluster, define sets, and generally have an understanding of how Slony-1 works. To use logs that you have received, one need only apply them to your Postgres instance using "psql -f" in sequence (more on that below).

To become a remote DRMS, first decide if you wish to be only a passive receiver or also a log shipper. JSOC will be a log-shipper. To generate logs, you must have already installed Slony-1 fully, and designate one node a master node. Then you must create a slave node to receive the master's replications, as a regular Slony-1 node (above). The words master and slave are used here only to give an idea of which node is changing data. Theoretically, you could have two nodes that each change different sets of data and both generate logs to ship, but this is not the model JSOC is adopting. JSOC will designate a master and a slave.

If a remote system (a "replica" or "remote DRMS") desires only to read (not write) these logs, they must contact us, and two basic scripts must be run.

The first script, slony1_dump.sh, is run here on the JSOC database "slave" node. Input includes the data clusters requested by the remote DRMS site, and the slave node's database name. It must be run as a Postgres administrative user. Remote users will need the assistance of the database administrator here, who will run slony1_dump.sh. Slony1_dump.sh outputs SQL commands to standard out, but can be redirected to a file (e.g. input_to_newRemoteDRMS.sql). The DBA at JSOC will arrange delivery or posting of the SQL file for remote DRMS site pickup. The SQL output of slony1_dump.sh, input_to_newRemoteDRMS.sql, must be run against your target replica database as: psql -d yourdbname -h yourhostname -f input_to_newRemoteDRMS.sql -U slonyadmin_user. The SQL script tries to install two very basic tables and three functions per cluster (each cluster creates a new namespace), and all the data in your cluster:

{{{_jsoc_example.sl_archive_tracking

Column | Type | Modifiers

at_counter | bigint | at_created | timestamp without time zone | at_applied | timestamp without time zone |

_jsoc_example.sl_sequence_offline

Column | Type | Modifiers

seq_id | integer | not null seq_relname | name | not null seq_nspname | name | not null List of functions
- Schema | Name | Result data type | Argument data types

_jsoc_example | archivetracking_offline | bigint | bigint, timestamp without time zone _jsoc_example | finishtableaftercopy | integer | integer _jsoc_example | sequencesetvalue_offline | integer | integer, bigint}}}

The language PLPGSQL must be installed on your target database for these tables & functions to be properly installed. After the tables and functions are created, input_to_newRemoteDRMS.sql loads the full data available for the cluster (via the "copy in" command). It updates the sl_archive_tracking table to reflect the number of the last log at the time of the data output, and waits for the user to begin loading logs. The Slony-1 log files must be scp'd over from JSOC's file server (method still TBD), and a short looping script employed to apply them. That's where the next script comes in, a variant of the following:

{{{#!/bin/sh -f #Change this to reflect your cluster name, log file location, host, port number, and your user and db name!!! SLONYCLUSTER=jsoc_example ARCHIVEDIR=/surge/slony/jsoc_slave_logs REPLICATIONUSER=postgres PGDATABASE=jsoc PGPORT=5444 PGHOST=myhost.stanford.edu #End your changes FILEPREFIX=slony1_log_2_ NEXTQUERY="select at_counter+1 from \"_${SLONYCLUSTER}\".sl_archive_tracking;" nextseq=psql -d ${PGDATABASE} -p ${PGPORT} -U ${REPLICATIONUSER} -h ${PGHOST} -A -t -c "${NEXTQUERY}" #last_counter=400000 while true; do #while [ $nextseq -lt $last_counter ]; do #Could do fewer selects/psqls I suppose, but better safe than sorry for the time being

NEXTQUERY="select at_counter+1 from \"_${SLONYCLUSTER}\".sl_archive_tracking;"
nextseq=psql -d ${PGDATABASE} -p ${PGPORT} -U ${REPLICATIONUSER} -h ${PGHOST} -A -t -c "${NEXTQUERY}" filespec=printf "slony1_log_2_"%020d".sql" $nextseq echo 'Looking for next file: ' ${filespec} if [ -f $ARCHIVEDIR/$filespec ]
- then
  - echo "Found it! Applying it to the log slave database..." echo $nextseq echo $last_counter psql -d ${PGDATABASE} -p ${PGPORT} -U ${REPLICATIONUSER} -h ${PGHOST} -f ${ARCHIVEDIR}/${filespec}
  else
  - echo "Not found. Going to sleep for a bit while we wait for the file to arrive..." sleep 10;
  fi

done}}}

If a remote replica/DRMS wishes to ship new changes back to the JSOC, they would have to install a traditional Slony-1 setup with a master and log-shipping slave node, and then only ship new data back to the JSOC (e.g. not something the JSOC is already sending as a log to the remote DRMS site). There can only be one origin node for any specific namespace and table (series).

In a traditional Slony-1 setup, changes are replicated across nodes very quickly, at least once per minute. In log-shipping, a new log file can be generated once per minute, or more or less often. As we define the clusters of the JSOC, we will define the frequency of replication log output as well. The size of the log file for a small database with no new activity is ~800 bytes. A log file directing the insertion of 100,000 records into a table with five columns was 9 Mb. The speed of log shipping replication is determined by how often remote users download their logs, the amount of activity in the data tables in the cluster, and network traffic.

A sample of the log file listings for a quiescent sample database: {{{-bash-3.00$ /bin/ls -l | head -10 total 275812 -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:40 slony1_log_2_00000000000000000001.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:40 slony1_log_2_00000000000000000002.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:40 slony1_log_2_00000000000000000003.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:40 slony1_log_2_00000000000000000004.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:41 slony1_log_2_00000000000000000005.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:41 slony1_log_2_00000000000000000006.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:41 slony1_log_2_00000000000000000007.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:41 slony1_log_2_00000000000000000008.sql -rw-rw-r-- 1 postgres postgres 698 Jul 24 17:41 slony1_log_2_00000000000000000009.sql}}}

The following figure is a rough sketch of our present implementation plan for Slony-1 (as of September 2008). attachment:slony_map1.jpg

Difference Between a Slony-1 Replica and a Warm Standby Server

1. Scope: per-table vs whole database

A Warm Standby preserves all user account settings, all database meta-data, and other system information. Schema changes are automatically included. A replica will not incorporate any user information or schema changes (e.g. "create series" and "delete series") unless specifically directed to do so.

2. Availability: available vs standby

A replica is available for use even if it is reading in logs. A standby is exactly that: standing by. A "feature" of Postgres is that while it is reading in transaction logs (different from Slony-1 logs), it is unavailable to all users. It considers the database in "recovery" or "standby" mode while it is reading transaction logs. To keep the standby up-to-date, it must constantly read any available transaction logs. Technically, there may be breaks where it is available because it's finished reading the current log, but don't count on it.

Replication Lag Requirement

From Phil's notes:

1. Access to quicklook data:

For Ops at LMSAL should be with lag of few minutes from real time. For NOAA, JSC, Air Force, etc., can be c. 5-10 minutes probably.

2. Clone for web access:

This can have a lag of 10s of minutes, maybe hour or more. Except for quicklook images movies of certain series. I wonder if we prepare some data for general purpose wide access via web and move that to a web server that does not have the full database access. That could be done in parallel with a full access clone of drms that has a lag and is used for searches of archived data for export.

3. Remote replicas:

Locations of Remote Replicas

0. LM

1. MSSL

2. Germany

3. SAO

4. India

Test Tasks

1. Log shipping

Please see above description "A Bit about Slony-1"

2. What happens when DB server goes down and when DB server reboots

If the master DB is down, a different server with a different node would have to become the new master. This is a well-defined task in the Slony-1 literature. During a re-boot, the node would be unavailable, and no logs would be created. Slony-1 is a separate demon, so it will try and try and try until it can able to communicate with the required node. Otherwise it will gracefully fail to communicate.

3. How to add tables to replication set while slon daemon is running?

Slony-1 will not allow users to directly add to an existing replication set while it is running. Instead, one must create a new cluster, add the table(s) in question, and then merge that new cluster with the existing one. Additionally, Slony-1 will not automatically create any tables or propagate schema changes. Those must be handled outside of Slony-1 or it will not be able to process data pertaining to the tables/changes in question. Scripts are being written at the JSOC to handle these changes for Slony-1 automatically. These scripts would be run after a user has decided that this data set needs to be included in a replication set, not just as a new series in JSOC.

4. Replica DB maintenance? e.g., rebuild index, vacuum, etc.

Replica DBs should require maintenance regularly. The changes being made to replica databases are done via SQL commands, just as they are on the original master host. If we clean up the master host, there is no way for the replica to do the same (those clean-up commands are outside the purview of Slony-1 programming). Vacuuming is done by Slony-1 at every log-ingest, at least, but for major maintenance, it is recommended that all replicas consider their own maintenance needs and plan accordingly. Maintenance will cannot be propagated remotely.

5. How to copy table schema and role information

Table schema information will be handled as above, in point #3. Role information, user information, and namespace information will require additional small scripts similar to those written for step #3, above.

6. Is there a way to subscribe to part of a series?

No. You must subscribe to the entire series. The quanta of Slony-1 is the database table. A series is one database table. Theoretically, a replica could create their own script to parse the Slony-1 log before it was ingested, and remove "irrelevant" entries. This is not advised as it could perhaps cause referential integrity issues. Instead, it may be better to delete "irrelevant" data from the series more directly via scripts run on Postgres. Either way, removal of data is up to the replica site and will not be supported by the JSOC.

Procedure to Become a Replica

1. Decide whether you wish to simply receive logs, or whether you wish to receive logs and send logs back to the JSOC or another replica. The former is easier for you, but could be limiting. If you want to decide later, and just start with receiving logs, that's fine. You will not lock yourself out of log-shipping later.

2. You will need to understand basic SQL, basic shell scripting, and you will need to have Postgres v8.3 or higher installed. It is recommended that you have some basic understanding of how Postgres works, how to log into it, and how to check its health.

3. Figure out what data sets you would like to receive and whether you have the space to do so. Make a list of those data sets, by namespace.series_name. Soon, the JSOC team will be able to respond to your request and make a slon demon for your job. If you are already a replica site receiving Slony-1 logs from JSOC, then you need to check with the person who is managing that process first.

4. More To Be Determined as we at JSOC progress with implementing Slony-1 log shipping.

-  ⇤ ← Revision 15 as of 2008-09-04 07:20:17 → 
  Size: 13655
  Editor: tucano
  Comment:
+   ← Revision 16 as of 2008-09-04 07:24:42 → ⇥
  Size: 13970
  Editor: tucano
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 16:
-If a remote system (a "replica" or "remote DRMS") desires only to read (not write) these logs, they have only to run two basic scripts.
+If a remote system (a "replica" or "remote DRMS") desires only to read (not write) these logs, they must contact us, and two basic scripts must be run.
 Line 18:
-The first script, slony1_dump.sh, is run on the JSOC database "slave" node based on the data clusters requested by the remote DRMS site.  Slony1_dump.sh outputs SQL commands to standard out, but can be redirected to a file (e.g. input_to_newRemoteDRMS.sql).  The SQL output of slony1_dump.sh, input_to_newRemoteDRMS.sql, must be run against your target replica database as: psql -d yourdbname -h yourhostname -f input_to_newRemoteDRMS.sql -U slonyadmin_user.  The SQL script tries to install two very basic tables and three functions per cluster (each cluster creates a new namespace), and all the data in your cluster:
+The first script, slony1_dump.sh, is run here on the JSOC database "slave" node.  Input includes the data clusters requested by the remote DRMS site, and the slave node's database name.  It must be run as a Postgres administrative user.  Remote users will need the assistance of the database administrator here, who will run slony1_dump.sh.  Slony1_dump.sh outputs SQL commands to standard out, but can be redirected to a file (e.g. input_to_newRemoteDRMS.sql).  The DBA at JSOC will arrange delivery or posting of the SQL file for remote DRMS site pickup.  The SQL output of slony1_dump.sh, input_to_newRemoteDRMS.sql, must be run against your target replica database as: psql -d yourdbname -h yourhostname -f input_to_newRemoteDRMS.sql -U slonyadmin_user.  The SQL script tries to install two very basic tables and three functions per cluster (each cluster creates a new namespace), and all the data in your cluster: