Notes
1. Backup
2. archive_command
3. file-system-level backup
4. Restore
5. restore_command
6. Example
6.1 Example dcs0
7. pg_dump
NOTE: This assumes that we're not using any tablespaces. NOTE: Where does slony fit into this?
1. BACKUP    (Top)
We will do an On-line backup and point-in-time recovery (PITR) for our Postgres Data Base.(http://www.postgresql.org/docs/8.1/interactive/backup-online.html)
This consists of combining a file-system-level backup with backup of the Write Ahead Log (WAL) files. If recovery is needed, we restore the file backup and then replay from the backed-up WAL files to bring the backup up to current time. So we first have to set up a procedure for archiving the WAL files. The system physically divides the WAL sequence into WAL segment files, which are 16MB apiece, and located in /c/pgsql/pg_xlog. We must set up a shell command to archive completed WAL segment files. (NOTE: The /c/pgsql/pg_xlog files are also backed up to tape as part of the normal nightly incremental backup procedure.) The shell command to use is specified by the archive_command configuration parameter, which in practice will always be placed in the postgresql.conf file (/d/pgsql/data/postgresql.conf). In this string, any %p is replaced by the absolute path of the file to archive, while any %f is replaced by the file name only. Write %% if you need to embed an actual % character in the command. The archive command will be executed under the ownership of the same user that the PostgreSQL server is running as. It is important that the archive command return zero exit status if and only if it succeeded. Upon getting a zero result, PostgreSQL will assume that the WAL segment file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds. The archive command should generally be designed to refuse to overwrite any pre-existing archive file. (NOTE: We do not currently backup partial WAL segments.) If you are concerned about being able to recover right up to the current instant, you may want to take additional steps to ensure that the current, partially-filled WAL segment is also copied someplace. This is particularly important if your server generates only little WAL traffic (or has slack periods where it does so), since it could take a long time before a WAL segment file is completely filled and ready to archive. One possible way to handle this is to set up a cron job that periodically (once a minute, perhaps) identifies the current WAL segment file and saves it someplace safe. Then the combination of the archived WAL segments and the saved current segment will be enough to ensure you can always restore to within a minute of current time. NOTE: Seperate backups must be made of manually edited configuration files: /d/pgsql/data/postgresql.conf /d/pgsql/data/pg_hba.conf /d/pgsql/data/pg_ident.conf This is automatically done in "2. tar the data dir:" below. But, if they are subsequently changed, they should be saved.
2. archive_command    (Top)
**************************************************************************** Here is our archive_command found in /d/pgsql/data/postgresql.conf: archive_command = 'test ! -f /var/lib/pgsql/backups/wal_arch/%f.gz && gzip -1 -c %p > /var/lib/pgsql/backups/wal_arch/%f.gz' ****************************************************************************
3. file-system-level backup    (Top)
Now that we have archiving of the WAL segment files, we must make a
file-system-level backup. Here's what to do:
1. Login as the postgres user (su - postgres):
> psql -h hmidb jsoc
jsoc=# SELECT pg_start_backup('2006_05_30_JSOC');
2. tar the data dir:
cd /d/pgsql; gtar -czf /d/pgsql/backups/base/backup_2006_05_30.tgz data
NOTE: Some backup tools that you might wish to use emit warnings or errors if
the files they are trying to copy change while the copy proceeds. This
situation is normal, and not an error, when taking a base backup of an active
database.
NOTE: The normal nightly incremental backup to tape will copy /d/pgsql/backups
files, so the .tgz file and the archived WAL files will be on tape in case
of catastrophic disk failures.
NOTE: A seperate backup tape for off-site storage will be made whenever
the tar of the data dir in done.
Copy this tar file to another machine for extra safety, e.g.:
(you might have to source the ssh-agent info first)
source /tmp/pgaginfo
scp -p /d/pgsql/backups/base/backup_2006_05_30.tgz jim@xim:/scr111/postgres
3. Connect to the database as a superuser and do:
> psql -h hmidb jsoc
jsoc=# SELECT pg_stop_backup();
To make use of this backup, you will need to keep around all the WAL segment
files generated during and after the file system backup. To aid you in doing
this, the pg_stop_backup function creates a backup history file that is
immediately stored into the WAL archive area. This file is named after the
first WAL segment file that you need to have to make use of the backup. For
example, if the starting WAL file is 0000000100001234000055CD the backup
history file will be named something like
0000000100001234000055CD.007C9330.backup. (The second number in the file name
stands for an exact position within the WAL file, and can ordinarily be
ignored.) Once you have safely archived the file system backup and the WAL
segment files used during the backup (as specified in the backup history
file), all archived WAL segments with names numerically less are no longer
needed to recover the file system backup and may be deleted. However, you
should consider keeping several backup sets to be absolutely certain that you
can recover your data. Keep in mind that only completed WAL segment files are
archived, so there will be delay between running pg_stop_backup and the
archiving of all WAL segment files needed to make the file system backup
consistent.
It's also worth noting that the pg_start_backup function makes a file named
backup_label in the database cluster directory, which is then removed again by
pg_stop_backup. This file will of course be archived as a part of your backup
dump file. The backup label file includes the label string you gave to
pg_start_backup, as well as the time at which pg_start_backup was run, and the
name of the starting WAL file. In case of confusion it will therefore be
possible to look inside a backup dump file and determine exactly which backup
session the dump file came from.
4. Restore    (Top)
1. su - postgres 2. Stop the postmaster, if it's running. -bash-3.00$ kill `head -1 /var/lib/pgsql/data/postmaster.pid` If this doesn't work do this as root: /sbin/service postgresql stop ^D 3. Copy the data directory and pg_xlog dir to a temporary location in case you need them later: cp -rp /d/pgsql/data /c/pgsql/data_tmp cp -rp /c/pgsql/pg_xlog /c/pgsql/data_tmp NOTE: If /c does not have enough space then copy off to another machine with adequate space. If this really starts to get large, then at least copy /c/pgsql/pg_xlog, as it may contain logs which were not archived before the system went down. 4. Clean out all existing files and subdirectories under the data directory: /bin/rm -rf /d/pgsql/data 5. Restore the database files from your backup dump: cd /d/pgsql; gtar -xzf /d/pgsql/backups/base/backup_2006_05_30.tgz data 6. If you had unarchived WAL segment files in /c/pgsql/data_tmp/pg_xlog, copy them zipped to /var/lib/pgsql/backups/wal_arch/%f.gz. 7. Create a recovery command file recovery.conf in /d/pgsql/data containing the restore_command as follows: (You may also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery has worked.)restore_command    (Top)
**************************************************************************** restore_command = 'test -f /var/lib/pgsql/backups/wal_arch/%f.gz && gunzip -1 -c /var/lib/pgsql/backups/wal_arch/%f.gz > %p' **************************************************************************** 8. Start the postmaster. The postmaster will go into recovery mode and proceed to read through the archived WAL files it needs. Upon completion of the recovery process, the postmaster will rename recovery.conf to recovery.done. #Do as root: #/sbin/service postgresql start #^D As user postgres: postmaster -D /var/lib/pgsql/data > & /tmp/postmaster.log & 9. Let in the users by restoring pg_hba.conf to normal. NOTE: If you were running a sum_svc, you must now stop and restart it so that it may connect to the new instance of the DB.
6. Example    (Top)
(See below for example for Data Capture systems dcs0-dcs2)
Start the Postgres (if not already running) as user postgres and save
a baseline tar file of the data directory:
hmidb:/var/lib/pgsql> postmaster -D /var/lib/pgsql/data > & /tmp/postmaster.log&
hmidb:/var/lib/pgsql> psql -h hmidb jsoc
joc=# SELECT pg_start_backup('2006_09_01_JSOC');
pg_start_backup
-----------------
2C/EEEFB8C0
(1 row)
jsoc=# \q
hmidb:/var/lib/pgsql> cd /d/pgsql
hmidb:/d/pgsql> gtar -czf /d/pgsql/backups/base/backup_2006_09_01_JSOC.tgz data
hmidb:/d/pgsql> psql -h hmidb jsoc
jsoc=# SELECT pg_stop_backup();
pg_stop_backup
----------------
2C/EEEFB910
(1 row)
jsoc=# \q
hmidb:/d/pgsql>
###########################################################################
To later restore:
Do as root:
[postgres@hmidb pgsql]# /sbin/service postgresql stop
Stopping postgresql service:
postgres@hmidb pgsql]# exit
hmidb:/var/lib/pgsql> whoami
postgres
Save current data & logs:
hmidb:/var/lib/pgsql> cp -rp /d/pgsql/data /c/pgsql/data_tmp
hmidb:/var/lib/pgsql> cp -rp /c/pgsql/pg_xlog /c/pgsql/data_tmp
Clean out existing file:
hmidb:/var/lib/pgsql> /bin/rm -rf /d/pgsql/data
Restore the baseline data:
hmidb:/var/lib/pgsql> cd /d/pgsql
hmidb:/d/pgsql> gtar -xzf /d/pgsql/backups/base/backup_2006_09_01_JSOC.tgz data
Zip any WAL segment not archived yet, e.g.:
> cd /c/pgsql/data_tmp/pg_xlog
> gzip -1 -c 000000010000002C000000EE > /var/lib/pgsql/backups/wal_arch/000000010000002C000000EE.gz
Make sure you have the restore command in /d/pgsql/data/recovery.conf:
restore_command = 'test -f /var/lib/pgsql/backups/wal_arch/%f.gz && gunzip
-1 -c /var/lib/pgsql/backups/wal_arch/%f.gz > %p'
Start the Postgres:
hmidb:/d/pgsql> postmaster -D /var/lib/pgsql/data > & /tmp/postmaster.log &
6.1 Example dcs0-2    (Top)
This is done on each Data Capture system in turn: dcs0, dcs1, dcs2.
Start the Postgres (if not already running) as user postgres and save
a baseline tar file of the data directory:
dcs0:/var/lib/pgsql> postmaster -D /var/lib/pgsql/data > & /tmp/postmaster.log&
dcs0:/var/lib/pgsql> psql jsocdc
jocdc=# SELECT pg_start_backup('2008_01_31_JSOCDC_dcs0');
pg_start_backup
-----------------
0/CAAB5DC0
(1 row)
jsocdc=# \q
dcs0:/var/lib//pgsql> gtar -czf /var/lib/pgsql/backups/base/backup_2008_01_31_JSOCDC_dcs0.tgz data
(NOTE: if get an error like this, it is ok)
gtar: data/base/16384/16385: file changed as we read it
dcs0:/var/lib/pgsql> psql jsocdc
jsocdc=# SELECT pg_stop_backup();
pg_stop_backup
----------------
0/CAB06F28
(1 row)
jsoc=# \q
dcs0:/var/lib/pgsql>
NOTE: Also do for dcs1 and dcs2
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Also do below according to which dcs is the current backup,
e.g. if dcs1 is doing HMI and dcs2 is doing AIA, then dcs0 is the
current backup, and so you would do like this:
Make sure that this cron job is running on dcs1 which
will rsync the dcs1 database backups every 15 minutes onto dcs0:
0,15,30,45 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs0.pl
And likwise on dcs2:
0,15,30,45 * * * * /var/lib/pgsql/rsync_pg_dcs2_to_dcs0.pl
Also make sure that this cron job is running on dcs1 which
will rsync the dcs1 /var/lib/pgsql/backups to
hmidb:/var/lib/pgsql/backups/dcs1
00 6,18 * * * /var/lib/pgsql/rsync_pg_dcs1_to_hmidb.pl
And likewise on dcs2:
00 6,18 * * * /var/lib/pgsql/rsync_pg_dcs2_to_hmidb.pl
NOTE: If dcs1 or dcs2 is rebooted, please update as appropriate
/var/lib/pgsql/rsync_pg_dcs1_to_dcs0.source and
/var/lib/pgsql/ssh_rsync_pg_dcs1_to_hmidb.source to set appropriately
SSH_AGENT_PID=11970
SSH_AUTH_SOCK=/tmp/ssh-BbZpg11969/agent.11969
where the values are obtained by doing:
> eval `ssh-agent`
> ssh-add ~/.ssh/id_rsa
and then echo $SSH_AUTH_SOCK and $SSH_AGENT_PID to get the
correct values to put into /var/lib/pgsql/rsync_pg_dcs1_to_dcs0.source
and /var/lib/pgsql/ssh_rsync_pg_dcs1_to_hmidb.source.
###########################################################################
To later restore:
Do as root:
dcs0 lib/pgsql# whoami
root
dcs0 lib/pgsql# /sbin/service postgresql stop
Shutting down PostgreSQLpostmaster stopped
dcs0 lib/pgsql# exit
dcs0:/var/lib/pgsql> whoami
postgres
Save current data & logs:
dsc0:/var/lib/pgsql> cp -rp /var/lib/pgsql/data /var/lib/pgsql/data_tmp
dsc0:/var/lib/pgsql> cp -rp /var/lib/pgsql/data/pg_xlog /var/lib/pgsql/data_tmp
Clean out existing file:
dsc0:/var/lib/pgsql> /bin/rm -rf /var/lib/pgsql/data
Restore the baseline data:
dcs0:/var/lib/pgsql> gtar -xzf /var/lib/pgsql/backups/base/backup_2008_01_31_JSOCDC_dcs0.tgz data
Zip any WAL segment not archived yet, e.g.:
> cd /var/lib/pgsql/data_tmp/pg_xlog
> gzip -1 -c 000000010000002C000000EE > /var/lib/pgsql/backups/wal_arch/000000010000002C000000EE.gz
Make sure you have the restore command in /var/lib/pgsql/data/recovery.conf:
restore_command = 'test -f /var/lib/pgsql/backups/wal_arch/%f.gz && gunzip
-1 -c /var/lib/pgsql/backups/wal_arch/%f.gz > %p'
Start the Postgres:
dcs0:/var/lib/pgsql> postmaster -D /var/lib/pgsql/data &
7. pg_dump    (Top)
In the beginning, the DB is save with pg_dump so that if lost entirely, all the tables and indexes and permissions, etc. can be restored. postgres@dcs1:~> pwd /var/lib/pgsql postgres@dcs1:~> whoami postgres postgres@dcs1:~> pg_dump --create jsocdc > pg_dump.create.jsocdc In order to restore, first drop the database, then: put the script, pg_dump.create.jsocdc, into psql as user postgres. Also do a --clean pg_dump: postgres@dcs1:~> pg_dump --clean jsocdc > pg_dump.clean.jsocdc
Last modified @ 09:40 Tue Mar 18 2008