(file) Return to whattodo_sum_partn_on_off.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

File: [Development] / JSOC / doc / whattodo_sum_partn_on_off.txt (download)
Revision: 1.1, Wed Jan 9 20:15:42 2013 UTC (9 years, 11 months ago) by production
Branch: MAIN
CVS Tags: Ver_LATEST, Ver_DRMSLATEST, Ver_9-5, Ver_9-41, Ver_9-4, Ver_9-3, Ver_9-2, Ver_9-1, Ver_9-0, Ver_8-8, Ver_8-7, Ver_8-6, Ver_8-5, Ver_8-4, Ver_8-3, Ver_8-2, Ver_8-12, Ver_8-11, Ver_8-10, Ver_8-1, Ver_8-0, NetDRMS_Ver_LATEST, NetDRMS_Ver_9-5, NetDRMS_Ver_9-41, NetDRMS_Ver_9-4, NetDRMS_Ver_9-3, NetDRMS_Ver_9-2, NetDRMS_Ver_9-1, NetDRMS_Ver_9-0, NetDRMS_Ver_8-8, NetDRMS_Ver_8-7, NetDRMS_Ver_8-6, NetDRMS_Ver_8-5, NetDRMS_Ver_8-4, NetDRMS_Ver_8-3, NetDRMS_Ver_8-2, NetDRMS_Ver_8-12, NetDRMS_Ver_8-11, NetDRMS_Ver_8-10, NetDRMS_Ver_8-1, NetDRMS_Ver_8-0, HEAD
initial

		/home/production/cvs/JSOC/doc/whattodo_sum_partn_on_off.txt

If a file server is down, it's /SUM partitions should be taken
offline so that SUMS does not try to allocate storage from them
and then hang on the mkdir that it is unable to do.

Here are the /SUM partitions on each file server:

d02:  /SUM0 thru /SUM20s
d03:  /SUM30s
d04:  /SUM40s

The way to take a /SUM partition offline is to set its 
pds_set_num = -1. This is done in the sum_partn_avail db table
which looks like:

 partn_name |  total_bytes   |  avail_bytes  | pds_set_num | pds_set_prime 
------------+----------------+---------------+-------------+---------------
 /SUM37     | 25000000000000 | 1349566595072 |           0 |             0
 /SUM4      | 33000000000000 | 1600150855680 |           0 |             0
 /SUM2      | 33000000000000 | 1599634276352 |           0 |             0
 /SUM20     | 33000000000000 | 1599828013056 |           0 |             0
 /SUM21     | 33000000000000 | 1600223768576 |           0 |             0
 /SUM22     | 33000000000000 | 1599709511680 |           0 |             0
 /SUM3      | 33000000000000 | 1599469428736 |           0 |             0
[etc.]

Do this as user production on a machine with psql (e.g. n02)
and adjust for which file server (d03) is to be taken offline:

> psql -h hmidb -p 5434 jsoc_sums
jsoc_sums=> select * from sum_partn_avail where partn_name like '/SUM3%';
 partn_name |  total_bytes   |  avail_bytes  | pds_set_num | pds_set_prime 
------------+----------------+---------------+-------------+---------------
 /SUM3      | 33000000000000 | 1599825543168 |           0 |             0
 /SUM30     | 22000000000000 | 1199689433088 |           0 |             0
 /SUM31     | 22000000000000 | 1199430434816 |           0 |             0
 /SUM32     | 22000000000000 |  747345281024 |           0 |             0
 /SUM33     | 22000000000000 |  786056609792 |           0 |             0
 /SUM34     | 25000000000000 | 1349826641920 |           0 |             0
 /SUM35     | 25000000000000 | 1349841321984 |           0 |             0
 /SUM36     | 25000000000000 | 1349845516288 |           0 |             0
 /SUM37     | 25000000000000 | 1349560303616 |           0 |             0
(9 rows)

jsoc_sums=> update sum_partn_avail set pds_set_num=-1 where 
jsoc_sums-> partn_name in ('/SUM30', '/SUM31', '/SUM32', '/SUM33',
jsoc-sums(> '/SUM34', '/SUM35', '/SUM36', '/SUM37');

[Notice that we did not set '/SUM3' which is not on the d03 fileserver.]

jsoc_sums=> \q

Now force all the sum_svc processes to reread this new sum_partn_avail table.

> sumrepartn

When d03 is back on the air, rerun the update command shown above with
pds_set_num=0 and do again:

> sumrepartn



Karen Tian
Powered by
ViewCVS 0.9.4