(file) Return to whattodo_sum_partn_on_off.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

 1 production 1.1 		/home/production/cvs/JSOC/doc/whattodo_sum_partn_on_off.txt
 2                
 3                If a file server is down, it's /SUM partitions should be taken
 4                offline so that SUMS does not try to allocate storage from them
 5                and then hang on the mkdir that it is unable to do.
 6                
 7                Here are the /SUM partitions on each file server:
 8                
 9                d02:  /SUM0 thru /SUM20s
10                d03:  /SUM30s
11                d04:  /SUM40s
12                
13                The way to take a /SUM partition offline is to set its 
14                pds_set_num = -1. This is done in the sum_partn_avail db table
15                which looks like:
16                
17                 partn_name |  total_bytes   |  avail_bytes  | pds_set_num | pds_set_prime 
18                ------------+----------------+---------------+-------------+---------------
19                 /SUM37     | 25000000000000 | 1349566595072 |           0 |             0
20                 /SUM4      | 33000000000000 | 1600150855680 |           0 |             0
21                 /SUM2      | 33000000000000 | 1599634276352 |           0 |             0
22 production 1.1  /SUM20     | 33000000000000 | 1599828013056 |           0 |             0
23                 /SUM21     | 33000000000000 | 1600223768576 |           0 |             0
24                 /SUM22     | 33000000000000 | 1599709511680 |           0 |             0
25                 /SUM3      | 33000000000000 | 1599469428736 |           0 |             0
26                [etc.]
27                
28                Do this as user production on a machine with psql (e.g. n02)
29                and adjust for which file server (d03) is to be taken offline:
30                
31                > psql -h hmidb -p 5434 jsoc_sums
32                jsoc_sums=> select * from sum_partn_avail where partn_name like '/SUM3%';
33                 partn_name |  total_bytes   |  avail_bytes  | pds_set_num | pds_set_prime 
34                ------------+----------------+---------------+-------------+---------------
35                 /SUM3      | 33000000000000 | 1599825543168 |           0 |             0
36                 /SUM30     | 22000000000000 | 1199689433088 |           0 |             0
37                 /SUM31     | 22000000000000 | 1199430434816 |           0 |             0
38                 /SUM32     | 22000000000000 |  747345281024 |           0 |             0
39                 /SUM33     | 22000000000000 |  786056609792 |           0 |             0
40                 /SUM34     | 25000000000000 | 1349826641920 |           0 |             0
41                 /SUM35     | 25000000000000 | 1349841321984 |           0 |             0
42                 /SUM36     | 25000000000000 | 1349845516288 |           0 |             0
43 production 1.1  /SUM37     | 25000000000000 | 1349560303616 |           0 |             0
44                (9 rows)
45                
46                jsoc_sums=> update sum_partn_avail set pds_set_num=-1 where 
47                jsoc_sums-> partn_name in ('/SUM30', '/SUM31', '/SUM32', '/SUM33',
48                jsoc-sums(> '/SUM34', '/SUM35', '/SUM36', '/SUM37');
49                
50                [Notice that we did not set '/SUM3' which is not on the d03 fileserver.]
51                
52                jsoc_sums=> \q
53                
54                Now force all the sum_svc processes to reread this new sum_partn_avail table.
55                
56                > sumrepartn
57                
58                When d03 is back on the air, rerun the update command shown above with
59                pds_set_num=0 and do again:
60                
61                > sumrepartn
62                
63                

Karen Tian
Powered by
ViewCVS 0.9.4