1 production 1.3 /home/production/cvs/JSOC/doc/whattodolev0.txt 16Oct2008
|
2 production 1.1
3
4 ------------------------------------------------------
5 Running Datacapture & Pipeline Backend lev0 Processing
6 ------------------------------------------------------
7
8
9 NOTE: For now, this is all done from the xim w/s (Jim's office)
10
11 Datacapture:
12 --------------------------
13
14 NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
15 own independent /home/production.
16
17 1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively.
18 If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
19 original machine is renamed dcs2.
20
|
21 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
|
22 production 1.4 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
23 and dcs1, respectively:
|
24 production 1.2
|
25 production 1.4 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
|
26 production 1.2 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
27
|
28 production 1.5 For this to work, this must be done on dcs0, dcs1 and dcs2, as user
29 postgres, after any reboot:
|
30 production 1.2
31 > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
32 > chmod 600 /var/lib/pgsql/ssh-agent.env
33 > source /var/lib/pgsql/ssh-agent.env
34 > ssh-add
35 (The password is written on my whiteboard (same as production's))
36
|
37 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
38
39 3. The Postgres must be running and is started automatically on boot:
40
41 > ps -ef |grep pg
42 postgres 4631 1 0 Mar11 ? 00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
43
44 4. The root of the datacapture tree is /home/production/cvs/JSOC.
45 The producton runs as user id 388.
46
47 5. The sum_svc is normally running:
48
49 > ps -ef |grep sum_svc
50 388 26958 1 0 Jun09 pts/0 00:00:54 sum_svc jsocdc
51
52 Note the SUMS database is jsocdc. This is a separate DB on each dcs.
53
54 6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
55
56 > sum_start_dc
57 sum_start at 2008.06.16_13:32:23
58 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
59 Do you want me to do a sum_stop followed by a sum_start for you (y or n):
60
61 You would normally answer 'y' here.
62
63 7. To run the datacapture gui that will display the data, mark it for archive,
64 optionally extract lev0 and send it on the the pipeline backend, do this:
65
66 > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
67 > ./socdc
68
69 All you would normally do is hit "Start Instances for HMI" or AIA for
70 what datacapture machine you are on.
71
72 8. To optionally extract lev0 do this:
73
74 > touch /usr/local/logs/soc/LEV0FILEON
75
76 To stop lev0:
77
78 > /bin/rm /usr/local/logs/soc/LEV0FILEON
79 production 1.1
80 The last 100 images for each VC are kept in /tmp/jim.
81
82 NOTE: If you turn lev0 on, you are going to be data sensitive and you
83 may see things like this, in which case you have to restart socdc:
84
85 ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
86 kill: no process ID specified
87
88 9. The datacapture machines automatically copies DDS input data to the
89 pipeline backend on /dds/socdc living on d01. This is done by the program:
90
91 > ps -ef |grep soc_pipe_scp
92 388 21529 21479 0 Jun09 pts/0 00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
93
94 This requires that an ssh-agent be running. If you reboot a dcs machine do:
95
96 > ssh-agent | head -2 > /tmp/ssh-agent.env
97 > chmod 600 /tmp/ssh-agent.env
98 > source /tmp/ssh-agent.env
99 > ssh-add
100 production 1.1 (The password is written on my whiteboard)
101
102 NOTE: cron jobs use this /tmp/ssh-agent.env file
103
104 If you want another window to use the ssh-agent that is already running do:
105 > source /tmp/ssh-agent.env
106
107 NOTE: on any one machine for user production there s/b just one ssh-agent
108 running.
109
110
111 If you see that a dcs has asked for a password, the ssh-agent has failed.
112 You can probably find an error msg on d01 like 'invalid user production'.
113 You should exit the socdc. Make sure there is no soc_pipe_scp still running.
114 Restart the socdc.
115
116 If you find that there is a hostname for production that is not in the
117 /home/production/.ssh/authorized_keys file then do this on the host that
118 you want to add:
119
120 Pick up the entry in /home/production/.ssh/id_rsa.pub
121 production 1.1 and put it in this file on the host that you want to have access to
122 (make sure that it's all one line):
123
124 /home/production/.ssh/authorized_keys
125
126 NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
127 authorized_keys with the new public key you just generated.
128
129 If not already active, then do what's shown above for the ssh-agent.
130
131
132 10. There should be a cron job running that will archive to the T50 tapes.
133 Note the names are asymmetric for dcs0 and dcs1.
134
135 30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
136
137 00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
138
139 11. There should be running the t50view program to display/control the
140 tape operations.
141
142 production 1.1 > t50view -i jsocdc
143
144 The -i means interactive mode, which will allow you to change tapes.
145
146 12. Every 2 days, inspect the t50 display for the button on the top row
147 called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
148 exported from the T50 and new tapes put in for further archiving.
149 Hit the 'Imp/Exp' button.
150 Follow explicitly all the directions.
151 The blank L4 tapes are in the tape room in the computer room.
152
153 13. Other background info is in:
154
155 http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
156
157
158
159 Level 0 Backend:
160 --------------------------
161
162 1. As mentioned above, the datacapture machines automatically copies DDS input
163 production 1.1 data to the pipeline backend on /dds/socdc living on d01.
164
165 2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
166 which has d01:/dds mounted. cl1n001 can be accessed through j1.
167
168 3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
169 /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
170
171 If you want to start afresh, kill any ingest_lev0 running (will later be
172 automated). Then do:
173
174 > cd /home/production/cvs/JSOC/proj/lev0/apps
175 > start_lev0.pl
176
177 You will see 4 instances started and the log file names can be seen.
178 You will be advised that to cleanly stop the lev0 processing, run:
179
180 > stop_lev0.pl
181
|
182 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
183 where they can stop cleanly.
184
185 For now, every hour, the ingest_lev0 processes are automatically restarted.
|
186 production 1.1
187
188 4. The output is for the series:
189
190 hmi.tlmd
191 hmi.lev0d
192 aia.tlmd
193 aia.lev0d
194
195 #It is all save in DRMS and archived.
196 Only the tlmd is archived. (see below if you want to change the
197 archiving status of a dataseries)
198
199 5. If something in the backend goes down such that you can't run
200 ingest_lev0, then you may want to start this cron job that will
201 periodically clean out the /dds/socdc dir of the files that are
202 coming in from the datacapture systems.
203
204 > crontab -l
205 # DO NOT EDIT THIS FILE - edit the master and reinstall.
206 # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
|
207 production 1.5 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.4 2008/10/16 18:33:06 production Exp $)
|
208 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
209
210 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
211
212 Starting and stoping SUMS on d02:
213
214 Login as production on d02
215 sum_start_d02
216
217 (if sums is already running it will ask you if you want to halt it.
218 you normally say 'y'.)
219
220 sum_stop_d02
221 if you just want to stop sums.
222
223 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
224
225 SUMS archiving:
226
227 Currently SUM is archiving continuously. The script is:
228
229 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
230
231 To halt it do:
232
233 touch /usr/local/logs/tapearc/TAPEARC_ABORT
234
235 Try to keep it running, as there is still much to be archived.
236
237 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
238
239 Change archiving status of a dataseries:
240
241 > psql -h hmidb jsoc
242
243 jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
244 UPDATE 1
245 jsoc=> \q
246
|