1 production 1.3 /home/production/cvs/JSOC/doc/whattodolev0.txt 16Oct2008
|
2 production 1.1
3
4 ------------------------------------------------------
5 Running Datacapture & Pipeline Backend lev0 Processing
6 ------------------------------------------------------
7
8
9 NOTE: For now, this is all done from the xim w/s (Jim's office)
10
11 Datacapture:
12 --------------------------
13
14 NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
15 own independent /home/production.
16
17 1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively.
18 If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
19 original machine is renamed dcs2.
20
|
21 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
|
22 production 1.4 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
23 and dcs1, respectively:
|
24 production 1.2
|
25 production 1.4 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
|
26 production 1.2 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
27
28 For this to work, this must be done on dcs2 after any reboot:
29
30 > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
31 > chmod 600 /var/lib/pgsql/ssh-agent.env
32 > source /var/lib/pgsql/ssh-agent.env
33 > ssh-add
34 (The password is written on my whiteboard (same as production's))
35
|
36 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
37
38 3. The Postgres must be running and is started automatically on boot:
39
40 > ps -ef |grep pg
41 postgres 4631 1 0 Mar11 ? 00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
42
43 4. The root of the datacapture tree is /home/production/cvs/JSOC.
44 The producton runs as user id 388.
45
46 5. The sum_svc is normally running:
47
48 > ps -ef |grep sum_svc
49 388 26958 1 0 Jun09 pts/0 00:00:54 sum_svc jsocdc
50
51 Note the SUMS database is jsocdc. This is a separate DB on each dcs.
52
53 6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
54
55 > sum_start_dc
56 sum_start at 2008.06.16_13:32:23
57 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
58 Do you want me to do a sum_stop followed by a sum_start for you (y or n):
59
60 You would normally answer 'y' here.
61
62 7. To run the datacapture gui that will display the data, mark it for archive,
63 optionally extract lev0 and send it on the the pipeline backend, do this:
64
65 > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
66 > ./socdc
67
68 All you would normally do is hit "Start Instances for HMI" or AIA for
69 what datacapture machine you are on.
70
71 8. To optionally extract lev0 do this:
72
73 > touch /usr/local/logs/soc/LEV0FILEON
74
75 To stop lev0:
76
77 > /bin/rm /usr/local/logs/soc/LEV0FILEON
78 production 1.1
79 The last 100 images for each VC are kept in /tmp/jim.
80
81 NOTE: If you turn lev0 on, you are going to be data sensitive and you
82 may see things like this, in which case you have to restart socdc:
83
84 ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
85 kill: no process ID specified
86
87 9. The datacapture machines automatically copies DDS input data to the
88 pipeline backend on /dds/socdc living on d01. This is done by the program:
89
90 > ps -ef |grep soc_pipe_scp
91 388 21529 21479 0 Jun09 pts/0 00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
92
93 This requires that an ssh-agent be running. If you reboot a dcs machine do:
94
95 > ssh-agent | head -2 > /tmp/ssh-agent.env
96 > chmod 600 /tmp/ssh-agent.env
97 > source /tmp/ssh-agent.env
98 > ssh-add
99 production 1.1 (The password is written on my whiteboard)
100
101 NOTE: cron jobs use this /tmp/ssh-agent.env file
102
103 If you want another window to use the ssh-agent that is already running do:
104 > source /tmp/ssh-agent.env
105
106 NOTE: on any one machine for user production there s/b just one ssh-agent
107 running.
108
109
110 If you see that a dcs has asked for a password, the ssh-agent has failed.
111 You can probably find an error msg on d01 like 'invalid user production'.
112 You should exit the socdc. Make sure there is no soc_pipe_scp still running.
113 Restart the socdc.
114
115 If you find that there is a hostname for production that is not in the
116 /home/production/.ssh/authorized_keys file then do this on the host that
117 you want to add:
118
119 Pick up the entry in /home/production/.ssh/id_rsa.pub
120 production 1.1 and put it in this file on the host that you want to have access to
121 (make sure that it's all one line):
122
123 /home/production/.ssh/authorized_keys
124
125 NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
126 authorized_keys with the new public key you just generated.
127
128 If not already active, then do what's shown above for the ssh-agent.
129
130
131 10. There should be a cron job running that will archive to the T50 tapes.
132 Note the names are asymmetric for dcs0 and dcs1.
133
134 30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
135
136 00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
137
138 11. There should be running the t50view program to display/control the
139 tape operations.
140
141 production 1.1 > t50view -i jsocdc
142
143 The -i means interactive mode, which will allow you to change tapes.
144
145 12. Every 2 days, inspect the t50 display for the button on the top row
146 called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
147 exported from the T50 and new tapes put in for further archiving.
148 Hit the 'Imp/Exp' button.
149 Follow explicitly all the directions.
150 The blank L4 tapes are in the tape room in the computer room.
151
152 13. Other background info is in:
153
154 http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
155
156
157
158 Level 0 Backend:
159 --------------------------
160
161 1. As mentioned above, the datacapture machines automatically copies DDS input
162 production 1.1 data to the pipeline backend on /dds/socdc living on d01.
163
164 2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
165 which has d01:/dds mounted. cl1n001 can be accessed through j1.
166
167 3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
168 /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
169
170 If you want to start afresh, kill any ingest_lev0 running (will later be
171 automated). Then do:
172
173 > cd /home/production/cvs/JSOC/proj/lev0/apps
174 > start_lev0.pl
175
176 You will see 4 instances started and the log file names can be seen.
177 You will be advised that to cleanly stop the lev0 processing, run:
178
179 > stop_lev0.pl
180
|
181 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
182 where they can stop cleanly.
183
184 For now, every hour, the ingest_lev0 processes are automatically restarted.
|
185 production 1.1
186
187 4. The output is for the series:
188
189 hmi.tlmd
190 hmi.lev0d
191 aia.tlmd
192 aia.lev0d
193
194 #It is all save in DRMS and archived.
195 Only the tlmd is archived. (see below if you want to change the
196 archiving status of a dataseries)
197
198 5. If something in the backend goes down such that you can't run
199 ingest_lev0, then you may want to start this cron job that will
200 periodically clean out the /dds/socdc dir of the files that are
201 coming in from the datacapture systems.
202
203 > crontab -l
204 # DO NOT EDIT THIS FILE - edit the master and reinstall.
205 # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
|
206 production 1.4 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.3 2008/10/16 18:29:18 production Exp $)
|
207 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
208
209 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
210
211 Starting and stoping SUMS on d02:
212
213 Login as production on d02
214 sum_start_d02
215
216 (if sums is already running it will ask you if you want to halt it.
217 you normally say 'y'.)
218
219 sum_stop_d02
220 if you just want to stop sums.
221
222 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
223
224 SUMS archiving:
225
226 Currently SUM is archiving continuously. The script is:
227
228 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
229
230 To halt it do:
231
232 touch /usr/local/logs/tapearc/TAPEARC_ABORT
233
234 Try to keep it running, as there is still much to be archived.
235
236 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
237
238 Change archiving status of a dataseries:
239
240 > psql -h hmidb jsoc
241
242 jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
243 UPDATE 1
244 jsoc=> \q
245
|