(file) Return to whattodolev0.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

  1 production 1.7 		/home/production/cvs/JSOC/doc/whattodolev0.txt  25Nov2008
  2 production 1.1 
  3 production 1.9 ------------------------------------------------
  4                WARNING!! Some of this is outdated. 3Jun2010
  5                Please see more recent what*.txt files, e.g.
  6                whattodo_start_stop_lev1_0_sums.txt
  7                ------------------------------------------------
  8 production 1.1 
  9                	------------------------------------------------------
 10                	Running Datacapture & Pipeline Backend lev0 Processing
 11                	------------------------------------------------------
 12                
 13                
 14                NOTE: For now, this is all done from the xim w/s (Jim's office)
 15                
 16                Datacapture:
 17                --------------------------
 18                
 19                NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
 20                own independent /home/production.
 21                
 22 production 1.7 FORMERLY: 1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 
 23                respectively. If the spare dcs2 is to be put in place, it is renamed dcs0 
 24                or dcs1, and the original machine is renamed dcs2.
 25                
 26                1. The datacapture machine serving for AIA or HMI is determined by
 27                the entries in:
 28                
 29                /home/production/cvs/JSOC/proj/datacapture/scripts/dsctab.txt
 30                
 31                This is edited or listed by the program:
 32                
 33                /home/production/cvs/JSOC/proj/datacapture/scripts> dcstab.pl -h
 34                Display or change the datacapture system assignment file.
 35                Usage: dcstab [-h][-l][-e]
 36                       -h = print this help message
 37                       -l = list the current file contents
 38                       -e = edit with vi the current file contents
 39                
 40                For dcs3 the dcstab.txt would look like:
 41                AIA=dcs3
 42                HMI=dcs3
 43 production 1.7 
 44 production 1.1 
 45 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
 46 production 1.4 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
 47                and dcs1, respectively:
 48 production 1.2 
 49 production 1.4 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
 50 production 1.2 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
 51                
 52 production 1.5 For this to work, this must be done on dcs0, dcs1 and dcs2, as user
 53                postgres, after any reboot:
 54 production 1.2 
 55                > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
 56                > chmod 600 /var/lib/pgsql/ssh-agent.env
 57                > source /var/lib/pgsql/ssh-agent.env
 58                > ssh-add
 59                (The password is written on my whiteboard (same as production's))
 60                
 61 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
 62                
 63                3. The Postgres must be running and is started automatically on boot:
 64                
 65                > ps -ef |grep pg
 66                postgres  4631     1  0 Mar11 ?        00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
 67                
 68                4. The root of the datacapture tree is /home/production/cvs/JSOC.
 69                The producton runs as user id 388.
 70                
 71                5. The sum_svc is normally running:
 72                
 73                > ps -ef |grep sum_svc
 74                388      26958     1  0 Jun09 pts/0    00:00:54 sum_svc jsocdc
 75                
 76                Note the SUMS database is jsocdc. This is a separate DB on each dcs.
 77                
 78                6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
 79                
 80                > sum_start_dc
 81                sum_start at 2008.06.16_13:32:23
 82 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
 83                Do you want me to do a sum_stop followed by a sum_start for you (y or n):
 84                
 85                You would normally answer 'y' here.
 86                
 87                7. To run the datacapture gui that will display the data, mark it for archive,
 88                optionally extract lev0 and send it on the the pipeline backend, do this:
 89                
 90                > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
 91                > ./socdc
 92                
 93                All you would normally do is hit "Start Instances for HMI" or AIA for
 94                what datacapture machine you are on.
 95                
 96                8. To optionally extract lev0 do this:
 97                
 98                > touch /usr/local/logs/soc/LEV0FILEON
 99                
100                To stop lev0:
101                
102                > /bin/rm /usr/local/logs/soc/LEV0FILEON
103 production 1.1 
104                The last 100 images for each VC are kept in /tmp/jim.
105                
106                NOTE: If you turn lev0 on, you are going to be data sensitive and you
107                may see things like this, in which case you have to restart socdc:
108                
109                ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
110                kill: no process ID specified
111                
112                9. The datacapture machines automatically copies DDS input data to the 
113                pipeline backend on /dds/socdc living on d01. This is done by the program:
114                
115                >  ps -ef |grep soc_pipe_scp
116                388      21529 21479  0 Jun09 pts/0    00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
117                
118                This requires that an ssh-agent be running. If you reboot a dcs machine do:
119                
120 production 1.8 > ssh-agent | head -2 > /var/tmp/ssh-agent.env
121                > chmod 600 /var/tmp/ssh-agent.env
122                > source /var/tmp/ssh-agent.env
123                > ssh-add	(or for sonar: ssh-add /home/production/.ssh/id_rsa)
124 production 1.1 (The password is written on my whiteboard)
125                
126 production 1.8 NOTE: cron jobs use this /var/tmp/ssh-agent.env file
127 production 1.1 
128                If you want another window to use the ssh-agent that is already running do:
129 production 1.8 > source /var/tmp/ssh-agent.env
130 production 1.1 
131                NOTE: on any one machine for user production there s/b just one ssh-agent
132                running.
133                
134                
135                If you see that a dcs has asked for a password, the ssh-agent has failed.
136                You can probably find an error msg on d01 like 'invalid user production'.
137                You should exit the socdc. Make sure there is no soc_pipe_scp still running.
138                Restart the socdc.
139                
140                If you find that there is a hostname for production that is not in the 
141                /home/production/.ssh/authorized_keys file then do this on the host that
142                you want to add:
143                
144                Pick up the entry in /home/production/.ssh/id_rsa.pub
145                and put it in this file on the host that you want to have access to
146                (make sure that it's all one line):
147                
148                /home/production/.ssh/authorized_keys
149                
150                NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
151 production 1.1 authorized_keys with the new public key you just generated.
152                
153                If not already active, then do what's shown above for the ssh-agent.
154                
155                
156                10. There should be a cron job running that will archive to the T50 tapes.
157                Note the names are asymmetric for dcs0 and dcs1.
158                
159                30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
160                
161                00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
162                
163 production 1.6 In the beginning of the world, before any sum_start_dc, the T50 should have 
164                a supply of blank tapes in it's active slots (1-24). A cleaning tape must
165                be in slot 25. The imp/exp slots (26-30) must be vacant.
166                To see the contents of the T50 before startup do:
167                
168                > mtx -f /dev/t50 status
169                
170                Whenever sum_start_dc is called, all the tapes are inventoried and added
171                to the SUMS database if necessary.
172                When a tape is written full by the tapearc_do cron job, the t50view
173                display (see 11. and 12. below) 'Imp/Exp' button will increment its
174                count. Tapes should be exported before the count gets above 5.
175                
176 production 1.1 11. There should be running the t50view program to display/control the
177                tape operations.
178                
179                > t50view -i jsocdc
180                
181                The -i means interactive mode, which will allow you to change tapes.
182                
183                12. Every 2 days, inspect the t50 display for the button on the top row
184                called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
185                exported from the T50 and new tapes put in for further archiving.
186 production 1.6 
187 production 1.1 Hit the 'Imp/Exp' button. 
188                Follow explicitly all the directions.
189                The blank L4 tapes are in the tape room in the computer room.
190                
191 production 1.6 When the tape drive needs cleaning, hit the "Start Cleaning" button on
192                the t50view gui.
193                
194 production 1.7 13. There should be a cron job running as user production on both dcs0 and 
195                dcs1 that will set the Offsite_Ack field in the sum_main DB table.
196                20 0 * * * /home/production/tape_verify/scripts/set_sum_main_offsite_ack.pl 
197                
198                Where:
199                #/home/production/tape_verify/scripts/set_sum_main_offsite_ack.pl
200                #
201                #This reads the .ver files produced by Tim's
202                #/home/production/tape_verify/scripts/run_remote_tape_verify.pl
203                #A .ver file looks like:
204                ## Offsite verify offhost:dds/off2ds/HMI_2008.06.11_01:12:27.ver
205                ## Tape   0=success 0=dcs0(aia)
206                #000684L4 0         1
207                #000701L4 0         1
208                ##END
209                #For each tape that has been verified successfully, this program
210                #sets the Offsite_Ack to 'Y' in the sum_main for all entries
211                #with Arch_Tape = the given tape id.
212                #
213                #The machine names where AIA and HMI processing live
214                #is found in dcstab.txt which must be on either dcs0 or dcs1
215 production 1.7 
216                14. Other background info is in:
217 production 1.1 
218                http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
219                
220 production 1.6 ***************************dsc3*********************************************
221                NOTE: dcs3 (i.e. offsite datacapture machine shipped to Goddard Nov 2008)
222                
223                At Goddard the dcs3 host name will be changed. See the following for
224                how to accomodate this:
225                
226                /home/production/cvs/JSOC/doc/dcs3_name_change.txt
227                
228                This cron job must be run to clean out the /dds/soc2pipe/[aia,hmi]:
229                0,5,10,15,20,25,30,35,40,45,50,55 * * * *
230                /home/production/cvs/JSOC/proj/datacapture/scripts/rm_soc2pipe.pl
231                
232                Also on dcs3 the offsite_ack check and safe tape check is not done in:
233                /home/production/cvs/JSOC/base/sums/libs/pg/SUMLIB_RmDo.pgc
234 production 1.1 
235 production 1.6 Also on dcs3, because there is no pipeline backend, there is not .arc file 
236                ever made for the DDS.
237                ***************************dsc3*********************************************
238 production 1.1 
239                Level 0 Backend:
240                --------------------------
241                
242 production 1.9 !!Make sure run Phil's script for watchlev0 in the background on cl1n001:
243                /home/production/cvs/JSOC/base/sums/scripts/get_dcs_times.csh
244                
245 production 1.1 1. As mentioned above, the datacapture machines automatically copies DDS input 
246                data to the pipeline backend on /dds/socdc living on d01. 
247                
248                2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
249                which has d01:/dds mounted. cl1n001 can be accessed through j1.
250                
251                3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
252                /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
253                
254                If you want to start afresh, kill any ingest_lev0 running (will later be
255                automated). Then do:
256                
257                > cd /home/production/cvs/JSOC/proj/lev0/apps
258 production 1.9 > doingestlev0.pl     (actually a link to start_lev0.pl)
259 production 1.1 
260                You will see 4 instances started and the log file names can be seen.
261                You will be advised that to cleanly stop the lev0 processing, run:
262                
263                > stop_lev0.pl
264                
265 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
266                where they can stop cleanly.
267                
268                For now, every hour, the ingest_lev0 processes are automatically restarted.
269 production 1.1 
270                
271                4. The output is for the series:
272                
273                hmi.tlmd
274                hmi.lev0d
275                aia.tlmd
276                aia.lev0d
277                
278                #It is all save in DRMS and  archived.
279                Only the tlmd is archived. (see below if you want to change the
280                archiving status of a dataseries)
281                
282                5. If something in the backend goes down such that you can't run 
283                ingest_lev0, then you may want to start this cron job that will
284                periodically clean out the /dds/socdc dir of the files that are
285                coming in from the datacapture systems.
286                
287                > crontab -l
288                # DO NOT EDIT THIS FILE - edit the master and reinstall.
289                # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
290 production 1.9 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.8 2009/08/03 18:24:23 production Exp $)
291 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
292                
293                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
294                
295                Starting and stoping SUMS on d02:
296                
297                Login as production on d02
298                sum_start_d02
299                
300                (if sums is already running it will ask you if you want to halt it.
301                you normally say 'y'.)
302                
303                sum_stop_d02
304                if you just want to stop sums.
305                
306                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
307                
308                SUMS archiving:
309                
310                Currently SUM is archiving continuously. The script is:
311                
312 production 1.9 /home/production/cvs/JSOC/base/sums/scripts/tape_do_0.pl  (and _1, _2, _3)
313 production 1.1 
314                To halt it do:
315                
316 production 1.9 touch /usr/local/logs/tapearc/TAPEARC_ABORT[0,1,2]
317 production 1.1 
318                Try to keep it running, as there is still much to be archived.
319                
320                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
321                
322                Change archiving status of a dataseries:
323                
324                > psql -h hmidb jsoc
325                
326                jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
327                UPDATE 1
328                jsoc=> \q
329                
330 production 1.7 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
331                
332                The modified dcs reboot procedure is in ~kehcheng/dcs.reboot.notes.

Karen Tian
Powered by
ViewCVS 0.9.4