(file) Return to whattodolev0.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

  1 production 1.6 		/home/production/cvs/JSOC/doc/whattodolev0.txt  07Nov2008
  2 production 1.1 
  3                
  4                	------------------------------------------------------
  5                	Running Datacapture & Pipeline Backend lev0 Processing
  6                	------------------------------------------------------
  7                
  8                
  9                NOTE: For now, this is all done from the xim w/s (Jim's office)
 10                
 11                Datacapture:
 12                --------------------------
 13                
 14                NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
 15                own independent /home/production.
 16                
 17                1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively. 
 18                If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
 19                original machine is renamed dcs2.
 20                
 21 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
 22 production 1.4 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
 23                and dcs1, respectively:
 24 production 1.2 
 25 production 1.4 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
 26 production 1.2 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
 27                
 28 production 1.5 For this to work, this must be done on dcs0, dcs1 and dcs2, as user
 29                postgres, after any reboot:
 30 production 1.2 
 31                > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
 32                > chmod 600 /var/lib/pgsql/ssh-agent.env
 33                > source /var/lib/pgsql/ssh-agent.env
 34                > ssh-add
 35                (The password is written on my whiteboard (same as production's))
 36                
 37 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
 38                
 39                3. The Postgres must be running and is started automatically on boot:
 40                
 41                > ps -ef |grep pg
 42                postgres  4631     1  0 Mar11 ?        00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
 43                
 44                4. The root of the datacapture tree is /home/production/cvs/JSOC.
 45                The producton runs as user id 388.
 46                
 47                5. The sum_svc is normally running:
 48                
 49                > ps -ef |grep sum_svc
 50                388      26958     1  0 Jun09 pts/0    00:00:54 sum_svc jsocdc
 51                
 52                Note the SUMS database is jsocdc. This is a separate DB on each dcs.
 53                
 54                6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
 55                
 56                > sum_start_dc
 57                sum_start at 2008.06.16_13:32:23
 58 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
 59                Do you want me to do a sum_stop followed by a sum_start for you (y or n):
 60                
 61                You would normally answer 'y' here.
 62                
 63                7. To run the datacapture gui that will display the data, mark it for archive,
 64                optionally extract lev0 and send it on the the pipeline backend, do this:
 65                
 66                > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
 67                > ./socdc
 68                
 69                All you would normally do is hit "Start Instances for HMI" or AIA for
 70                what datacapture machine you are on.
 71                
 72                8. To optionally extract lev0 do this:
 73                
 74                > touch /usr/local/logs/soc/LEV0FILEON
 75                
 76                To stop lev0:
 77                
 78                > /bin/rm /usr/local/logs/soc/LEV0FILEON
 79 production 1.1 
 80                The last 100 images for each VC are kept in /tmp/jim.
 81                
 82                NOTE: If you turn lev0 on, you are going to be data sensitive and you
 83                may see things like this, in which case you have to restart socdc:
 84                
 85                ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
 86                kill: no process ID specified
 87                
 88                9. The datacapture machines automatically copies DDS input data to the 
 89                pipeline backend on /dds/socdc living on d01. This is done by the program:
 90                
 91                >  ps -ef |grep soc_pipe_scp
 92                388      21529 21479  0 Jun09 pts/0    00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
 93                
 94                This requires that an ssh-agent be running. If you reboot a dcs machine do:
 95                
 96                > ssh-agent | head -2 > /tmp/ssh-agent.env
 97                > chmod 600 /tmp/ssh-agent.env
 98                > source /tmp/ssh-agent.env
 99                > ssh-add
100 production 1.1 (The password is written on my whiteboard)
101                
102                NOTE: cron jobs use this /tmp/ssh-agent.env file
103                
104                If you want another window to use the ssh-agent that is already running do:
105                > source /tmp/ssh-agent.env
106                
107                NOTE: on any one machine for user production there s/b just one ssh-agent
108                running.
109                
110                
111                If you see that a dcs has asked for a password, the ssh-agent has failed.
112                You can probably find an error msg on d01 like 'invalid user production'.
113                You should exit the socdc. Make sure there is no soc_pipe_scp still running.
114                Restart the socdc.
115                
116                If you find that there is a hostname for production that is not in the 
117                /home/production/.ssh/authorized_keys file then do this on the host that
118                you want to add:
119                
120                Pick up the entry in /home/production/.ssh/id_rsa.pub
121 production 1.1 and put it in this file on the host that you want to have access to
122                (make sure that it's all one line):
123                
124                /home/production/.ssh/authorized_keys
125                
126                NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
127                authorized_keys with the new public key you just generated.
128                
129                If not already active, then do what's shown above for the ssh-agent.
130                
131                
132                10. There should be a cron job running that will archive to the T50 tapes.
133                Note the names are asymmetric for dcs0 and dcs1.
134                
135                30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
136                
137                00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
138                
139 production 1.6 In the beginning of the world, before any sum_start_dc, the T50 should have 
140                a supply of blank tapes in it's active slots (1-24). A cleaning tape must
141                be in slot 25. The imp/exp slots (26-30) must be vacant.
142                To see the contents of the T50 before startup do:
143                
144                > mtx -f /dev/t50 status
145                
146                Whenever sum_start_dc is called, all the tapes are inventoried and added
147                to the SUMS database if necessary.
148                When a tape is written full by the tapearc_do cron job, the t50view
149                display (see 11. and 12. below) 'Imp/Exp' button will increment its
150                count. Tapes should be exported before the count gets above 5.
151                
152 production 1.1 11. There should be running the t50view program to display/control the
153                tape operations.
154                
155                > t50view -i jsocdc
156                
157                The -i means interactive mode, which will allow you to change tapes.
158                
159                12. Every 2 days, inspect the t50 display for the button on the top row
160                called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
161                exported from the T50 and new tapes put in for further archiving.
162 production 1.6 
163 production 1.1 Hit the 'Imp/Exp' button. 
164                Follow explicitly all the directions.
165                The blank L4 tapes are in the tape room in the computer room.
166                
167 production 1.6 When the tape drive needs cleaning, hit the "Start Cleaning" button on
168                the t50view gui.
169                
170 production 1.1 13. Other background info is in:
171                
172                http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
173                
174 production 1.6 ***************************dsc3*********************************************
175                NOTE: dcs3 (i.e. offsite datacapture machine shipped to Goddard Nov 2008)
176                
177                At Goddard the dcs3 host name will be changed. See the following for
178                how to accomodate this:
179                
180                /home/production/cvs/JSOC/doc/dcs3_name_change.txt
181                
182                This cron job must be run to clean out the /dds/soc2pipe/[aia,hmi]:
183                0,5,10,15,20,25,30,35,40,45,50,55 * * * *
184                /home/production/cvs/JSOC/proj/datacapture/scripts/rm_soc2pipe.pl
185                
186                Also on dcs3 the offsite_ack check and safe tape check is not done in:
187                /home/production/cvs/JSOC/base/sums/libs/pg/SUMLIB_RmDo.pgc
188 production 1.1 
189 production 1.6 Also on dcs3, because there is no pipeline backend, there is not .arc file 
190                ever made for the DDS.
191                ***************************dsc3*********************************************
192 production 1.1 
193                Level 0 Backend:
194                --------------------------
195                
196                1. As mentioned above, the datacapture machines automatically copies DDS input 
197                data to the pipeline backend on /dds/socdc living on d01. 
198                
199                2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
200                which has d01:/dds mounted. cl1n001 can be accessed through j1.
201                
202                3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
203                /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
204                
205                If you want to start afresh, kill any ingest_lev0 running (will later be
206                automated). Then do:
207                
208                > cd /home/production/cvs/JSOC/proj/lev0/apps
209                > start_lev0.pl
210                
211                You will see 4 instances started and the log file names can be seen.
212                You will be advised that to cleanly stop the lev0 processing, run:
213 production 1.1 
214                > stop_lev0.pl
215                
216 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
217                where they can stop cleanly.
218                
219                For now, every hour, the ingest_lev0 processes are automatically restarted.
220 production 1.1 
221                
222                4. The output is for the series:
223                
224                hmi.tlmd
225                hmi.lev0d
226                aia.tlmd
227                aia.lev0d
228                
229                #It is all save in DRMS and  archived.
230                Only the tlmd is archived. (see below if you want to change the
231                archiving status of a dataseries)
232                
233                5. If something in the backend goes down such that you can't run 
234                ingest_lev0, then you may want to start this cron job that will
235                periodically clean out the /dds/socdc dir of the files that are
236                coming in from the datacapture systems.
237                
238                > crontab -l
239                # DO NOT EDIT THIS FILE - edit the master and reinstall.
240                # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
241 production 1.6 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.5 2008/10/16 18:46:58 production Exp $)
242 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
243                
244                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
245                
246                Starting and stoping SUMS on d02:
247                
248                Login as production on d02
249                sum_start_d02
250                
251                (if sums is already running it will ask you if you want to halt it.
252                you normally say 'y'.)
253                
254                sum_stop_d02
255                if you just want to stop sums.
256                
257                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
258                
259                SUMS archiving:
260                
261                Currently SUM is archiving continuously. The script is:
262                
263 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
264                
265                To halt it do:
266                
267                touch /usr/local/logs/tapearc/TAPEARC_ABORT
268                
269                Try to keep it running, as there is still much to be archived.
270                
271                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
272                
273                Change archiving status of a dataseries:
274                
275                > psql -h hmidb jsoc
276                
277                jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
278                UPDATE 1
279                jsoc=> \q
280                

Karen Tian
Powered by
ViewCVS 0.9.4