(file) Return to whattodolev0.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

  1 production 1.3 		/home/production/cvs/JSOC/doc/whattodolev0.txt  16Oct2008
  2 production 1.1 
  3                
  4                	------------------------------------------------------
  5                	Running Datacapture & Pipeline Backend lev0 Processing
  6                	------------------------------------------------------
  7                
  8                
  9                NOTE: For now, this is all done from the xim w/s (Jim's office)
 10                
 11                Datacapture:
 12                --------------------------
 13                
 14                NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
 15                own independent /home/production.
 16                
 17                1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively. 
 18                If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
 19                original machine is renamed dcs2.
 20                
 21 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
 22 production 1.4 running on dcs0 and dcs1. You should see this postgres cron job on dcs0
 23                and dcs1, respectively:
 24 production 1.2 
 25 production 1.4 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs0_to_dcs2.pl
 26 production 1.2 0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
 27                
 28                For this to work, this must be done on dcs2 after any reboot:
 29                
 30                > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
 31                > chmod 600 /var/lib/pgsql/ssh-agent.env
 32                > source /var/lib/pgsql/ssh-agent.env
 33                > ssh-add
 34                (The password is written on my whiteboard (same as production's))
 35                
 36 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
 37                
 38                3. The Postgres must be running and is started automatically on boot:
 39                
 40                > ps -ef |grep pg
 41                postgres  4631     1  0 Mar11 ?        00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
 42                
 43                4. The root of the datacapture tree is /home/production/cvs/JSOC.
 44                The producton runs as user id 388.
 45                
 46                5. The sum_svc is normally running:
 47                
 48                > ps -ef |grep sum_svc
 49                388      26958     1  0 Jun09 pts/0    00:00:54 sum_svc jsocdc
 50                
 51                Note the SUMS database is jsocdc. This is a separate DB on each dcs.
 52                
 53                6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
 54                
 55                > sum_start_dc
 56                sum_start at 2008.06.16_13:32:23
 57 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
 58                Do you want me to do a sum_stop followed by a sum_start for you (y or n):
 59                
 60                You would normally answer 'y' here.
 61                
 62                7. To run the datacapture gui that will display the data, mark it for archive,
 63                optionally extract lev0 and send it on the the pipeline backend, do this:
 64                
 65                > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
 66                > ./socdc
 67                
 68                All you would normally do is hit "Start Instances for HMI" or AIA for
 69                what datacapture machine you are on.
 70                
 71                8. To optionally extract lev0 do this:
 72                
 73                > touch /usr/local/logs/soc/LEV0FILEON
 74                
 75                To stop lev0:
 76                
 77                > /bin/rm /usr/local/logs/soc/LEV0FILEON
 78 production 1.1 
 79                The last 100 images for each VC are kept in /tmp/jim.
 80                
 81                NOTE: If you turn lev0 on, you are going to be data sensitive and you
 82                may see things like this, in which case you have to restart socdc:
 83                
 84                ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
 85                kill: no process ID specified
 86                
 87                9. The datacapture machines automatically copies DDS input data to the 
 88                pipeline backend on /dds/socdc living on d01. This is done by the program:
 89                
 90                >  ps -ef |grep soc_pipe_scp
 91                388      21529 21479  0 Jun09 pts/0    00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
 92                
 93                This requires that an ssh-agent be running. If you reboot a dcs machine do:
 94                
 95                > ssh-agent | head -2 > /tmp/ssh-agent.env
 96                > chmod 600 /tmp/ssh-agent.env
 97                > source /tmp/ssh-agent.env
 98                > ssh-add
 99 production 1.1 (The password is written on my whiteboard)
100                
101                NOTE: cron jobs use this /tmp/ssh-agent.env file
102                
103                If you want another window to use the ssh-agent that is already running do:
104                > source /tmp/ssh-agent.env
105                
106                NOTE: on any one machine for user production there s/b just one ssh-agent
107                running.
108                
109                
110                If you see that a dcs has asked for a password, the ssh-agent has failed.
111                You can probably find an error msg on d01 like 'invalid user production'.
112                You should exit the socdc. Make sure there is no soc_pipe_scp still running.
113                Restart the socdc.
114                
115                If you find that there is a hostname for production that is not in the 
116                /home/production/.ssh/authorized_keys file then do this on the host that
117                you want to add:
118                
119                Pick up the entry in /home/production/.ssh/id_rsa.pub
120 production 1.1 and put it in this file on the host that you want to have access to
121                (make sure that it's all one line):
122                
123                /home/production/.ssh/authorized_keys
124                
125                NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
126                authorized_keys with the new public key you just generated.
127                
128                If not already active, then do what's shown above for the ssh-agent.
129                
130                
131                10. There should be a cron job running that will archive to the T50 tapes.
132                Note the names are asymmetric for dcs0 and dcs1.
133                
134                30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
135                
136                00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
137                
138                11. There should be running the t50view program to display/control the
139                tape operations.
140                
141 production 1.1 > t50view -i jsocdc
142                
143                The -i means interactive mode, which will allow you to change tapes.
144                
145                12. Every 2 days, inspect the t50 display for the button on the top row
146                called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
147                exported from the T50 and new tapes put in for further archiving.
148                Hit the 'Imp/Exp' button. 
149                Follow explicitly all the directions.
150                The blank L4 tapes are in the tape room in the computer room.
151                
152                13. Other background info is in:
153                
154                http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
155                
156                
157                
158                Level 0 Backend:
159                --------------------------
160                
161                1. As mentioned above, the datacapture machines automatically copies DDS input 
162 production 1.1 data to the pipeline backend on /dds/socdc living on d01. 
163                
164                2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
165                which has d01:/dds mounted. cl1n001 can be accessed through j1.
166                
167                3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
168                /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
169                
170                If you want to start afresh, kill any ingest_lev0 running (will later be
171                automated). Then do:
172                
173                > cd /home/production/cvs/JSOC/proj/lev0/apps
174                > start_lev0.pl
175                
176                You will see 4 instances started and the log file names can be seen.
177                You will be advised that to cleanly stop the lev0 processing, run:
178                
179                > stop_lev0.pl
180                
181 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
182                where they can stop cleanly.
183                
184                For now, every hour, the ingest_lev0 processes are automatically restarted.
185 production 1.1 
186                
187                4. The output is for the series:
188                
189                hmi.tlmd
190                hmi.lev0d
191                aia.tlmd
192                aia.lev0d
193                
194                #It is all save in DRMS and  archived.
195                Only the tlmd is archived. (see below if you want to change the
196                archiving status of a dataseries)
197                
198                5. If something in the backend goes down such that you can't run 
199                ingest_lev0, then you may want to start this cron job that will
200                periodically clean out the /dds/socdc dir of the files that are
201                coming in from the datacapture systems.
202                
203                > crontab -l
204                # DO NOT EDIT THIS FILE - edit the master and reinstall.
205                # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
206 production 1.4 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.3 2008/10/16 18:29:18 production Exp $)
207 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
208                
209                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
210                
211                Starting and stoping SUMS on d02:
212                
213                Login as production on d02
214                sum_start_d02
215                
216                (if sums is already running it will ask you if you want to halt it.
217                you normally say 'y'.)
218                
219                sum_stop_d02
220                if you just want to stop sums.
221                
222                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
223                
224                SUMS archiving:
225                
226                Currently SUM is archiving continuously. The script is:
227                
228 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
229                
230                To halt it do:
231                
232                touch /usr/local/logs/tapearc/TAPEARC_ABORT
233                
234                Try to keep it running, as there is still much to be archived.
235                
236                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
237                
238                Change archiving status of a dataseries:
239                
240                > psql -h hmidb jsoc
241                
242                jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
243                UPDATE 1
244                jsoc=> \q
245                

Karen Tian
Powered by
ViewCVS 0.9.4