(file) Return to whattodolev0.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

  1 production 1.1 		/home/production/cvs/JSOC/doc/whattodolev0.txt  28Jul2008
  2                
  3                
  4                	------------------------------------------------------
  5                	Running Datacapture & Pipeline Backend lev0 Processing
  6                	------------------------------------------------------
  7                
  8                
  9                NOTE: For now, this is all done from the xim w/s (Jim's office)
 10                
 11                Datacapture:
 12                --------------------------
 13                
 14                NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
 15                own independent /home/production.
 16                
 17                1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively. 
 18                If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
 19                original machine is renamed dcs2.
 20                
 21                2. Login as user production via j0. (password is on Jim's whiteboard).
 22 production 1.1 
 23                3. The Postgres must be running and is started automatically on boot:
 24                
 25                > ps -ef |grep pg
 26                postgres  4631     1  0 Mar11 ?        00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
 27                
 28                4. The root of the datacapture tree is /home/production/cvs/JSOC.
 29                The producton runs as user id 388.
 30                
 31                5. The sum_svc is normally running:
 32                
 33                > ps -ef |grep sum_svc
 34                388      26958     1  0 Jun09 pts/0    00:00:54 sum_svc jsocdc
 35                
 36                Note the SUMS database is jsocdc. This is a separate DB on each dcs.
 37                
 38                6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
 39                
 40                > sum_start_dc
 41                sum_start at 2008.06.16_13:32:23
 42                ** NOTE: "soc_pipe_scp jsocdc" still running
 43 production 1.1 Do you want me to do a sum_stop followed by a sum_start for you (y or n):
 44                
 45                You would normally answer 'y' here.
 46                
 47                7. To run the datacapture gui that will display the data, mark it for archive,
 48                optionally extract lev0 and send it on the the pipeline backend, do this:
 49                
 50                > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
 51                > ./socdc
 52                
 53                All you would normally do is hit "Start Instances for HMI" or AIA for
 54                what datacapture machine you are on.
 55                
 56                8. To optionally extract lev0 do this:
 57                
 58                > touch /usr/local/logs/soc/LEV0FILEON
 59                
 60                To stop lev0:
 61                
 62                > /bin/rm /usr/local/logs/soc/LEV0FILEON
 63                
 64 production 1.1 The last 100 images for each VC are kept in /tmp/jim.
 65                
 66                NOTE: If you turn lev0 on, you are going to be data sensitive and you
 67                may see things like this, in which case you have to restart socdc:
 68                
 69                ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
 70                kill: no process ID specified
 71                
 72                9. The datacapture machines automatically copies DDS input data to the 
 73                pipeline backend on /dds/socdc living on d01. This is done by the program:
 74                
 75                >  ps -ef |grep soc_pipe_scp
 76                388      21529 21479  0 Jun09 pts/0    00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
 77                
 78                This requires that an ssh-agent be running. If you reboot a dcs machine do:
 79                
 80                > ssh-agent | head -2 > /tmp/ssh-agent.env
 81                > chmod 600 /tmp/ssh-agent.env
 82                > source /tmp/ssh-agent.env
 83                > ssh-add
 84                (The password is written on my whiteboard)
 85 production 1.1 
 86                NOTE: cron jobs use this /tmp/ssh-agent.env file
 87                
 88                If you want another window to use the ssh-agent that is already running do:
 89                > source /tmp/ssh-agent.env
 90                
 91                NOTE: on any one machine for user production there s/b just one ssh-agent
 92                running.
 93                
 94                
 95                If you see that a dcs has asked for a password, the ssh-agent has failed.
 96                You can probably find an error msg on d01 like 'invalid user production'.
 97                You should exit the socdc. Make sure there is no soc_pipe_scp still running.
 98                Restart the socdc.
 99                
100                If you find that there is a hostname for production that is not in the 
101                /home/production/.ssh/authorized_keys file then do this on the host that
102                you want to add:
103                
104                Pick up the entry in /home/production/.ssh/id_rsa.pub
105                and put it in this file on the host that you want to have access to
106 production 1.1 (make sure that it's all one line):
107                
108                /home/production/.ssh/authorized_keys
109                
110                NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
111                authorized_keys with the new public key you just generated.
112                
113                If not already active, then do what's shown above for the ssh-agent.
114                
115                
116                10. There should be a cron job running that will archive to the T50 tapes.
117                Note the names are asymmetric for dcs0 and dcs1.
118                
119                30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
120                
121                00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
122                
123                11. There should be running the t50view program to display/control the
124                tape operations.
125                
126                > t50view -i jsocdc
127 production 1.1 
128                The -i means interactive mode, which will allow you to change tapes.
129                
130                12. Every 2 days, inspect the t50 display for the button on the top row
131                called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
132                exported from the T50 and new tapes put in for further archiving.
133                Hit the 'Imp/Exp' button. 
134                Follow explicitly all the directions.
135                The blank L4 tapes are in the tape room in the computer room.
136                
137                13. Other background info is in:
138                
139                http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
140                
141                
142                
143                Level 0 Backend:
144                --------------------------
145                
146                1. As mentioned above, the datacapture machines automatically copies DDS input 
147                data to the pipeline backend on /dds/socdc living on d01. 
148 production 1.1 
149                2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
150                which has d01:/dds mounted. cl1n001 can be accessed through j1.
151                
152                3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
153                /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
154                
155                If you want to start afresh, kill any ingest_lev0 running (will later be
156                automated). Then do:
157                
158                > cd /home/production/cvs/JSOC/proj/lev0/apps
159                > start_lev0.pl
160                
161                You will see 4 instances started and the log file names can be seen.
162                You will be advised that to cleanly stop the lev0 processing, run:
163                
164                > stop_lev0.pl
165                
166                For now, every hour (might be 1/2hr) the ingest_lev0 processes are 
167                automatically restarted.
168                
169 production 1.1 
170                4. The output is for the series:
171                
172                hmi.tlmd
173                hmi.lev0d
174                aia.tlmd
175                aia.lev0d
176                
177                #It is all save in DRMS and  archived.
178                Only the tlmd is archived. (see below if you want to change the
179                archiving status of a dataseries)
180                
181                5. If something in the backend goes down such that you can't run 
182                ingest_lev0, then you may want to start this cron job that will
183                periodically clean out the /dds/socdc dir of the files that are
184                coming in from the datacapture systems.
185                
186                > crontab -l
187                # DO NOT EDIT THIS FILE - edit the master and reinstall.
188                # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
189                # (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $)
190 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
191                
192                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
193                
194                Starting and stoping SUMS on d02:
195                
196                Login as production on d02
197                sum_start_d02
198                
199                (if sums is already running it will ask you if you want to halt it.
200                you normally say 'y'.)
201                
202                sum_stop_d02
203                if you just want to stop sums.
204                
205                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
206                
207                SUMS archiving:
208                
209                Currently SUM is archiving continuously. The script is:
210                
211 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
212                
213                To halt it do:
214                
215                touch /usr/local/logs/tapearc/TAPEARC_ABORT
216                
217                Try to keep it running, as there is still much to be archived.
218                
219                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
220                
221                Change archiving status of a dataseries:
222                
223                > psql -h hmidb jsoc
224                
225                jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
226                UPDATE 1
227                jsoc=> \q
228                

Karen Tian
Powered by
ViewCVS 0.9.4