(file) Return to whattodolev0.txt CVS log (file) (dir) Up to [Development] / JSOC / doc

  1 production 1.2 		/home/production/cvs/JSOC/doc/whattodolev0.txt  08Oct2008
  2 production 1.1 
  3                
  4                	------------------------------------------------------
  5                	Running Datacapture & Pipeline Backend lev0 Processing
  6                	------------------------------------------------------
  7                
  8                
  9                NOTE: For now, this is all done from the xim w/s (Jim's office)
 10                
 11                Datacapture:
 12                --------------------------
 13                
 14                NOTE:IMPORTANT: Please keep in mind that each data capture machine has its
 15                own independent /home/production.
 16                
 17                1. The Datacapture system for aia/hmi is by convention dcs0/dcs1 respectively. 
 18                If the spare dcs2 is to be put in place, it is renamed dcs0 or dcs1, and the
 19                original machine is renamed dcs2.
 20                
 21 production 1.2 1a. The spare dcs2 normally servers as a backup destination of the postgres
 22                running on dcs0 and dcs1. You should see this postgres cron job:
 23                
 24                0,20,40 * * * * /var/lib/pgsql/rsync_pg_dcs1_to_dcs2.pl
 25                
 26                For this to work, this must be done on dcs2 after any reboot:
 27                
 28                > ssh-agent | head -2 > /var/lib/pgsql/ssh-agent.env
 29                > chmod 600 /var/lib/pgsql/ssh-agent.env
 30                > source /var/lib/pgsql/ssh-agent.env
 31                > ssh-add
 32                (The password is written on my whiteboard (same as production's))
 33                
 34 production 1.1 2. Login as user production via j0. (password is on Jim's whiteboard).
 35                
 36                3. The Postgres must be running and is started automatically on boot:
 37                
 38                > ps -ef |grep pg
 39                postgres  4631     1  0 Mar11 ?        00:06:21 /usr/bin/postmaster -D /var/lib/pgsql/data
 40                
 41                4. The root of the datacapture tree is /home/production/cvs/JSOC.
 42                The producton runs as user id 388.
 43                
 44                5. The sum_svc is normally running:
 45                
 46                > ps -ef |grep sum_svc
 47                388      26958     1  0 Jun09 pts/0    00:00:54 sum_svc jsocdc
 48                
 49                Note the SUMS database is jsocdc. This is a separate DB on each dcs.
 50                
 51                6. To start/restart the sum_svc and related programs (e.g. tape_svc) do:
 52                
 53                > sum_start_dc
 54                sum_start at 2008.06.16_13:32:23
 55 production 1.1 ** NOTE: "soc_pipe_scp jsocdc" still running
 56                Do you want me to do a sum_stop followed by a sum_start for you (y or n):
 57                
 58                You would normally answer 'y' here.
 59                
 60                7. To run the datacapture gui that will display the data, mark it for archive,
 61                optionally extract lev0 and send it on the the pipeline backend, do this:
 62                
 63                > cd /home/production/cvs/JSOC/proj/datacapture/scripts>
 64                > ./socdc
 65                
 66                All you would normally do is hit "Start Instances for HMI" or AIA for
 67                what datacapture machine you are on.
 68                
 69                8. To optionally extract lev0 do this:
 70                
 71                > touch /usr/local/logs/soc/LEV0FILEON
 72                
 73                To stop lev0:
 74                
 75                > /bin/rm /usr/local/logs/soc/LEV0FILEON
 76 production 1.1 
 77                The last 100 images for each VC are kept in /tmp/jim.
 78                
 79                NOTE: If you turn lev0 on, you are going to be data sensitive and you
 80                may see things like this, in which case you have to restart socdc:
 81                
 82                ingest_tlm: /home/production/cvs/EGSE/src/libhmicomp.d/decompress.c:1385: decompress_undotransform: Assertion `N>=(6) && N<=(16)' failed.
 83                kill: no process ID specified
 84                
 85                9. The datacapture machines automatically copies DDS input data to the 
 86                pipeline backend on /dds/socdc living on d01. This is done by the program:
 87                
 88                >  ps -ef |grep soc_pipe_scp
 89                388      21529 21479  0 Jun09 pts/0    00:00:13 soc_pipe_scp /dds/soc2pipe/hmi /dds/socdc/hmi d01i 30
 90                
 91                This requires that an ssh-agent be running. If you reboot a dcs machine do:
 92                
 93                > ssh-agent | head -2 > /tmp/ssh-agent.env
 94                > chmod 600 /tmp/ssh-agent.env
 95                > source /tmp/ssh-agent.env
 96                > ssh-add
 97 production 1.1 (The password is written on my whiteboard)
 98                
 99                NOTE: cron jobs use this /tmp/ssh-agent.env file
100                
101                If you want another window to use the ssh-agent that is already running do:
102                > source /tmp/ssh-agent.env
103                
104                NOTE: on any one machine for user production there s/b just one ssh-agent
105                running.
106                
107                
108                If you see that a dcs has asked for a password, the ssh-agent has failed.
109                You can probably find an error msg on d01 like 'invalid user production'.
110                You should exit the socdc. Make sure there is no soc_pipe_scp still running.
111                Restart the socdc.
112                
113                If you find that there is a hostname for production that is not in the 
114                /home/production/.ssh/authorized_keys file then do this on the host that
115                you want to add:
116                
117                Pick up the entry in /home/production/.ssh/id_rsa.pub
118 production 1.1 and put it in this file on the host that you want to have access to
119                (make sure that it's all one line):
120                
121                /home/production/.ssh/authorized_keys
122                
123                NOTE: DO NOT do a ssh-keygen or you will have to update all the host's
124                authorized_keys with the new public key you just generated.
125                
126                If not already active, then do what's shown above for the ssh-agent.
127                
128                
129                10. There should be a cron job running that will archive to the T50 tapes.
130                Note the names are asymmetric for dcs0 and dcs1.
131                
132                30 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do
133                
134                00 0-23 * * * /home/production/cvs/jsoc/scripts/tapearc_do_dcs1
135                
136                11. There should be running the t50view program to display/control the
137                tape operations.
138                
139 production 1.1 > t50view -i jsocdc
140                
141                The -i means interactive mode, which will allow you to change tapes.
142                
143                12. Every 2 days, inspect the t50 display for the button on the top row
144                called 'Imp/Exp'. If it is non 0 (and yellow), then some full tapes can be
145                exported from the T50 and new tapes put in for further archiving.
146                Hit the 'Imp/Exp' button. 
147                Follow explicitly all the directions.
148                The blank L4 tapes are in the tape room in the computer room.
149                
150                13. Other background info is in:
151                
152                http://hmi.stanford.edu/development/JSOC_Documents/Data_Capture_Documents/DataCapture.html
153                
154                
155                
156                Level 0 Backend:
157                --------------------------
158                
159                1. As mentioned above, the datacapture machines automatically copies DDS input 
160 production 1.1 data to the pipeline backend on /dds/socdc living on d01. 
161                
162                2. The lev0 code runs as ingest_lev0 on the cluster machine cl1n001,
163                which has d01:/dds mounted. cl1n001 can be accessed through j1.
164                
165                3. All 4 instances of ingest_lev0 for the 4 VCs are controlled by
166                /home/production/cvs/JSOC/proj/lev0/apps/doingestlev0.pl
167                
168                If you want to start afresh, kill any ingest_lev0 running (will later be
169                automated). Then do:
170                
171                > cd /home/production/cvs/JSOC/proj/lev0/apps
172                > start_lev0.pl
173                
174                You will see 4 instances started and the log file names can be seen.
175                You will be advised that to cleanly stop the lev0 processing, run:
176                
177                > stop_lev0.pl
178                
179 production 1.2 It may take awhile for all the ingest_lev0 processes to get to a point
180                where they can stop cleanly.
181                
182                For now, every hour, the ingest_lev0 processes are automatically restarted.
183 production 1.1 
184                
185                4. The output is for the series:
186                
187                hmi.tlmd
188                hmi.lev0d
189                aia.tlmd
190                aia.lev0d
191                
192                #It is all save in DRMS and  archived.
193                Only the tlmd is archived. (see below if you want to change the
194                archiving status of a dataseries)
195                
196                5. If something in the backend goes down such that you can't run 
197                ingest_lev0, then you may want to start this cron job that will
198                periodically clean out the /dds/socdc dir of the files that are
199                coming in from the datacapture systems.
200                
201                > crontab -l
202                # DO NOT EDIT THIS FILE - edit the master and reinstall.
203                # (/tmp/crontab.XXXXVnxDO9 installed on Mon Jun 16 16:38:46 2008)
204 production 1.2 # (Cron version V5.0 -- $Id: whattodolev0.txt,v 1.1 2008/09/30 17:45:29 production Exp $)
205 production 1.1 #0,20,40 * * * * /home/jim/cvs/jsoc/scripts/pipefe_rm
206                
207                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208                
209                Starting and stoping SUMS on d02:
210                
211                Login as production on d02
212                sum_start_d02
213                
214                (if sums is already running it will ask you if you want to halt it.
215                you normally say 'y'.)
216                
217                sum_stop_d02
218                if you just want to stop sums.
219                
220                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
221                
222                SUMS archiving:
223                
224                Currently SUM is archiving continuously. The script is:
225                
226 production 1.1 /home/production/cvs/JSOC/base/sums/scripts/tape_do.pl
227                
228                To halt it do:
229                
230                touch /usr/local/logs/tapearc/TAPEARC_ABORT
231                
232                Try to keep it running, as there is still much to be archived.
233                
234                ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
235                
236                Change archiving status of a dataseries:
237                
238                > psql -h hmidb jsoc
239                
240                jsoc=> update hmi.drms_series set archive=0 where seriesname='hmi.lev0c';
241                UPDATE 1
242                jsoc=> \q
243                

Karen Tian
Powered by
ViewCVS 0.9.4