HK Level Zero Code Debug Guide(version 2 on 5/11/2010)
Item # (priority) |
Module Component/ Subcomponent/ Executable |
Error or Warning |
Total Message |
Problem Described |
Solution/ Actions Required |
1 (high) |
->HK Level 0 ->MOC Dayfile Save and decode keywords data for asd dayfile and save keywords to asd HK by APID series(sdo.lev0_asd_0004) ->dsdf.pl cronjob on production@j0. |
Email message with this subject. JSOC:
WARNING: Ingesting MOC dayfiles: status:No files loaded today |
JSOC: WARNING: Ingesting MOC dayfiles:
status:No files loaded today Warning Message: -->Received count of <0> hkt files and count of <0> xml files from directory </tmp21/jsoc/sdo/mocprods/lzp>. -->When executing script </home/production/cvs/JSOC/proj/lev0/scripts/hk/dsdf.pl> from cron job.
|
Dayfile were not retrieved by process that retrieves the Level 0 dayfiles from MOC Product server. This should be done by 9PM PDT. Then this process looks for dayfiles. If none are there, an email is sent out to jsoc_ops. This causes other processes to fail because they are looking for dayfiles at 11PM.
|
(1)Confirm the process that retrieves Dayfiles from MOC is working. Check with Art or Hao. (2-option a)Once this is working the process will pick up dayfiles for lost day(s) and the current day and save dayfiles at 10PM-PDT. (2-option b)If needed we can start save of dayfiles for lost day as soon as we get files retrieved by process that gets files from MOC Product Server. ->login to j0 as production. % cd /home/production/cvs/JSOC/proj/lev0/scripts/hk % dsdf.pl moc -->View log file to confirm saved dayfiles and decode asd dayfile. % view log-df-moc
(3)There are processes that missed getting dayfiles and processing. These needed-temporary-quickly-done processes run as cron jobs between 11-11:30PM PDT on production@j0. By checking log files(log-doAPID17, log-doAPID19, log-doAPID21, log-doAPID38, log-doAPID40) can see if runs were successful. (a) Keyword Data for APID 19(For Sebastien and Jesper) ->Got to directory to update manually script. % cd /home/production/cvs/JSOC/proj/lev0/scripts/hk ->Change start day and end day in script to day(s) did not have day files. For example change $i start value and 6 end value shown below: #(1)hardcode end day for month #for october 2008 use 1-31 days $i=5; while ($i < 6) { --> Go into file and change: % vi getFilesFromDayfile_019v2.pl -->Run % getFilesFromDayfile_019v2.pl b)APID 17,21,38,40 are less critical since these series are used by Master Pointing series to gather data. Since the master pointing series does not change HK lookup times often, this should not have a problem until we update the master pointer series. To do these updates. Repeat steps about described in 3a. By vi editing files and then running. Remember v2 version in name of file is for hand editing and v3 version should not be touched(used for cronjobs). |
2 (high) |
->HK Level 0 ->Real time Decode and save of keywords to asd series using RTMON Dayfile ->On production@j0 at /home/production/cvs/JSOC/proj/lev0/scripts/hk.
% gdfrt.pl apid=129 pkt_size=128 > LOG-RT-APID0x081-2010.05.11.09_10:00:00 % monitor_df_rtmon.pl apid=129 |
(1) j0 server down or crashed email from admins
(2) Email with subject:
JSOC:WARNING:INGESTING REALTIME DAYFILE FOR APID 129:Dayfile does not exist |
(1)j0 server down or crashed. (2)Email message in body :Error Message: -->Dayfile </hmisdp-mon/log/packets/0x0081/20100204.0x0081> is not probably there. -->This monitor script will continue running and resend another email notice if the data file is there and data starts flowing and stops again. -->To restart monitor enter command: /home/production/cvs/JSOC/proj/lev0/scripts/hk/monitor_df_rtmon.pl apid=129 -->To stop monitor run command(user=production):/home/production/cvs/JSOC/proj/lev0/scripts/hk/stop_monitor_df_rmon.pl apid=129 |
Possible Problems: (1)J0 server goes down because of crash or runs out of memory. (2)Or the j0 server loses nfs connection to /hmisdp-mon/log/packets directory. |
(1)Solution is to restart gdfrt.pl process with new LOG file marked with time restarted process. Restart monitor_df_rtmon.pl process.
a)Login into j0 as production user on terminal window. Execute: % gdfrt.pl apid=129 pkt_size=128 > LOG-RT-APID0x081-2010.05.11.09_10:00:00
This process decodes minute of dayfile and saves keywords to sdo.lev0_asd_0004 series.
b)Log on to j0 on another terminal window and execute: % monitor_df_rtmon.pl apid=129
This script monitors if dayfile is getting updated every 5 minutes, if not sends email warning.
c) View if passing when processing. Note the is one issue shown in the log file as failed when processing more then 1 minute of data at beginning of run. This is really ok. % tail -f <LOG-RT-<whatever>
d)View in LookData tool that data records are getting added to sdo.lev0_asd_0004.
(2)Check j0 can see /hmimon-sdp.
|
3 (low) |
->HK Level 0 ->RTMON Dayfile Save of only ASD Dayfile ->dsdf.pl cronjob on production@j0.
|
Email message with this subject.
JSOC:WARNING:Ingesting RTMON dayfiles: status:Possible error ingesting dayfile. |
Warning Message: -->Received count of <1> hkt files and count of <1> xml files from directory </hmisdp-mon/log/packets>. -->When executing script </home/production/cvs/JSOC/proj/lev0/scripts/hk/dsdf.pl> from cron job. -->Check if there is problem. The dayfiles to ingest into data series and delete from directory did not match the count of the dayfiles received. -->Possibly a problem ingesting dayfiles in series because of bad setting of SUMSERVER parameter or SUMS could be not available. -->Possibly not an issue which was caused by the dayfiles not being ingested on previous day(s) therefore the file count received today does not match files ingested.
|
The RT Dayfile was not saved to dayfile series.
Possible problems: (1)SUMS down.
(2)Path setup to pickup or drop off dayfile was not correct. (3)No dayfile on /hmi-sdp server. (4)J0 has no access to hmi-sdp server.
|
(1)Verify issue: Check if RT dayfile in sdo.hk_dayfile series where date is day of problem and source=rtmon. Check log file at log-df-rtmon for information on warning. Check for dayfile at drop directory defined in dsdf.pl and pickup directory in ingest_dayfile.pl
(2)Manual save RT dayfile that failed last night. Execute the manual command to save asd rtmon dayfile to sdo.hk_dayfile. ssh or login to j0 as production user. Since we are save only asd files(apid in hex use 81) use command with apid=81. Set merged value to 0 which mean save dayfile only. Utilize src=rtmon because file is in rt filename format. The dsnlist is a map to tell script were save dayfile.
a)Save dayfile: % cd /home/production/cvs/JSOC/proj/lev0/scripts/hk
% ingest_dayfile.pl apid=81 dsnlist=./df_apid_ds_list_for_rtmon src=rtmon merged=0
b)Verify files save by checking LookData tool with sdo.hk_dayfile.
c)Clean files: % cd /tmp21/production/lev0/hk_rtmon_dayfile % rm <files saved> % vi DF_DELETE_FILE_LIST_RTMON (Remove files saved already)
(3)Fix issue, if needed to, described in list issues in Problem Description column. Verify tomorrow no warning email is send. Verify log has no errors.
|
|
|
|
|
|
|