Art's archive
8-16-2012
- Fix a crash in show_info - the crash used to happen when DRMS timed-out waiting for SUMS to do a tape read (and it took more than 2 hours to complete). I modified lower-level code to pass up the call stack a new error code, DRMS_ERROR_SUMSTRYLATER. Then I had the various functions in the call stack handle this, with show_info finally exiting upon encountering this code.
7-26-2012
- Adding support for a new jsoc-main command-line argument, --DRMS_JSDRETENTION. This flag, when set, sends a tdays retention value to SUMS equal to the value in the series jsd. If SUM_get() is operating on SUs from multiple series, then the retention value sent to SUMS is the max() of all series jsd retention values. The effect of this is to ensure that SUs so accessed have a least the jsd retention value. If an SU is retrieved from tape, then it will have the jsd retention value.
7-5-2012
- I fixed a bug in create_series. The code that saves the default value to the db was using the keyword's format jsd field to convert the keyword's value to a string. When writing the default value for a keyword to the drms_keyword table, consistently convert values to strings - do not use the keywords format jsd field, which can vary between users. Instead use the conversions in drms_sprintfval.
6-28-2012
- Changed retention of site-specific slony logs to 30 days (from 10).
- Start work on a script to make parts of the production cvs JSOC trees.
- Move the maintainlogs.pl code that updates the lastarch field. Instead of calling it only if to-be-archived logs exist, call it always (unless an error prevents it from being called). It should always be updated if the script runs successfully to completion.
- Modified jsoc_export_as_fits to reject the export of Rice-compressed floating-point images.
- Started work on modify_series to make it properly handle the addition of new keywords to a DRMS series.
6-21-2012
- I put all export-related logs into /home/jsoc/exports/logs and hooked them all up with a cron-driven system that archives old logs.
- I published a four series for Todd.
6-14-2012
- Wrote and documented the function drms_appendcomment(), drms_appendhistory(), drms_appkey_string().
- I modified our make system so that libstats.a now links to all DRMS module/exe types.
- Found a problem with mdi.vw_V. It looks like there was some re-processing, of recent records. These records all have bad images associated with them. One of these records is number 8257849. The records' data are stored in TAS files. I see a TAS file with 321 good record, and 39 bad records. The bad ones all have recnums in the 82***** range, and the good ones have recnums in the 74***** range. It looks like somebody reprocessed records, and these re-processed ones are bad, but the non-reprocessed ones are still good.
- Helped Serena Criscuoli transfer data from Stanford to NSO. I wrote a script that uses the export CGI to get paths to several FITS files, and gave that to her.
- Modify the export code so that export of records from TAS files creates files with .fits extensions.
- Fixed broken export system that I caused when I modified the export code for TAS files with .fits extensions.
- Fixed daily build cron script.
Helped Isabelle by providing her my exportfits.pl script and showing her how to use it. I also have to fix a bug in the way it URI-escapes the cgi URI, and I made it first check the record-set query to ensure that it resolves into a >0 set of records.
6-7-2012
- Finished transtoproc.pl. This script queries the selected database to obtain information about all currently running transactions. Then it locates all processing running these transactions and prints a report.
- Worked on updating/transferring my paper to-do list to an electronic one. I've still got several pages of the original paper one to review.
10-13-2011
- Fixed a problem with duplicate records in the DRMS_Record_t struct. This struct maintains an array of pointers to records, and it is possible to have the same pointer more than once in this array. This can happen because different record subsets can overlap in their record pointers. The fix was to refcount records in the record cache, removing/freeing a record only if the refcount decreases to 0.
- Committed that work-around for the DSDS fits reader not converting integer data to floating-point data when the bzero and bscale keywords are missing.
10-06-2011
- Investigate purported drms_copykeys() issue. When the source record contains linked keywords, these links are not being "followed". drms_copykeys() calls drms_copykeyB(), and this latter function is not resolving linked keywords. Modified drms_copykeys() to call drms_copykey(), which properly follows links by reading keyword values with drms_getkey_p().
- Fixed a crash in drms_close_records() due to the existence of duplicate record pointers in the record-set being freed. Duplicates arise when sub-record-set-queries, in a comma-separated list of such queries, resolve into overlapping sets of records. To fix this, I added a refcount field to the DRMS_Record_t struct. Whenever a record is inserted into the record-cache, its refcount is set to 1. And whenever a record pointer is set to point to this record in the cache, the refcount is incremented. Whenever a pointer to this record in the cache is discarded, the refcount is decremented. And if the refcount reaches 0, then the record in the cache is removed from the cache, and freed.
- Investigate slow db queries. There were many sequential scans of the admin.ns table. I added an index to the name column, and then ensured that code in drms_record.c used this index.
- Fixed crash in drms_stage_records() due to a missing check for a NULL container.
09-08-2011
- Added 'filters' to dlsource.pl. If you list files or directories in the filter variables, then the implied files will be stripped from exports and file-set listings. Filters are not available for checkouts or file tagging.
- Added a new flag to dlsource.pl that works in conjunction with the -r and -R flags. When this new flag, -F, is set, then if a file to be checked-out is missing the requested tag, the head version of this file is checked-out.
- Rick, Jim, and I have been working on NetDRMS release 2.7. This release contains the new multi-SUMS. There are several SUMS localization items are that incomplete and that I'm working on for the next release (2.8). At this point, Jim is working on non-SU-specific scripts to start and stop MULTI-sums.
- I'm spending LOTS of time trying to improve our SQL queries. We have a "shadow" table that does make some queries faster, but the most important queries are currently running more slowly. We are trying to make shadow tables work.
- Modify the record-set parser to not go into an infinite loop if the user provides an empty-string record-set query.
- More SUMS issues investigations.
08-11-2011
- Re-wrote the Rules.mk for base/sums/apps. It had gotten too "junky" which made it hard to modify to support the new multi-SUMS architecture. After trying unsuccessfully to get it to work properly as is was, I gave up and re-wrote it.
- Export-system fix - The utility function that creates filenames was not properly distinguishing between source and target segments for linked segments. I added a new argument to this function that accepts the target segment, to be used only if the source segment is linked.
- Export-system fix - There was a hung jsoc_export_manage process that was blocking exports. I killed this process, but I have not yet identified why jsoc_export_manage was hung in the first place. This has happened before and needs to be fixed.
- Export-system fix - There was a problem in jsoc_export_as_is.c. It was not checking the return value of a drms_segment_lookup(). If a segment file is missing, then the path returned by SUMS is the empty string. I modified jsoc_export_as_is.c to examine this path, and if the path is empty, then write a No Data File value in the output record.
- Export-system - I updated ~jsoc/cvs/Development/JSOC to incorporate an hg_patch fix (it had special code to handle aia.lev1 - the code finds the "closest" observation given a start and stop time, and a cadence. The bug was that this code worked for only aia.lev1, and not any of the many series that overlay aia.lev1).
- I modified gen_init.csh,
- Worked on the query-optimization changes.
- Added code to lib DRMS in preparation for future query-optimization code. This prep code will prevent users from adding records to series that have associated summary tables. When I commit the query-optimization code and we have a mix of "new" code and "old" code, the old code will not be able to add records to series that has summary tables. This is good, but if a summary table exists, then we need the new code to update the summary tables, and the old code does not have this update functionality.
- Helped JGB with cfitsio funpack and imcopy. He was trying to uncompress an image and create a new fits with this image in the primary HDU. I figured out that you can specify the correct HDU (#1) in the INPUT fits file. If you do this, then the output fits file will have only a single HDU with the image in it.
- I created the jsocprod db user and the associated su_jsocprod namespace. In addition to read/write privilege in its own namespace, it has the privileges of dsdsdata, mdidata, and sdodata.
- I helped Yang gain permission to create series in the hmi_test namespace.
- There is some problem with the code that creates aia.lev1_nrt2. It isn't properly setting bzero/bscale keywords in the database.
- There was a problem with the SUMS client code getting out-of-sync with the SUMS server code. Jim rebuilt and ran SUMS and then data stopped getting ingested into SUMS. Phil sync'd ~jsoc/cvs/Development/JSOC/base/ and rebuilt and this caused data production to resume.
- I answered Igor's questions about slony. He set up a slony SERVER with "log-shipping" at NSO. I think the idea is for him to distribute GONG data to us via slony logs.
- We're going to do a full JSOC release early next week. The NetDRMS release will follow later when Rick has time to do it.
03-10-2011
- Wrote makelcindices.pl - This script adds lower-case indices to 4 drms_* tables. We have queries that run billions of times a day on these tables, and the queries were resulting in sequential scans through these tables. By adding these lower-case indices, and making a minor change to the queries, the queries do index scans, which run ~30 faster than the sequential scans.
Made changes to queries that access drms_* series to improve efficiency by taking advantage of recently created lower-case indices. OLD: "from %s.%s where seriesname ~~* '%s' order by linkname", namespace, DRMS_MASTER_LINK_TABLE, template->seriesinfo->seriesname", NEW: "from %s.%s where lower(seriesname) = '%s' order by linkname", namespace, DRMS_MASTER_LINK_TABLE, lcseries);"
- In exputil.c, instead of using drms_getkey_string() to get the value of a keyword that appears as a substring in an exported file's filename, use drms_keyword_snprintfval(). The latter uses the format field of the keyword when generating the keyword value, but the former uses some default that can result in spaces appearing in the value (which is bad for filenames). This code is used in the export workflow to generate exported files' filenames.
03-03-2011
- I really have to have some discipline and get back to writing these progress reports.
- I helped Priya test her changes to the export system (to support movie/image exports). I re-figured out how to test code in Priya's environment (code in in home directory). You have to run as Priya, but connect to the db as user production. This is required because the manager program (jsoc_export_manage) needs to write to series that only production can write to.
- Using the tables provided by Consistent State, I found that we are doing sequential scans on all drms_* tables when we read series from disk. To fix this, I had to create new indices on these tables. The key is 'seriesname' and the index has to be a lower-case index. I also had to modify lib DRMS to change the where clause from seriesname ilike 'mdi.fd_m_96m_lev18' to lower(seriesname) = 'mdi.fd_m_96m_lev18'.
02-03-2011
- I've been neglecting to write down my weekly progress reports again.
- I helped Rick track down a problem with his time_travel code (or something like that). We were getting an error message from CFITSIO (although it printed an "unknown error" message). We ran valgrind and found another bug - an uninitialized status variable in drms_segment_writeinternal(). I haven't fixed that yet. We also found that the CFITSIO problem was due to a bug in the output series' segment specifications. They were missing the empty string for the cparams field. The create_series code was then using the next field, which was "0.0" (bzero). Then CFITSIO was trying to use "0.0" as a compression argument, which is not valid.
01-28-2011
- I tracked down a problem with drms_log. The bug was created, in drms_log.c, on 4/17/2009 (by tplarson), so it has been there a long time. He changed the psql select statements to add more fields. This changed the position of the sunum field (from 0-based index 2 to 6), but he forgot to fetch the sunum from the query result from this new position (so he was fetching the user name and trying to convert it to a long long and use it as a sunum).
01-21-2011
- Created a new db user, thartlep, and a new namespace, su_thartlep.
- Help Carl set-up iristest machine with customized full-JSOC build. We created a config.local for use with iristest. There were some build problems, most likely related to Carl's use of icc 9.1, instead of a more current one (we generally use 11.1).
- Make the compiler autoconfiguration configurable so that the user can turn it off.
01-14-2011
- I added a composite index to hmi.lev1. If you now compare show_coverage hmi.lev1 low=15795318 high=15795637 block=32 key=FSN vs. show_coverage hmi.lev0a low=15795318 high=15795637 block=32 key=FSN, the run time is pretty comparable (I made the hmi.lev1 query faster). They both now run in sub-second time (the hmi.lev1 query that last took me 1:40 now takes less than a millisecond!). And I'm running debug code, so expect quicker run time on release code. My analysis of the query planner was correct. I created a new composite index on hmi.lev1 (leaving the indexes on FSN and T_OBS_index intact), and this composite index now gets used so that the sort now runs very quickly (because it doesn't have to actually do any sorting - the composite key has the records sorted in the order needed by the query).
- I added an index to hmi.cosmic_rays that fixes the slow queries on hmi.cosmic_rays. The query now takes less than a millisecond (instead of 2 seconds). I did the same for sdo.lev0_asd_0004. Queries went from about 2.5 seconds to 0.1 seconds. In both cases, there were ONLY composite indexes, which perform much slower than non-composite ones.
- Added new DRMS API functions, starting with drms_clone_records() and working down the call stack, that are guaranteed NOT to connect to SUMS. If they need to connect with SUMS to complete correctly, then they return a status code of DRMS_ERROR_NEEDSUMS. The no-sums version of drms_clone_records() is drms_clone_records_nosums(). The main purpose of this was to allow set_info to change keyword values without requiring fetching of SUs from SUMS (there is no reason to fetch SUs if you're just modifying keywords and sharing segments). I handed this off to Hao who says this is working fine.
01-13-2011
- Fix memory corruption problem in set_info.c.
- Worked with Priya to get her jsoc_export_as_images and jsoc_export_as_movie changes into our production environment. I had her check in her scripts that get called from the drms_run script. We fixed a couple of minor bugs in them. I made the changes to jsoc_export_manage to support new protocols (JPEG, PNG, MPEG). I merged all of Priya's exportdata.html changes with Phil's recent changes to the same file. I made a couple of minor bug fixes in Priya's changes to exportdata.html. I spent a long time chasing a red herring while testing. I thought that there was a problem with jsoc_fetch adding a new request record to jsoc.export_new. But I was looking in jsoc.export, the wrong series, thinking that is where jsoc_fetch writes the initial request record. Sat with Priya and debugged this problem, to find out I was looking at the wrong series. I had to fix a couple of bugs I made in jsoc_export_manage to get JPEG and MPEG export working (forgetting the "== 0" in a strcasecmp for the JPEG export; accidentally putting single quotes around $REQDIR in the export cmds). Got help from Phil to use jsoc_fetch_test properly to capture the stdin that gets sent to jsoc_fetch. Tested a bit with Priya after it all worked.
01-06-2010
- Restored the doxygen web pages by re-creating /web/jsoc/htdocs/doxygen_html, then I re-ran the doxygen program on the head versions of the code files in cvs. I made the group owning /web/jsoc/htdocs/doxygen_html to be 'jsoc' (since user jsoc, the user that runs doxygen, wasn't a member of 'www').
- Verified that we didn't miss any MOC server FDS files during the 10-day d02 downtime at the end of December.
- The HMI observable code checked into cvs was not building. I tracked down 3 problems: 1. A couple of changes to make files were not checked in - Sebastien checked those in. 2. The Rules.mk for the observable code didn't have the dependency between the observable modules and the interpolation libraries quite right. I fixed that. 3. The "sock" versions of the observable code weren't building. I fixed that.
- I reorganized the processing/protocol code in jsoc_export_manage.c. I moved the processing code so that it executes before the protocol-export code. Before the change, the protocol-export code was duplicated - once for no_op processing, and once for hg_patch processing.
12-09-2010
- Move call to SUM_delete_series() to location above calls to delete DRMS db tables/entries.
12-02-2010
- Worked with Priya on her jsoc_export_as_movie and jsoc_export_as_images. Came up with a plan for running the movie-making software on the output of render_image (a DRMS module). She wrote a bash script to be called by the drms run script produced by jsoc_export_manage. The bash script generates the index.txt file. We still need to modify jsoc_export_manage and jsoc_export_make_index to accommodate the new export products.
- Helped Priya get her bash script running on the qsub cluster.
- Wrote code to support DRMS restarts of SUMS connections when the connection to SUMS dies.
- Worked with Rick to publish several series.
- Used the tools provided by Brian F. and worked with him and Keh-Cheng to upgrade slony to version 1.2.21.
- Tried to track down problem where a bunch of records had their retention times drastically reduced.
- Worked with Jim to fix a localization problem in the sum_rm code. I modified files and sent them to UCLAN, they tried them out, but had troubles compiling the code. The next step is for me to examine their make logs and figure out what went wrong. I need to resolve this before the next JSOC release.
11-19-2010
- Modified publish_series.sh so that only one instance may run at any time. Also, added a function to tee output to both a logfile and to stdout. Modified logfile name to include PID of running publish_series.sh.
- Worked with Priya on design of jsoc_export_as_movie. She will use the jsoc.export SU to put all her temporary files in. She is going to create links to the .png files output by render_image - the link names will be of the form [0-9][0-9][0-9][0-9].png, the form needed by ffmpeg. She is also going to parameterize the script to accept Stanford-specific arguments (like paths). She will modify jsoc_export_make_index to accommodate the new movie and image protocol.
- Helped John Beck investigate a failure with his fits-writing code - ds9 crashes trying to read a file he generated.
- Helped Hanlong to regain his ability to run ingest_fits.
- Investigate our use of the serializable transaction level: we are running the db in the serializable isolation level. In general, the module (transaction) that modifies data really doesn't have to worry much about race conditions during writes. There will generally be just a single instance of a single module being run by a single person trying to write records to a db table. So we never see the obvious error that COULD happen with the serializable isolation level - we never have the case where the first transaction changes a row which is then also changed by a second transaction (which would result in an error, with the second transaction rolling back). We also don't generally do row updates, just inserts, and half of what the serializable isolation level is good for is preventing updates to the same row (which doesn't happen in our system). So, we actually have no need for serializability at all in most cases (we don't access our tables for writing concurrently). We could actually run outside of a transaction 95% of the time and things would be fine (as long as we are careful not to run 2 or more instances of a module running on the same output records, which would cause a rollback in serializable mode anyway). We could probably get away with removing our BEGIN/COMMIT statements altogether. We could also do as Joe has suggested (and as we've discussed here several times) doing calculations outside the transactions, chunks at a time, committing the chunks as they complete in a transaction. PG also supports locks of several kinds (table locks, row locks, etc.) - if we needed, we could lock certain tables to stop race conditions, although since we don't have concurrent writes, the point is moot.
- Ticket #319 - Modified drms_link_getpidx(DRMS_Record_t *rec) to return a status code that is used by drms_insert_series(). If drms_link_getpidx() fails - in the reported case, because the linked series did not exist - then drms_insert_series() fails with an error.
- Track down problem with hg_patch exports. It appears that the exportdata html page was using the full-disc series when erroneously deriving a filenamefmt string to use for the corresponding hg_patch series, instead of using the actual hg_patch series.
- Helped Rick update hmi.drms_keyword (on hmidb2) to fix a mistake he made on 3 series he already published.
- Helped track down a problem with failed slony log production last after midnight on 11/22/2010. The problem was a bug in slony that was uncovered by a publication Rick performed. The publication went fine and entered data into the _jsoc tables correctly, but the slon daemon that reads from that table was overflowing a buffer in the code the writes the slon daemon log (on hmidb2). I am working with Brian F. at consistent state to install an upgrade to slony that will fix this problem. We are going to install the patch on CS VMs so we can test out the fix thoroughly before applying it to our own system.
- Looked at Elena's machine - it has an old OS and things aren't working on it. Brian R. will install a new OS when he has time.
Work with Jim on getting localized db@host:port into SUMLIB_RmDo.pgc.
11-18-2010
- Add 'RECNUM' to set of FITS keywords exported during export of FITS files; modify jsd parser to print a warning if provided external keyword name (in keyword description) is RECNUM, since this will collide with auto-export of RECNUM.
- Meet with Priya to research what is needed to create a 'jsoc_export_as_movie'. Priya is modifying exportdata.html to add 2 new items to the protocol drop-down: movie and jpeg. There would most likely be no changes to jsoc_fetch as the new protocol strings would fit in the existing Protocol column of the jsoc.export_new table. I'll will modify jsoc_export_manage to write a drms_run script that calls jsoc_export_as_movie (a script that calls render_image followed by ffmpeg, the program that creates movies from images).
- Ticket #317 - Tracked down an issue with the allocation of slot number to storage units. The current code isn't properly freeing slots whose records were freed. Also, there is a problem whereby a slot dir could be deleted when in fact the slot was shared between records (via drms_clone_records).
- Tracked down a problem with hmi.web_images. The data segment was declared constant, but it should have been declared variable.
- Created a new DRMS db user account and namespace for Kaori. Also helped track down a problem with her .pgpass.
- Tracked down a problem with create_series (drms_insert_series()) not working properly when the series being created has a link to a series that does not exist. The drms_insert_series() code gets the list of target-series prime keys from the target series, and if it doesn't exist, it doesn't create the proper set of db columns in the series (being created) table.
11-04-2010
- Working on set_info. There are many unforeseen issues to address, also there were a few import bugs to fix.
- Track down problem with Sudeepto's inability to run delete_series. The problem was that he was not a member of the db group 'jsoc', so he couldn't read from _jsoc.sl_table, and this permission is needed when the drms_replicated() db function is executed. Modified delete_series to print a warning message when this lack of permission exists, since it took a long time to track down.
- I created the NetDRMS 2.4 release CVS tags - Rick and Igor are testing/evaluating the release.
- Help Yang track down memory corruption problems in his syncrhonic map module.
- Wrote/submitted script to maintain monitoring software configuration tables.
- Phil developed new plan for keeping AIA data online. LMSAL would keep new disks locally (a sufficient number to hold all AIA data). Stanford would mount those disks over a fast network, so they would appear to be local to Stanford and Stanford's SUMS. Stanford writes to these disks and LMSAL (and all other users) obtains the data files from these disks.
- We published LOS observable series hmi.M_45s, hmi.V_45s, hmi.Ic_45s, hmi.Ld_45s, and hmi.Lw_45s - they contain records from middle of September to end of October.
- Jim is fixing some localization issues in SUMS (things like the port number used for the db connection). In other words, he's removing hard-coding of things to make localization easier at remote sites.
10-21-2010
- I have been less than diligent lately about writing progress reports.
- Friday - help Priya.
- Yesterday, spent most of the day tracking down a problem with DRMS/SUMS use of SUM_put(). Duplicate SUNUMs are ending up in SUMS. I was not able to find any way that DRMS could potentially send duplicate SUNUMs to SUMS. I added logging to drms_server.c that would print all the SUNUMs passed to SUMS. Jim then built his lev0 code with this logging, and found no duplicates sent, but still found duplicate SUNUMs in SUMS. So the problem is most likely in SUMS.
- Monday - sick, took day off of work.
- Created a new db account for Peter Williams and helped him get set up to use DRMS.
- Wednesday - fixed a problem with SUMLIB_Main_Update.pgc - there was an extra declaration for SUM_Main_Update() which differed from the declaration in sum_rpc.h. Removed the declaration in SUMLIB_Main_Update.pgc. Update NetDRMS_Ver_2-4 tag so that the files in the set match those in the 2-5 set (plus the version of SUMLIB_Main_Update.pgc with the above fix in it.
10-07-2010
- Took a bit of a respite from writing progress reports.
- Modified Slony-replication configuration to use 'production' code in /home/jsoc/cvs/Development/JSOC.
- Investigated jsoc_fetch code against VSO's requirements. Awaiting Igor's feedback so we can modify as needed.
09-16-2010
- Ticket #293 - When determining if the per-segment keyword is relevant to the segment being exported, use %03d for the format, not %d, when converting the segment number into the search string. The code looks for the presence of the search string in the keyword name - eg, it is looking for 002 in TOTALVALS_001. If the search string is present, then the keyword is relevant to the segment being exported. The bug was the use of %d for the format string, instead of %03d. If you use %d, and the segnum is 0, then you are looking for "0" in TOTALVALS_000, TOTALVALS_001, etc., and all keywords will match the search string.
08-12-2010
- Fixed problem with export code - there was an uninitialized variable in drms_segment_filename().
- Helped track down a problem in Rick's script that runs mtrack.
- Implemented DRMS side of SUM_put() multi-SUNUM arguments. I'm planning on testing the changes on our private SUMS on shoom.
- Investigating the inefficiencies in SUMS key.c code. Keys are stored in a linked list. Lots of sums code needs to find particular keys in the list, and that is achieved by a linear search in the list. I added a new field to the KEY structure - a pointer to a hash table. Every list has a single hash table, and all the hash pointers in all the KEYs of a list point to the same hash structure. This hash is used when finding a particular KEY so that it can be found in constant time, not linear time.
- Investigated inefficiencies in jsoc_info. There are two main problems: 1. The data passed from jsoc_info to lookdata.html (used to fill out the record table) is not organized by record, even though the data are read in record order (by iterating through a record set). First is an array of a subset of record info, then an array of keywords, with subelements containing an array of per-record keyword values, then an analogous array of segments, followed by an analogous array of links. In other words, the info gleaned from iterating through records gets all jumbled up and printed out of record-order. The result is that all the information must be held in memory until the very last bit of output is printed, resulting in a large memory footprint. The lookdata.html wants the data in record-order anyway, so lookdata.html has to make several passes through the data to put it back into record order. 2. There are no checks to reject requests for a very large number of records.
- Document slony replication.
08-05-2010
- Fix for jsoc_export_as_fits bailing out with an error if a segment file is missing from an SU of a multi-segment record.
Call drms_stage_records() before calling the lower-level seg-specific export function so that we don't have to call SUM_get() on each record - just once for all records; drms_segment_filename no longer assumes that rec->su exists - if it doesnt, it calls SUM_get(), and if it still doesnt, it sets the filename to the empty string.
- Prepare for presentation at local helio meeting at Stanford. Give presentation.
07-29-2010
- Restore -lssl flag when building SUMS apps - needed for MD5 stuff. The build was failing on shoom, and I discovered that this flag was missing. I'm not sure how this built on all machines other than shoom to begin with.
- Use sunum as the sunum list argument for jsoc_fetch's exp_su mode - there was a bug in the ds argument, which will be fixed, but for now, use sunum, which makes more sense anyway.
- Document drms_run.
- Help track down memory leak in SUMS. Keh-Cheng found the major problem first - the users of get_effdate() was not freeing the returned string, and get_effdate() was called for every SUMS request. Keh-Cheng thought libecpg was leaking query strings, but I don't think that is the case. He was dumping the heap memory and seeing a bunch of 'COMMIT' strings, but I think that he was looking at COMMIT in unallocated heap blocks - the heap was large and 'holey' because of the thousands of leaked get_effdate() strings.
07-22-2010
- Ticket #297 - Modify DRMSKeyValToFITSKeyVal() and cfitsio_append_key() to pass the keyword format string from the DRMS keyword struct to the FITS keyword struct. Then use the keyword format field when printing FITS floating-point keyword values into the FITS header.
- Respond to Romeo with questions about Kevin's next subcontract.
- Ticket #301 - jsoc_export_as_fits was deriving the FITS keyword name and the FITS keyword value from keyword structures from the target of linked series. Instead, it needed to use the target keyword structures for the FITS keyword values, but use the source keyword structures for the FITS keyword names.
- Try and track down the cause of that damn slony replication failure - driving me crazy - we can't seem to make any progress on it. Something happened on May 9 around 1-2am where about 1000 records were added to 5 series in hmi_test in the master database, but the records were not replicated to the slave database.
- Add Isabelles limbfit code to cvs.
- Work with Isabelle to provide her DRMS and CVS help this week.
- Work with Brian Fehrle of consistent state to install slony auditing code onto hmidb. Put consistent-state auditing code into our CVS tree too.
07-15-2010
- Instead of /tmp21, use /tmp22 to hold product files downloaded from the MOC Product Server.
- Help Xudong check in files in ~xudong/cvs/JSOC/proj/mag/ident to the CVS repository.
- Go through/review 9 days of emails that accumulated while I was on vacation.
07-01-2010
- dllzp.pl - On 6/25/2010 Carl found another script that needed editing. He needed to hard-code the home directory to /home/production instead of using the $HOME env var in this script. He did that and checked the script into CVS, then I updated that script in the live environment (/home/jsoc/cvs/Development/JSOC) around 9:30pm. At 10:10 pm I noticed a failure in the master download script (dllzp.pl). There was a typo in a variable name (was $scriptpath, should have been $scriptPath). I fixed that around 11:00pm. Then dllzp.pl ran again at 11:10pm, and this time it ran correctly (the script runs every hour from 10:10pm to 6:10am until it gets new files, or until 6:10am if no new files are found). We got an error message via email the next morning. The problem was in the clean-up phase of the ingestion of the new files, and the failure did not prevent successful ingestion. One of Carl's scripts was trying to overwrite a status file that use jsoc had no permission to write to. I fixed the permission issue, then Carl manually cleaned up the files that had been successfully ingested. We then manually ran dllzp.pl around 10:30am on 6/25 so that it would download 2 files again (for apid 5). dllzp.pl successfully re-downloaded those files and called dsdf.pl successfully. dsdf.pl successfully ran end-to-end this time - no warnings or error messages were produced and the 2 downloaded files were re-ingested successfully. At this point, we're pretty confident that the code is working fine.
06-24-2010
- Added a new DRMS call (drms_getsuinfo) to SUM_infoEx(). Supports both the direct-connect and socket-connect module methods. It handles duplicate SUNUMs, by creating a list of SUs devoid of duplicates, and sending this list to SUMS. Upon return, the DRMS code creates one SUM_info_t per original SUNUM, regardless of duplicates.
- Added drms_record_getinfo() that calls drms_getsuinfo().
- Modified show_info to use this
- Modified the script (dllzp.pl) and cron job that downloads the MOC Product lzp files. The script now looks for new lzp files only in the window in which we expect the files to arrive (6ut - 12ut the next day). If it doesn't find at least one new file by the end of the time window (12ut), then it prints an error message to the log that gets mailed to a mailing list. I modified the cron job check once an hour during this time window.
- Track down problem where DSUN_OBS was lacking precision - problem was in build_lev1, where it was converting a double (from iorbit.c) to float.
06-10-2010
- All replication code: do not check for the existence of the subscribe lock-file after releasing the lock - to do that would require having the lock as another part of the code could have acquired it.
- Unpublish should call delete_series with the new -k flag - this tells the delete-series code to not tell SUMS to delete the SUdirs.
Modify drms_delete_series to take a new parameter, keepsums, that, if set, will skip the call to SUMS that causes the SUdirs to be marked for deletion. Fix a crash in drms_record_directory - dont try to use record->su if it is NULL (because the record has no SU associated with it). Add some debugging information, in verbose mode, to the drms_delete_series call. Add publish code that properly identifies which transaction is blocking a publication. Add some more logging to the unpublish code.
- Fix for make warning complaining about multiple definitions of libstat.a target commands.
- TRAC #275 - I verified on 6/9 that the exported fits files contain linked keywords (when the record being exported belongs to a series that contains linked keywords).
- TRAC #94- This was implemented some time ago - drms_merge_record() is the API function call.
- TRAC #67 - All the Rules.mk files that caused fpic obj files to be created have been modified so that this fpic code isn't uselessly generated. But references to the fpic libraries still exist, albeit they are orphaned. Probably leave it this way since a least one site (IAS) was asking for this code.
06-03-2010
- Wrote a program for Keiji to copy his data (keywords plus images) from a badly defined series to a new one.
- Spent time trying to track down slony-replication failure (for hmi_test.V_45s). There are records that exist on hmidb (slony master) that do not exist on hmidb2 (slony slave). All missing records were created somewhere between 2 - 3 am on Sunday May 9. The program creating the records was run with logging turned off. One other set of logs that might have been helpful was also not present. And the logs we did have don't have information needed to pinpoint the problram. We're making changes to preserve this logging, and to add additional monitoring. The fix will be to re-publish the series - all sites will have to re-subscribe to the series once publication has happened.
- Looking at some bugs reported by MPS in the remotesums_master.pl.
- More investigation of long-running, hung jsoc_fetch's on solarweb.
05-27-2010
- Fix record-set-query parser so that it properly tests for mixed query case - at least one non-prime-key query + at least one prime-key query.
- Fix minor memory leak in base/drms/apps/accessreplogs.c.
- Modify archivelogs.pl to use cmd-line tar, not the Archive::tar perl module. The latter needs to hold the entire tar contents in memory. We have lots of data in our tar files, and we cannot hold all the data in memory.
- Use case-insensitive comparison when comparing the series name in the seriesname columns of the drms_ tables with the series name passed into the createtabstructure module.
05-20-2010
- Single quotes in SQL where clauses changed into double quotes in lookdata; in DRMS, the parser changes them back to single quotes.
- Make sure that the slon start and stop scripts wait until the slon PID files appear (in the case of a start) or disappear (in the case of a shut down) before releasing the slon-daemon lock.
- When parsing record-set query skip values, for slotted keywords, must convert skip value into a whole number of slots to skip.
- Do performance reviews (as supervisor).
- Fix for improper test for mixed query case (a query involving where clauses on both prime keys and non-prime keys). drms_names.c was looking only at the very first record-set filter and looking for a where caluse - it needed to look at all record-set filters.
- Investigate John S's problem with aia_test.synoptic - looks like he ingested records improperly (the T_REC_index doesn't make sense given the supposed epoch of 1993.01, it makes sense for an epoch of 1977.01).
- Fix broken build - hg_patch make files were not 100 percent correct.
05-13-2010
- Change jsoc_export_as_fits so that it reports export-payload size in MB, not bytes.
- Fix compiler warning for drms_keyword.h, drms_segment.h, drms_link.h - header defining XASSERT was not #included.
- Fix the subscription retry code - it was using the ps -p command to get the basename of the command that started the update process, but this basename is truncated to 15 chars. The result was a failed comparison, and the execution of multiple update processes for the same subscription file, which is bad.
- Modify subscribe_series to disallow the runing of simultaneous instances.
Dont ask user if they want to delete a series that they are trying to subscribe to but in fact they already have such a series on their site. Instead, tell them that they cant resubscribe to a series if they already are subscribed to it, and that they should unsubscribe first if they want to re-subscribe to it.
- Publish 6 series for Phil. Went not so smoothly. Needed to manually shut down the slon processes a couple of times. The subscribetemplate file was not up to date, which was killing the publish. I often had to manually delete series on hmidb2 when the publish failed (do so as db user postgres). Also, people had very long running transactions, which made it difficult to get anything published.
- Add a lock to subscribe_series so that there isn't more than one instance running from a given node.
05-06-2010
- Modified get_slony_logs.pl - clean up downloaded tar files and unneeded log files extracted from tar files. It is possible that this script will download a tar ball of logs, but that only some of the logs in that tar ball are needed. This will happen if the script runs and downloads/ingest logs thatare then later on tarred into a tar ball which is then downloaded by this script. Those logs previously ingested are not needed. This script will then not clean up those files, since clean-up happens in the function that does the ingestion, but that function will not be called for the not-needed logs.
04-15-2010
Changed SetKeyInternal so that it returns an error if an attempt is made to set a constant keyword.
- Changed drms_copykeys so that it skips target keywords that are constant.
- Fixed a bug in modify_series - it was attempting to create a record prototype (for use in creating a new series) from an existing record template by modifying the template. This didn't work (you shouldn't even modify a record template for starters) - the modify_series code was deleting the template! I changed this behavior so that now the template is first copied into to new, stand-alone record, and then this record is modified and used as input to the create series call.
04-08-2010
- Worked M - W with Kevin, the dbase consultant. We focused on getting publish_series and unpublish_series working. The main issue we saw was that long-running transactions can prevent the creation of a second replication set, and it can also prevent the merging of the second set with the first one. We modified publish_series so that it does NOT try to do the set merging - instead Kevin will write a cron script that attempts this "clean up" on a regular basis. Having more than one replication set has no impact from the subscriber's point of view. publish_series will also make multiple attempts to create the second replication set, in case the first attempt fails because of a log-running transaction.
- Work on Kevin's contract - finalized the amendment for April, and talked with Romeo about a plan for May and on.
04-01-2010
- Add the slon -x command to the slave slon. This command is used to generated a flag file every time slony creates a log. The flag file indicates which log files are safe to parse.
- We (vso people, Jennifer, I) tracked down a problem in the slony-log parsing code (an uncaught error and race condition). Saw several other problems in that code and have asked Igor to fix the problems.
- Fixed a problem in the slony-log archiving code - a needed env variable wasn't getting set, and the check on return codes from a system code wasn't being done properly.
03-25-2010
- Fix broken build. Forgot to make dependency of hg_patch_sock on libastro, so I added that in the hg_patch Rules.mk.
- Add read permissions on schema admin and tables admin.* to public user on jsoc database on hmidb2.
- create su_apache namespace tables on hmidb2 - drms_* tables; these are needed so that user apache can access hmidb2.
- createtabstructure.c - Do not automatically attempt to override the archive, tapegroup, owner, and retention values for the series - only do so if the user provides values for those fields.
- Combine redundant variables shared by get_slony_logs.pl and subscribe_series. Also, put these variables into a single configuration file - repclient.template.cfg.
03-18-2010
- Add instructions to config.local.template detailing how to build SU-proj dirs at NetDRMS sites.
Fix crash in show_info - a deleted SU will cause rec->su to be NULL. Code wasnt checking rec->su and was dereferencing it.
- Do 2 new JSOC releases - 5.7 and 5.8.
- Needed to allocate a keymap structure before calling drms_keymap_parsefile(). This was a problem Tom was seeing - a crash when calling jsoc_export_as_fits() with a kmfile arg.
03-11-2010
- Rename the base and supplementing record parameter names; also fix a bug in drms_link_set() - code was using the template keyword to obtain a value, but it should have been using a keyword instance.
- Fix bug, in drms_names.c, where a string val was not initialized causing code to attempt to free the uninitialized pointer.
- Modify the new drms_link_set() API function to make the identity of the two record-parameters clearer (baserec and supplementingrec).
- Incorporate Igor's jsoc_fetch changes - VSO needs to see json output, even if SUMS is down; I modified his changes to allow a new request to be processed if SUMS is not down, and not all SUs are online (his changes had blocked the new request from happening). Also, I prevent non-VSO requests from proceeding if SUMS is down.
- Add new CVS directory for farside project.
- Help Bala, Richard, Anthony S., Rick with various DRMS things.
- Move interpolation code from JSOC/base/libs/interpolate to JSOC/proj/libs/interpolate.
- Add Jesper's fresize.[ch] code to interpolate library.
- Working on making Stanford "proj" directories available and buildable by NetDRMS sites (via an entry to the config.local file).
- Allow the caller to provide a flag, NOASYNCREQUEST, that prevents jsoc_fetch from doing an asynchronous request - which involves writing an entry to jsoc.export
- Commit exportdata, which apparently never had been committed, to cvs. Also, make /web/jsoc/htdocs/ajax point to the ~jsoc versions of the webapps.
03-04-2010
- Track down and fix problem with the writing of partial fits files. The problem was in fitsrw - an hcontainer was being modified while an iterator was operating on it. Also, add a missing DRMS_Env_t *env parameter to 3 fitsrw APIs. Also, clean up the fits-writing code (including removing unused functions in cfitsio.c)
- Ensure that you dont try to write checksums for fitsfiles that are read-only.
- Modify drms_copykeys() to skip target keywords if they are linked keywords.
02-25-2010
- Add the call to copy the tar files into SUMS in archivelogs.pl (slony)
- dlMOCDataFiles.pl - If a previous download failed, try again up to 4 times. If it still fails, then a manual download must be performed.
- dlMOCDataFiles.pl - Fix logic that determines when to retry a file download that failed to download previously.
- Ported the new archivelogs.pl and accessreplogs to hmidb2. There were a number of problems with accessreplogs accessing SUMS - user postgres wasn't a member of unix group SOI, hmidb2 wasn't on yellowpages, etc. Keh-Cheng and Brian made changes to allow all this, although they suggested that this isn't good from a security perspective (they think that hmidb2 should be more isolated). I offered to change the architecture (ie, run accessreplogs somewhere else, using file locking to coordinate access to the slony logs by the processes running on hmidb2), but Keh-Cheng said I should just run accessreplogs from hmidb2, so I did.
- Show Bala how to call drms_setlink_dynamic when the target keyword is a prime, slotted keyword.
- Remove all but a single module from the default make in the cookbook project.
- Review Kevin's invoice.
Fix a bug in iorbit.c. In some cases, the orb_rec string for a particular target time wasn't being set. This was due to the fact that I was capturing one orb_rec string per grid vector. But there could be fewer grid vectors than requested target times. Change the way that the orb_rec record is located in the orbit series. Use the indexmap returned by GetGridVectors() - it contains the indices into the array of grid vectors of the vectors that lie immediately below the target times. indexmap maps target time into an index into the grid vectors.
02-18-2010
- Created new script (archivelogs.pl) for archiving original slony logs into .tar.gz files. This will be called by a cron tab entry in hmidb2. The script, after tarring, will call accessreplogs to ingest the tar files into SUMS for permanent storage.
- Add module to store/retrieve slony logs from a series.
- Make the base ones not stanford-specific, put those in proj (slony_env.h, publish_series.sh)
- Export iorbit_carrcoords for sebastien.
- Modify the path to toolbox.pm so that it is relative to prep_slon_logs, not the working directory.
- Install the wrapper script drms_flock.pl on hmidb2. This script takes a script as an argument, and executes it, but only if it succeeds in obtaining an exclusive file lock.
- Install the new archivelogs.pl on hmidb2.
- Fix development download of lzp data - previous failures required a force run (-f arg) of the download script.
- Fix bug in copyfile (if used a second time in a program, it forgot to malloc); also fix a memory leak in the function.
- Fix bug in drms_stage_records - was not staging if the number of records was less than 2, but it should have staged if there was a least one record.
02-11-2010
- Incorporate Igor's changes to subscribe_series - adding wrappers for ssh, scp, etc, doing some additional testing on files writing more logging.
- Add RM function (and use it) to subscribe_series.
- Make the series name in createtabstructure (the name actually stored in the ns.drms_* tables) case-insensitive. subscribe_manage was calling with the wrong case, but as it turns out, the series names are stored in a case-insensitive way.
- Modify subscribe_series: it no longer will apply the createns.sql file if the schema of the series being subscribed to is already set up properly.
- Add more logging to subscribe_series.
- Add some robustness to subscribe_series (slony): 1. Will now recover from file download interruptions. Will recognize partial file downloads and resume by downloading only the bytes needed to complete the download; 2. Add a state file (yes, another file) that allows the client to resume from exactly where it left off, should there be some kind of interruption (like a ctrl-c or machine goes down).
- More subscribe_series changes: Figure out if an unsubscription is happening. If so, run get_slony_logs.pl to flush out remaining sql log files. Also, make changes to support namespace-specific createns.sql files.
- Changes to sql_gen and subscribe_series (slony) - Write out individual createns.sql files - one for each schema since the client must decide whether or not it needs to apply those files.
- subscribe_series (slony) - Add a little more logging to help debug a problem (but the problem wasnt reproducible with these changes); also move the RM command that deletes the createns.sql - only delete the file if the schema checks dont fail.
- Implement client side of unsubscribe-series. Downloads all sql files not yet downloaded before unsubscription happened, then applies them, then deletes the series being removed from subscription.
02-04-2010
- Rename GetTableOID to avoid a conflict with a function of the same name in DRMS.
- Do release 5.6 of JSOC and 2.0 of NetDRMS.
- New jsds for the *.fds series - these include the new DATE keyword and the newer mechanism for the _step keywords.
01-28-2010
- Fix code that improperly set isconstant field in drms_keyword tables when creating a series - the isconstant field was not set correctly for index, slot, and ts_eq types of keywords.
- Fix code in drms_record.c, drms_fitsrw.c that didn't use the proper method for obtaining sorted keywords, links, and segments.
- Remove print statement that displays when parse_zone fails; this function needs to be called silently from certain places.
- Don't automatically override the owner field of the jsd (for socket-connect modules).
- Add support for time intervals - keywords that are time type with a time-field that is a recongizable time interval unit.
Ensure the link->info->rank field gets set when the template record is created.
- Fix for bug when saving the isconstant column in the drms_keyword table.
- Update the sdo_ground.fds and sdo_ground.fds_orbit_vectors series: add the DATE keyword and conform to the new style for specifying time intervals; also specify keyword ranks.
- Add support for Fortran files with .f90 extension.
- Fix for redundant definitions killing build in proj/libs/dr/dr.c.
- Calculate more metadata: rsun_obs, crlt_obs, crln_obs, car_rot, and the orb-record id which is a string that identifies, for each input time value, the orbit-vector data point that lies immediately below the input time. Also, modify the test program that uses the iorbit library.
- Add newchunk parameter to drms_recordset_fetchnext - previously, newchunk was one of the status codes the function could return, but we needed the ability to return newchunk on top of the status codes.
- Fix some crashes in extract_fds_statev - it was the case that when fetching a chunk, it might have only 1 record. In that case, the status returned was "last record in rs", but this was masking the "new chunk" status. We want to return both status codes, but you can only return one. So I made a new parameter that returns whether or not a new chunk was started. Then I had the extract code use this new return status.
- Move the fds-product processing to the sdo_ground namespace.
01-21-2010
- Track down problem with one of Rick's series - su_rsb.rdVfitsf_sp. The series definition was invalid - you can't use vardim with generic segments, and you can't have a vardim segment with no dimensions.
- Lev1 - modify the series definitions for sdo.fds and sdo.fds_orbit_vectors. Also, move the pre-launch data to the sdo_ground namespace.
- Track down crash in extract_fds_statev.
- Examine feedback from Joe H. and Igor - things to incorporate into our slony code (like using a conf file to specify log and working directories, double-quoting all shell commands, etc.).
01-14-2010
- Wrote manage_logs.pl - This is for slony/subscribe_series. It is a new script that tars/cleans up the parsed sql logs from the site-specific directories on solarweb.
- Added the Net::SCP put subroutine and used it to save counter file on server. manage_logs.pl will use the counter value to know what sql logs/tars can be tarred or deleted.
- Fix the order of keywords that jsoc_info and show_info uses - it uses the jsd order.
- Merge Phil's jsoc_info_test with the cvs version of jsoc_info.
- Fix typo in value-table code: wrong index used for links (was using segment index, but should have been link index).
- Fix various problems in jsoc_info.c (the drms_record_nextkey and drms_record_nextseg functions always followed links, but in many cases you can't do this because the link has never been set. So I added a parameter to these functions, "followlink" to allow the user specify whether the link should be followed. Then I did a sweep of the entire cvs tree to make sure all code is using the new parameter correctly.)
- Fix autoadjusting of segment paths in value table of lookdata.
01-07-2010
- Fix merge of jsoc_fetch (version with changes for VSO and changes for supporting lookdata2.html).
- Got Kevin to make a new proposal for work beyond the original 511 hours. Sent it to Romeo, who sent it to Karen. Awaiting the response.
- Help Rick with linked keywords.
- Meeting with Kevin to discuss next steps for finishing up slony work.
- Implemented sorting keywords by record table order: 1. For new series, use the persegment field in ns.drms_keyword to store keyword rank (determined at jsd-parsing time). There is a new field in the keyword structure that has rank. 2. For old series, use record-table (eg., su_arta.myseries) column order for variable keys, use the ns.drms_keyword order for constant keywords, and put the constant keywords before the variable ones.
12-03-2009
- Ticket #89 - Remove error message that occurs when there is an attempt to read a segment from a record that has no SU (because it got deleted), or when there is an attempt to read a segment file that doesnt exist (because the user never wrote the segment file - no data for that segment for that record). Ensure that valid error codes are returned in these cases (DRMS_ERROR_NOSTORAGEUNIT for the first case and DRMS_ERROR_INVALIDFILE).
- Ticket #231 - Clean up cache between calls or iorbit_getinfo() when using the dont cache mode.
- Ticket #232 - Modify parse_zone() to recognize GPS as a valid time zone.
- Ticket #235 - removed the insert into admin.sessionns.
- Help several people with several issues - Bala, Rick, Jim, Rock, Phil, Kevin.
- Meeting with Kevin on 12/3 to discuss final issues before he goes live with subscribe_series().
11-26-2009
- Ticket #182 - Support the retrieval of all arguments (named or unnamed) by number with cmdparams_getargument().
- Ticket #182 - Ensure that the order of the cmd-line arguments is preserved when parsing them into the cmdparams struct. There was a bug where this was not the case.
- Ticket #182 - Save the original cmd-line argument values that were parsed in cmdparams. Pass those on to the drms_run script by checking all arguments to see if they originated from the cmd-line and if they have not been accessed. If they were from the original cmd-line, but not accessed, then pass them on to the drms_run script. Update cptest to print out all cmd-line arguments via the cmdparams_getargument() API. Add a new parameter to this function to return the cmd-line str (if relevant) for the argument.
- Ticket #229 - Modify jsoc_fetch so that it handles HTTP POST requests. Located a CGI library, qDecoder, that actually encapsulates GET and POST requests so that a single API can be used to retrieve arguments of both types of requests. Ideally, this API would be used for both types of requests, and I may change that at some point since it is more robust than the current method, which involves code in jsoc_fetch trying to figure if it is being run in an HTTP GET context, and then manually retrieving the arguments.
11-19-2009
- Ticket #227 - Dont call SUM_put() for an SU that has no files in it
- Fix some compiler warnings in drms_storageunit.c
- Modify arithtool to test out the changes to drms_storageunit.c
- Ticket #200 - Remove the warnings message that previously issued from drms_copykeys()
- Ticket #72 - Add a Rules.mk to proj/cookbook to test out the use of libdrms.a and libdrms_sock.a by these cookbook modules.
- Ticket #220 - incorrect use of segments bzero/bscale when calculating the tile size of vardim files. Use the output arrays bzero/bscale, not the segments bzero/bscale.
- Ticket #214 - this was fixed a few weeks ago.
- Ticket #216 - this was a duplicate of #220.
- Ticket #186 - configure now checks for existence of the custom.mk before attempting to rm it.
- Ticket #222 - For the seg_bzero and seg_bscale keywords, change the format field from %f to %g.
Ticket #224 - Added 3 new columns to the 'Get Keyword and Segment Values Here' section of lookdata. For each segment, there are: seg_cparms: < seg > , seg_bzero: < seg > , and seg_bscale: < seg >. Also removed hidden (implicit) keywords from the "List of All Keywords and Segments" section.
- Fix bug where lookdata.html not checking for undefined var value before checking its string length.
Rework cmdparams so that it is known which args were accessed by the code linking to cmdparams. cmdparams now uses a hash of CmdParams_Arg_t structure - one for each argument, regardless of the type of argument (named, unnamed, etc.). Previously, named arguments and unnamed arguments were stored in different places. The accessed field will be set to 1 when user code accesses the argument. This is all in preparation for two tickets - #182 having to do with passing along unused arguments to a subprocess, and another that calls for tracking which arguments have been used in a program/module.
- Ticket #182 - Changes so that every time a cmdparams user accesses an arg - via the cmdparams_get... functions - the args 'accessed' flag is set. In a future submission, drms_run.c will pass any un-accessed args to its subscript.
11-12-2009
- New module (createns) that creates SQL that, when ingested into the DRMS database by user postgres, creates a new namespace, along with the ns.drms_* tables and sequences. This module is needed for Kevin's subscribe_series() program.
- Spent most of the week updating the 326a document.
- Work with Bala to review her code under valgrind. Found some problems with a failure to dereference some pointers, and also a 16MB leak.
11-5-2009
- New module (createtabstructure) that creates SQL that duplicates the PG table structure for a given series. It allows the caller to override some of the series-specific information (archive, retention, tapegroup)
- Document createtabstructure
- Fix a couple of tags for the NetDRMS release (some removed files had the NetDRMS beta tags associated with them before they were removed).
- Fix jsoc_export_make_index - A couple of bug fixes: add missing right angle brackets in html end cell; modify code to look for tab delimiter between record query and URL to data file.
- Change the links in /web/jsoc/cgi-bin/ajax to point to the binaries in /home/jsoc/cvs/Development/JSOC/bin (instead of pointing to the binaries in Phil's personal home dir.
10-22-2009
- Modify jsoc_main, jsoc_main_sock and drms_server to use signal handlers to do asynchronous shutdown (effected through signals). They were using signal handlers, and then using non-async-signal-safe functions inside the handler. Also, make a DRMS lock for client modules (socket-connect mods) to synchronize access to the environment. And make a new client lock to serialize socket module access to drms_server DRMS library.
10-08-2009
- 4 fixes for time parsing issues: 1. Fix parse_zone so it handle timezone strings that have trailing non-tz chars, 2. fix in parse_zone to handle the +- time offsets, 3. Fix for time parser looking for JD anywhere in timestring when checking for a Julian Day time string (should only have MJD or JD at the beginning of the time string), 4. In DRMS, when parsing a time range, if a time string of the form 2009.02.05_12:00:00-2009.02.05_15:00:00, associate -2009 with the second time string, not the time zone of the first
- Help Keiji with his Rules.mk file.
- Fix for SUMS make not being able to find libecpg.a. There was a variable removed from make_basic.mk that was still being used by SUMS apps.
- Add -lssl to the list of library dependencies needed for building SUMS apps
- Add script that loads other SQL files that create PostgreSQL functions and return types.
- gen_init.csh - Remove the lines that create the SQL files that create the drms_series and drms_session PG functions.
- createpgfuncs.pl - Add parameter that accepts database name; pass that name to the psql -f calls
10-01-2009
- Fix for buffer overrun in cmdparams_conv2type().
- Remove fortran interface from base build if no acceptable fortran compiler is found.
- Remove jpe from ia32 build (make it build only on x86_64 linux)
- Ensure that base/include and include have been created before alling gen_init.csh, which requires that they exist; remove localization.h-link creation from gen_init.csh; move non-clean code into appropriate place in configure.
- Handle missing values provided as input target times, especially if all input is missing target times (iorbit function iorbit_getinfo())
09-24-2009
- Write a function that mapps the sums_url string (in jsoc.drms_sites) to the scp command used by sum_export_svc. This allows DRMS sites to localize the scp command they run.
- The sock module drms_log_sock was calling a server-only function - make it call the client version of the function, which calls into drms_server to call the server-only function.
- Revamp some make-system files to handle third-party libraries better. There is no longer a lib_third_party directory. The paths to these libraries are specified in the configure script (for Stanford defaults) or in the config.local file (to override defaults and for non-Stanford sites). The only third-party libraries supported in the base system are PostgreSQL and CFITSIO. To use other third-party libraries, project-specific Rules.mk files need to be modified.
- Use a global define that lists the path to the JSOC tree library directory so that every module that is built knows where its library subtree is (the subtree that is part of the tree that contains the module).
- Fully implement the ability to override the output/object directory (eg, use N02 instead of linux_x86_64) and to use machine-specific make variables (for things like third-party library locations).
09-17-2009
- Modify drms_copy_keys to not fail if one or more of the target keywords dont get set.
- Update wiki export documentation to reflect changes to the exp_su and exp_status jsoc_fetch commands needed for exporting SU. There commands now always print the DATA section, with 5 fields per-su (SUNUM, owning series, path, status, size).
- Implementation of drms_segment_fopen() and drms_segment_fclose() - functions to open and close a FILE * so that, especially for the generic segment protocol, you can write to a FILE *.
- Update documentation to improve accuracy in the documentation that describes the difference between the internal file name and the original external one.
- Modify the default extension for generic-protocol-segment files - used to be .generic, but now it is the empty string. There is no such thing as a .generic file type.
- Fix buffer overrun in cmdparams in the code that saves the array argument values.
- Track down a problem in Rick's code that compiles a C module that calls into Fortran code.
09-10-2009
- Edit the configuration files for dlfds and dllzp so that the development and release versions both use the same ssh-agent.
Create a wiki account for Kevin Kempter - account is KevinKempter.
- Write perl script to create 'fake' data for sdo.fds_orbit_vectors. Jim needs data for time ranges that this series didn't have, so I took the data from 2009.03.24 and replicated it all the way back to 2007.09.01. Scripts creates data files, which are then ingested into sdo.fds, and then parsed and ingested into sdo.fds_orbit_vectors.
- Fix problem whereby the drms_run script was completing (with an error), but the recnum file wasn't being written, and the recnum file is used by the qsub script to determine when drms_run has completed.
- Attend HMI Team meeting, present on first day.
09-03-2009
- Fix a bug in the code that writes images with all blank values - shorts, ints, and long longs were not being done correctly.
- Add a workaround to overcome a fitsio bug where it does not properly update the TFORM1 keyword when writing compressed images.
- Work on jcc.pl - about 2/3 of the way done.
- Meet with Rick, Igor and Jennifer to resolve issues regarding the customization of the string used to create the scp command used by sum_export_svc.
- Meet with Rick, Igor, and Jennifer to discuss the method by which VSO can get a list of sunum, status (online, offline, bad), path, and SU size via our export cgis. We decided to develop a new interface (eg, like jsoc_fetch), that does what we want. Between jsoc_fetch, and show_info, there should be something that does what we want.
08-27-2009
- Change from caching ALL series in the series_cache during module startup to lazy loading of the cache (on-demand). We have over 43K series and caching all of them, despite most of them never being used, was inefficient.
- Spend 2 days with Kevin Kempter, the psql expert.
08-20-2009
- Allow the jsd to specify the format of the text string that represents the default value of keywords (this is how it was originally), but then fix places that read that text - it needs to be able to handle hexadecimal strings.
- Disable the suppression of warning messages from the release build. Also, disable suppression of specific warning messages (some errors were suppressed from the debug build). So now all warning messages are displayed. Added support for JSOC_WARNICC to pass a string to compiler (for custom actions on warnings).
- Dont copy the description from the slotted key to the index key.
- Fix for not properly copying from one string to another when generaing a fits name from a drms name and vice versa
- Tracked down a problem with export not working if you export all files in a series to fits and select the name of the segment to export. If you just select "ALL" segments, then it worked.
- Help track down a problem with the filename not being properly created when exporting as is (needed to add single quotes around the filename template being passed to jsoc_export_as_fits).
- When ingesting FITS files with drms_fitsrw_read(), put the original FITS keyword name in the keyword comment field, but only if it is different from the DRMS keyword name, or if the FITS keyword is of type LOGICAL.
- Modify remotesums export so that sites can override the sums_url field in jsoc.drms_sites to specify their own copy program to run.
- Help Jennifer get remotesums working on JILA again.
- Help Bala with her port of Xuepu's mag code.
08-13-2009
- Update the default code that generates a DRMS keyname from a FITS keyname so that it does Phils scheme
- Export two functions in the API - base_fitskeycheck and base_drmskeycheck to check for invalid fits and drms keywords and also check for reserved keywords
Dont put [<fitsname>:<cast>] into comment field of keyword upon fITS file import, unless the FITS keyname couldnt be used as a DRMS keyname, or if the FITS keytype is logical; also check for reserved keyword names when generating FITS or DRMS keyword names from DRMS or FITS keyword names.
Fix problem where drms_insert_series() isn't properly converting the default value from the DRMS_Keyword_t structures to text string representations. It was using key->info->format to go from value->text. But then drms_template_record() was using atoi to go from text->value. This wont work if key->info->format is %x since this causes the text representation of a pointer to be stored, which wont covert correctly back to an int using atoi.
- Vacation for 3 days in Portland.
08-06-2009
Fixed ticket #159. The problem was that, regardless of the location of the prime keys in the "PrimeKeys:" or "Index:" lines of the jsd, index keys (the ones associated with slotted keys) always needed to be placed before all other keys in a record-set query string.
- Fix for release build of o2helio failing on n00 - problem was that the ic -xW flag was set, but it shouldn't have been since it is relevant only on x86_64.
- Add support for the zltr time zones (A-I, K-Z).
- Help Rick find problem with his build of ia32 mtrack - was crashing on a drms_segment_read() call. The problem was due to a configuration issue that left him without a fitsio.h header file. And with the way our icc is set up, the compiler issues only a warning, not an error.
- Help Bala with her code.
- Make warning #266 an error if the compiler is icc - not declaring a function can cause very subtle and hard to find bugs.
- Disable all of globalhs if the c compiler is gcc.
- Added one too many instances of printing zero for naxis when protocol is generic - remove the extra one.
- Fix for ticket # 177 - there were a couple of bugs: timeio.c wasn't handling invalid time strings properly, drms_missing_time wasn't comparing against the JD_0 time, and cmdparams had the wrong value for JD_0 time.
- Update for solaris, which is what G Linford is using - parse the list of directories returned by the sftp ls -l command because solaris prints results differently than linux does
- revert changes to reserve DATE keyword (so that users cannot specify it in a jsd)
07-30-2009
- In show_info -j, if the segment has an naxis of 0 (e.g., for generic segments), then still print 0 in the naxis field, since create_series requires the naxis field to be present.
- Fix jhelio2mlat make files to fix builld.
- Fix for jv2helio not building. Problem was missing dependency libmkl_lapack in the 32-bit build of the module.
ifort doesnt have an option for generating dependency information when compiling - so dont include dep files for ifort-build object files in DEP_<dollarsign>(d)
- Fix for the wrong combination of the ifort ftrapuv flag and optimization
- Simplify generation of DEBUGOUT flag in globalhs. Instead of generating a dummy file, debug.h, that contains a define that gets included everywhere, simply use the compiler's -D flag to conditionally set DEBUGOUT.
- Remove obsolete files for idl and a previous version of the remotedrms sql fetcher.
- Remove obsolete man-page generation script.
- Fix bug when determining the length of the missing-value string used in jsds"
07-09-2009
- Help several people with their modules - Rick, Xudong, Bala
- Fix for 'Potentially invalid time string' message erroneously appearing when creating any series". The code that was checking for an invalid time string was bad - fixed that and deferred part of the check to higher-level functions.
- Give practice FORR presentation
- Update FORR slides (continually doing this).
- Make a new implicit keyword - DATE_imp - in drms_parser.c. This will be optionally written by module code (this is up to the user) by calling drms_keyword_setdate(). Ensure that DATE_imp is a reserved keyword that is not allowed in a .jsd. Also need to ensure it IS exported when exporting to a fits file (normally HIDDEN keywords are not allowed to be exported). And, finally, make DATE a reserved keyword so that if a series has a DATE keyword, it doesn't get exported.
- Add a function, drms_keyword_setdate(), that will set the DATE keyword with the current date value.
07-02-2009
- Spent some time with my new macbook - figured out how to use it to directly edit files in my home dir, also had to figure out problems, with Brian's help, in copying text from various windows (basically x-windows copying doesn't work, so we figured out a work-around using non-UI emacs and iTerm, which does copy nicely to mac's clipboard).
- Provided more help to people trying to get their code working in CVS, like Yang and Bala.
- Helped Rick find some bugs in his mtrack code.
- Attended VSO status meeting with Joe G.
- Wrote some more of jcc.pl. This is a script that will compile module code (.c or .f file) and link it against the release binaries in /home/jsoc.
- Reviewed some more of the FORR slides; incorporated Craig W's comments.
- Starting preparing my talk part of the FORR.
06-25-2009
- Spent most of this week preparing for FORR, which involved getting information from several people, like Jim and Keh-Cheng and Jennifer and Carl. Created slides, met with Phil and Rock in iterations. Sent slides to LMSAL people on Friday.
- Helped various people with CVS issues so they could get their work done that needs to be done for the FORR.
06-18-2009
- In Boulder this week at AAS SPD meeting. Did poster sessions on Monday and Wednesday late afternoon (4-6:30pm). Attended VSO meeting on Friday (6/19) all day until 3:30pm. Attended several improptu meetings with VSO gang. Gathered in Deborah's lab a few times.
- Finalized poster with Rick for SPD meeting.
06-11-2009
- Modify make system to accommodate non-ifort fortran compiler. All the flags previously defined were specific to ifort. I also added a script that determines which C and Fortran compilers are installed, and then sets those in the make system. You can override the compilers with environment variables - CUSTOM_COMPILER and CUSTOM_FCOMPILER.
- Fix broken ggc build in several files in proj/lev0.
- Cleaned up make_basic.mk - removed some things that were never used; fixed some wrong flags (claimed to be link flags, but were compile flags); added commments; reorganized into more distinct sections.
- Help Junwei work in CVS.
- Write a section for Igor for hist VSO poster for SPD.
06-04-2009
- Help Rick track down a memory-corruption bug in his ingest_track.
- Fix a bug in fitsrw_getfptr - wasn't properly removing cache items from gFFiles. This function was trying to remove the same value again and again. The solution was to save, while iterating through all the items, all the keys of the items being removed. Then outside the iteration loop, removing all the items from gFFiles.
- Help Yang set up his environment (.setJSOCenv) so that he can access the CVS tree.
- Write up my sections for NetDRMS poster for SPD.
05-28-2009
- Documented jsoc_export_as_fits.
- Modified show_info and jsoc_info to not show implicit keys (like segname_bzero) to the public - these are hidden keywords and should only be observable from psql.
- Don't export implicit keywords when exporting FITS files
- Don't require bzero/bscale to be present in jsd for floating-point data type segments.
- Rebuilt the development /home/jsoc cvs code so that lookdata export has latest changes.
- Help Jennifer with modify_series. There was a problem with drms_delete_series not working because the series had already been renamed.
- Several meetings with Richard/Rock about creating the flatfield series.
- Wrote RemoteSUMS text for SPD meeting poster.
- Help Tim L. with make file issues - converting a library to just .os that link with a module.
05-21-2009
- I finished drms_fitsrw_read() and checked it into CVS. This required making a function that imports FITS keywords into DRMS keywords. It runs through the code that knows how to convert a FITS keyword name into a DRMS keyword name. FITS has a LOGICAL type of keyword, but DRMS doesn't. To handle such discrepancies, I added a 'casting' feature. If the DRMS Y keyword's description field has [X:LOGICAL], then when DRMS exports keyword Y to a FITS keyword, the FITS keyword has name Y and it gets casted to LOGICAL. With this method, when a FITS logical keyword is imported into drms, it actually gets saved into a DRMS_TYPE_CHAR keyword - a value of 1 means 'T' and 0 means 'F'. Then when the DRMS keyword gets exported back to FITS, it gets converted back to 'T' or 'F' in the FITS file.
- Fix several large leaks due to drms_server_end_transaction() not cleaning everything up allocated by drms_server_begin_transaction().
- Fix a crash where static memory was being freed.
- When importing FITS keywords into DRMS keywords, not sure what to do about duplicate FITS keywords. The standard does not prohibit them, but it also does not say how they should be handled - that is completely under control of the application reading the FITS file. Is our policy that the value of the first instance of such a keyword is the value of the DRMS keyword, or is it our policy that the last instance overrides previous instances? Or do we retain all instances in some manner? I decided to go with the policy that the first instances is used.
- Need a plan for converting COMMENT and HISTORY FITS keywords into DRMS keywords. Resolution: append each instance to the previous instance, separated by a newline character.
- Fix bug in delete_series - the code was insisting that the vector of SUNUMs was of length greater than zero, but of course this isnt necessarily so as the series could have no records in it.
- Add FITS compression-string parameter, cparms, to jsoc_export_as_fits. The parameter is a comma-separated list of compression strings, one for each segment being exported. These strings are passed directly to the CFITSIO file-write functions.
- Add a status flag to drms_commit_all_units() to communicate lower-level problems back to caller - this allows drms_server_commit() to rollback the DRMS dbase if problems happen during commit.
- delete_series now calls drms_replicated() - the psql pl/pgsql function - to determine if the table being deleted is currently being replicated. If that table is being replicated, then delete_series fails to delete the series.
- Add a check for permission to delete records from the series at the beginning of drms_delete_series
05-14-2009
- Dont have SUMS remove SUNUMS before trying to delete DRMS psql tables - if the DRMS table deletion is going to fail, you dont want to have SUMS remove the SUNUMS.
- Fix bug in drms_server_dropseries - was trying to receive, from socket, the series name twice, but that name was written only once.
- Fix 9 leaks in DRMS/cmdparams and arithtool.
- Help Bala, Rick, Tim L., John B., Raymond (master_series crash)
- Track down with Hao a crash in hmi_import_egse_lev0_sock.
- Commit Rick's changes to drms_binfile.c. These changes add support for bzero and bscale values (the header was modified).
Support a -L flag (which means create a SUDIR for the stdout and stderr log files) in drms_server. Also, support the same flag in drms_run, and if the flag is set in the args to drms_run, then pass that arg to drms_server. One big problem was that the drms_server.c code handling env->session->sudir (which contains the path to the SUDIR containing the log files) did not expect this to be NULL. libdrmsclient.a was okay with that being NULL, but not drms_server. So, I had to check all the places where env->session->sudir was used in drms_server and handle a NULL value.
Fix crash in db_disconnect. The function assumed that dbin->db_lock was not null. So the last check-in added a pthread_mutex_destroy(dbin->db_lock), which crashes if dbin->db_lock == NULL.
05-07-2009
- Remove truncation of double value in parse_duration() - it doesn't seem to be correct.
- Fix create_series jsd parser so that it properly tracks the line number in the .jsd when reporting errors. It now also reports a more useful drms_parser.c line number when an error occurs.
- Change the way that delete_series works - it now passes a filepath to SUM_delete_series(). This path contains a list of SUNUMs that SUMS must handle. The filepath points to a file that lives in SUMS. The SUDIR containing the file is temporary and not archived. In the process of doing this, I added Phils function that fetches vectors of keywords given a record-set query. SUMS will parse this file of SUNUMs and set the appropriate dbase table entries for those SUNUMs.
- Help many people with their DRMS issues: tim l., todd, phil, john b., jennifer, sebastien, jim, rick.
- Add and write documentation for drms_record_getvector(). This was originally Phil's function for fetching a subset of the columns that a drms_open_records() returns.
- Modify drms_binfile.c to set bzero/bscale to 0/1. So right now binfile supports 0/1 only, but in the future, Rick is going to add new headers to binfiles to have bzero/bscale
04-30-2009
- If the SUNUM is not local, then it is not an error if a user asks for attributes like whether the data are online, what the retention is, etc.
- Help Jennifer resolve problem with scp running in expect script. You have to source the ssh-agent env file in the shell that gets forked with the exec or spawn calls - the environment is NOT automatically passed onto the new shell. Also, I figured out the syntax for running a new shell with the tcl exec call.
- Spend a lot of time with Rick getting the SUMS code working on remote sites.
- Fix jsoc_export_as_fits not following segment links.
- drms_segment_mapexport_tofile() now supports GENERIC segments - if it encounters such a segment, it will simply copy the generic file as is to the output file.
- Make sure that when iterating over keywords, use drms_record_nextkey(); support all segment protocols in drms_segment_mapexport_tofile().
- Fix record-set-query-name parsing error - if the trailing ] was missing in a filter, the code would enter an infinite loop.
- Remove code to update drms_session with status during module run. Only update that table when opening and closing a DRMS session.
- Fix a problem with DRMS modules not calling PQfinish(), the API function that causes the db client to disconnect from the db. I also removed db_abort() since it was essentially redundant with db_disconnect(), and I made it so that drms_disconnect() is callable only by clients.
- Export drms_names_parseduration() in DRMS library API. Also, add documentation to header file.
- Investigate slowness in calling "show_info -P aia.lev0d'[1391927-1675045]' key=fsn". The problem was a call in the loop on sums requests in drms_sums_thread(). It was updating the drms_session table on every pass through the loop. If you look at hmidb, you'll see a lot of CPU usage. When we remove this update to drms_session, things really speed up. I found a very interesting result. I put the offending code (updating drms_session) back, and put a timer around it. Then I ran show_info -P aia.lev0d'[1391927-1521940]' key=fsn. At first, it always took less than a millisecond to complete - this includes going from DRMS all the way through lipq and the dbase. But by the time we got to FSN 1450000, it was taking about 6 milliseconds to complete. Then by FSN 1500000, it was taking about 0.010 seconds. By the end (FSN 1520000), it was taking about 0.015 seconds. Clearly, this is causing the long execution time of show_info - if it takes about 0.015 seconds, on average, to update the table, then it would take over an hour for show_info to run. So, it takes longer and longer to update the table. The slowness must not be coming from DRMS (since that doesn't change), but from the dbase itself. But why? What is it about updating the same row over and over in a single table that causes the dbase to "slow down"? IS the dbase busy hoarding memory (like the deleted old records) during this time in a way such that it gets slower and slower to store memory as the amount of memory it is storing increases (like we're filling up a tree)?
- Add Igor's script to CVS that copies Slony files from SU to NSO.
04-23-2009
- Fixes for remotesums not properly doing the asynchronous download. lib DRMS was blocking on remotesums_master.pl, but it should not have blocked. Also, show_info wasn't properly handling the 'trylater' status code generated by remotesums. Needed to separate the record-chunk status codes from the regular drms status codes because code in show_info needed to look at both (and we don't combine status codes in drms_statuscodes.h).
- Fix drms_server deadlock - was due to server-side accept call blocking, even when there isn't any clients
- Update instructions for tagging the repository files for a release. Previously, there wasn't a way to cleanly do this, but with a recent change to the modules file, this is now possible.
- Update tag-creation notes in release howto.
04-16-2009
- Final fix for specifying db port.
- Incorporate fixed for _GNU_SOURCE. Add that define to all linux builds.
- Do JSOC 5.1 release, write up release notes.
- Made some make file changes to select the correct fortran compiler on Mac (if one exists).
- Remove the -V flag to drms_run in jsoc_export_manage
- More fixes to make_basic.mk - remove unused MATHLIBS variable and replace with FMATHLIBS
- Document drms_copykeys
- Clean up proj/example/apps - make fortran module variable name have _sock in it, since only sock fortran modules can be made
- Document the make system on wiki; remove obsolete documentation.
- Remove all references to old, crufty DRMS code (drms_compress, drms_fits, drms_tasfile)
- Move private API function (not be used in code outside of DRMS library) to _priv.h.
- Update section 3 in the doxygen documentation - add all code section, with several sub-sections
04-09-2009
- Recover from accidentally deleting all my .c and .h files from my cvs tree.
- Move the port-number parsing (from the hostname) in drms_connect_direct_toport() and similar functions to a lower-level function. Other lower-level code needed this parsing functionality.
- Add a flag to drms_copykeys() to indicate which record, the source or target, supplies the names of the keywords whose values will be copied.
- Add comments to remotesums_master.pl to explain how the whole remotesums process works.
- Fixes for the case where source SUs are NOT online. remotesums_master.pl must poll for status until stats == 0, then it is okay to use the resulting text file to obtain information to pass to remotesums_ingest.
- Finish port of drms_run from csh script to C code: added support for DRMS_RETENTION and copying drms_server log file from the specified directory into the log SU; add more flags that mean verbose; add --help flag; don't send TERM signal if the socket-connect modules have set the abort flag, which causes drms_server to terminate immediately
- Fix a deadlock when socket-connect modules abort - their drms_server threads send drms_server a TERM signal, the problem was that the clientcounter didn't get decremented, so drms_server was waiting for one drms_server thread to terminate, but it already had, but drms_server wasn't told about this.
- Clean up the make files a bit - in the examples folder, an exe was re-inventing the wheel, which already existed in make_basic.mk.
- Work on dependency issue in make system. It appears that having links in the o.d to header files is NOT a problem. I hard-coded a dependency to a link (which links to a header file) in the list of object files in proj/export/apps/Rules.mk. I edited the target file (test.h) and make jsoc_export_as_fits caused a recompile/link. The problem turned out to be an error in several of the Rules.mk files. The value for the DEP_$(d) variable was incorrect - it was DEP_$(d) := $(OBJ_$(d):%=%.o.d) (there is an extra '.o' before the last '.d'. Fixed all the Rules.mk.
- Submit changes for replacing the old drms_run (csh script) with the new one (C executable).
- Start documenting the JSOC make system.
04-02-2009
- Fix several problems with drms_server shutdown. Remove all calls to Exit(1) and exit(0) and _exit(0) from all non-main threads. Remove busy loop from the main thread. This loop was entered when a signal shutdown was received by the signal thread. Instead, let main get into a good state (finish all server threads, don't start new ones, kill the signal thread), then call either drms_server_abort() or drms_server_commit() and then exit.
- Track down performance bottleneck when dbase transactions occur at a quick rate - SUM_open() calls saturate; add some more descriptive print statements to drms_server and simple_drmsrun
Add new timer functions. There are a lot of shortcomings of the original functions. They rely upon globals that can't be shared very well by different blocks of code. They don't work with threads at all. Also, with the original functions, you must not call PPopTimer(). You can't time a function that gets repeatedly called, saving the timer between calls. Also, the original functions only work if you always call PushTimer/PopTimer every time you want to time a block of code. But typically, this is not desired. What you want to do is to start a timer, then check on the elapsed time at many times and locations after starting the timer - you don't want to continually create and destroy the timer.
- Fix broken build - a define was accidentally removed from SUM.h.
- Investigate remotesums not working when the requested SU is not online. The problem is a bug in jsoc_fetch. Talked with Phil and he fixed it. While I was at it, refamiliarize how all the parts of jsoc export work, and document them (in remotesums_master.pl for now).
03-26-2009
- Fix crash in drms_server that occurred after send drms_server a TERM signal. The semaphore to handle clean shutdown wasn't set up in drms_server.c (it was set up in jsoc_main.c though). Also, in base/drms/libs/api/drms_server.c, look for the existence of the shutdown semaphore - if it wasn't set up, don't use it.
- simple_drmsrun.c - created this.
- Added function to copy drms keywords from one record to another. Has a parameter to indicate the class of keywords to copy (like all non-implicit keywords).
- Fix a bug where when parsing slotted keys, there was an attempt to convert slotted key value to index value, even if the user was using the FIRST_VALUE or LAST_VALUE notation.
- Fix for freed record slots not being deleted.
- More fixes for drms_server not shutting down properly: don't allow main thread to be interrupted by SIGUSR2 while the thread is holding a lock.
- I did a fair amount of work on making sure that drms_server shuts down properly and that simple_drmsrun works well with drms_server. The original timeout of 10 seconds was too short - many drms_server processes did not write their env files within 10 seconds. In fact, even with large timeouts, this was often the case - the env file couldn't be found. Modified simple_drmsrun to printout output to one file per drms_server (qsub just lumps the output of all simple_drmsrun scripts together so it is not possible to figure out what is happening).
03-19-2009
- Re-write simulate.pl (the script to test hmidb performance) to issue qsub commands that run drms_run scripts. The drms_run script itself runs a csh script that contains one or more show_info or set_keys commands.
- Update simulate2.pl to have latest cadences. Added a new parameter, l15mult, that allows you to control the read rate of all lev15 series.
- Found a performance problem with drms_open_nrecords(). It does a group by on the entire ordered (by prime key) data series table, then picks N records from the top or bottom of the result. This was very slow. My change was to limit (4 * N) the size of the original table, then do a group by. This resulted in a query that ran in about 1/(10^4) of the time that the original query did.
- Change the flag that requests the module version number to 'jsocmodver' - just like the flag for direct-connect modules".
- Fixed a bug in the code that copies dbase binary query results to C structures. If a dbase column was filled with NULLs, then DRMS would crash because DRMS would malloc(0), which would return a non-null ptr, and then it would write to the ptr, which you are not supposed to do.
- Bugs in drms_run caused it to fail to shutdown drms_server sometimes. The $status csh var was being used to ask if drms_server started in background properly - sometimes it reported non-zero, but drms_server started. Since drms_run thought it didn’t start, it didn’t bother shutting it down. Rewrote drms_run in C code – simple_drmsrun.c. Simply runs a script – doesn’t do the other things that drms_run does.
03-12-2009
- Changes to remotesums to support a new app, sum_export_svc, that executes scp cmds on the behalf of users that trigger the remotesums process. Developed/Tested on the Test SUMS (on d00). Worked a while with Jim to figure out the cause of problems preventing SUM_export() from succeeding. One problem was that we needed to run sum_export_svc as user production (so that the files could be easily deleted while testing). Another was that ssh-agent wasn't set up. I needed to put production's public key in arta's .ssh/authorized_keys file (because, for now, the user@server that exports the SUs is arta@j0.
- Remove check for existing ssh-agent. This check must be performed by the owner of sum_export_svc since the ssh-agent process will be accessible by the owner of sum_export_svc, and by nobody else, which is most likely different than the owner of remotesums_master.pl
- Wrote up a document to describe the set up involved, both on the source and destination, to use remote sums.
- Implement the base and step features in the sql-query-generation code for index range setss.
- Fix the subquery that finds a value for the second prime key given a restricted set of records specified by the first prime key.
- Implement value-range skip ('@') constructs, and fix a bug in the parsing of value range queries involving float/double prime keys that have ranges and/or skip values.
03-05-2009
- Add some comments to the FDS orbit-getting function call; make iorbit_getinfo return an array of info vectors, not a linked list, add a new parameter to iorbit_getinfo that allows the caller to flush the grid-vector cache. Met with Keh-Cheng and Rock to evaluate the iorbit_getinfo() function and a new one for getting the pointing information.
Work with Tim L. to finalize the drms naming document. Talk with Phil to fully understand the 'index range set' notation (#X).
- Help Yang get a psql/drms account set up.
- Make it hard to accidentally use the 'print the module version' flag - set_keys was taking a 'version = xx' cmd-line arg, and this was causing the set_keys version to print; now the flag to cause a version print is --jsocmodver
- Make a new flag for set_keys, '-l', that indicates that the keyword names in the key=value pair cmd-line args are all lower-case, regardless of the case of the actual names. This allows you to not have to know the case when using set keys. Also, if verbose, print some more diagnostic information - I used this for the db perf test.
- Add a new verbose flag to show_info - used for the db perf test I did; if the verbose flag is set, print the query
02-26-2009
- Forgot to #include the header for the drmssite_info stuff in drms_server.c before I left on vacation, so I added that.
- Went through all my email that stacked up while I was on vacation.
- Fix [!!] not working with drms_open_recordset. The problem was that the value of the 'allvers' flag was not being passed onto drms_query_string() in some cases.
- Debug a problem is jpeq.c. It was modifying the argv filled in by cmdparams_get_argv(), but that argv is a pointer to a field inside the cmdparams structure and shouldn't be touched. To help catch this problem in the future, I modified cmdparams_get_argv() so that if a user declares an argv that allows manipulation of the underlying strings, icc displays an error.
- Bug #148 - I believe this is fixed, Jim has an old libdrms. I can't reproduce the crash that he was seeing.
02-05-2009
- Modified remotesums_ingest.c to use SUM_open(). This API communicates with sum_export_svc to issue scp calls to j0.stanford.edu. At this point, haven't figured out how to use it correctly. SUM_open() hangs.
- Made JSOC release 5.0.
- Help Jim with makes, doxygen.
- Worked with Rick to make a Fortran module for Graham and Ashley - the module is a wrapper around a bunch of Fortran code. Made the Fortran interface module, fdrms.mod, build by default make. Write a couple of function to return the number of records returned from the f_drms_open_records() call.
- Help Rick to get NetDRMS 2.0a ready
01-29-2009
- Met with Igor/Jennifer to test out newest remotesums code.
- Met with Tim L. to work on some of his crashes he was seeing.
- Met with Rick/Jim to
- Met with Keh-Cheng/Jim to discuss plan how to scp from each remote SUMS site to Stanford's j0. For each remote site, we will create a single account at Stanford through which scp commands will be allowed. And only a single remote-site user will be allowed to login to the corresponding Stanford account. For each site, the authorized_keys file for the site's Stanford account will contain a single public key - the key for a single remote-site user. The SUMS sum_svc at each site will be run as that user, which will allow sum_svc to access j0. All other remote-site users must pass their j0-access request through sum_svc.
- Remote the sourcing of the ssh-agent configuration file from remotesums_ingest.c - move it to remotesums_master.pl.
- Beef-up the checking for proper ssh-agent running in remotesums_master.pl; also handle the case where the user may be running in Bourne shell.
- Fix for buffer overrun in drms_server_dropseries_su.
- Fix arithtool to use the DRMS API for getting an array of the prime keys. Also, if the code fails to read a file (perhaps it isn't present or something like that), it will simply skip that file and go onto the next. Previously, the program would stop and fail.
01-22-2009
- It turns out that the modification to show_info I made to fetch remote SUs worked for the direct-connect of the module, but it didn't work for the socket-connect version. In fact, the socket-connect version can't even call the DRMS code to fetch SUs (because this code is accessible to servers only, not sock modules). So, I had to write that layer that communicates from the sock module to drms_server. The sock module request the SU from drms_server, and it handles the call to remotesums_master.pl.
- Fix for broken show_info caused by changes for remotesums; Add sthe passing of series names from DRMS to remotesums_master.pl.
- Initial commit of remotesums_ingest - a module that calls SUM_alloc2() for each SUNUM and copies a storage unit into the created su dir.
- Make the ingest module write to stdout error codes for DRMS to read; comment out test code in sum_open; make a function to commit (SUM_put()) the SU created in the ingest module.
- Fix a couple of crashes in the drms_su_getsudir() code (using the request structure after it has been freed, which caused double freeing)"
- remotesums: get the multiple-SUNUM case working; fix problem where last item written to remotesums pipe (DRMS/remotesums_master.pl) was used as return value - changed to first item written; fixed problem in remotesums_ingest where utility output was being sent to remotesums pipe - changed so that all output written to /dev/null.
- remotesums: fix a couple of bugs in the code that handles multiple SU requests when some of the SUs do exist in SUMS, but some do not.
01-15-2009
- Got involved in the vacuuming discussion. It looks like autovacuum will remove old rows that cannot be accessed by any existing transaction, but we need to test (Jennifer is going to do this). Full vacuum does compaction and release of disk space to the os/fs. Other than this disk cleanup, it is identical to standard vacuum.
- Got remotesums_master.pl downloading files. As a test, the test SUMS on d00 is used by a show_info that asks for a SUNUM unknown to the test SUMS. This triggers the remotesums code, which calls jsoc_fetch which accesses the real SUMS. The real SUMS does know about the sunum. It passes the SU path back to remotesums_master.pl. The latter then copies the file to a local dir. Next: need to write the ingest program to obtain a path to the test SUMS, then modify remotesums_master.pl to copy the files directly into the test SUMS SU.
- sick 2 days.
01-08-2009
- Fix: when the day of year is smaller than a 3-digit number AND when the 3 month window spans more than one year, pre-pend that 3-digit number with zeros.
- Help Rock with jsoc_export stuff - he was having troubles getting export as fits to run. Fix for jsoc_export_as_fits failure. Problem was collision between jsoc_export_as_fits cmd-line arg 'version' and the same but newly reserved arg in jsoc_main. Renamed version to expversion.
- Re-create the sensitivity image (a disc used for calibration) for a particular magnetogram for Yang. We're trying to fix a problem.
- Made some good progress on remote sums. Finished the communication between DRMS and remotesums_master.pl. Also started fleshing out remotesums_master.pl.
- Modify fdmagcal so that it errors-out if no table (sensitivity image) is provided, or if a bad path to the table is provided.
- sick a day
12-18-2008
- Ticket #12 - Added the call to SUM_delete_series() per Jim's instructions. Had to fix a bug in the chunked record set cursor close call - the cursor on a table was remaining open, which was causing the "drop table" query to fail. Their was DRMS client and server code to write.
- Add date to build failure message in jsoc_build.pl. Also catch more error cases.
12-11-2008
Wrote a perl script to emulate our standard pipeline processing, from lev0 to lev1.5. It takes a defined set of tables with read and write cadences, and then calls show_info #$ (read) and set_keys (write) at the appropriate cadences. When calling set_keys, it provides test data (from su_jennifer.hmiground_lev0) by randomly selecting from 10000 records.
- Fixed test data series (su_jennifer.hmiground_lev0, port 5444 on hmidb) so that missing test values have the word "BLANK". This facilitated the pipeline emulation script.
- Updated wiki to show how to create a wiki group. Changed all instances of "karen" to "art".
- Figured out how to fix the ns.drms_keyword.persegment field. Most of these fields were bad.
- Ticket #129 - Wrote a script, /home/arta/Projects/JSD/makeupdatecmds.pl, to obtain a list of all namespaces in admin.ns, then to apply an SQL command (provided on the cmd-lne to makeupdatecmds.pl) to ns.drms_keyword. Used this to update all external prime keywords (some of which are internal prime as well). Jennifer manually changed several other keywords' persegment values so that they are correct now, so I believe we are done.
- Ticket #113 - DBIndex field no longer has index keyword name, has slotted keyword name.
12-4-2008
- More fixes with show_info -j: this time, needed to declare cparms and bzero/bscale implicit keywords implicit
- Ticket #129 - When printing a jsd, if the dbindex contains an index keyword, don't print it. Instead print the corresponding slotted keyword name. Also, pass the double quotes surrouding the fits compression string to inner parsing code so that the entire compression string gets saved, not just the first non-whitespace word.
- Finish reworking cmdparams and module_args documentation.
11-27-2008
- Ticket #47 - I ran several tests to confirm that any description, as well as certain other fields (like "unit" field) can have commas in them, as long as such fields are surrounded by quotes.
- Wrote /home/arta/Projects/DBPerfTest/simulate.pl, which runs a loop that queries and inserts records into test dataseries that are representative of the entire pipeline.
- Ticket #96 - Implemented - if no version is specified, then parser puts current version into the seriesinfo. Upon printing a .jsd (with show_info -j for example), no version field is printed, because the output .jsd is consistent with the syntax of the current-version .jsd.
- Ticket #99 - The solution is to append a trailing '/' to the directory. It turns out that there was aleady a char buffer with the slash appended, but the original, non-slash-terminated string was sent to dsds.
- Ticket #132 - This was not implemented, and it was trying to modify static memory in a bad way. I re-wrote parse_range_float() to properly handle closed and open floating point intervals. I added the check for out-of-range in the code that calls parse_range_float().
11-20-2008
- Ticket #130 - Problem was that jpe was using a 'reserved' jsoc_main flag '-A'. I added a new cmdparams api, cmdparams_reserve() to allow jsoc_main to say what params are reserved. If a module tries to define a parameter that has been reserved, an warning message displays. also removed the "-A" flag from jsoc_main. If you want to tell DRMS to always archive your output series, then use --archive (case insensitive).
- Did a thorough test of the quadratic interpolation of getorbitinfo. Rock and I get identical results.
- Document (Doxygenify) drms_open_recordset().
- Ticket #53 - 1. Disallow setting retention times via the environment. 2. You must own the series in order to decrease retention. If you pass a negative retention value to SUM_get() or SUM_put(), this will ensure that you don't decrease retention. A positive value can decrease it. 3. By default, SUM_get() will pass a -3 retention to SUMS. This can be changed by adding a line to the configure script. This default can be overridden by specifying a value on the cmd-line (with the DRMS_RETENTION argument). A positive value can be passed only if you own the series. If you try to set a positive value, and you don't own the series, the value gets multiplied by -1. If the data are offline and you pass a negative value, SUMS sets the absolute value. 4. By default, SUM_put() will pass the jsd value. This default can be overridden by specifying a value on the cmd-line (with the DRMS_RETENTION argument). Since only the series owner can call SUM_put(), it doesn't matter if the retention value is positive or negative - it will be passed as is to SUMS.
- Track down a problem in Jim's jpe code - it was using a function that wasn't declared, so it was assumed that the function returns an int. The the return of this function was actually a pointer. So, if the pointer being returned was a 64-bit number, this got truncated to 32-bits because of the assumed int return value. Then a 32-bit number was cast to 64-bits, which causes the uppper 32 bits to get filled with ones.
- Ticket #131 - change module archive flag from -A to DRMS_ARCHIVE=val. When parsing from cmd-line or jsd, ensure that archive value is -1, 0, or 1. Fix problem in drms_commit_all_units() return archive flag - it wasn't properly assessing whether the data were archived or not.
11-13-208
- Modified fdmagcal and fixplatescale_v0 (SOI modules) to bump the BLDVER18 version to the version in those modules. It will only do this if the module version is greater than the current version. Also, added the OUTBITS=0 flag to a map file used to run fdmagcal and verified that the output was indeed 32 bit float data.
- Ticket #126 - There were two problems: 1. The DRMS code assumes no line had more than LINE_MAX chars (LINE_MAX is 2048). 2. When looking for the right bracket in a record-set query, there was no check to see if the end of the input had been reached.
- Spent a couple of days finalizing the FDS orbit data interpolation code. It was particularly tricky dealing with caching of data. Ended up having to use drms_open_recordset() and traversing in the forward direction only (which means I had to sort the target times in increasing order).
- A day and a half sick time.
- Fix a bug in cmdparams where string values were being truncated at 255 chars. There is another problem somewhere in cmdparams or the hash code.
11-06-2008
- Ticket #76 - Created a new API, drms_open_nrecords(), that uses a modified version of the psql query that limits the number of psql records returned to DRMS to n. Modified show_info to use this new API.
Ticket #120 - The scripts now use the environment variables to use the PID to find /proc/<pid>/cmdline. They parse the cmdline to see if it contains ssh-agent.
- Modified iorbit.c to provide a vector of data (over time) to the interpolation function, which more efficiently interpolates a vector of data values.
- Ticket #121 - mad dlMOCDataFiles.pl (the master script that actually downloads files from the MOC product server to stanford) retry scp'ing of files when scp fails. Retries a max of 5 times.
- Ticket #122 - Don't use 'R' flag for printing module version number. Instead use --ver, --vers, --version, --about, or --vn. Also, make cmdparams allow flags of the form --XXX that don't require a value to be associated with the flag. Used by jsoc_main so that --version can indicate 'print version'.
- Work with Hao/Jim to create a map file that will convert 1.8.0 magnetograms to 1.8.2. But first, we just got a mapfile to get 1.8.0 processed by fdmagcal.
10-30-2008
- Help Jennifer with problem with create_series not working on a .jsd produced by show_info -j. It actually did work.
- Fix daily doxygen generation. There was a permission problem with the existing html and man files. I manually created those as user arta (group www), which creates files/directories that aren't accessible by user jsoc (who isn't a member of www). jsoc is the daily user who executes the doxygen generation script.
- Add a new param to drms_recordset_query() so that it returns a flag indicating whether or not the query contains a [! ... \] in it. Needed by the show_info() family so that it calls correct drms_query_string() correctly.
- Manually run the dlfds.pl script to retry a previous scp error.
- Ticket #114 - Fixed by removing the call to the following from drms_strval() so that the template-creation code doesn't reject JD_0 timestrings:
if (time_is_invalid (val->time_val)) { fprintf (stderr, "ERROR: Invalid time string %s\n", str); XASSERT(0); }
#115: Comparison to missing time in drms_types, and in timeio.c uses comparison of doubles, which means that a missing time might not look like a missing time. I changed to compare to a range about JD_0: t > JD_0 - 10e-5 && t < JD_0 + 10e-5.
- Fix more permission issues with the nightly build script.
- Make a template doxygen file that can be used to document new DRMS modules. Also, for drms_record put the doxygen function documentation at the end of the file so that it doesn't interfere with the list of functions
- Ticket #74 - There was a maximum of 128 record-sets allowed in the @filename file. I changed this to dynamically allow as many as needed (within memory limits) by using linked lists.
10-23-2008
- JSOC release 4.7, NetDRMS release 1.1
- Help John Serafin with .jsd syntax.
- Track down problems with jsoc_update.pl not working. It calls 'df', and this was hanging on n00 because n00 was trying to mount some filesystems from a machine that wasn't exporting those filesystems.
- Ticket #107 - The way this works, sscan_time() will see that it DOESN'T recognize NaN. So then it will return JD_0. The JSD parser was seeing that sscan_time() didn't consume any chars, so it thought that NaN was an invalid value. Fixed - drms_sscanf2() now does not consider it an error if sscan_time() doesn't consume any chars in the time string.
- Ticket #108 - The keyword template code does not recognize 'Z' as a valid time zone. This is because it is using a timeio function that didn't recognize 'Z'. drms now uses the parse_zone function, and that function correctly recognizes 'Z'. The function was also expanded to recognize timezones of the form +800. Also, the zone_isvalid() function was modified to recognize 'Z' as valid.
- Ticket #109 - Add a new flag, -R, that causes modules to print out the JSOC version of the module, then return 0.
- Ticket #110 - Resurrect Karen's build script. I made some change so that it will print, in the email sent to jsoc_dev, the compilation error messages. I also had to fix the way it detected an error - Jim had added a function that Karen's code thought meant a compile error (but it was just a function with 'Error' in its name). The script will be run by user jsoc on maelstrom.
- Revive Karen's daily build script, run as jsoc@maelstrom.
- Reorganize and fix doxygen tags in our modules. Make the 'module' tab page look better.
- Ticket #57 - implemented this. If any [? ... ?] appears any where in record-set query, the 'prime-key logic' is disabled.
10-16-2008
- Several fixes for DRMS string-value parsing - in the recordset-parsing code, in the jsd parsing code, in the drms_open_records() code, and in cmdparams. cmdparams will now handle a couple of escape characters, like \t. If you put these on the cmd-line, cmdparams will convert those two characters into the actual esc character that they represent. Make drms_sscanf_str() more robust.
- drms_sscanf_int() was improperly calculating the number of chars processed in the input was a time string with leading whitespace. Fixed that.
- Ticket #77 - I thoroughly reworked all this drms_sscanfXXX stuff. The only functions that are exported at this point are drms_sscanf2() and drms_sscanf_str(). The original drms_sscanf() is now a static function named drms_sscanf_int(). drms_sscan2() and drms_sscanf2() now handle strings better. They can accept strings that contain commas, brackets, spaces - anything. You can specify a delimiter to drms_sscanf2() that will cause parsing to end when that delimiter is first encountered. For example, the original drms_sscanf() used to stop parsing when it encountered a right bracket (']'). To retain this behavior when using drms_sscanf2(), provide "]" as the delimiter.
- Ticket #92 - fixed when I implemented record-chunking.
- Bug Fix - (for broken build) helio2mlat_j called a function whose signature changed and helio2mlat_j wasn't updated to use this new signature
- Bug Fix - Fix problems when parsing record-set queries that contain time strings. Some of the problems fixed - timestrs that begin with '-', timestrs with spaces before or after the timestr, timestrs that end in a tz with a space after that (got interpreted as bad tz, and replaced with UT.
- Ticket #84 -
- Ticket #97 -
- Ticket #88 - Set the DEBUG flag back to 0. We no longer need to have debug by default for debugging instability issues.
- Fix SUMS build warning - duplicate make rule for padata.o file.
10-09-2008
- Track down TAS-file write inefficiency issues. Use strace and debugging of libcfitsio.a. Problems included cfitsio file buffering limitations (too few buffers) and not saving vital keywords, like bitpix, in a structure. Also, there was a conflict between cfitsio buffering and stdio file buffering that Keh-Cheng discovered. He turned off stdio buffering in cfitsio, and then updated the cfitsio in ~jsoc with this unbuffered version. With the way that cfitsio does file buffering, it is relatively easily for all buffers to get used up. When that happens, cfitsio "thrashes" when it reads new data - it basically has to flush a buffer to make room for the new data. So, if the buffer to be flushed is dirty, it has to write it out, then it has to read data from disk into the just-flushed buffer. By "relatively easily", this is sufficient to cause all buffers to get used up every iteration:
for (1 to n) {
- fits_get_img_param(fileptr, ...); fits_write_img(fileptr, ...);
}
fits_get_img_param() wants to read a couple of keywords from the header. fits_write_img() wants to see and write data. On each iteration, the fits_get_img_param() call will cause a read miss - the data that you want aren't in a buffer because cfitsio has already flushed that buffer. So there must be a file seek followed by a read. The fits_write_img() call will use up all the buffers so that the next fits_get_img_param() call will be a miss again. BTW, fits_write_img() does some reading as well as writing, presumably for the same reason that it needs more buffers than are available. Unlike the fits_get_img_param() call, I haven't traced exactly when this happens - doing that would be a bit time consuming, and I think the issue is the same as with the fits_get_img_param() call. To verify my suspicions, I did the first call 2x in a row:
fits_get_img_param(fileptr, ...); fits_get_img_param(fileptr, ...);
The first time, there was a read miss (and hence disk read). But the second time, there wasn't. That is because with these calls, not all buffers get used up (all the data needed by this call lives in one record, and that is easily contained in a buffer, whereas with fits_write_img(), a large number of buffers needs to be used, even on my test which was 192 x 192 images).
- Improve efficiency when reading and writing TAS slices. First, the fits_write_subset() call can do a lot of file seeking and reading. Instead, and if the image to be written is contiguous in memory, use the fits_write_img() call. Second, cache the desirable parts of the fits file header. If you don't do this, and then you request something like bitpix from cfitsio, it might have to seek to the header part of the file and re-read in the bitpix part of the header. This could happen easily as cfitsio buffers file access, and the number of buffers is small. cfitsio doesn't save the header in a structure - it lives only in these ephemeral buffers. With release code on maelstrom, I'm now seeing less than 2 tenths of a millisec per TAS/fits slice write (for 192 x 192 float slices) when writing slices, and a total read overhead/inefficiency a little less than 1% (to write 54MB, cfitsio reads .5 MB from the output file). Keh-Cheng saw 0% overhead, but I'm not quite seeing that, but close.
- Manually downloaded LZP day 250 from MOC Product Server. They reused that day to hold sim #4 data, but my scripts had already downloaded the original day 250 files. In that case, my script won't re-download the same files. They did not change the version number of the files, so my scripts didn't re-download them.
Finished implementation of record-chunking. I had to remove a LIMIT statement from the query part of the cursor declaration. The cursor should be defined on the entire set of records, but the limit statement was preventing the entire set of records from being included in the response to the query. So, if the original query was 150K records, the limit was causing only 140K records to be present in the overall record set. Then when you iterate through all records, 10K records are missing (rs->n == 150K, but the cursor only knows about 140K).
- Met with Rock and Carl to discuss how to store information in a record of a series of Carl's so that he can find the source record in the future. I also went over how to write doxygen documentation for modules.
Help Jim use libsoi.so. For my own edification, you can simply add "-lsoi" to your link command line, and also do setenv LD_LIBRARY_PATH <path> and that will cause the linker to statically link to the dynamic library! Then you don't need to call code to lookup entry points and assign them to function pointers, etc.
10-02-2008
- Ticket #54 and #58 - If there are no non-TAS segments, then the slot directories are not created. This involves adding a flag to the various "newslots" functions, and in the case of socket-connect code, passing a new parameter via the socket connection. The flag is 1 if the slot dirs should be created. If there are no non-TAS segments, then the slot directories are not printed by drms_record_directory(). Otherwise, they are.
- Implementation of record-set-chunking. Currently, works only if iterating in the forward direction.
- Fix some bugs in record-chunking.
- Make extract_fds_statev use record-chunking in a couple of places to try it out.
09-25-2008
- Fix for control-c for direct-connect modules. The database connection was being broken by the signal thread without regard to what state the main thread was in. Often, the main thread could be trying to access the dbase, but the signal thread pulls the plug on the dbase in the middle of this. Now, the SIGUSR2 signal is sent fromthe signal thread to the main thread, and the latter goes quiescent before the signal thread disconnects from the dbase.
- Bug Fix: for ingest_lev0 creating a fits file with headers that doesn't have the bzero/bscale keywords set. Create a new drms_segment_writewithkeys() function that adds fits keywords to resulting fits file.
- Met with Tim and discussed next steps in global helio pipeline. Need to discuss cmd-line, env saving again. Tim needs record-chunking.
- Track down problems with extract_fds_statev(). 2 problems: committing one record at a time (need to combine records into a record-set that can be closed with drms_close_records()), and searching for one record at a time from dbase. Need to chunk records together.
- Bug Fix: .jsds that contained slotted keywords didn't alway result in a series that has an index. The index should be the corresponding index keyword, but if the .jsd explicitly described the index keyword, no index was created.
- Ticket #82 - There were several problems that I fixed: jsds with 'index' keywords were resulting in series with no db index, several of the jsd fields needed surrounding double quotes, don't print to jsd implicitly created keywords, don't allow explicit creation of index keywords, don't reject jsds that have a unit size of 0 but no segments, set the implicit keyword flag on index keywords, ensure index keywords are part of the set of db index keywords, don't include in the set of db index keywords slotted keywords.
09-18-2008
- Discuss the setting of SU retention time. It isn't quite doing what we want. I wrote up a 'truth table' with the results of discussions with others and my research on how the code works. I sent it out to Phil and Jim to review - this will eventually go to Jesper for review.
- Started writing up a job description for the PSQL expert.
- Ticket #53 - implemented the part about removing the ability to set retention time by specifying an environment variable (DRMS_RETENTION).
- Ticket #53 and #39 - Added functions to drms_series.c that allow the user to check whether they have permission to create a new record, delete an existing record, or update an existing record for a given series (drms_series_cancreaterecord(), drms_series_candeleterecord(), and drms_series_canupdaterecord()).
- Talked with Jennifer about dbase table permissions, and the drms_open_records() sql that gets evaluated.
- getorbitinfo: Fix bug in routine that finds the grid point greater than the target point; also handle the situation where the FDS data just doesn't contain enough data to satisfy the request for the target point.
cmdparams now stores the original cmd-line in 2 new CmdParams_t fields - argc and argv. These are completely analogous to the argc and argv parameters in the main() function. Access these via the cmdparams_get_argv(&cmdparams, &argv, &argc) call.
- getorbitinfo: For each interpolated time, add calculation of solar vr, vw, vn and dsun_obs. Also, return a list of info structures that contain hciX, hciY, hciZ, hciVX, hciVY, hciVZ, dsun_obs, vr, vw, and vn.
- getorbitinfo: In the demo program, getorbitinfo, iterate through returned info structure, and print out the contents. Also, fix a bug in the atan() function used - I needed to use the one that allows you to specify what quadrant you want based on the values of x and y.
- Ticket #80 - I worked with Jim on this. This was a DRMS problem, not in SUMS - I fixed it at some point, but Jim's show_info hadn't been updated with that fix yet. So, this is fixed.
09-11-2008
- Increase performance (decrease run time ) of jsoc_info, and fix some leaks. It now takes less than 1/200th the time it originally took. There was a bad use of realloc to resize a buffer - it was being resized continually, and by adding only a few bytes at a time. Also, there was an n^2 algorithm.
- Next step in lev0, getorbitinfo processing. Start making module that figures out what range of data to fetch from DRMS given the desired output times. Also, cache this information.
- Create several new trac tickets from notes in my spiral binder.
- Bug fix (trac ticket #80): It turns out there were a couple of problems. -P wasn't working when either the -A flag was set or the seg=XXX list was provided. Also, Jim is indicating that SUMS isn't always looking at the SUM request mode (which contains the RETRIEVE/NORETRIEVE flag). I've fixed the first problem (revision 1.23 of show_info.c), now reassigning it to Jim to look at the SUMS issue.
- Bug fix (Ticket #78) - Fix crash in db_bin_query(). Use PQgetlength() to determine length of data to copy from the psql result to the DRMS internal representation of a db query, not the fixed maximum value length.
09-04-2008
- Created a module, getorbitinfo, that fetches FDS orbit vector information from sdo.fds_orbit_vectors. For now it calls test code. This code compares FDS HCI values with HCI values that are derived from an earth ephemeris and the FDS GCI values. This is a verification of the FDS data provided by NASA (and it looks fine).
- JSOC version 4.6 release.
- Met with Rock to go over next steps in creation of getorbitinfo.
- Do some review and research what projects are left to be done, and what the status is of those projects.
- Reviewed all old active/open tickets, assigning all unassigned ones when appropriate.
08-28-2008
Fix bug in code that reads binfile and zipfile segment files. drms_segment_read() was assuming that the bin reading code set bzero and bscale. But the binfile doesn't have that information, so they were garbage values. The garbage values were then used to convert the data ==> garbage in --> garbage out.
- Added implicit bzero/bscale keyword generation to binfile and zipfile segment protocols. bzero and bscale are NOT stored in these files at all.
- Fix jsoc_update.pl. On the remote machine (eg., n00), it was not cd'ing to the JSOC tree before running make.
- Test Phil's make_vw_V using a combination of both ds_mdi.XXX JSOC dataseries, and prog: MDI dataseries as input. Works fine.
- Started working on next JSOC release - creating release notes.
- Implement drms_segment_readsclice() and drms_segment_writesclice() functions. Previously, there was no drms_segment_writesclice(), and the drms_segment_readsclice() function read the entire file into memory, and then trimmed out the undesired parts.
- Finished up TAS file implementation and testing.
- Several make file fixes in SUMS, DSDSMIGR, and lev0 - most of the problems were due to redundant obj file or exe file rules.
08-21-2008
- Modify show_info -j code (actually drms_jsd_print()) so that the regular record template is not used. The record template expands per-segment keywords into multiple keywords, with the per-segment flag set. Then if you try to create a series from the printed .jsd, each of these keywords in the set expands into multiple keywords (geometric growth). Instead prevent the expansion from happening if the template is going to be used to create a .jsd.
- Started porting soho_orbit.c to JSOC land. Spent some time trying to figure out what it does. The plan is to grab enough of it to use on our data in sdo.fds_orbit_vectors and see that the results we get using the old code and the new code jibe.
- Tried unsuccessfully to use Phil's ingest_dsds_a to ingest dsds vw_V data into a new test series that uses the TAS protocol. It looks like I'm missing PEQ.
- Got Phil's ingest_dsds_a to work on DSDS data, ingesting form a "prog:" specification.
- Implemented TAS for floats with no compression.
- Modified drms_open_records() so that it recognizes "dsds.XXX" and "ds_mdi.XXX" data series. These have a generic segment that points to a VDS directory. When drms_open_records() sees these data series, it calls drms_open_dsdsrecords() so that libdsds.so can handle them. libdsds.so calls peq to get the soi keylist which VDS knows how to use. And Jim has modified peq so that if it sees a non-prog spec, it will ask SUMS for the data (which is where the "dsds.XXX" and "ds_mdi.XXX" data reside). Unfortunately, in this case, peq doesn't work well - the key list it returns isn't useable by vds_open(). Instead, I had to extract the VDS directory (a SUMS directory) from the keylist, and then create a new keylist from this VDS directory. Then vds_open() was able to use this second keylist to access the data.
- Met with Jesper to discuss FDS orbit vectors and how to use them to provide interpolated vector information for level 0.5. Don't assume that the desired times for which we want orbit info are evenly spaced. Linear interpolation most likely won't work. Whenever FDS data are used by the getorbitinfo function, cache the range of FDS data retrieved from DRMS - this function will be called every second. I'll write the outer shell of the function, Jesper will write the math part. I will still write the code that tests HCI values provided in FDS - start with GCI provided in FDS, run Jesper's code that converts to HCI, then compare to the HCI provided in FDS data.
- Restarted ssh-agent and exportmanage.pl and recovered following the power outtage on Monday.
08-14-2008
- Worked mostly on FITS slices. Got the FITS implementation of TAS working, including bzero/bscale issues. Added a cache of open fitsfile pointers so that subsequent uses of a fits file within the same DRMS session do not have to call fopen(). The cache gets purged if too many fitsfile pointers are open. The file changes are discarded on module abort. Tested with simple cases thoroughly.
- Started working on fixing the "show_info -j" code. This code makes a .jsd from a series' template record structure. Reworked the per_segment and isdrmsprime fields of the DRMS_Keyword_t structure. per_segment was renamed to "kwflags", which is a bit field. The first bit is now the per_segment flag, two bits are for the isdrmsprime flag (which was expanded to not DRMS prime, DRMS-internal prime, and DRMS-external prime), and one bit indicates whether a keyword was implicitly created. When creating a jsd from the jsd template record, we don't want to include certain implicit keys (like index keywords). However, the "implicitness" of the keyword was not being saved before I added the kwflags structure.
08-07-2008
- Worked mostly on the FITS TAS format. Copied some of Tim's code into my cvs tree and got the DRMS wrapper around it working. But several issues regarding bzero/bscale arose. Further progress is pending on meeting to resolve issues. Had several discussions with Phil, Rick and Keh-Cheng.
- Met with Phil and Rick to resolve bzero/bscale issues, and to finalize FITS-slicing plan.
- Fix drms_stage_records(). It was possible to request greater than the maximum number of SUs from SUMS, which caused failure and a crash. Fix for drms_su_getdirs() requesting more than the maximum number of storage units that can be requested from SUMS. Request MAXSUMREQCNT chunks from SUMS, looping until all SUs are requested.
07-31-2008
- Met with Igor to discuss VSO access of SU database. The big issue is speed - according to Igor, our system isn't optimally designed for speedy queries. He is developing a plan to work around this limitation. It may involve a new database table, new database columns, or a new database. He will collect some empirical data on SU system responses before developing a complete plan.
- Fix broken build.
- Fix drms_segment_read() not working when the segment protocol is 'bin'.
- Give DRMS presentation to Time-Distance meeting.
- 1 day attending a wedding.
07-24-2008
- Fix jsoc_export failure. Problem was maelstrom rebooted, but the script that normally must run that repeatedly calls jsoc_export_manage wasn't restarted.
- Made a DRMS module that shows how to send signals to a specific thread in a DRMS module. This works despite being inside the DRMS multi-threaded environment. To build this, you ‘make threadsigs’, or ‘make examples’ (which will make all the example modules). In a nutshell, you need to create a new thread to handle the signal you want to send, ie SIGARLM. The signal gets sent to this thread. Then when this thread handles the signal, it sets a global var that the main thread can then see. You have to be careful and use a mutex whenever you handle this global variable, otherwise you could get a race condition.
- Add td_createalarm/td_destroyalarm, in libthreadutil.a, functions that allow the calling thread to receive ARLM signals. Also added a new example module, threadalrm, that uses these new functions.
- Bug fix: Fix for ingest_lev0 not building. add_small_image.c was including png.h instead of mypng.h.
07-17-2008
- Ensure that the global build macro, CDIR, doesn't have the /auto/homeX stuff in it.
- Bug fix: print out prime keys correctly in jsoc_export - was using the prime key templates instead of the real instances of the prime keys.
- Fix ingest_lev0 build, which was errorring out.
- Change drms_getkey_string() to use the format/unit provided in the keyword to format the time string output.
- Remove unnecessary prime keyword (FDS_DATA_PRODUCT) from sdo.fds, fdsIngest.pl, and extract_fds_statev.c. Make the potential values for the DATA_FORMAT field single chars to enhance efficiency.
- Make the idHELIO and idGEO keywords in sdo.fds_orbit_vectors more compact. Rearrange the order in which the contents are saved so that a user can simply run 'show_info -p' on these strings and have DRMS return the path to the original data files.
- Changes to the master MOC Product Server download scripts so that they download 'live' data (i.e., they user j0 to download data from the server). Now both 'live' and 'dev' data (data downloaded through maelstrom) are being downloaded daily.
- Start using ssh-agent as a means to provide pass-phraseless use of priv/pub keys when downloading files from the MOC Product Server.
- Put the path to the .ssh-agent configuration files in the MOC Product Server download configuration files so that the download finds the ssh-agent that provides the private keys. The cron job environment doesn't provide the necessary env variables so the ssh-agent server information has to be placed in a .ssh-agent file which then must be sourced by the download scripts directly.
- JSOC Version 4.5/NetDRMS Version 1.0 release. This took some time to track down all the issues. The good news is that much of the hacks needed to get remote users up and running have been obviated.
- Document storage-unit archive and retention concepts, both in the wiki and in Doxygen.
- Met with Rock to discuss the next step in the fds_get_orbit module.
07-10-2008
- Fix bug where show_info was trying to get the record-directory, even for DSDS and plainfile data sets (which have no record-directory). Add support to libdsds.so and libdrms to provide file path for DSDS and plainfile type record-set queries.
- Modified libdrms: upon export, convert TIME keywords to string keywords.
- Track down some leaks in drms_opendsds_records() and other locations.
- Fix bug in fitsrw: String keywords should have their values surrounded by single quotes." base/libs/fitsrw/cfitsio.c.
- During export of fits files, convert TIME keywords to string keywords.
- Fix bug where show_info was trying to get the record-directory, even for DSDS and plainfile data sets (which have no record-directory). Add support to libdsds.so and libdrms to provide file path for DSDS and plainfile type record-set queries.
- Add new DRMS API for ingest_lev0: drms_export_tofitsfile() - takes DRMS_Array_t, keyword, and compression parms. Add a module to test out the new drms_mapexport_tofitsfile() API.
07-03-2008
- Add function to libmisc.a that safely (or more safely) concatenates strings: base_strlcat().
- Add jsoc_export_as_is() make rule to proj/export/apps.
- Make the index.txt file produced by jsoc_export.c compatible with the one parsed by jsoc_export_make_index; Add code in jsoc_export_manage.c so that it calls jsoc_export.c for the case of exporting to fits file.
- Pass jsoc bin/script root to jsoc_export_manage as a cmd-line param, then pass this root via qsub to the qsub script. This allows the user to specify the JSOCROOT of the exes and scripts to be used during export (you may want to use bins other than ones rooted at /home/jsoc/cvs/JSOC). These get set in the environment that calls jsoc_export_manage.
- Add support for specifying the db user in jsoc_export_manage - this will be needed if running as user jsoc.
- Make the export-packing-list file (index.txt) have lower-case keywords.
06-26-2008
- Met with Karen to continue the brain dump.
- Another meeting with Karen, Keh-Cheng, Jim and Phil to discuss what happens after Karen is gone and to discuss the present state of our database projects moving forward.
- Met with Phil to finalize jsoc_export plans.
- Added a new case for 'fits' file export to jsoc_export_manage.c. This calls jsoc_export with the appropriate set of arguments. Spent a long time getting jsoc_export_manage to work - the dbase control arguments, like JSOC_DBNAME, DO NOT WORK WITH SOCK MODULES when they connect to an existing drms_server process. You have to run drms_server with JSOC_DBNAME set, then all sock modules use that setting.
- Implemented jsoc_export per meeting with Phil. Made it run so that it can accept a record-set query from either the cmd-line, or from a keyword in the jsoc.exports series (or any series). It also creates the "packing list" file now.
- Implemented a way to read generic text files that define constants. These text files are not compiled – they are read into memory during runtime. So, you don’t have to re-compile to change the definitions. I added just one such definition file for now: /proj/export/apps/data/export.defs. But the idea is that this would be used for the “Configuration” file concept (put d02 into a config file, not hard-coded in SUM.h). The call drms_defs_register(DEFS_MKPATH("/data/export.defs")) reads the file into memory (path relative to your .c file), then when you call drms_defs_getval("kPackListFileName"), for ex., you get the definition associated with an id string named kPackListFileName.
- Worked on make files so that ingest_lev0 can run as a sock module. But the current design of ingest_lev0 prohibits a conversion to a sock module. Because it uses drms_server_begin_transaction()/drms_server_end_transaction(), it must a direct-connect module.
06-19-2008
- Friday 6/13 PTO
- Bug fix: fix parsing of record-query segment list that was leading to an infinite loop; move the code that removes unneeded segments a little downstead - the removed segments were still needed.
- Test out Keh-Cheng's patched cfitsio - ensure DRMS works with it. It does work and it fixes the problem we were seeing earlier with reading compressed signed char, short, and int images.
- Test out new ds9 - seems to work.
- Bug fix in FITSRW: Was not properly reading the essential fits keywords (eg, NAXIS, BZERO, etc.) from FITS files. Keywords like BZERO could be int, but they could be float. The FITSRW code was assuming int, when in fact sometimes the value was a float. So I added several conversion functions (convert to int from any keyword type) and used those in the place where FITSRW was failing.
- Bug fix in FITSRW: The cfitsio_append_key() function was not casting string keywords properly. It was assuming that the string value passed in was a char *, but it was a char ** (a pointer to a string, not a pointer to a char).
- Pull out all top-level jsoc export code from drms_record.c and put into the jsoc_export module.
- Met with Karen to get a brain dump on postgres layout for DRMS, SUMS interface with DRMS, various DRMS threads, and socket-connect vs. direct-connect stuff.
- Met with lev0 gang to discuss next steps for getting to level 0.5.
06-12-2008
- Released version 4.4 of JSOC. No NetDRMS release was created. Tracked down a few issues before finalizing the release.
- Wrote up RFC for the Configure File. This file would contain all our currently hard-coded paths and defines. Code would then read from this file to obtain this information.
- Add code to verify that keyword format fields in the .jsd are compatible with the data type of the keyword. If an incompatibility is detected, a warning is printed, but the module will continue to run to completion.
- Fix a bug in zone_adjustment_inner() I added a couple of days ago. Forgot to check for a NULL int * before setting that int.
- DRMS now recognizes time strings with format 2008.05.12_TAI.
- Investigate several DRMS issues: the configure script making links to ALL .h files in your source tree (don't put extraneous stuff in there), buffer overrun in ingest_lev0 when reading a keyword, drms_run not working when doing drms_export stuff, investigate a build issue for Charles Baldner - he wasn't linking to lapack correctly when using icc; Carl was having a link problem because he was using gcc to try and link against /home/production/cvs/jsoc/lib/saved/linux_x86_64/libhmicomp_egse.a, but this library was built using icc so he was seeing unresolved dependencies.
- Fix a bug in the FDS/LZP download scripts. LZP was dumping in a location where the FDS ingest script was reading from. As a result, files not recognized by the FDS ingestion script were being rejected and causing failure.
- Fix 2 small memory leaks in drms_open_records().
- Track down a bug in ingest_lev0.c where it was not calling drms_free_array() after calling drms_segment_read().
- Thursday 6/12 PTO
06-05-2008
- Worked with Tim L. on TS_SLOT slotted keys. He was having problems getting the right slot given a time string query, but that all seemed to work once he got to my office. We tested it out and it seems to work.
- Modified extract_fds_statev (the module that ingests helio- and geo-centric FDS orbit vector files into sdo_dev.fds_orbit_vectors) to skip adding a new record if there is an old record with exactly the same information in it.
- Finalized FDS Moc Product Server file download scripts and modules, except that we should be using j0.stanford.edu as the machine that runs the scripts. I sent a request form to NASA last week, but haven't gotten approval yet. Was able to add production@maelstrom and jsoc@maelstrom to the authorized_keys list on the MOC product server (had to work around not being able to ssh to that machine). FDS files are now both downloaded and ingested into sdo_dev.fds automatically, and then the helio- and geo-centric orbit vector files are ingested into sdo_dev.fds_orbit_vectors automatically.
- Worked with Rick on creating a 'cookbook' of JSOC modules that are basically examples of increasing complexity. This cookbook is part of both the JSOC and NetDRMS releases. I worked on the make files.
- Investigated an error Jim was seeing while reading a fits file. He somehow got a fits file saved who's data type/dimensions don't match what the segment specifies.
Ported Phil's TS_SLOT implementation from his home dir to mine, and then into cvs after testing. Also investigate issue with how rounding of durations was done in Phil's implementation (durations are queries of the form <start time>/<duration>). The rounding wasn't quite working with odd <duration> values (values that aren't multiples of the slot width) - fixed that minor bug. Also, print a warning if a user uses an invalid <duration>.
If a time string is missing a time zone, or has an invalid time zone, DRMS now interprets the time zone to be the one in the keyword->into->unit field.
- Bug fix in record-set query parsing code - Phil found the problem, Karen found the problem code and fixed it. I just verified that Karen got it right. Way to go Karen! It isn't easy following the parsing function I wrote.
05-29-2008
- Did latest JSOC release (version 4.3).
- Tested FITSRW/cfitsio carefully (bzero/bscale too) after integrating Tim H's changes and my changes.
- Fixed a bug in the 4.3 JSOC release - libfitsrw.a was never added to libdrms.a (a 'meta' library that contains the code of all other client JSOC libraries). This library is for users who work outside of the DRMS system. Rebuilt 0.9 NetDRMS release.
- Tracked down a few user problems (lookdata.html, a problem Rick was having, my own problems, etc.) to a change to the SUMSERVER definition in SUM.h. This was changed after the Ver 4.3 JSOC release. Not only did this define change, but the server changed as well. This means all existing JSOC modules would fail unless they got the new SUM.h and rebuilt their binaries.
- Changed SUM.h in the 4.3 release to accommodate a post-release change in SUMS that caused jsoc modules to not find the SUMS server. Did 2 things: modified .setJSOCenv and .setJSOCuser_env to manually set the SUMSERVER env variable (the overrides the value defined in SUM.h), and I also modified SUM.h and re-did the 4.3 release and the NetDRMS 0.9 release.
- Help Tim L. get his "peak bagging" C module that uses Fortran heavily to build and run. I made numerous make file changes and also tracked down a lot of missing dependent files (from SOI).
- Memorial Day holiday.
05-22-2008
- Spent more time testing cfitsio. I tested all of the bzero/bscale plan developed by Phil, Rick, Karen, and I.
Fix a slotted keys bug found by Tim L. SetkeyInternal() was not using the correct slotted keyword value when mapping to the corresponding index keyword value. It was using the UNconverted slotted keyword value, but it should have been using the one that was the same type as the slotted key.
Fix a bug in jsoc_info. drms_sprintfval_format() was being used improperly - you can't provide key->info->format as the format parameter if the keyword is a TIME keyword. This became a problem after I modified code to use key->info->format/unit as the sprint_time unit/time zone.
- Fix two bugs I found in drms_array.c. All the functions that convert from a float data type value to an integer value did not do rounding and range check correctly. So, now we first round (using the Linux round() function - round away from zero if the value is 1/2 way to the next integer or greater, otherwise round toward zero); then we check range, and if the range falls outside a valid integer range, the destination value gets set to missing.
- Fixed the following bugs in FITSRW: 1. was not handling CHAR type correctly; it was not converting between signed char (from DRMS) to unsigned char (from fits file) and back correctly. 2. Was not reading SIMPLE and EXTEND keywords properly in FITS file. 3. was not differentiating between image type and data type when writing images - this was a problem for the CHAR type.
- Helped Tim Larson get peak bagging code ported to DRMS. This involved fixing the Rules.mk files and finding all the dependencies (there were a lot of .f, .c files, and libraries).
- Worked with Tim H. to get his FITSRW compression code into DRMS. Made changes to his files so that it built in DRMS. It wasn't handling blank values properly. Spent a long time trying to figure out what the problem was. Eventually got Keh-Cheng to help see that the problem was a bug in cfitsio.
- Developed work-around to get FITSRW compression working (had to work around 2 cfitsio bugs).
05-15-2008
- Finish extract_fdv_statev. Re-wrote sdo.fds_orbit_vectors with the correct slotting parameters (epoch and step).
- Track down and remove an 'order by' statement from the select statement that selects the keyword information to be placed into the template record. Did this because re-ordering means that DRMS is out of sync with psql, and so that the order in which the .jsd specifies keywords matches what appears in the keyword HContainer_t.
- Met with Todd to briefly discuss magenetic pipeline stuff - how SOI works (and how is Hao maintaining it). Met with Yang to track down why old remapped images are on the website. I think that a script did not get updated with new v2helio parameters.
- Tim L. found a bug when ingesting his test data (with ingest_dsds_a). I investigated and found, by using psql, that anything that was type DRMS ‘long long’ was not ingested correctly. The problem was in libdsds.so. It assumed that SDS_LONGs passed from SDS were 64-bit types. But that is only true on 64-bit machines. On 32-bit machines, like n00 which is what Tim is using, the SDS_LONGs are 32-bit. (so basically, SDS_LONG means ‘long’, not ‘long long’). I made the fix on Tim’s machine and he checked it in. To use it, you just need ‘make dsds’. The problem should only exist if you ingested data on a 32-bit machine, and your input fits files have SDS_LONGs in them (yes, SDS converts small values like -1 to long instead of int).
- Talked with Jennifer about slotted times - what they are used for and some of the details (like slot straddles epoch).
- Rebuilt /home/jsoc binaries for hmidb to hmidb2 change (and back again).
- To test fits file reading/writing, I modified arithtool. I added support for in-memory data types other than double; added new bzero/bscale parameters.
- Implemented final bzero/bscale plan. Most of the changes were in drms_segment_write(). Removed drms_segment_setscaling() and drms_segment_getscaling() which where actually just get/set bzero/bscale keywords. We will need two new API function with similar names that mean 'read/modify the DRMS_Array_t parameter'.
- Spent quite a while tracking down a bug in FITSRW. It wasn't working on CHAR data types. The problem was that DRMS was passing signed char data, but FITSRW was expecting unsigned char data. I changed the cfitsio image type to SBYTE_IMG and the data type to TSBYTE to accommodate signed data. cfitsio does not natively support signed BYTES - it just adds 128 to data and adjusts bzero. So I had to set a bzero value that compensated for this.
05-08-2008
- Record-chunking. Wrote just the C-wrapper around the actual SQL FETCH statement that downloads the next chunk (so currently, drms_recordset_fetchnext() actually doesn't get called - the entire record-set is downloaded, for now).
- Work on drms module to extract helio- and geo-centric orbit vectors from FDS data products. I had already written the code to do helio-centric orbits - so I added the geo-centric orbits.
- Worked with Carl on the hk_dayfile .jsd. Wrote up the keyword .jsd descriptions for the DATE keyword. Made the _SDO_to_DRMS_time() function static inline, and changed the epoch call - use the #define that is a number, not the #define that is a function call.
- Met with Carl, Jennifer, Rock for more lev 0 generation discussions about how to obtain and ingest FDS and housekeeping files. Helped Carl with the jsd descriptions of his series' time-slotting keywords.
- Spent a fair amount of time re-writing extract_fdv_statv (a module). Needed to think about how to handle ingesting orbit files from sdo.fds (which contains disparate data - data with differing cadences, formats, etc.). It was more complicated than you'd think. Basically, as soon as orbit files are downloaded, they get ingested into sdo.fds_orbit_vectors. Because this series is a combination of two different FDS products, I used a temporary table to match up helio data from one file with geo data from another file, and create an output record in sdo.fds_orbit_vectors.
- Found and investigated a couple of bugs while doing extract_fdv_statv (transient records were not working for direct-connect modules - Karen fixed.)
- Sick with flu on Monday.
05-01-2008
- Fixed a bug in drms_insert_series(). At some point, we switched the meaning of the format and unit fields of TIME keywords. This meant that calls to sprint_time() sometimes had to be adjusted. I missed one of those calls. This change was fixing one of those calls - before my change, the 'format' field was being used, when it should have been the 'unit' field.
- Wrote up detailed documentation for download of MOC Product Server files. Created a place on the wiki for the MOC Product Server.
- Helped Carl get the SDO_to_DRMS_time() function put into a library to be shared by several users of the function. Before this, the same function was defined many times. Carl will then switch out the current references to the duplicate functions to use the library version.
- Performance review with Phil. Finalized Phil's review form.
- Met with Rock, Carl, Jennifer to discuss 0.3 generation from FDS data, house-keeping data.
- Started porting helio2mlat from SOI to DRMS. Got about 1/2 way through.
04-24-2008
- Vacation most of Thursday/Friday.
- Resolve code merge issues with one of the SUMS Rules.mk.
- Work on FDS/LZP download scripts. Added logging and the mailing of bad error messages to several people (Rock, Jennifer, Art). Clean up - remove unused static LZP file specification (instead the specification file is generated dynamically and depends on the current date). Fix cronjob table so that the path to the executables/scripts used by the download scripts can be found.
- Work With Tim Larson. Finalize initial port of v2helio to DRMS (o2helio). Fixed some crashes involving code that manages memory (ensures that leaks don't happen). Help track down problem where o2helio's apodize() function was producing output that differed at the 1e-5 level from v2helio's apodize() function. The problem was in a statement like: float f = (float)(d); float f2 = d; f != f2 (where d is a double). The compiler, upon seeing the float cast, may change the way it holds intermediate products (intermediate may be floats, not doubles).
- Implement SLOT slotted keywords. I had implemented TS_EQ a while ago. I made the additions/changes to support SLOT.
- Filled out performance review documents.
04-17-2008
- Added a TSEQ_EPOCH flag/string to DRMS so that you can specify TSEQ_EPOCH for a time variable's value and drms_parser.c will understand this a replace it with the correct num secs since the DRMS epoch.
- Worked with Rick to get SUMS working on JILA's linux machine.
- Investigated how to use CVS to lock files. Sent results via email to the Thursday group. Went on record opposing using file locks.
- Met with Rick and Jim and D. Haber to help her get SUMS running at JILA. Most changes had to do with modifying paths hard-coded into SUMS code.
- Modified the code that populates keyword info - if the keyword is a TIME keyword, then do some checks to see how the format and unit fields are being used. People have been using these improperly and inconsistently. So, format and/or unit may be changed under the hood.
- Added a "version" field to the .jsd. This allows us to add new fields to the .jsd without having to retrofit ALL existing series. This version gets saved in the *.drms_series table. That will allow us to know what version of .jsd the series was created with. The first use of this versioning is to add a "cparms" field to the segment specification (see below).
- Added a "cparms" field to the segment part of the .jsd. This is a string that specifies what type of compression the segment will use. It is a string passed directly to cfitsio. It gets saved as a segment-specific keyword - this allows us to specify a different cparms for each data file in a series.
- Spent a long time debugging a problem when I added a "version" column to the *.drms_series tables. Not sure what the problem is, but a plpsql script is failing, presumably because of a return type mismatch between various *.drms_series tables. Karen is helping me now track down the problem.
- Discussed with Phil a proposal to chunk recordset queries. Wrote up the proposal and sent it out to jsoc_dev.
04-10-2008
- Add ringfit_ssw.f in the proj/examples/apps directory - this is a Fortran module that calls D. Haber's ringanalysis Fortran function.
- Did the JSOC Version 4.2 release.
- Worked with Rick on the changes necessary for building JSOC/DRMS code on Mac. This included incorporating Joe Hourcle's changes. Got DRMS built, but SUMS is a problem. Stopped working on this pending a decision about how important this mac port is.
- Cleaned up move of lookdata into CVS: removed jsoc_support.js from cvs (this is no longer used), added prototype*.js to cvs (and link from /web/jsoc/htdocs/ajax to ~jsoc's directory containing this file), removed jsoc_info.csh and show_series.csh from cvs since they were for testing only.
- Remove hard-coding of compiler choice for sums apps and libs. Now, the compiler chosen in make_basic.mk (gcc or icc, default is icc) will be used to make these SUMS binaries.
- Meeting with Karen and Tim H. to discuss plan for incorporating fits-compression specification into .jsd files so that .jsd writers can specify what type of compression to use.
- Updated my jsoc_export module (JSOC/proj/export/apps/jsoc_exports.c) to work in the jsoc database (it was working on the jsoc_test database previously). I noticed that the RequestID is an int in db jsoc, and I was assuming it was a string in db jsoc_test (we decided in a meeting that it would be a string). But I changed lib drms to assume a long long.
Added the ability to append a segment list to the record set query: ds=<recset>{seg1,seg2,seg3…}. drms_open_records() recognizes this syntax – the rec->segments container contains only segs that you request (this doesn’t affect the template segment though).
04-03-2008
- Updated the script that the maelstrom cron job calls to ingest the FDS data into the series sdo.moc_fds. So, the cron job calls mocDlFds.csh, which in turns calls dlMOCDataFiles.pl to download the files to /surge/sdo/mocprods. Then mocDlFds.csh calls fdsIngest.pl to ingest the files into sdo.moc_fds. As files are successfully ingested (the ingest script compares the source file and the ingested file), they are deleted from /surge.
- Move the 0th slot so that its CENTER corresponds to the epoch.
- Meet with Rick, Debra, Paul to talk about how to get netDRMS SUMS localizations into next JSOC release.
- Incorporate Joe Hourcle's mac changes into our CVS tree.
- Sigh. Fix jsoc_sync.pl yet again to work around lame CVS. I need to vent - I can't expess how difficult CVS is. There, much better. Call cvs update followed by cvs checkout to get the desired effect (add/remove/update all the files that are in the user's working directory, followed by checking out NEW files within the module - the NEW files were added by a CVS user).
- Updated configure script so that the check for 3rd-party libs is now $JSOC_MACHINE-dependent.
- Check into CVS the jsoc_info app and supporting web apps (lookdata.html, jsoc_support.js, etc.). Updated the files/directories on /web that contained these files to point to these files in their new CVS locations. Updated the CVS tree rooted at ~jsoc/cvs/JSOC to use these new files.
- Move time mapping (from date strings or enum vals to doubles) to drms_types.c since TIME is one of the drms types. There are only a couple of time functions so keep them merged in drms_types.c, not a separate new file.
- Mess up Phil's CVS working directory by making a lot of his files owned by me. Then attempt to fix the problem, but call Brian to have him chown the files back to Phil.
- Move the definitions of various epoch (MDI_EPOCH, SDO_EPOCH, etc.) to timeio.h. Also, get rid of JSOC_EPOCH. Code that wants to use the MDI_EPOCH will need to access series that have been created with MDI_EPOCH as the epoch.
03-27-2008
- Checked in initial implementation of drms export.
- Update drms_sscanf() to accept the string "DRMS_MISSING_VALUE". Now, when the jsd parser, for example, sees this string, the data-type-specific missing value will be set.
- Change the implementation of the FITS and FITZ data segment protocols to use the cfitsio library wrapper FITSRW. The old implementations now exist in new protocols, DRMS_FITSDEPCRECATED and DRMS_FITZDEPCRECATED.
- Fix the drms_protocol stuff - adding protocols was a confusing experience. There were two enums that had the same items in them, but in different orders. The conversion from string to protocol was inefficient, etc. Did this in preparation for the other work on deprecating the old protocols.
- Fix the drms_parser code that would not accept an empty string.
- Work on using the new FITS protocol to write out float data into an integer data segment. Right now, there is a crash.
- Found a problem with FITSRW. It thinks that C type int is the same thing as fitsio's TLONG. But TLONG is of type long, which is 32 bits on a 32-bit machine, and 64-bits on a 64-bit machine. Corrected that problem. Added support for fitsio type TINT - which meshes with DRMS_TYPE_INT.
- Attended the team meeting in Napa on Wednesday.
- 03-20-2008
- drms_names.c was not properly handling a record query that contained a prime key value without specifying the prime key name itself (eg, su_production.tlm_test[VC05_2008_030_16_42_56_200872a1918_1c298_00]). The problem was that an acceptable string value was limited to 32 chars, but in fact any length string (up to the query limit) should have been allowed.
- Created su_arta and jsoc namespaces on the jsoc_test database. I need these to test out the new FITSIO protocol, and to test out drms export.
- Made some changes to FITSRW so that it will work with drms export. Although it used a keyword container nominally called a 'list', it was not a linked list but an array. I added rudimentary support for linked lists (creating, inserting, freeing). I need this because in general we don't know ahead of time how many keywords will be in the list. You iterate through keywords, and if they are suitable, then you add an item to the FITSRW keyword list. This new implementation is used for drms export. I did not change the existing uses of the keyword array.
- Fixed the jsoc_update.pl script - it wasn't running the configure script due to a typo involving a missing semicolon.
- Briefly investigated the make system to see if binaries that need to be built are building. Phil was thinking that sometimes things don't build that need to build, but in fact what I saw was that things that don't need rebuilding DO rebuild. I have not figured out why this is the case, but this is a minor issue - we inefficiently rebuild when not necessary.
- Met after last Thursday's jsoc meeting to discuss some more drms export issues. We chopped up some of the work and divided it up amongst people. The drms export specification is starting to take some real form.
- Investigated the ability to be able to specify the filename for a generic data segment that gets saved in SUMS. Right now, it is the source file base name. I suggested always using the segment name in an attempt to reduce complexity, but it looks like this won't happen. And it cannot happen since a lot of what we've already ingested doesn't use the segment name. Will hand this off to Tim, but I'm not sure what the resolution is yet.
- Discussed scaling issues during drms export. The resolution is that we will NOT scale the data upon export - the data bits in SUMS will be the data bits exported. Of course, the BITPIX, BSCALE, etc. drms keywords will be put in the exported FITS header.
- Met with Carl to make his lev0 packet_time housekeeping series slotted. Then troubleshooted problems.
- Spent a few days working on drms export. Got it completely implemented. Now I'm creating a module to test out the DRMS calls that export the data. I'm testing with the jsoc_test database.
- 03-13-2008
- Finish adding code to drms_segment.c to read/write FITS files using fitsrw (cfitsio wrapper). Code checked-in, but is only accessible via new FITSIO segment protocol.
- Met with Tim to discuss changes needed in fitsrw to support my code in drms_segment.c. Then met one more time to integrate his changes into CVS.
- 'Fix' some build problems. Actually, just keep the building - some changes needed to be changed.
- Overhaul of drms_ismissing() based on email thread. Broke up this function into several type-specific ones: drms_ismissing_char(), etc. Use isnan() for drms_ismissing_float/double(). drms_ismissing_time() checks for isnan() and JD_0. These functions are all static inline functions so that type-checking is performed.
- Modified slot-keys. If a slotted-key duration falls on a slot boundary, then do not include the next-higher slot as part of the duration. Also, if a time falls very close to the upper slot boundary, move it into the upper slot (the rationale is that imprecision in float could cause what should have been on the boundary to fall below it).
- Reviewed TAS/array slicing.
- Meeting to discuss drms_export(). I'm going to do the export of drms records to fits files. This will involve using Tim's fitsrw.
- Make-file changes: set the various -L and -l flags so that icc and gcc can build and find cfitsio library. Make icc code link to icc-specific libraries, but gcc code not link to them.
- TAS/slicing meeting. We discussed what needs to be done to use CFITSIO to do the slicing. Tim will figure out how to use CFITSIO to write/compress blocks (tiles). I will figure out how to slip this into the existing TAS framework. Probably just use the existing TAS, but at lowest level call into FITSRW (wrapper around CFITSIO). TAS currently deals with writing partial slices by concatenating until full, then writing at the end of the TAS file. This is tricky - causes defragmentation, problems tracking partial blocks, etc. We will abandon this for now and just have the user write FITS blocks directly. We will revisit partial writes later.
- Made some minor changes to FITSRW so that it works with drms_segment(). Debugged FITSIO protocol - the drms_segment_write() is working, but drms_segment_read() is resulting in image data that is all NaN.
- 03-06-2008
- Add all previous revisions of hk_config_file files AND all previous revisions of hk_jsd_file files.
- Another fix to libdsds.so. If the data type of fits file being ingested in not double, then make the type float. This is the conversion that VDS/SDS is going to apply.
- Add fdsIngest.pl to the cron job that automatically downloads FDS files from the MOC Product Server. This script ingests all FDS products of interest into a single DRMS series, sdo.moc_fds (this is in progress).
- Met with Carl to review plan for migrating lev0 scripts and code from EGSE and jsoc trees to JSOC tree (tables already migrated). We will do the next JSOC release without Carl's lev0 scripts and code. He will do the migration on his own after the next release.
- Sat down with Tim H. and integrated his cfitsio work into our CVS system. I did all the make file work to make this happen. His library now builds when make is run. Had further discussions of the next steps to take. Tim will start working on files in CVS; when that is done, we will work together on drms_segment.c to call into his library.
- Worked with Jim on JSOC release. There were SUMS problems in UC JSOC - writing past the edge of a buffer, double-freeing, etc. By end of day Friday 2/29 Jim had resolved those.
- Spent all day Monday working on release. Jim, Karen, and Carl all had more changes they wanted in the build - I synched those to my machine. Created a script, base/util/scripts/extcvscomm.pl, to provide all commit comments between a specified date and the current date. I used the output of that script and solicited comments to include in the release notes. Tested the latest build with various simple modules/commands.
- Updated the documentation to discuss how to access level-0 tables.
- Dealt with several release issues: 1) current drms_sscanf() didn't work on some of Rick series, he changed which broke other DRMS features. The two were not compatible. Met with Rick and resolved the issues and put the fix into the JSOC V 4.1 release. 2) ia64. Worked on making ia64 build on d02. The JSOC code is not ia64-ready. Made a small number of modifications so that SUMS built on ia64 (some third-party lib headers/libraries were missing - Keh-Cheng installed those, some compiler-code-compatibility issues needed to be resolved). 3) Postgres can be installed in different locations.
- Finalized Version 4.1 JSOC Release.
- Met with Tim H. again and worked together on drms_segment.c code to use his libraries. Refined the FITSRW APIs needed by drms_segment.c. I did more make modifications so that drms and modules build (we are now intimately tied to cfitsio.a).
- 02-28-2008
- drms_open_records() was returning ‘series not found’. This was due to DSDS records that were missing keyword and file data. A series got created (on the fly) from a record with a double keyword. Then later, DRMS was using a DSDS record to fill-in a DRMS record. However, that DSDS record’s keyword data was missing (represented as an empty string). So, the series was expecting a double keyword, but in fact DRMS was trying to set that double keyword with an empty string. And you can’t do that. The fix was to 1) resolve the situation where DSDS keyword types conflict across records by making the keyword type STRING; and 2) when opening DSDS data, do not use records that have no data file. When that happens, the keyword values are all empty strings.
- There was duplicate code in dsds.c (libdsds.so). A correction was in one copy of the code, but not the other. Factored out into functions and had calling functions use those new functions.
- Fixed the MANPATH issue for JSOC users. Modified .setJSOCenv and .setJSOCuser_env to set the MANPATH environment variable. We no longer rely upon MAN finding the manpath based upon $PATH entries (which is specific to linux).
- Put all of Carl’s level-0 HK tabular data files into our new CVS tree (TBL_JSOC). Some files went into CVS, some went into /surge (temporary dayflies), and some when into /home/production/lev0.
- Created new CVS modules for level 0 processing. ‘cvs co LEV0TBLS’ will put all these files in $CVSROOT/TBL_JSOC. This is largely for examining/changing files outside of production. ‘cvs co PROD_LEV0TBLS’ from /home/production/ will put these files into /home/production/lev0. This latter command is what production needs to do.
- Tracked down problems in the production build of LC jsoc (Jeneen saw that some MDI ingestion code hadn’t been running since December). Somebody changed a bunch of files, and then didn’t check them in. And they changed them in a way that made the build break. I tracked down which changes we wanted to keep, and which we wanted to discard. Re-built LC jsoc.
- Synchronized change to LC (lower-case) jsoc with changes to UC JSOC in preparation for a JSOC release that has SUMS/LEV0 code in it.
- Various smaller investigations/help for others – make issues, the MDI-code/endian issue.
- Met with Tim H. regarding next step of cfitsio integration. Decided to have Tim integrate his code into CVS, in $JSOCROOT/base/libs/fitsiowrap. I’ll work on the make files. Then we work together on modifying drms_segment.c to use his library. We won’t touch drms_fits.c. Goal is to be able to use his new library to read fits that currently reside in SUMS.
- Worked on tracking down problem in latest build. Was due to integration from LC jsoc to UC JSOC of a double-free bug in sum_open.c. Karen came up with a fix – waiting for Jim to bless it. The original bug is still in LC jsoc.
- Modified jsoc_sync.pl and jsoc_update.pl to use a file "modulespec.txt" that lists CVS modules to 'track'. If this optional file exists, then CVS will always operate only on those modules. Currently we have JSOC, DRMS, LEV0TBLS, PROD_LEV0TBLS, and EGSE. Created a script, /home/jsoc/checkoutJSOC.pl, that takes 'DRMS' or 'JSOC' as a parameter. This script will check out either the base for full set of code files, and then create the modulespec.txt file for you (which you can subsequently modify).
- Meet with Jim to finalize latest JSOC release. Need to convert all hard-coded paths to the newest locations.
- Meetings – JSOC, Lev0, Tim H., data export, Aloise.