Art's archive
10-09-2008
- Track down TAS-file write inefficiency issues. Use strace and debugging of libcfitsio.a. Problems included cfitsio file buffering limitations (too few buffers) and not saving vital keywords, like bitpix, in a structure. Also, there was a conflict between cfitsio buffering and stdio file buffering that Keh-Cheng discovered. He turned off stdio buffering in cfitsio, and then updated the cfitsio in ~jsoc with this unbuffered version. With the way that cfitsio does file buffering, it is relatively easily for all buffers to get used up. When that happens, cfitsio "thrashes" when it reads new data - it basically has to flush a buffer to make room for the new data. So, if the buffer to be flushed is dirty, it has to write it out, then it has to read data from disk into the just-flushed buffer. By "relatively easily", this is sufficient to cause all buffers to get used up every iteration:
for (1 to n) {
- fits_get_img_param(fileptr, ...); fits_write_img(fileptr, ...);
}
fits_get_img_param() wants to read a couple of keywords from the header. fits_write_img() wants to see and write data. On each iteration, the fits_get_img_param() call will cause a read miss - the data that you want aren't in a buffer because cfitsio has already flushed that buffer. So there must be a file seek followed by a read. The fits_write_img() call will use up all the buffers so that the next fits_get_img_param() call will be a miss again. BTW, fits_write_img() does some reading as well as writing, presumably for the same reason that it needs more buffers than are available. Unlike the fits_get_img_param() call, I haven't traced exactly when this happens - doing that would be a bit time consuming, and I think the issue is the same as with the fits_get_img_param() call. To verify my suspicions, I did the first call 2x in a row:
fits_get_img_param(fileptr, ...); fits_get_img_param(fileptr, ...);
The first time, there was a read miss (and hence disk read). But the second time, there wasn't. That is because with these calls, not all buffers get used up (all the data needed by this call lives in one record, and that is easily contained in a buffer, whereas with fits_write_img(), a large number of buffers needs to be used, even on my test which was 192 x 192 images).
- Improve efficiency when reading and writing TAS slices. First, the fits_write_subset() call can do a lot of file seeking and reading. Instead, and if the image to be written is contiguous in memory, use the fits_write_img() call. Second, cache the desirable parts of the fits file header. If you don't do this, and then you request something like bitpix from cfitsio, it might have to seek to the header part of the file and re-read in the bitpix part of the header. This could happen easily as cfitsio buffers file access, and the number of buffers is small. cfitsio doesn't save the header in a structure - it lives only in these ephemeral buffers. With release code on maelstrom, I'm now seeing less than 2 tenths of a millisec per TAS/fits slice write (for 192 x 192 float slices) when writing slices, and a total read overhead/inefficiency a little less than 1% (to write 54MB, cfitsio reads .5 MB from the output file). Keh-Cheng saw 0% overhead, but I'm not quite seeing that, but close.
- Manually downloaded LZP day 250 from MOC Product Server. They reused that day to hold sim #4 data, but my scripts had already downloaded the original day 250 files. In that case, my script won't re-download the same files. They did not change the version number of the files, so my scripts didn't re-download them.
Finished implementation of record-chunking. I had to remove a LIMIT statement from the query part of the cursor declaration. The cursor should be defined on the entire set of records, but the limit statement was preventing the entire set of records from being included in the response to the query. So, if the original query was 150K records, the limit was causing only 140K records to be present in the overall record set. Then when you iterate through all records, 10K records are missing (rs->n == 150K, but the cursor only knows about 140K).
- Met with Rock and Carl to discuss how to store information in a record of a series of Carl's so that he can find the source record in the future. I also went over how to write doxygen documentation for modules.
Help Jim use libsoi.so. For my own edification, you can simply add "-lsoi" to your link command line, and also do setenv LD_LIBRARY_PATH <path> and that will cause the linker to statically link to the dynamic library! Then you don't need to call code to lookup entry points and assign them to function pointers, etc.
10-02-2008
- Ticket #54 and #58 - If there are no non-TAS segments, then the slot directories are not created. This involves adding a flag to the various "newslots" functions, and in the case of socket-connect code, passing a new parameter via the socket connection. The flag is 1 if the slot dirs should be created. If there are no non-TAS segments, then the slot directories are not printed by drms_record_directory(). Otherwise, they are.
- Implementation of record-set-chunking. Currently, works only if iterating in the forward direction.
- Fix some bugs in record-chunking.
- Make extract_fds_statev use record-chunking in a couple of places to try it out.
09-25-2008
- Fix for control-c for direct-connect modules. The database connection was being broken by the signal thread without regard to what state the main thread was in. Often, the main thread could be trying to access the dbase, but the signal thread pulls the plug on the dbase in the middle of this. Now, the SIGUSR2 signal is sent fromthe signal thread to the main thread, and the latter goes quiescent before the signal thread disconnects from the dbase.
- Bug Fix: for ingest_lev0 creating a fits file with headers that doesn't have the bzero/bscale keywords set. Create a new drms_segment_writewithkeys() function that adds fits keywords to resulting fits file.
- Met with Tim and discussed next steps in global helio pipeline. Need to discuss cmd-line, env saving again. Tim needs record-chunking.
- Track down problems with extract_fds_statev(). 2 problems: committing one record at a time (need to combine records into a record-set that can be closed with drms_close_records()), and searching for one record at a time from dbase. Need to chunk records together.
- Bug Fix: .jsds that contained slotted keywords didn't alway result in a series that has an index. The index should be the corresponding index keyword, but if the .jsd explicitly described the index keyword, no index was created.
- Ticket #82 - There were several problems that I fixed: jsds with 'index' keywords were resulting in series with no db index, several of the jsd fields needed surrounding double quotes, don't print to jsd implicitly created keywords, don't allow explicit creation of index keywords, don't reject jsds that have a unit size of 0 but no segments, set the implicit keyword flag on index keywords, ensure index keywords are part of the set of db index keywords, don't include in the set of db index keywords slotted keywords.
09-18-2008
- Discuss the setting of SU retention time. It isn't quite doing what we want. I wrote up a 'truth table' with the results of discussions with others and my research on how the code works. I sent it out to Phil and Jim to review - this will eventually go to Jesper for review.
- Started writing up a job description for the PSQL expert.
- Ticket #53 - implemented the part about removing the ability to set retention time by specifying an environment variable (DRMS_RETENTION).
- Ticket #53 and #39 - Added functions to drms_series.c that allow the user to check whether they have permission to create a new record, delete an existing record, or update an existing record for a given series (drms_series_cancreaterecord(), drms_series_candeleterecord(), and drms_series_canupdaterecord()).
- Talked with Jennifer about dbase table permissions, and the drms_open_records() sql that gets evaluated.
- getorbitinfo: Fix bug in routine that finds the grid point greater than the target point; also handle the situation where the FDS data just doesn't contain enough data to satisfy the request for the target point.
cmdparams now stores the original cmd-line in 2 new CmdParams_t fields - argc and argv. These are completely analogous to the argc and argv parameters in the main() function. Access these via the cmdparams_get_argv(&cmdparams, &argv, &argc) call.
- getorbitinfo: For each interpolated time, add calculation of solar vr, vw, vn and dsun_obs. Also, return a list of info structures that contain hciX, hciY, hciZ, hciVX, hciVY, hciVZ, dsun_obs, vr, vw, and vn.
- getorbitinfo: In the demo program, getorbitinfo, iterate through returned info structure, and print out the contents. Also, fix a bug in the atan() function used - I needed to use the one that allows you to specify what quadrant you want based on the values of x and y.
- Ticket #80 - I worked with Jim on this. This was a DRMS problem, not in SUMS - I fixed it at some point, but Jim's show_info hadn't been updated with that fix yet. So, this is fixed.
09-11-2008
- Increase performance (decrease run time ) of jsoc_info, and fix some leaks. It now takes less than 1/200th the time it originally took. There was a bad use of realloc to resize a buffer - it was being resized continually, and by adding only a few bytes at a time. Also, there was an n^2 algorithm.
- Next step in lev0, getorbitinfo processing. Start making module that figures out what range of data to fetch from DRMS given the desired output times. Also, cache this information.
- Create several new trac tickets from notes in my spiral binder.
- Bug fix (trac ticket #80): It turns out there were a couple of problems. -P wasn't working when either the -A flag was set or the seg=XXX list was provided. Also, Jim is indicating that SUMS isn't always looking at the SUM request mode (which contains the RETRIEVE/NORETRIEVE flag). I've fixed the first problem (revision 1.23 of show_info.c), now reassigning it to Jim to look at the SUMS issue.
- Bug fix (Ticket #78) - Fix crash in db_bin_query(). Use PQgetlength() to determine length of data to copy from the psql result to the DRMS internal representation of a db query, not the fixed maximum value length.
09-04-2008
- Created a module, getorbitinfo, that fetches FDS orbit vector information from sdo.fds_orbit_vectors. For now it calls test code. This code compares FDS HCI values with HCI values that are derived from an earth ephemeris and the FDS GCI values. This is a verification of the FDS data provided by NASA (and it looks fine).
- JSOC version 4.6 release.
- Met with Rock to go over next steps in creation of getorbitinfo.
- Do some review and research what projects are left to be done, and what the status is of those projects.
- Reviewed all old active/open tickets, assigning all unassigned ones when appropriate.
08-28-2008
Fix bug in code that reads binfile and zipfile segment files. drms_segment_read() was assuming that the bin reading code set bzero and bscale. But the binfile doesn't have that information, so they were garbage values. The garbage values were then used to convert the data ==> garbage in --> garbage out.
- Added implicit bzero/bscale keyword generation to binfile and zipfile segment protocols. bzero and bscale are NOT stored in these files at all.
- Fix jsoc_update.pl. On the remote machine (eg., n00), it was not cd'ing to the JSOC tree before running make.
- Test Phil's make_vw_V using a combination of both ds_mdi.XXX JSOC dataseries, and prog: MDI dataseries as input. Works fine.
- Started working on next JSOC release - creating release notes.
- Implement drms_segment_readsclice() and drms_segment_writesclice() functions. Previously, there was no drms_segment_writesclice(), and the drms_segment_readsclice() function read the entire file into memory, and then trimmed out the undesired parts.
- Finished up TAS file implementation and testing.
- Several make file fixes in SUMS, DSDSMIGR, and lev0 - most of the problems were due to redundant obj file or exe file rules.
08-21-2008
- Modify show_info -j code (actually drms_jsd_print()) so that the regular record template is not used. The record template expands per-segment keywords into multiple keywords, with the per-segment flag set. Then if you try to create a series from the printed .jsd, each of these keywords in the set expands into multiple keywords (geometric growth). Instead prevent the expansion from happening if the template is going to be used to create a .jsd.
- Started porting soho_orbit.c to JSOC land. Spent some time trying to figure out what it does. The plan is to grab enough of it to use on our data in sdo.fds_orbit_vectors and see that the results we get using the old code and the new code jibe.
- Tried unsuccessfully to use Phil's ingest_dsds_a to ingest dsds vw_V data into a new test series that uses the TAS protocol. It looks like I'm missing PEQ.
- Got Phil's ingest_dsds_a to work on DSDS data, ingesting form a "prog:" specification.
- Implemented TAS for floats with no compression.
- Modified drms_open_records() so that it recognizes "dsds.XXX" and "ds_mdi.XXX" data series. These have a generic segment that points to a VDS directory. When drms_open_records() sees these data series, it calls drms_open_dsdsrecords() so that libdsds.so can handle them. libdsds.so calls peq to get the soi keylist which VDS knows how to use. And Jim has modified peq so that if it sees a non-prog spec, it will ask SUMS for the data (which is where the "dsds.XXX" and "ds_mdi.XXX" data reside). Unfortunately, in this case, peq doesn't work well - the key list it returns isn't useable by vds_open(). Instead, I had to extract the VDS directory (a SUMS directory) from the keylist, and then create a new keylist from this VDS directory. Then vds_open() was able to use this second keylist to access the data.
- Met with Jesper to discuss FDS orbit vectors and how to use them to provide interpolated vector information for level 0.5. Don't assume that the desired times for which we want orbit info are evenly spaced. Linear interpolation most likely won't work. Whenever FDS data are used by the getorbitinfo function, cache the range of FDS data retrieved from DRMS - this function will be called every second. I'll write the outer shell of the function, Jesper will write the math part. I will still write the code that tests HCI values provided in FDS - start with GCI provided in FDS, run Jesper's code that converts to HCI, then compare to the HCI provided in FDS data.
- Restarted ssh-agent and exportmanage.pl and recovered following the power outtage on Monday.
08-14-2008
- Worked mostly on FITS slices. Got the FITS implementation of TAS working, including bzero/bscale issues. Added a cache of open fitsfile pointers so that subsequent uses of a fits file within the same DRMS session do not have to call fopen(). The cache gets purged if too many fitsfile pointers are open. The file changes are discarded on module abort. Tested with simple cases thoroughly.
- Started working on fixing the "show_info -j" code. This code makes a .jsd from a series' template record structure. Reworked the per_segment and isdrmsprime fields of the DRMS_Keyword_t structure. per_segment was renamed to "kwflags", which is a bit field. The first bit is now the per_segment flag, two bits are for the isdrmsprime flag (which was expanded to not DRMS prime, DRMS-internal prime, and DRMS-external prime), and one bit indicates whether a keyword was implicitly created. When creating a jsd from the jsd template record, we don't want to include certain implicit keys (like index keywords). However, the "implicitness" of the keyword was not being saved before I added the kwflags structure.
08-07-2008
- Worked mostly on the FITS TAS format. Copied some of Tim's code into my cvs tree and got the DRMS wrapper around it working. But several issues regarding bzero/bscale arose. Further progress is pending on meeting to resolve issues. Had several discussions with Phil, Rick and Keh-Cheng.
- Met with Phil and Rick to resolve bzero/bscale issues, and to finalize FITS-slicing plan.
- Fix drms_stage_records(). It was possible to request greater than the maximum number of SUs from SUMS, which caused failure and a crash. Fix for drms_su_getdirs() requesting more than the maximum number of storage units that can be requested from SUMS. Request MAXSUMREQCNT chunks from SUMS, looping until all SUs are requested.
07-31-2008
- Met with Igor to discuss VSO access of SU database. The big issue is speed - according to Igor, our system isn't optimally designed for speedy queries. He is developing a plan to work around this limitation. It may involve a new database table, new database columns, or a new database. He will collect some empirical data on SU system responses before developing a complete plan.
- Fix broken build.
- Fix drms_segment_read() not working when the segment protocol is 'bin'.
- Give DRMS presentation to Time-Distance meeting.
- 1 day attending a wedding.
07-24-2008
- Fix jsoc_export failure. Problem was maelstrom rebooted, but the script that normally must run that repeatedly calls jsoc_export_manage wasn't restarted.
- Made a DRMS module that shows how to send signals to a specific thread in a DRMS module. This works despite being inside the DRMS multi-threaded environment. To build this, you ‘make threadsigs’, or ‘make examples’ (which will make all the example modules). In a nutshell, you need to create a new thread to handle the signal you want to send, ie SIGARLM. The signal gets sent to this thread. Then when this thread handles the signal, it sets a global var that the main thread can then see. You have to be careful and use a mutex whenever you handle this global variable, otherwise you could get a race condition.
- Add td_createalarm/td_destroyalarm, in libthreadutil.a, functions that allow the calling thread to receive ARLM signals. Also added a new example module, threadalrm, that uses these new functions.
- Bug fix: Fix for ingest_lev0 not building. add_small_image.c was including png.h instead of mypng.h.
07-17-2008
- Ensure that the global build macro, CDIR, doesn't have the /auto/homeX stuff in it.
- Bug fix: print out prime keys correctly in jsoc_export - was using the prime key templates instead of the real instances of the prime keys.
- Fix ingest_lev0 build, which was errorring out.
- Change drms_getkey_string() to use the format/unit provided in the keyword to format the time string output.
- Remove unnecessary prime keyword (FDS_DATA_PRODUCT) from sdo.fds, fdsIngest.pl, and extract_fds_statev.c. Make the potential values for the DATA_FORMAT field single chars to enhance efficiency.
- Make the idHELIO and idGEO keywords in sdo.fds_orbit_vectors more compact. Rearrange the order in which the contents are saved so that a user can simply run 'show_info -p' on these strings and have DRMS return the path to the original data files.
- Changes to the master MOC Product Server download scripts so that they download 'live' data (i.e., they user j0 to download data from the server). Now both 'live' and 'dev' data (data downloaded through maelstrom) are being downloaded daily.
- Start using ssh-agent as a means to provide pass-phraseless use of priv/pub keys when downloading files from the MOC Product Server.
- Put the path to the .ssh-agent configuration files in the MOC Product Server download configuration files so that the download finds the ssh-agent that provides the private keys. The cron job environment doesn't provide the necessary env variables so the ssh-agent server information has to be placed in a .ssh-agent file which then must be sourced by the download scripts directly.
- JSOC Version 4.5/NetDRMS Version 1.0 release. This took some time to track down all the issues. The good news is that much of the hacks needed to get remote users up and running have been obviated.
- Document storage-unit archive and retention concepts, both in the wiki and in Doxygen.
- Met with Rock to discuss the next step in the fds_get_orbit module.
07-10-2008
- Fix bug where show_info was trying to get the record-directory, even for DSDS and plainfile data sets (which have no record-directory). Add support to libdsds.so and libdrms to provide file path for DSDS and plainfile type record-set queries.
- Modified libdrms: upon export, convert TIME keywords to string keywords.
- Track down some leaks in drms_opendsds_records() and other locations.
- Fix bug in fitsrw: String keywords should have their values surrounded by single quotes." base/libs/fitsrw/cfitsio.c.
- During export of fits files, convert TIME keywords to string keywords.
- Fix bug where show_info was trying to get the record-directory, even for DSDS and plainfile data sets (which have no record-directory). Add support to libdsds.so and libdrms to provide file path for DSDS and plainfile type record-set queries.
- Add new DRMS API for ingest_lev0: drms_export_tofitsfile() - takes DRMS_Array_t, keyword, and compression parms. Add a module to test out the new drms_mapexport_tofitsfile() API.
07-03-2008
- Add function to libmisc.a that safely (or more safely) concatenates strings: base_strlcat().
- Add jsoc_export_as_is() make rule to proj/export/apps.
- Make the index.txt file produced by jsoc_export.c compatible with the one parsed by jsoc_export_make_index; Add code in jsoc_export_manage.c so that it calls jsoc_export.c for the case of exporting to fits file.
- Pass jsoc bin/script root to jsoc_export_manage as a cmd-line param, then pass this root via qsub to the qsub script. This allows the user to specify the JSOCROOT of the exes and scripts to be used during export (you may want to use bins other than ones rooted at /home/jsoc/cvs/JSOC). These get set in the environment that calls jsoc_export_manage.
- Add support for specifying the db user in jsoc_export_manage - this will be needed if running as user jsoc.
- Make the export-packing-list file (index.txt) have lower-case keywords.
06-26-2008
- Met with Karen to continue the brain dump.
- Another meeting with Karen, Keh-Cheng, Jim and Phil to discuss what happens after Karen is gone and to discuss the present state of our database projects moving forward.
- Met with Phil to finalize jsoc_export plans.
- Added a new case for 'fits' file export to jsoc_export_manage.c. This calls jsoc_export with the appropriate set of arguments. Spent a long time getting jsoc_export_manage to work - the dbase control arguments, like JSOC_DBNAME, DO NOT WORK WITH SOCK MODULES when they connect to an existing drms_server process. You have to run drms_server with JSOC_DBNAME set, then all sock modules use that setting.
- Implemented jsoc_export per meeting with Phil. Made it run so that it can accept a record-set query from either the cmd-line, or from a keyword in the jsoc.exports series (or any series). It also creates the "packing list" file now.
- Implemented a way to read generic text files that define constants. These text files are not compiled – they are read into memory during runtime. So, you don’t have to re-compile to change the definitions. I added just one such definition file for now: /proj/export/apps/data/export.defs. But the idea is that this would be used for the “Configuration” file concept (put d02 into a config file, not hard-coded in SUM.h). The call drms_defs_register(DEFS_MKPATH("/data/export.defs")) reads the file into memory (path relative to your .c file), then when you call drms_defs_getval("kPackListFileName"), for ex., you get the definition associated with an id string named kPackListFileName.
- Worked on make files so that ingest_lev0 can run as a sock module. But the current design of ingest_lev0 prohibits a conversion to a sock module. Because it uses drms_server_begin_transaction()/drms_server_end_transaction(), it must a direct-connect module.
06-19-2008
- Friday 6/13 PTO
- Bug fix: fix parsing of record-query segment list that was leading to an infinite loop; move the code that removes unneeded segments a little downstead - the removed segments were still needed.
- Test out Keh-Cheng's patched cfitsio - ensure DRMS works with it. It does work and it fixes the problem we were seeing earlier with reading compressed signed char, short, and int images.
- Test out new ds9 - seems to work.
- Bug fix in FITSRW: Was not properly reading the essential fits keywords (eg, NAXIS, BZERO, etc.) from FITS files. Keywords like BZERO could be int, but they could be float. The FITSRW code was assuming int, when in fact sometimes the value was a float. So I added several conversion functions (convert to int from any keyword type) and used those in the place where FITSRW was failing.
- Bug fix in FITSRW: The cfitsio_append_key() function was not casting string keywords properly. It was assuming that the string value passed in was a char *, but it was a char ** (a pointer to a string, not a pointer to a char).
- Pull out all top-level jsoc export code from drms_record.c and put into the jsoc_export module.
- Met with Karen to get a brain dump on postgres layout for DRMS, SUMS interface with DRMS, various DRMS threads, and socket-connect vs. direct-connect stuff.
- Met with lev0 gang to discuss next steps for getting to level 0.5.
06-12-2008
- Released version 4.4 of JSOC. No NetDRMS release was created. Tracked down a few issues before finalizing the release.
- Wrote up RFC for the Configure File. This file would contain all our currently hard-coded paths and defines. Code would then read from this file to obtain this information.
- Add code to verify that keyword format fields in the .jsd are compatible with the data type of the keyword. If an incompatibility is detected, a warning is printed, but the module will continue to run to completion.
- Fix a bug in zone_adjustment_inner() I added a couple of days ago. Forgot to check for a NULL int * before setting that int.
- DRMS now recognizes time strings with format 2008.05.12_TAI.
- Investigate several DRMS issues: the configure script making links to ALL .h files in your source tree (don't put extraneous stuff in there), buffer overrun in ingest_lev0 when reading a keyword, drms_run not working when doing drms_export stuff, investigate a build issue for Charles Baldner - he wasn't linking to lapack correctly when using icc; Carl was having a link problem because he was using gcc to try and link against /home/production/cvs/jsoc/lib/saved/linux_x86_64/libhmicomp_egse.a, but this library was built using icc so he was seeing unresolved dependencies.
- Fix a bug in the FDS/LZP download scripts. LZP was dumping in a location where the FDS ingest script was reading from. As a result, files not recognized by the FDS ingestion script were being rejected and causing failure.
- Fix 2 small memory leaks in drms_open_records().
- Track down a bug in ingest_lev0.c where it was not calling drms_free_array() after calling drms_segment_read().
- Thursday 6/12 PTO
06-05-2008
- Worked with Tim L. on TS_SLOT slotted keys. He was having problems getting the right slot given a time string query, but that all seemed to work once he got to my office. We tested it out and it seems to work.
- Modified extract_fds_statev (the module that ingests helio- and geo-centric FDS orbit vector files into sdo_dev.fds_orbit_vectors) to skip adding a new record if there is an old record with exactly the same information in it.
- Finalized FDS Moc Product Server file download scripts and modules, except that we should be using j0.stanford.edu as the machine that runs the scripts. I sent a request form to NASA last week, but haven't gotten approval yet. Was able to add production@maelstrom and jsoc@maelstrom to the authorized_keys list on the MOC product server (had to work around not being able to ssh to that machine). FDS files are now both downloaded and ingested into sdo_dev.fds automatically, and then the helio- and geo-centric orbit vector files are ingested into sdo_dev.fds_orbit_vectors automatically.
- Worked with Rick on creating a 'cookbook' of JSOC modules that are basically examples of increasing complexity. This cookbook is part of both the JSOC and NetDRMS releases. I worked on the make files.
- Investigated an error Jim was seeing while reading a fits file. He somehow got a fits file saved who's data type/dimensions don't match what the segment specifies.
Ported Phil's TS_SLOT implementation from his home dir to mine, and then into cvs after testing. Also investigate issue with how rounding of durations was done in Phil's implementation (durations are queries of the form <start time>/<duration>). The rounding wasn't quite working with odd <duration> values (values that aren't multiples of the slot width) - fixed that minor bug. Also, print a warning if a user uses an invalid <duration>.
If a time string is missing a time zone, or has an invalid time zone, DRMS now interprets the time zone to be the one in the keyword->into->unit field.
- Bug fix in record-set query parsing code - Phil found the problem, Karen found the problem code and fixed it. I just verified that Karen got it right. Way to go Karen! It isn't easy following the parsing function I wrote.
05-29-2008
- Did latest JSOC release (version 4.3).
- Tested FITSRW/cfitsio carefully (bzero/bscale too) after integrating Tim H's changes and my changes.
- Fixed a bug in the 4.3 JSOC release - libfitsrw.a was never added to libdrms.a (a 'meta' library that contains the code of all other client JSOC libraries). This library is for users who work outside of the DRMS system. Rebuilt 0.9 NetDRMS release.
- Tracked down a few user problems (lookdata.html, a problem Rick was having, my own problems, etc.) to a change to the SUMSERVER definition in SUM.h. This was changed after the Ver 4.3 JSOC release. Not only did this define change, but the server changed as well. This means all existing JSOC modules would fail unless they got the new SUM.h and rebuilt their binaries.
- Changed SUM.h in the 4.3 release to accommodate a post-release change in SUMS that caused jsoc modules to not find the SUMS server. Did 2 things: modified .setJSOCenv and .setJSOCuser_env to manually set the SUMSERVER env variable (the overrides the value defined in SUM.h), and I also modified SUM.h and re-did the 4.3 release and the NetDRMS 0.9 release.
- Help Tim L. get his "peak bagging" C module that uses Fortran heavily to build and run. I made numerous make file changes and also tracked down a lot of missing dependent files (from SOI).
- Memorial Day holiday.
05-22-2008
- Spent more time testing cfitsio. I tested all of the bzero/bscale plan developed by Phil, Rick, Karen, and I.
Fix a slotted keys bug found by Tim L. SetkeyInternal() was not using the correct slotted keyword value when mapping to the corresponding index keyword value. It was using the UNconverted slotted keyword value, but it should have been using the one that was the same type as the slotted key.
Fix a bug in jsoc_info. drms_sprintfval_format() was being used improperly - you can't provide key->info->format as the format parameter if the keyword is a TIME keyword. This became a problem after I modified code to use key->info->format/unit as the sprint_time unit/time zone.
- Fix two bugs I found in drms_array.c. All the functions that convert from a float data type value to an integer value did not do rounding and range check correctly. So, now we first round (using the Linux round() function - round away from zero if the value is 1/2 way to the next integer or greater, otherwise round toward zero); then we check range, and if the range falls outside a valid integer range, the destination value gets set to missing.
- Fixed the following bugs in FITSRW: 1. was not handling CHAR type correctly; it was not converting between signed char (from DRMS) to unsigned char (from fits file) and back correctly. 2. Was not reading SIMPLE and EXTEND keywords properly in FITS file. 3. was not differentiating between image type and data type when writing images - this was a problem for the CHAR type.
- Helped Tim Larson get peak bagging code ported to DRMS. This involved fixing the Rules.mk files and finding all the dependencies (there were a lot of .f, .c files, and libraries).
- Worked with Tim H. to get his FITSRW compression code into DRMS. Made changes to his files so that it built in DRMS. It wasn't handling blank values properly. Spent a long time trying to figure out what the problem was. Eventually got Keh-Cheng to help see that the problem was a bug in cfitsio.
- Developed work-around to get FITSRW compression working (had to work around 2 cfitsio bugs).
05-15-2008
- Finish extract_fdv_statev. Re-wrote sdo.fds_orbit_vectors with the correct slotting parameters (epoch and step).
- Track down and remove an 'order by' statement from the select statement that selects the keyword information to be placed into the template record. Did this because re-ordering means that DRMS is out of sync with psql, and so that the order in which the .jsd specifies keywords matches what appears in the keyword HContainer_t.
- Met with Todd to briefly discuss magenetic pipeline stuff - how SOI works (and how is Hao maintaining it). Met with Yang to track down why old remapped images are on the website. I think that a script did not get updated with new v2helio parameters.
- Tim L. found a bug when ingesting his test data (with ingest_dsds_a). I investigated and found, by using psql, that anything that was type DRMS ‘long long’ was not ingested correctly. The problem was in libdsds.so. It assumed that SDS_LONGs passed from SDS were 64-bit types. But that is only true on 64-bit machines. On 32-bit machines, like n00 which is what Tim is using, the SDS_LONGs are 32-bit. (so basically, SDS_LONG means ‘long’, not ‘long long’). I made the fix on Tim’s machine and he checked it in. To use it, you just need ‘make dsds’. The problem should only exist if you ingested data on a 32-bit machine, and your input fits files have SDS_LONGs in them (yes, SDS converts small values like -1 to long instead of int).
- Talked with Jennifer about slotted times - what they are used for and some of the details (like slot straddles epoch).
- Rebuilt /home/jsoc binaries for hmidb to hmidb2 change (and back again).
- To test fits file reading/writing, I modified arithtool. I added support for in-memory data types other than double; added new bzero/bscale parameters.
- Implemented final bzero/bscale plan. Most of the changes were in drms_segment_write(). Removed drms_segment_setscaling() and drms_segment_getscaling() which where actually just get/set bzero/bscale keywords. We will need two new API function with similar names that mean 'read/modify the DRMS_Array_t parameter'.
- Spent quite a while tracking down a bug in FITSRW. It wasn't working on CHAR data types. The problem was that DRMS was passing signed char data, but FITSRW was expecting unsigned char data. I changed the cfitsio image type to SBYTE_IMG and the data type to TSBYTE to accommodate signed data. cfitsio does not natively support signed BYTES - it just adds 128 to data and adjusts bzero. So I had to set a bzero value that compensated for this.
05-08-2008
- Record-chunking. Wrote just the C-wrapper around the actual SQL FETCH statement that downloads the next chunk (so currently, drms_recordset_fetchnext() actually doesn't get called - the entire record-set is downloaded, for now).
- Work on drms module to extract helio- and geo-centric orbit vectors from FDS data products. I had already written the code to do helio-centric orbits - so I added the geo-centric orbits.
- Worked with Carl on the hk_dayfile .jsd. Wrote up the keyword .jsd descriptions for the DATE keyword. Made the _SDO_to_DRMS_time() function static inline, and changed the epoch call - use the #define that is a number, not the #define that is a function call.
- Met with Carl, Jennifer, Rock for more lev 0 generation discussions about how to obtain and ingest FDS and housekeeping files. Helped Carl with the jsd descriptions of his series' time-slotting keywords.
- Spent a fair amount of time re-writing extract_fdv_statv (a module). Needed to think about how to handle ingesting orbit files from sdo.fds (which contains disparate data - data with differing cadences, formats, etc.). It was more complicated than you'd think. Basically, as soon as orbit files are downloaded, they get ingested into sdo.fds_orbit_vectors. Because this series is a combination of two different FDS products, I used a temporary table to match up helio data from one file with geo data from another file, and create an output record in sdo.fds_orbit_vectors.
- Found and investigated a couple of bugs while doing extract_fdv_statv (transient records were not working for direct-connect modules - Karen fixed.)
- Sick with flu on Monday.
05-01-2008
- Fixed a bug in drms_insert_series(). At some point, we switched the meaning of the format and unit fields of TIME keywords. This meant that calls to sprint_time() sometimes had to be adjusted. I missed one of those calls. This change was fixing one of those calls - before my change, the 'format' field was being used, when it should have been the 'unit' field.
- Wrote up detailed documentation for download of MOC Product Server files. Created a place on the wiki for the MOC Product Server.
- Helped Carl get the SDO_to_DRMS_time() function put into a library to be shared by several users of the function. Before this, the same function was defined many times. Carl will then switch out the current references to the duplicate functions to use the library version.
- Performance review with Phil. Finalized Phil's review form.
- Met with Rock, Carl, Jennifer to discuss 0.3 generation from FDS data, house-keeping data.
- Started porting helio2mlat from SOI to DRMS. Got about 1/2 way through.
04-24-2008
- Vacation most of Thursday/Friday.
- Resolve code merge issues with one of the SUMS Rules.mk.
- Work on FDS/LZP download scripts. Added logging and the mailing of bad error messages to several people (Rock, Jennifer, Art). Clean up - remove unused static LZP file specification (instead the specification file is generated dynamically and depends on the current date). Fix cronjob table so that the path to the executables/scripts used by the download scripts can be found.
- Work With Tim Larson. Finalize initial port of v2helio to DRMS (o2helio). Fixed some crashes involving code that manages memory (ensures that leaks don't happen). Help track down problem where o2helio's apodize() function was producing output that differed at the 1e-5 level from v2helio's apodize() function. The problem was in a statement like: float f = (float)(d); float f2 = d; f != f2 (where d is a double). The compiler, upon seeing the float cast, may change the way it holds intermediate products (intermediate may be floats, not doubles).
- Implement SLOT slotted keywords. I had implemented TS_EQ a while ago. I made the additions/changes to support SLOT.
- Filled out performance review documents.
04-17-2008
- Added a TSEQ_EPOCH flag/string to DRMS so that you can specify TSEQ_EPOCH for a time variable's value and drms_parser.c will understand this a replace it with the correct num secs since the DRMS epoch.
- Worked with Rick to get SUMS working on JILA's linux machine.
- Investigated how to use CVS to lock files. Sent results via email to the Thursday group. Went on record opposing using file locks.
- Met with Rick and Jim and D. Haber to help her get SUMS running at JILA. Most changes had to do with modifying paths hard-coded into SUMS code.
- Modified the code that populates keyword info - if the keyword is a TIME keyword, then do some checks to see how the format and unit fields are being used. People have been using these improperly and inconsistently. So, format and/or unit may be changed under the hood.
- Added a "version" field to the .jsd. This allows us to add new fields to the .jsd without having to retrofit ALL existing series. This version gets saved in the *.drms_series table. That will allow us to know what version of .jsd the series was created with. The first use of this versioning is to add a "cparms" field to the segment specification (see below).
- Added a "cparms" field to the segment part of the .jsd. This is a string that specifies what type of compression the segment will use. It is a string passed directly to cfitsio. It gets saved as a segment-specific keyword - this allows us to specify a different cparms for each data file in a series.
- Spent a long time debugging a problem when I added a "version" column to the *.drms_series tables. Not sure what the problem is, but a plpsql script is failing, presumably because of a return type mismatch between various *.drms_series tables. Karen is helping me now track down the problem.
- Discussed with Phil a proposal to chunk recordset queries. Wrote up the proposal and sent it out to jsoc_dev.
04-10-2008
- Add ringfit_ssw.f in the proj/examples/apps directory - this is a Fortran module that calls D. Haber's ringanalysis Fortran function.
- Did the JSOC Version 4.2 release.
- Worked with Rick on the changes necessary for building JSOC/DRMS code on Mac. This included incorporating Joe Hourcle's changes. Got DRMS built, but SUMS is a problem. Stopped working on this pending a decision about how important this mac port is.
- Cleaned up move of lookdata into CVS: removed jsoc_support.js from cvs (this is no longer used), added prototype*.js to cvs (and link from /web/jsoc/htdocs/ajax to ~jsoc's directory containing this file), removed jsoc_info.csh and show_series.csh from cvs since they were for testing only.
- Remove hard-coding of compiler choice for sums apps and libs. Now, the compiler chosen in make_basic.mk (gcc or icc, default is icc) will be used to make these SUMS binaries.
- Meeting with Karen and Tim H. to discuss plan for incorporating fits-compression specification into .jsd files so that .jsd writers can specify what type of compression to use.
- Updated my jsoc_export module (JSOC/proj/export/apps/jsoc_exports.c) to work in the jsoc database (it was working on the jsoc_test database previously). I noticed that the RequestID is an int in db jsoc, and I was assuming it was a string in db jsoc_test (we decided in a meeting that it would be a string). But I changed lib drms to assume a long long.
Added the ability to append a segment list to the record set query: ds=<recset>{seg1,seg2,seg3…}. drms_open_records() recognizes this syntax – the rec->segments container contains only segs that you request (this doesn’t affect the template segment though).
04-03-2008
- Updated the script that the maelstrom cron job calls to ingest the FDS data into the series sdo.moc_fds. So, the cron job calls mocDlFds.csh, which in turns calls dlMOCDataFiles.pl to download the files to /surge/sdo/mocprods. Then mocDlFds.csh calls fdsIngest.pl to ingest the files into sdo.moc_fds. As files are successfully ingested (the ingest script compares the source file and the ingested file), they are deleted from /surge.
- Move the 0th slot so that its CENTER corresponds to the epoch.
- Meet with Rick, Debra, Paul to talk about how to get netDRMS SUMS localizations into next JSOC release.
- Incorporate Joe Hourcle's mac changes into our CVS tree.
- Sigh. Fix jsoc_sync.pl yet again to work around lame CVS. I need to vent - I can't expess how difficult CVS is. There, much better. Call cvs update followed by cvs checkout to get the desired effect (add/remove/update all the files that are in the user's working directory, followed by checking out NEW files within the module - the NEW files were added by a CVS user).
- Updated configure script so that the check for 3rd-party libs is now $JSOC_MACHINE-dependent.
- Check into CVS the jsoc_info app and supporting web apps (lookdata.html, jsoc_support.js, etc.). Updated the files/directories on /web that contained these files to point to these files in their new CVS locations. Updated the CVS tree rooted at ~jsoc/cvs/JSOC to use these new files.
- Move time mapping (from date strings or enum vals to doubles) to drms_types.c since TIME is one of the drms types. There are only a couple of time functions so keep them merged in drms_types.c, not a separate new file.
- Mess up Phil's CVS working directory by making a lot of his files owned by me. Then attempt to fix the problem, but call Brian to have him chown the files back to Phil.
- Move the definitions of various epoch (MDI_EPOCH, SDO_EPOCH, etc.) to timeio.h. Also, get rid of JSOC_EPOCH. Code that wants to use the MDI_EPOCH will need to access series that have been created with MDI_EPOCH as the epoch.
03-27-2008
- Checked in initial implementation of drms export.
- Update drms_sscanf() to accept the string "DRMS_MISSING_VALUE". Now, when the jsd parser, for example, sees this string, the data-type-specific missing value will be set.
- Change the implementation of the FITS and FITZ data segment protocols to use the cfitsio library wrapper FITSRW. The old implementations now exist in new protocols, DRMS_FITSDEPCRECATED and DRMS_FITZDEPCRECATED.
- Fix the drms_protocol stuff - adding protocols was a confusing experience. There were two enums that had the same items in them, but in different orders. The conversion from string to protocol was inefficient, etc. Did this in preparation for the other work on deprecating the old protocols.
- Fix the drms_parser code that would not accept an empty string.
- Work on using the new FITS protocol to write out float data into an integer data segment. Right now, there is a crash.
- Found a problem with FITSRW. It thinks that C type int is the same thing as fitsio's TLONG. But TLONG is of type long, which is 32 bits on a 32-bit machine, and 64-bits on a 64-bit machine. Corrected that problem. Added support for fitsio type TINT - which meshes with DRMS_TYPE_INT.
- Attended the team meeting in Napa on Wednesday.
- 03-20-2008
- drms_names.c was not properly handling a record query that contained a prime key value without specifying the prime key name itself (eg, su_production.tlm_test[VC05_2008_030_16_42_56_200872a1918_1c298_00]). The problem was that an acceptable string value was limited to 32 chars, but in fact any length string (up to the query limit) should have been allowed.
- Created su_arta and jsoc namespaces on the jsoc_test database. I need these to test out the new FITSIO protocol, and to test out drms export.
- Made some changes to FITSRW so that it will work with drms export. Although it used a keyword container nominally called a 'list', it was not a linked list but an array. I added rudimentary support for linked lists (creating, inserting, freeing). I need this because in general we don't know ahead of time how many keywords will be in the list. You iterate through keywords, and if they are suitable, then you add an item to the FITSRW keyword list. This new implementation is used for drms export. I did not change the existing uses of the keyword array.
- Fixed the jsoc_update.pl script - it wasn't running the configure script due to a typo involving a missing semicolon.
- Briefly investigated the make system to see if binaries that need to be built are building. Phil was thinking that sometimes things don't build that need to build, but in fact what I saw was that things that don't need rebuilding DO rebuild. I have not figured out why this is the case, but this is a minor issue - we inefficiently rebuild when not necessary.
- Met after last Thursday's jsoc meeting to discuss some more drms export issues. We chopped up some of the work and divided it up amongst people. The drms export specification is starting to take some real form.
- Investigated the ability to be able to specify the filename for a generic data segment that gets saved in SUMS. Right now, it is the source file base name. I suggested always using the segment name in an attempt to reduce complexity, but it looks like this won't happen. And it cannot happen since a lot of what we've already ingested doesn't use the segment name. Will hand this off to Tim, but I'm not sure what the resolution is yet.
- Discussed scaling issues during drms export. The resolution is that we will NOT scale the data upon export - the data bits in SUMS will be the data bits exported. Of course, the BITPIX, BSCALE, etc. drms keywords will be put in the exported FITS header.
- Met with Carl to make his lev0 packet_time housekeeping series slotted. Then troubleshooted problems.
- Spent a few days working on drms export. Got it completely implemented. Now I'm creating a module to test out the DRMS calls that export the data. I'm testing with the jsoc_test database.
- 03-13-2008
- Finish adding code to drms_segment.c to read/write FITS files using fitsrw (cfitsio wrapper). Code checked-in, but is only accessible via new FITSIO segment protocol.
- Met with Tim to discuss changes needed in fitsrw to support my code in drms_segment.c. Then met one more time to integrate his changes into CVS.
- 'Fix' some build problems. Actually, just keep the building - some changes needed to be changed.
- Overhaul of drms_ismissing() based on email thread. Broke up this function into several type-specific ones: drms_ismissing_char(), etc. Use isnan() for drms_ismissing_float/double(). drms_ismissing_time() checks for isnan() and JD_0. These functions are all static inline functions so that type-checking is performed.
- Modified slot-keys. If a slotted-key duration falls on a slot boundary, then do not include the next-higher slot as part of the duration. Also, if a time falls very close to the upper slot boundary, move it into the upper slot (the rationale is that imprecision in float could cause what should have been on the boundary to fall below it).
- Reviewed TAS/array slicing.
- Meeting to discuss drms_export(). I'm going to do the export of drms records to fits files. This will involve using Tim's fitsrw.
- Make-file changes: set the various -L and -l flags so that icc and gcc can build and find cfitsio library. Make icc code link to icc-specific libraries, but gcc code not link to them.
- TAS/slicing meeting. We discussed what needs to be done to use CFITSIO to do the slicing. Tim will figure out how to use CFITSIO to write/compress blocks (tiles). I will figure out how to slip this into the existing TAS framework. Probably just use the existing TAS, but at lowest level call into FITSRW (wrapper around CFITSIO). TAS currently deals with writing partial slices by concatenating until full, then writing at the end of the TAS file. This is tricky - causes defragmentation, problems tracking partial blocks, etc. We will abandon this for now and just have the user write FITS blocks directly. We will revisit partial writes later.
- Made some minor changes to FITSRW so that it works with drms_segment(). Debugged FITSIO protocol - the drms_segment_write() is working, but drms_segment_read() is resulting in image data that is all NaN.
- 03-06-2008
- Add all previous revisions of hk_config_file files AND all previous revisions of hk_jsd_file files.
- Another fix to libdsds.so. If the data type of fits file being ingested in not double, then make the type float. This is the conversion that VDS/SDS is going to apply.
- Add fdsIngest.pl to the cron job that automatically downloads FDS files from the MOC Product Server. This script ingests all FDS products of interest into a single DRMS series, sdo.moc_fds (this is in progress).
- Met with Carl to review plan for migrating lev0 scripts and code from EGSE and jsoc trees to JSOC tree (tables already migrated). We will do the next JSOC release without Carl's lev0 scripts and code. He will do the migration on his own after the next release.
- Sat down with Tim H. and integrated his cfitsio work into our CVS system. I did all the make file work to make this happen. His library now builds when make is run. Had further discussions of the next steps to take. Tim will start working on files in CVS; when that is done, we will work together on drms_segment.c to call into his library.
- Worked with Jim on JSOC release. There were SUMS problems in UC JSOC - writing past the edge of a buffer, double-freeing, etc. By end of day Friday 2/29 Jim had resolved those.
- Spent all day Monday working on release. Jim, Karen, and Carl all had more changes they wanted in the build - I synched those to my machine. Created a script, base/util/scripts/extcvscomm.pl, to provide all commit comments between a specified date and the current date. I used the output of that script and solicited comments to include in the release notes. Tested the latest build with various simple modules/commands.
- Updated the documentation to discuss how to access level-0 tables.
- Dealt with several release issues: 1) current drms_sscanf() didn't work on some of Rick series, he changed which broke other DRMS features. The two were not compatible. Met with Rick and resolved the issues and put the fix into the JSOC V 4.1 release. 2) ia64. Worked on making ia64 build on d02. The JSOC code is not ia64-ready. Made a small number of modifications so that SUMS built on ia64 (some third-party lib headers/libraries were missing - Keh-Cheng installed those, some compiler-code-compatibility issues needed to be resolved). 3) Postgres can be installed in different locations.
- Finalized Version 4.1 JSOC Release.
- Met with Tim H. again and worked together on drms_segment.c code to use his libraries. Refined the FITSRW APIs needed by drms_segment.c. I did more make modifications so that drms and modules build (we are now intimately tied to cfitsio.a).
- 02-28-2008
- drms_open_records() was returning ‘series not found’. This was due to DSDS records that were missing keyword and file data. A series got created (on the fly) from a record with a double keyword. Then later, DRMS was using a DSDS record to fill-in a DRMS record. However, that DSDS record’s keyword data was missing (represented as an empty string). So, the series was expecting a double keyword, but in fact DRMS was trying to set that double keyword with an empty string. And you can’t do that. The fix was to 1) resolve the situation where DSDS keyword types conflict across records by making the keyword type STRING; and 2) when opening DSDS data, do not use records that have no data file. When that happens, the keyword values are all empty strings.
- There was duplicate code in dsds.c (libdsds.so). A correction was in one copy of the code, but not the other. Factored out into functions and had calling functions use those new functions.
- Fixed the MANPATH issue for JSOC users. Modified .setJSOCenv and .setJSOCuser_env to set the MANPATH environment variable. We no longer rely upon MAN finding the manpath based upon $PATH entries (which is specific to linux).
- Put all of Carl’s level-0 HK tabular data files into our new CVS tree (TBL_JSOC). Some files went into CVS, some went into /surge (temporary dayflies), and some when into /home/production/lev0.
- Created new CVS modules for level 0 processing. ‘cvs co LEV0TBLS’ will put all these files in $CVSROOT/TBL_JSOC. This is largely for examining/changing files outside of production. ‘cvs co PROD_LEV0TBLS’ from /home/production/ will put these files into /home/production/lev0. This latter command is what production needs to do.
- Tracked down problems in the production build of LC jsoc (Jeneen saw that some MDI ingestion code hadn’t been running since December). Somebody changed a bunch of files, and then didn’t check them in. And they changed them in a way that made the build break. I tracked down which changes we wanted to keep, and which we wanted to discard. Re-built LC jsoc.
- Synchronized change to LC (lower-case) jsoc with changes to UC JSOC in preparation for a JSOC release that has SUMS/LEV0 code in it.
- Various smaller investigations/help for others – make issues, the MDI-code/endian issue.
- Met with Tim H. regarding next step of cfitsio integration. Decided to have Tim integrate his code into CVS, in $JSOCROOT/base/libs/fitsiowrap. I’ll work on the make files. Then we work together on modifying drms_segment.c to use his library. We won’t touch drms_fits.c. Goal is to be able to use his new library to read fits that currently reside in SUMS.
- Worked on tracking down problem in latest build. Was due to integration from LC jsoc to UC JSOC of a double-free bug in sum_open.c. Karen came up with a fix – waiting for Jim to bless it. The original bug is still in LC jsoc.
- Modified jsoc_sync.pl and jsoc_update.pl to use a file "modulespec.txt" that lists CVS modules to 'track'. If this optional file exists, then CVS will always operate only on those modules. Currently we have JSOC, DRMS, LEV0TBLS, PROD_LEV0TBLS, and EGSE. Created a script, /home/jsoc/checkoutJSOC.pl, that takes 'DRMS' or 'JSOC' as a parameter. This script will check out either the base for full set of code files, and then create the modulespec.txt file for you (which you can subsequently modify).
- Meet with Jim to finalize latest JSOC release. Need to convert all hard-coded paths to the newest locations.
- Meetings – JSOC, Lev0, Tim H., data export, Aloise.
