Phil's archive

Anchor(mostrecent)


13 March 2008

This week worked on spec for export cgi-bin basic set and had discussion with Rick, Art, and Karen about overall plan for initial export process. Rick has made series jsoc.exports and I am summarizing the plan.

Discussed with Karen how to extract the second half of the query from dataseries query from the drms_open_records code. Now there are tow parts, the lookup of prime keywords and converting to a normalized form, second part is writing the query to implement the prime key versioning logic. the second part takes an argument telling the form of query desired: full recordset, count only, selected keywords only. Additional code is then needed to process the query return values as exists in open_records. I made a version to count records - will be checked in soon. Still needed is code to pack keyword values as vectors into DRMS_Array_t objects.

Made program to read recordset and make new record in output series that contains mean, stddev, and count of values per "pixel" of input series. My first use of read and write through segments! in my proj/myproj/apps/pt_average.c

Worked with Jim, Carl, Rock to get "value added" code working for lev0 HMI processing.

I am including notes on export here until get a better place (soon I hope)

Plan for export of data.
Exports will be managed in the series jsoc.exports
Each request will be given a RequestID which can be used to track status.
Data to be exported will be requested via several methods.  The data
requested will be collected (e.g. tar) and placed in the data segment
in jsoc.exports.  It will be exported directly from the SUMS directory.

Series: jsoc.exports will have no archive and retention time of 7 days.
Thus the Requestor will have one week to fetch the data.  The record
will be retained for a history log of exports.

P% show_info jsoc.exports -l
Prime Keys are:
        RequestID
DB Index Keys are:
        RequestID
All Keywords for series jsoc.exports:
        DataSet         (string)        Dataset requested
        ExpTime         (time)  Time of export
        FilenameFmt     (string)        File basname format
        Format          (int)   Data format code
        Notify          (string)        Notification address
        ReqTime         (time)  Time of request
        RequestID       (int)   Export request identifier
        Requestor       (string)        Name of requestor
        Size            (int)   Volume of data requested
        Status          (int)   Status of request
Segments for series jsoc.exports:
        Data                 NA generic VAR     Exported data
 
One of the access tools to initiate an export will be a set of web
cgi-bin modules using get and post methods.  These will allow simple
browser access via direct html, javascript, or shell access via wget.
Later we may use higher level tools such as GWT to build more capable
interfaces.

These web modules will communicate using the JSON protocols (the functionality of
a stripped down XML) and later XML if needed.  These are not expected to
be directly usable browser tools, they are components to be used by
web pages to be provided separately.

To support exports several related functions are needed:

 * There will be a function to write a fits file from a record a jsoc.exports segment.

 * There will be a way to generate usable filenames that identify the data by
   series, record, and segment.


Draft suggestion for jsoc_get cgi-bin program.

These could be operations in a single program with an "opcode" first parameter as shown 
here or perhaps as several separate programs with a single purpose for each.
Browser manages getting a record set that is desired, or some other means gets a list of SUs.
The modules here can be used to get a record set.

User (browser, wget, etc) does an export request which returns online|nearline|offline and size info.
If the size exceeds some threshold no staging is initiated - a fact included in the status returned.
request includes preferred export method.  If the data is online and the method is http then
the data is immediately returned.

Some of the functions are simple information requests and do not need to specify the requestor
or keep track of responses. Other functions will need to manage a RequestID to track progress.
Those functions will expect requestor info such as contact info.




Synopsis:

jsoc_get op=<command> {other arguments as specified below in Expects list}


Description:

Each of these commands can be executed locally, via a browser presumably from a Javascript
 program, or via "wget" or similar program.


 exp_kinds - get list of export methods with rules and limits of use.
   Usage:  use this call to get list and restrictions for export methods and protocols.  Use to
           inform user of choices that will be needed later.  Could be expanded to
           be part of login handshake with Requestor to establish preferred method
           and verify limits for that user.  Implement when better understood.
           Methods might be: ftp(for push), http(immediate), email(delayed), tape(mail), url(for pickup)
           Protocols might be: FITS_tar, FITS_zip, jpg, mov, etc.



 series_list - get list of series matching specified filter, like show_series.
   Usage:  use this call to get a list of target series for further examination
   Method:  GET
   Expects:
     *  a ds parameter containing a series filter.
   Returns:
     *  status - returns 0 if OK, 1 if series not found.
        if status is 1 returns element "error" containing error message
     *  n - count of the seriesnames matching the query
     *  names- an array of series information containing:
        *  name - series name
        *  primekeys - an array containing list of prime key names
        *  note - descriptive text for the series
  NOTE: this is fully implemented as a call to show_series.  Simply make a sym link in
        cgi-bin to the proper show_series binary.  Test from command line by adding "-z" flag.
        from browser use cgi-bin/show_series?ds=<filter> (may omit the ? and arg for all series)
        wget may be used, e.g.:
           wget -O list.json http://jsoc.stanford.edu/cgi-bin/ajax/show_series



 series_struct - info for "seriesname" gives list of keywords as show_info -l and show_info -s combined
   Usage:  use this call to get structural contents of a series with summary of data coverage.  This
           info can provide info needed to formulate a request for contents of key values or data arrays.
   Method:  GET
   Expects: 
     *  ds param with seriesname
   Returns:
     *  status - returns 0 if OK, 1 if series not found.
        if status is non-zero returns element "error" containing error message
     *  primekeys - an array containing list of prime key names
     *  dbindex - an array containing list of DBindex key names
     *  keywords - an array of keyword info containing:
        *  name - keyword name
        *  type - keyword type, e.g. int, string
        *  note - descriptive text for the keyword
     *  segments - an array of segment info containing:
        *  name - segment name
        *  units - data units
        *  protocol - storage protocol for segment file
        *  dims - segment array dimensions
        *  note - descriptive text for segment
     *  links - an array of link info containing
        *  name - segment name
        *  target - series name for target of the link
        *  kind - "static" or "dynamic"
        *  note - descriptive text for the link
     *  interval - a struct containing first and last info
        *  FirstRecord - contains query that will match the first record based on the first prime key
        *  FirstRecnum - contains the recnum of the FirstRecord
        *  LastRecord - contains query that matches the final record based on the first primekey.
        *  LastRecord - contains the recnum for LastRecord
        *  MaxRecnum - contains the highest recnum in the series.
   NOTE:  This operation should be fast, the user can expect a prompt reply from the server.
   NOTE: this function is now implemented as jsoc_info, example:
      jsoc_info ds=mdi.vw_V_lev18 op=series_struc

 rs_summary - get recordset summary info for "record_query", uset to refine query.
   Usage:  use this call to probe the expected return for a given query.  Can be used to
           estimate appropriatness of the query for the job at hand.  With extensions can
           be used to probe conpleteness, etc.
           NOTE: this call may be slow if the series is large.  The user should be patient.
   Method:  GET
   Expects:
     *  ds containing simple record_query (i.e. only one series spec).
   Returns:
        server gives count of records and some other info, online, size, etc.
        some coverage statistics based on completeness within recordset.  Maybe
        a bar plot of coverage in some bins.  Start with just count of records.
     *  status - returns 0 if OK, 1 if series not found.
        if status is non-zero returns element "error" containing error message
     *  count - number of records matching query.



 rs_list - get recordset list expanded with selected keyword and segment values. 
           Basically like show_info with key= and seg= and if seg= then -P args.
   Usage:  use this call to get detailed information from DRMS, record names, keyword values,
           full paths to online data, etc.  Can be final query for some tasks where keyword
           values are sufficient.  Can provide a list of records that can be further
           sub-selected based on keyword values, etc.
   Method:  GET
   Expects:
     * ds - containing recordset query
     * key - keywords list optional
     * seg - optional segment name list
   Returns:
     * status - 0=OK, 1=query failed.
        if status is non-zero returns element "error" containing error message
     * count - number of records returning values
     * keywords - array of keyword names for which info is returned
     * segments - array of segment names for which info is returned
     * recinfo - object of record info containing arrays of length <count>:
        * record - array of names as in the # line of show_info -k
        * online - array of online status, 0=online, 1=nearline, 2=offline
        * <keyname> - array of <count> values for keyword <keyname>
        ...
        * <segname> - array of <count> filenames or paths for segment <segname>
        ...



 rs_image - get raw data or thumbnails for recordset for selected segments
          This call generates a request in jsoc.exports only if needed to get space to
          make images.  We need to develop a mechanism to avoid lost of duplicate image making.
   Usage: - can be used along with rs_list to get selection info to better define
          desired data.  Can also be used to get direct URL of online generic data.
   Method:  POST
   Expects: (as JSON)
     * ds - recordset spec for single series
     * seg - list of segments for which thumbnails are requested.  Can be omitted for all.
     * protocol - file as is: nop; single images: gif, jpg, png; or movie: mov, etc.
   Returns:
     * status - 0=OK, 1=failed.
        if status is non-zero returns element "error" containing error message
     * count - number of records returning images.
     * segments - array of segment names for which images are available
     * recinfo - array of record info containing:
        * record - name of record as query
        * <segname> - array of <count> URLs pointing to file or thumbnails images or URL of movie.
        ...



 exp_request - request export of recordset with data.
      server examines request 
         if immediately available and method is http
             then send status and data 
         else if small enough and is online but needs export processing then initiate processing
             and return status and DataCartID.
         else estimate processing and/or size info and return status along with options and DataCartID.
      Note the query can be concatenation of several record sets.  I.e. user can make datacart.
   Usage:  Primary data request tool when data arrays desired.
   Method:  POST
   Expects: (as JSON)
     * ds - contains recordset query
     * requestor - ID of user, can be random or known Requestor.
     * notify - email address of user
     * method - name of export method, e.g. ftp, http, email, tape, url
   Returns:
     * status - 0=OK immediate data available or delayed request in queue,
                1=processing,
                2=large request needs manual confirm,
                3=bad recordset. 
        if status is 3 returns element "error" containing error message
     * requestid - RequestID of record in jsoc.exports
     * data - URL of requested data if is available now.
     * size - bytes of data to be returned if positive, or -1 if not known yet.
     * wait - estimated seconds until data is available if status==1.
     * contact - email address, name to contact if status==2.  user should contact with RequestID.



 exp_status - request status of pending request.
   Usage:  part of handshake if exp_request returned status==1.
   Method:  GET
   Expects:
     * requestid - RequestID of pending request
   Returns:
     * status - 0=OK immediate data available or delayed request in queue,
                1=processing,
                2=large request needs manual confirm,
                3=bad recordset. 
        if status is 3 returns element "error" containing error message
     * requestid - RequestID of record in jsoc.exports
     * data - URL of requested data if is available now and staged type.
     * size - bytes of data to be returned if positive, or -1 if not known yet.
     * wait - estimated seconds until data is available if status==1.
     * contact - email address, name to contact if status==2.  user should contact with RequestID.



 exp_su - this call initiates export of a StorageUnit to a remote DRMS
   Usage:  Used by remote DRMS to get needed SU.  Complete with exp_status calls.
   Method:  POST
   Expects:
     * requestor - name of remote DRMS site.
     * method - name of export method, e.g. ftp, tape, url
     * sunum - storage unit number
   Returns:
     * status - 0=OK immediate data available or delayed request in queue,
                1=processing,
                2=large request needs manual confirm,
                3=bad SU
        if status is 3 returns element "error" containing error message
     * requestid - RequestID of record in jsoc.exports
     * data - URL of requested data if is available now and staged type.
     * size - bytes of data to be returned if positive, or -1 if not known yet.
     * wait - estimated seconds until data is available if status==1.
     * contact - email address, name to contact if status==2.  user should contact with RequestID.



  exp_history - this call gives a remote requestor a log of prior requests.
   Usage:  Used by remote users to manage their requests.
   Expects:
     * requestor - Requestor ID of previous originator of data export requests. 
     * activeonly - Boolean, if present request will only respond with requests that
             have not had a status=0 returned from an exp_request, exp_status, or exp_su call.
     * requestid - if present info for only this requestid will be returned, if requestor matches.
   Returns:
     * status - 0=OK, 1=requestor unknown.
        if status is 1 returns element "error" containing error message
     * count - number of returned RequestIDs
     * requests - array of request descriptions containing
        * requestid - RequestID
        * ds  - recordset query used 
        * exptime - Time of export request
        * FilenameFmt     (string)        File basname format
        * Format          (int)   Data format code
        * Notify          (string)        Notification address
        * ReqTime         (time)  Time of request
        * Size            (int)   Volume of data requested
        * Status          (int)   Status of request