AJAX Interface for JSOC
Plan:
- Implement basic access to keywords and segment information via CGI-BIN GET and POST server programs to allow Javascript access to basic JSOC functions.
- Provide basic browser side html to demonstrate capability and serve as a starting point for simple use and development.
- Publish the spec for the interface to allow others to build better Client-side tools.
Overview of Export Management
Exports will be managed in the series jsoc.exports Each request will be given a RequestID which can be used to track status. Data to be exported can be requested via several methods. The data requested will be collected (e.g. tar) and placed in a data segment in a record in jsoc.exports. It will be exported (i.e. picked up by the requestor) directly from the SUMS directory.
Series: jsoc.exports will not be archived and retention time of 7 days. Thus the Requestor will have one week to fetch the data. The export record will be retained for a history log of exports.
The jsoc.exports series looks like:
P% show_info jsoc.exports -l
Prime Keys are:
RequestID
DB Index Keys are:
RequestID
All Keywords for series jsoc.exports:
DataSet (string) Dataset requested
ExpTime (time) Time of export
FilenameFmt (string) File basname format
Format (int) Data format code
Notify (string) Notification address
ReqTime (time) Time of request
RequestID (int) Export request identifier
Requestor (string) Name of requestor
Size (int) Volume of data requested
Status (int) Status of request
Segments for series jsoc.exports:
Data NA generic VAR Exported data
{ - need to add keywords for export method (URL, tape, etc) and export function requested (i.e. convert to fits, rebin, extract, etc.)}
Status of prototype implementation of the draft plan
Many of the functions described below are now implemented in series_info and in a new module called jsoc_info which has heritage in show_info but emits JSON responses. Several open-source tools have been used in developing jsoc_info as well as http://jsoc.stanford.edu/ajax/lookdata.html which is a test client-side javascript code to use jsoc_info. These include:
Prototype Javascript Framework at http://www.prototypejs.org/ which provides AJAX style javascript functions enabling much easier jsvascript coding.
The JSON home which describes the XML-like JSON data protocols with links to software.
M'sJSON JSON parsing and generating code which is used in jsoc_info.
And hints for compatibility found in w3schools DOM information pages.
The operations now supported are:
series_list -- Implemented within "show_series". Expects a single parameter "filter" which is used in regex to limit the number of seriesnames returned. See the man page for show_series. If e.g. all series with the word "mdi" are desired then the URL will be: http://jsoc.stanford.edu/cgi-bin/ajax/show_series?filter=mdi
series_struct -- Implemented in jsoc_info. Expects parameters as described below. Example URL for e.g. the series mdi.vw_V_lev18 is: http://jsoc.stanford.edu/cgi-bin/ajax/jsoc_info?op=series_struct&ds=mdi.vw_V_lev18 which returns the JSON version of the information in the shell commands "show_info -l mdi.vw_V_lev18" and "show_info -s mdi.vw_V_lev18" combined.
rs_summary -- Implemented in jsoc_info. Expects parameters as below. Example URL is http://jsoc.stanford.edu/cgi-bin/ajax/jsoc_info?op=rs_summary&ds=mdi.vw_V_lev18
rs_list -- Implemented in jsoc_info. Expects parameters as below. The lookdata call limits the recordset to 10,000 records. The present implementation (due to mjson library performance) can take a minute or more for 10,000 records. Use care. This and following functions expect an explicit recordset query which will resolve to a subset of the records in the given series. The example URL here returns a few keywords for 5 minutes (5 records) from the mdi.vw_V_lev18 data. http://jsoc.stanford.edu//cgi-bin/ajax/jsoc_info.csh?ds=mdi.vw_V_lev18%5B1996.05.01_12%3A00%2F5m%5D&op=rs_list&key=DATAMEAN%2CT_OBS&seg=**NONE**
Draft Implementation Plan
One of the access tools to initiate an export will be a set of web cgi-bin modules using get and post methods. These will allow simple browser access via direct html, javascript, or shell access via wget. Later we may use higher level tools such as GWT to build more capable interfaces.
These web modules will communicate using the JSON protocols (the functionality of a stripped down XML) and later XML if needed. These are not expected to be directly usable browser tools, they are components to be used by web pages to be provided separately.
To support exports several related functions are needed:
- There will be a function to write a fits file from a record a jsoc.exports segment.
- There will be a way to generate usable filenames that identify the data by
- series, record, and segment.
Draft suggestion for JSOC cgi-bin export programs
A set of basic operations is needed to allow identification of series, lists and meaning of keyword metadata in those series, the range of date present, etc. Another set of operations is needed to allow access to keyword values and direct access to the files containing the buld of the data. These could be operations in a single program with an "opcode" first parameter as shown here or perhaps as several separate programs with a single purpose for each. In the draft implementation several server-side programs have been implemented.
Usage
The basic idea is that the user (here meaning browser javascript or shell script wget calls) will make a sequence of requests to build up a fully specified JSOC DRMS "record set" query. Then that query will be used to fetch some desired data or metadata. If only metadata is needed, the response will be "immediate". If file data is needed and is online then the request can also be provided immediately. If the online status is not known or is known to be offline then an export request can be submitted, a RequestID returned to the user, then that RequestID can be used in subsequent polling to determine when the data is actually available and to get the link for data access. In both the immediate and request-respond methods the data will in the end be provided via a URL. At the beginning only http access can be used to fetch the data. At a later date sftp access should also be available. Also, at some later date alternate delivery methods will be provided for large requests.
At some point this process can be expanded to allow the user to build up an "export cart" like a shopping cart which will contain a compound recordset query. The first implementation described below will only support exports from a single series per request. This is sufficient if the "export cart" in managed on the user side.
Synopsis
jsoc_{something} op=<command> {other arguments as specified below in Expects list}
Description:
Each of these commands can be executed locally, via a browser presumably from a Javascript
- program, or via "wget" or similar program.
op=exp_kinds - get list of export methods with rules and limits of use.
Usage: use this call to get list and restrictions for export methods and protocols. Use to
inform user of choices that will be needed later. Could be expanded to
be part of login handshake with Requestor to establish preferred method
and verify limits for that user. Implement when better understood.
Methods might be: ftp(for push), http(immediate), email(delayed), tape(mail), url(for pickup)
Protocols might be: FITS_tar, FITS_zip, jpg, mov, etc.
NOTE: This does not exist yet. Probably obsolete. Do not expect it soon.
op=series_list - get list of series matching specified filter, like show_series.
Usage: use this call to get a list of target series for further examination
Method: GET
Expects:
* a ds parameter containing a series filter.
Returns:
* status - returns 0 if OK, 1 if series not found.
if status is 1 returns element "error" containing error message
* n - count of the seriesnames matching the query
* names- an array of series information containing:
* name - series name
* primekeys - an array containing list of prime key names
* note - descriptive text for the series
NOTE: The ds filter is a regular expression to match seriesnames. A prefix of "NOT" will exclude names matching the filter. This function is fully implemented as a call to show_series. From browser use cgi-bin/show_series?ds=<filter> (may omit the ? and arg for all series) wget may be used, e.g.:
wget -O list.json http://jsoc.stanford.edu/cgi-bin/ajax/show_series?ds=hmi
op=series_struct - info for "seriesname" gives list of keywords as show_info -l and show_info -s combined
Usage: use this call to get structural contents of a series with summary of data coverage. This
info can provide info needed to formulate a request for contents of key values or data arrays.
Method: GET
Expects:
* ds param with seriesname
Returns:
* status - returns 0 if OK, 1 if series not found.
if status is non-zero returns element "error" containing error message
* note - descriptive text for the series
* archive - 1 means data is archived, 0 means not archived
* retention - number of days data retained in SUMS
* tapegroup - SUMS tapegroup number
* unitsize - max number of records per SUMS storage unit
* primekeys - an array containing list of prime key names
* dbindex - an array containing list of DBindex key names
* keywords - an array of keyword info containing:
* name - keyword name
* type - keyword type, e.g. int, string
* units - keyword units
* note - descriptive text for the keyword
* segments - an array of segment info containing:
* name - segment name
* units - data units
* protocol - storage protocol for segment file
* dims - segment array dimensions
* note - descriptive text for segment
* links - an array of link info containing
* name - segment name
* target - series name for target of the link
* kind - "static" or "dynamic"
* note - descriptive text for the link
* interval - a struct containing first and last info
* FirstRecord - contains query that will match the first record based on the first prime key
* FirstRecnum - contains the recnum of the FirstRecord
* LastRecord - contains query that matches the final record based on the first primekey.
* LastRecord - contains the recnum for LastRecord
* MaxRecnum - contains the highest recnum in the series.
NOTE: This operation should be fast, the user can expect a prompt reply from the server. NOTE: this function is now implemented as jsoc_info, example:
- jsoc_info ds=mdi.vw_V_lev18 op=series_struct
or
op=rs_summary - get recordset summary info for "record_query", use to refine query.
Usage: use this call to probe the expected return for a given query. Can be used to
estimate appropriateness of the query for the job at hand. With extensions can
be used to probe completeness, etc.
NOTE: this call may be slow if the series is large. The user should be patient.
Method: GET
Expects:
* ds containing simple record_query (i.e. only one series spec).
Returns:
server gives count of records and some other info, online, size, etc.
some coverage statistics based on completeness within recordset. Maybe
a bar plot of coverage in some bins. Start with just count of records.
* status - returns 0 if OK, 1 if series not found.
if status is non-zero returns element "error" containing error message
* count - number of records matching query.
NOTE: this operation is implemented in jsoc_info taking op=rs_summary and ds=recordset
op=rs_list - get recordset list expanded with selected keyword and segment values.
Basically like show_info with key= and seg= and if seg= then -P args.
Usage: use this call to get detailed information from DRMS, record names, keyword values,
full paths to online data, etc. Can be final query for some tasks where keyword
values are sufficient. Can provide a list of records that can be further
sub-selected based on keyword values, etc.
Method: GET
Expects:
* ds - containing recordset query, required
* key - keyword name list, optional, if not present all keys are processed
* seg - segment name list, optional: if not present all segs are processed.
Returns:
* status - 0=OK, 1=query failed.
if status is non-zero returns element "error" containing error message
* count - number of records returning values
* keywords - array of objects containing
* name - keyword name
* values - array of <count> values for that keyword.
* segments - array of objects containing
* name - name of segment
* dims - array of <count> strings containing segment file array dimensions
* values - array of <count> pathnames to segment file
* recinfo - array of count objects containing:
* name- name of record as query, as in the # line of show_info -k
* online - 1=online 0=offline
NOTE: Now implemented in jsoc_info. jsoc_info understands keyword or segment with names "**NONE**" in the key and seg parameter lists as flags that no keys or segs are wanted. Also jsoc_info recognizes "**ALL**" to mean the obvious, for both/either keys and segs. jsoc_info also recognizes the special keyword names of "*recnum*" "*sunum*", and "*logdir*" and returns the recnum and sunum valueis and log directories if those are specified, respectively. The "**ALL**" flag does not prevent explicit keywords to be listed too, in that case those keywords will appear twice. Starting in Oct 2008 all keyword values are returned as json strings. This allows floating NaNs to be returned as "nan" instead of a bunch of "9"s. It also accomodates octal and hex formats.
op=rs_image - get raw data or thumbnails for recordset for selected segments
This call generates a request in jsoc.exports only if needed to get space to
make images. We need to develop a mechanism to avoid lost of duplicate image making.
Usage: - can be used along with rs_list to get selection info to better define
desired data. Can also be used to get direct URL of online generic data.
Method: POST
Expects: (as JSON)
* ds - recordset spec for single series
* seg - list of segments for which thumbnails are requested. Can be omitted for all.
* protocol - file as is: nop; single images: gif, jpg, png; or movie: mov, etc.
Returns:
* status - 0=OK, 1=failed.
if status is non-zero returns element "error" containing error message
* count - number of records returning images.
* segments - array of segment names for which images are available
* recinfo - array of record info containing:
* record - name of record as query
* <segname> - array of <count> URLs pointing to file or thumbnails images or URL of movie.
...
Note: no development of this option yet.
op=exp_request - request export of recordset with data.
server examines request
if immediately available and method is url_quick
then send status and data
else if small enough and is online but needs export processing then initiate processing
and return status and RequestID.
else estimate processing and/or size info and return status along with options and RequestID.
Note the query can be concatenation of several record sets. I.e. user can make datacart.
Usage: Primary data request tool when data files desired.
Method: POST (or GET?)
Expects: (as JSON)
About the data:
* ds - contains recordset query
* process - Requested processing prior to export in desired protocol, default is "no_op". See below.
* protocol - file conversion request. At present options are: fits, as_is.
* filenamefmt - rule for export filenames, default: {seriesname}.{recnum:%ld}.{segment_filename}
About the communication:
* format - format of returned information, defaults to "json", options will be: json, txt, html, maybe xml
* method - name of export method, e.g.: url, url_quick, and later: ftp, http, email, tape.
About the user:
* requestor - ID of user, can be random or known Requestor.
* notify - email address of user. May be omitted unless method is "email" or "tape".
Returns in the specified format:
* status - 0=OK immediate data available or queue managed data is complete
1=request received and action is pending, i.e. in processing
2=queued for processing
3=request too large for automatic requests
4=request not formed correctly, bad series, etc.
5=request old, results requested after data timed out.
if status is > 2 returns element "error" containing error message
* requestid - RequestID of record in jsoc.exports
* dir - Directory in the JSOC system where exported data is located.
* data - an array of count objects containing:
* record - name of record as query with segment name as suffix
* filename - file or link name of the requested data.
* size - bytes of data to be returned if positive, or -1 if not known yet.
* count - number of files returned in the data array.
* method - copied from input, but url_quick may be reported as "url" if applicable.
* protocol - copied from input.
* wait - estimated seconds until data is available if status==1.
* error - message, only present if status > 2
* contact - email address, name to contact if status >2. user should contact with RequestID.
Note: This is now implemented via "jsoc_fetch". The "process" field allows passing one of the approved on-demand processing requests for JSOC data. The default is "no_op" meaning that no processing will be done beyond protocol conversion if needed. Other standard processing is TBD. Note on method=url_quick: If the protocol is as-is and the full RecordSet is online, data will contain a JSON array of pairs of query names and full URLs, one for each record. This is similar to the op=rs_list format for segments. If the data is not all online the url_quick will be treated as if it were "url". If the "url_quick" request was successful, there will be no record of the export in jsoc.exports and RequestID will be empty, and Notify and Requestor will have been ignored. For normal exports, i.e. method=url, the data may be accessed by creating the URL from "http://jsoc.stanford.edu/" + the contents of the "dir" variable + "/" + the contents of a "filename" variable.
op=exp_status - request status of pending request.
Usage: part of handshake if exp_request returned status==1.
Method: GET
Expects:
* requestid - RequestID of pending request
* format - format of returned information, defaults to "json", options will be: json, txt, html, xml
Returns in the specified format:
* status - 0=OK immediate data available or delayed request in queue,
1=processing,
2=queued for processing,
2=large request needs manual confirm,
3=bad recordset.
if status is 3 returns element "error" containing error message
* requestid - RequestID of record in jsoc.exports
* data - an array of count objects containing:
* record - name of record as query with segment name as suffix
* filename - file or link name of the requested data.
* dir - URL tail of directory containing the returned data.
* size - bytes of data to be returned if positive, or -1 if not known yet.
* count - number of files returned in the data array.
* method - copied from input, but url_quick may be reported as "url" if applicable.
* protocol - copied from input.
* wait - estimated seconds until data is available if status==1.
* contact - email address, name to contact if status==2. user should contact with RequestID.
Note: this is now implemented in "jsoc_fetch". Note on dir URL contents: The provided URL will be a directory containing some files with standard names and an "index.html" that will provide information about the exported data in web form. The directory will contain a "packing-list" file which will be a table with a row for each data file. The row will contain the DRMS record query that resolves to the record and a filename which can be concatenated onto the string "http://jsoc.stanford.edu/" and the "dir" string to be a URL for the file. If the data is "as-is" the files will be links to the actual segment files for the selected records. If some processing has been done, the data will be in files possibly tarred together and the packing-list will be a catalog of the tar file. The files: index.html, index.json, index.txt, (later maybe index.xml} will be present all containing the same information. The index.json file contents will be the same as the returned json text if format=json. Additional files will be present in the form {RequestID}.{extension} where the present extensions are "qsub", "drmsrun", and "env" which contain the qsub script and drms_run scripts run to make the export, and the shell environment during the drms_run session.
op=exp_su - this call initiates export of a StorageUnit to a remote DRMS
Usage: Used by remote DRMS to get needed SU. Complete with exp_status calls.
Method: POST
Expects:
* requestor - name of remote DRMS site.
* method - name of export method, e.g. ftp, tape, url
* sunum - storage unit number
Returns:
* status - 0=OK immediate data available or delayed request in queue,
1=processing,
2=large request needs manual confirm,
3=bad SU
if status is 3 returns element "error" containing error message
* requestid - RequestID of record in jsoc.exports
* data - URL of requested data if is available now and staged type.
* size - bytes of data to be returned if positive, or -1 if not known yet.
* wait - estimated seconds until data is available if status==1.
* contact - email address, name to contact if status==2. user should contact with RequestID.
Note: This will be implemented via a Perl script with possibly a different interface.
op=exp_history - this call gives a remote requestor a log of prior requests.
Usage: Used by remote users to manage their requests.
Expects:
* requestor - Requestor ID of previous originator of data export requests.
* activeonly - Boolean, if present request will only respond with requests that
have not had a status=0 returned from an exp_request, exp_status, or exp_su call.
* requestid - if present info for only this requestid will be returned, if requestor matches.
Returns:
* status - 0=OK, 1=requestor unknown.
if status is 1 returns element "error" containing error message
* count - number of returned RequestIDs
* requests - array of request descriptions containing
* requestid - RequestID
* ds - recordset query used
* exptime - Time of export request
* FilenameFmt (string) File basname format
* Format (int) Data format code
* Notify (string) Notification address
* ReqTime (time) Time of request
* Size (int) Volume of data requested
* Status (int) Status of request
NOTE: we need to establish some password protection for this request. The names and email addresses of requestors along with the details of their export requests will not be public information. It will be maintained for statistical purposes and to allow notification to the requestors if the data they have exported is found to be faulty, poorly calibrated, etc.
Client Side
A sample tool is now at http://jsoc.stanford.edu/ajax/lookdata.html that functions with show_series, jsoc_info and jsoc_fetch (all in http://jsoc.stanford.edu/cgi-bin/ajax/). lookdata supports calls to all of the above functions that are noted to have show_series, jsoc_info, and jsoc_fetch implementations.
Lookdata may be used as examples that function to provide building blocks for a more capable and more friendly user experience.
Each of the jsoc_info, show_series, and jsoc_fetch operations implemented has also been verified to work via wget calls. A useful application built using wget will need a JSON parser compatible with the scripting language chosen. The www.json.org page links to a number of available implementations. A jsoc_fetch script should do one exp_request call then loop on the return value of status in exp_status calls until a "0" is returned. The return containing the status=0 will also provide the json with the full result. At some point, the option of plain text for easier shell scripting will be available.
