SUM API Storage Unit Management Subsystem (SUMS) ------------------------------------------------------------- /* SUM.h */ #ifndef SUM_INCL #define SUM_VERSION_NUM (1.0) #define DBCONNECT "DSOWNER/HMI4SDO@hmidb" #include #include /* need to repeat this for pro-c precompiler */ typedef uint64_t SUMID_t; /* Bitmap of modes set in a SUM_t structure */ #define ARCH 1 /* archive the storage unit to tape */ #define TEMP 2 /* the storage unit is temporary */ #define PERM 4 /* the storage unit is permanent */ #define TOUCH 8 /* tdays gives the storage unit retention time */ #define RETRIEVE 16 /* retrieve from tape */ #define NORETRIEVE 32 /* don't retrieve from tape */ #define FULL 1024 /* also set this to get full info from DB query */ #define SUM_INCL #endif ------------------------------------------------------------------- This is found in sum_rpc.h: typedef struct SUM_struct { SUMID_t uid; CLIENT *cl; /* client handle for calling sum_svc */ SUM_info_t *sinfo; /* info from sum_main for SUM_info() call */ int debugflg; /* verbose debug mode if set */ int mode; /* bit map of various modes */ int tdays; /* touch days for retention */ int group; /* group # for the given dataseries */ int storeset; /* assign storage from JSOC, DSDS, etc. Default JSOC*/ int status; /* return status on calls. 1 = error, 0 = success */ double bytes; char *dsname; /* dataseries name */ char *username; /* user's login name */ char *history_comment; /* history comment string */ int reqcnt; /* # of entries in arrays below */ uint64_t *dsix_ptr; /* ptr to array of dsindex uint64_t */ char **wd; /* ptr to array of char * */ } SUM_t; typedef struct SUMEXP_struct { SUMID_t uid; int reqcnt; /* # of entries in arrays below */ char *host; /* hostname target of scp call */ char **src; /* ptr to char * of source dirs */ char **dest; /* ptr to char * of destination dirs */ } SUMEXP_t; ------------------------------------------------------------------- This is found in sum_info.h: struct SUM_info_struct { struct SUM_info_struct *next; uint64_t sunum; //aka ds_index char online_loc[81]; char online_status[5]; char archive_status[5]; char offsite_ack[5]; char history_comment[81]; char owning_series[81]; int storage_group; double bytes; char creat_date[32]; char username[11]; char arch_tape[21]; int arch_tape_fn; char arch_tape_date[32]; char safe_tape[21]; int safe_tape_fn; char safe_tape_date[32]; int pa_status; int pa_substatus; char effective_date[20]; }; typedef struct SUM_info_struct SUM_info_t; -------------------------------------------------------------------------- SUM_t *SUM_open(char *server, char *db, int (*history)(const char *fmt, ...)) A DRMS instance opens a session with SUMS. It gives the server name to connect to, defaults to SUMSERVER env else SUMSERVER define. The db name has been depricated and has no effect. The db will be the one that sum_svc was started with, e.g. sum_svc hmidb. The history is a printf type logging function. Returns a pointer to a SUM handle that is used to identify this user for this session. Returns NULL on failure. Currently the dsix_ptr[] and wd[] arrays are malloc'd to size SUMARRAYSZ (512). -------------------------------------------------------------------------- int SUM_close(SUM_t *sum, int (*history)(const char *fmt, ...)) Closes the session. Returns 0 on success, else error code. Will release all read-only storage and release all uncommitted allocated storage and free any other resources for this SUM handle. -------------------------------------------------------------------------- int SUM_get(SUM_t *sum, int (*history)(const char *fmt, ...)) Gets the location of the storage units given by the dsindexes. Marks the storage units as open for read. Return 0 on success with data available, 1 on error, or RESULT_PEND when the data will come from tape (call SUM_poll() or SUM_wait() to get completion msg. NOTE: Caller must check sum->status for any errors after SUM_poll() or final SUM_wait()). NOTE: You can call to get any number of storage unit (reqcnt). One completion message will be received when all units are online. If you get back an error status, you will not know if any particular storage unit failed. All the reqcnt storage units stand or fall as a team. If you want resolution at the individual storage unit level, you should make seperate SUM_get() calls. If you make another SUM_get() call before you do a SUM_wait() there will be two completion messages pending and SUM_wait() will return after the first one and you will not know which one did complete and there will still be another one pending. So keep the SUM_get()/SUM_wait() calls paired, unless you want to explicitly program for something more complex. The caller sets: SUMID = the open id mode = RETRIEVE | NORETRIEVE to get any offline dataunits from tape storage or not. Also TOUCH if want to change online retention time. tdays = touch days for online retention. Always used regardless of TOUCH mode if the SU was read from tape. reqcnt = Number of dsindex values given below to get dsix_ptr= Pointer to array of reqcnt uint64_t to indicate DB index of the dataunits The function returns: 0 on success w/ data available, 1 on error, 4 on connection reset by peer (sum_svc probably gone), 32 RESULT_PEND. Reading from tape, answer will be sent later. The user should call SUM_wait() or SUM_poll() to get the answer. If success then: wd = Array of char * pointing to the wd of each dsindex given. Value is empty string for any non-existing storage unit, or an offline storage unit when mode = NORETRIEVE. -------------------------------------------------------------------------- int SUM_alloc(SUM_t *sum, int (*history)(const char *fmt, ...)) Assigns storage from /SUM and does mkdir and reports wd. The dir is owned by the calling user. This is used when an application wants to make datarecords and put them in the managed /SUM storage. The application makes a SUM_alloc() call for each storage unit that it wants to output datarecords to. Also used internally to allocate storage for dataunits being retrieved from tape. (An application can make multiple SUM_alloc() calls, while previously there was only one alloc for any pe map file. Note that there is no longer any subdir naming template as prog:, level:, and series: no longer exits. Also the dsindex in now assinged at the start rather then at the end of storage unit (dataset) creation. ) NOTE: Currently you are restricted to make only one alloc at a time, i.e. reqcnt must be 1. The caller sets: SUMID = the open id bytes = number of bytes to allocate The function returns: Error code, else 0 on success with: dsix_ptr= Pointer to dsindex assigned to this storage unit. The application associates this dsindex with every datarecord that it creates in this storage unit. wd = Pointer to string giving the allocated wd. It is of the form of /SUM2/D123456/ where D123456 is a unique number supplied by the DB for each SUM_alloc() call (acutally can be the dsindex). The datasegment records are created under this wd by the application with file names of the form record_666.segment_001.fits. Where 666 represents the unique record number assigned by the JSOC Data Record Managment System and 001 represents the first of possible multiple datasegments written. (Check with Rasmus.) -------------------------------------------------------------------------- int SUM_alloc2(SUM_t *sum, uint64_t sunum, int (*history)(const char *fmt, ...)) Assigns storage from /SUM for the given sunum (i.e. ds_index) and does mkdir and reports wd. NOTE: This is designed to replicate locally, data from a remote SUMS. The sunum must not be from the range of assigned sunum's for the callers local SUMS. The sunum will first be validated as not belonging to this SUMS. The dir is owned by the calling user. The application makes a SUM_alloc2() call for each storage unit that it wants to replicate data segments to. NOTE: Currently you are restricted to make only one alloc at a time, i.e. reqcnt must be 1. The caller sets in sum: SUMID = the open id bytes = number of bytes to allocate The function returns: Error code, else 0 on success with: dsix_ptr= Pointer to dsindex assigned to this storage unit. In this case, it will be the given sunum. wd = Pointer to string giving the allocated wd. It is of the form of /SUM2/D123456/ where 123456 is the given sunum. -------------------------------------------------------------------------- int SUM_put(SUM_t *sum, int (*history)(const char *fmt, ...)) Puts storage units from allocated storage to the DB catalog. Upon success the wds are owned by production. NOTE: All the mode, tdays, group, DB index and wd values must be for the same dsname. The caller sets: SUMID = the open id mode = [ARCH | TEMP | PERM] + TOUCH for a normal, temporary or permanent cataloging with touch option to give tdays below tdays = If TOUCH applies, number of days to retain the storage unit dsname = dataseries name group = the storage group # for this dataseries reqcnt = Number of dsindex values given below to put dsix_ptr= Pointer to array of reqcnt uint64_t to indicate DB index of the dataunits wd = Array of char * pointing to the wd of each dsindex given. Value is NULL for any missing dataset The function returns non-0 on error. A 1 is a fatal error, see the sum_svc log file. A 2 is a null wd that was given and skipped. Normally the caller would not send blank wds. Sample: uint64_t *dsixpt; char **cptr; sum->reqcnt = 3; dsixpt = sum->dsix_ptr; *dsixpt++ = index0; /* ds_index of alloced data segment */ *dsixpt++ = index1; *dsixpt = index2; cptr = sum->wd; *cptr = (char *)malloc(64); strcpy(*cptr, alloc_wd0); *cptr++; *cptr = (char *)malloc(64); strcpy(*cptr, alloc_wd1); *cptr++; *cptr = (char *)malloc(64); strcpy(*cptr, alloc_wd2); if(SUM_put(sum, printf)) { /* save the data segments for archiving */ printf("Error: on SUM_put()\n"); } else { cptr = sum->wd; for(i=0; i < sum->reqcnt; i++) { printf("The put wd = %s\n", *cptr++); } } -------------------------------------------------------------------------- int SUM_poll(SUM_t *sum) (Normally only used by DRMS) Check if a previous request is complete. * Return 0 = msg complete, the sum has been updated * TIMEOUTMSG = msg still pending, try again later * ERRMSG = fatal error NOTE: Upon msg complete return, sum->status != 0 if error anywhere in the path of the request that initially returned the RESULT_PEND status. -------------------------------------------------------------------------- int SUM_wait(SUM_t *sum) (Normally only used by DRMS) Wait until previous request is complete * Return 0 = msg complete, the sum has been updated * ERRMSG = fatal error NOTE: Upon msg complete return, sum->status != 0 if error anywhere in the path of the request that initially returned the RESULT_PEND status. -------------------------------------------------------------------------- int SUM_info(SUM_t *sum, uint64_t sunum, int (*history)(const char *fmt, ...)) Returns the sum_main table info for the given sunum (i.e. ds_index) Sample use: SUM_info_t *sinfo; if(SUM_info(sum, 2650355, printf)) { printf("Fail on SUM_info() in main3\n"); } else { sinfo = sum->sinfo; printf("sum_info online_loc = %s\n", sinfo->online_loc); printf("sum_info online_status = %s\n", sinfo->online_status); printf("sum_info archive_status = %s\n", sinfo->archive_status); printf("sum_info creat_date = %s\n", sinfo->creat_date); printf("sum_info arch_tape = %s\n", sinfo->arch_tape); printf("sum_info arch_tape_fn = %d\n", sinfo->arch_tape_fn); printf("sum_info arch_tape_date = %s\n", sinfo->arch_tape_date); } The function returns: Error code, else 0 on success with sum->sinfo pointing to: struct SUM_info_struct { struct SUM_info_struct *next; uint64_t sunum; //aka ds_index char online_loc[80]; char online_status[5]; char archive_status[5]; char offsite_ack[5]; char history_comment[80]; char owning_series[80]; int storage_group; double bytes; char creat_date[32]; char username[10]; char arch_tape[20]; int arch_tape_fn; char arch_tape_date[32]; char safe_tape[20]; int safe_tape_fn; char safe_tape_date[32]; int pa_status; int pa_substatus; char effective_date[20]; } SUM_info_t; typedef struct SUM_info_struct SUM_info_t; -------------------------------------------------------------------------- int SUM_infoEx(SUM_t *sum, int (*history)(const char *fmt, ...)) Returns the sum_main/sum_partn_alloc table info for the given sunums (up to 512) NOTE: Advise max of 64 to prevent excessive keylist scanning. NOTE: If use this in conjuction with DRMS, there may be a problem with crashing sum_svc. See Art's or Phil's code for the way to handle using this with DRMS. Sample use: SUM_info_t *sinfo; uint64_t *dsixpt; sum->reqcnt = 4; sum->sinfo = NULL; //allow auto malloc dsixpt = sum->dsix_ptr; *dsixpt++ = 6379855; //fill in sunums *dsixpt++ = 40954592; *dsixpt++ = 1433435; *dsixpt = 40350694; if(SUM_infoEx(sum, printf)) { printf("\nFail on SUM_infoEx()\n"); } else { sinfo = sum->sinfo; while(sinfo) { printf("\nsum_info username = %s\n", sinfo->username); printf("sum_info online_loc = %s\n", sinfo->online_loc); printf("sum_info online_status = %s\n", sinfo->online_status); printf("sum_info archive_status = %s\n", sinfo->archive_status); printf("sum_info owning_series = %s\n", sinfo->owning_series); printf("sum_info creat_date = %s\n", sinfo->creat_date); printf("sum_info arch_tape = %s\n", sinfo->arch_tape); printf("sum_info arch_tape_fn = %d\n", sinfo->arch_tape_fn); printf("sum_info arch_tape_date = %s\n", sinfo->arch_tape_date); printf("sum_info pa_status = %d\n", sinfo->pa_status); printf("sum_info pa_substatus = %d\n", sinfo->pa_substatus); printf("sum_info effective_date = %s\n", sinfo->effective_date); sinfo = sinfo->next; } SUM_infoEx_free(sum); } NOTES: Returns data in the order of the sunums given. Do not give duplicate sunums or this order is violated. If you give a non-existing sunum, the returned online_loc is NULL. -------------------------------------------------------------------------- int SUM_infoArray(SUM_t *sum, uint64_t *dxarray, int reqcnt, int (*history)(const char *fmt, ...)) Returns the sum_main/sum_partn_alloc table info for the given sunums up to MAXSUNUMARRAY (65536). NOTE: Always allocates the memory needed for the answers at sum->sinfo. NOTE: For reqcnt > 128, this is much faster than SUM_infoEx(). NOTE: If use this in conjuction with DRMS, the same note may apply that is in SUM_infoEx(), i.e.: See Art's or Phil's code for the way to handle using this with DRMS. Sample use: SUM_info_t *sinfo; uint64_t dxarray[MAXSUNUMARRAY]; sum->reqcnt = 1024; sunum = 187699530; for(i=0; i < sum->reqcnt; i++) { dxarray[i] = sunum++; } if(SUM_infoArray(sum, &dxarray, sum->reqcnt, printf)) { printf("\nFail on SUM_infoArray()\n"); } else { sinfo = sum->sinfo; for(i=0; i < sum->reqcnt; i++) { printf("\nsum_info sunum = %u\n", sinfo->sunum); printf("sum_info username = %s\n", sinfo->username); printf("sum_info online_loc = %s\n", sinfo->online_loc); printf("sum_info online_status = %s\n", sinfo->online_status); printf("sum_info archive_status = %s\n", sinfo->archive_status); printf("sum_info owning_series = %s\n", sinfo->owning_series); printf("sum_info bytes = %g\n", sinfo->bytes); printf("sum_info creat_date = %s\n", sinfo->creat_date); printf("sum_info arch_tape = %s\n", sinfo->arch_tape); printf("sum_info arch_tape_fn = %d\n", sinfo->arch_tape_fn); printf("sum_info arch_tape_date = %s\n", sinfo->arch_tape_date); printf("sum_info pa_status = %d\n", sinfo->pa_status); printf("sum_info pa_substatus = %d\n", sinfo->pa_substatus); printf("sum_info effective_date = %s\n", sinfo->effective_date); sinfo = sinfo->next; } SUM_infoArray_free(sum); //must free when done, else double free in close } NOTE: Returns non-0 on error. Error 4 is Connection reset by peer, sum_svc probably gone. NOTES: Always returns data in the order requested. Will handle dup sunum. If you give a non-existing sunum, the returned online_loc is NULL. -------------------------------------------------------------------------- int SUM_delete_series(char *filename, int (*history)(const char *fmt, ...)) /* Called by the delete_series program before it deletes the series table. * Called with a pointer to a filename that has the sunums * that are associated with the series about to be deleted. * Returns 1 on error, else 0. */ This will mark all the given storage units as delete pending with a substatus of DADPDELSU to not do any Records.txt processing for the storage unit when it is deleted, as the DRMS may have reused the record numbers in the Records.txt file. -------------------------------------------------------------------------- int SUM_export(SUMEXP_t *sumexp, int (*history)(const char *fmt, ...)) /* Will take a request (typically from remotesums_ingest) * and do an scp for the given host, source and target dirs. * The ssh-agent must be set up properly for this scp to complete. * Returns 0 on success, else 1. Example of use is in: /home/production/cvs/JSOC/base/sums/apps/main2.c -------------------------------------------------------------------------- int SUM_nop(SUM_t *sum, int (*history)(const char *fmt, ...)) /* See if sum_svc is still alive. Return 0 if ok, 1 on timeout, * 4 on error (like unable to connect, i.e. the sum_svc is gone), * 5 tape_svc is gone (new 03Mar2011) -------------------------------------------------------------------------- EXAMPLE OF USE: See cvsroot/PROTO/src/SUM/main.c with corresponding make in Makelinuxia64.mk and Makelinux4.mk The SUM API library is in: cvsroot/PROTO/src/libSUMAPI.d (OLD) and cvs/jsoc/src/base/sumsapi -------------------------------------------------------------------------- DISCUSSION: The SUMS runs as a server sum_svc, which the SUM_open() connects with via a socket. It can decide how to serialize calls to the DB. -------------------------------------------------------------------------- 20Jan2005 Here's how the SUMS might be used. When the DRMS first starts it calls SUM_open(). When the DRMS gets a call to write a record it needs a wd to write the record to. If none is assigned, then it calls SUM_alloc() to get a storage unit and it also gets back a dsindex to associate with this data record. Subsequent records are written to this wd until the number of records are in the storage unit, which the DRMS knows from the series definition. So when the next record is to be written another SUM_alloc() must be done. When the application finally returns to pe, it indicates all the storage units that were allocated and that need a SUM_put() done on them. This happens on the return to pe so that an abort and release of everything can be an option that the module selects in the end. All the storage units are put one by one with a SUM_put() call with the archive mode and retention time and other ancillary info. This would allow a module to produce seperate archivable, temporary and permanenent datasets, compared to mdi which requires that all output ds have the same archive properties.