Creating an ad hoc data extract configuration

You can adjust the data extract process to retrieve data from the production backup database for your ad hoc requirements.

About this task

If your query can fetch huge data and if it is not possible to refine the query or if pagination is not working, you can use the Data Extract Service in an ad-hoc manner to retrieve the data. Though this does a blind extract of the table data, it does not impact the application server and database performance.

You can create an ad hoc data extract configuration that is identified by a TaskId for the tables and columns based on your requirement, set a high frequency so that it is not called automatically, and set how many days of older data as required. You must not create other agent criteria for the CoC Data Extract agent. Instead, it will run by the same agent criteria, which are configured for the regular data extracts. The extracted data is compressed and send to same remote location that is configured for the regular data extracts.

It is preferred to trigger the ad hoc extracts when the regular extract is not running. The memory calculations are done considering only one data extract service runs on the server or JVM and if bigger heap is not provided, task takes more time. Besides that, if regular extracts are also running, you need to make your consuming application not to pick up the extract if the compressed file or tasks meant for ad hoc extract.

Procedure

  1. See the following sample to create an ad hoc configuration. This creates a configuration to extract the specified columns from YFS_Order_Line for the last 90 days. The frequency is set to 2400 hours so that it is not triggered automatically for the next 100 days.
    <DataExtractConfig Action="ADD" 
    TaskId="yfs_order_line_adhoc_extract_1"
    TableName="YFS_ORDER_LINE"
    Columns="ORDER_LINE_KEY,ORDER_HEADER_KEY,PRIME_LINE_NO,ITEM_ID,UOM,ORDERED_QTY" 
    FrequencyInHours="2400" FirstRunExtractInDays="90"/>
    
  2. Run the following query from the Database Query Client to identify the data extract runs in the recent past. This query returns the past extract jobs with the last one appearing at the top of the list. See the value for the column RUN_AT, for a regular extract task record at the top in the list. Know the criteria trigger interval so that you can deduce the approximate time of next trigger the.
    SELECT CFG.TASK_ID, CFG.TABLE_NAME, JOB.RUN_AT, JOB.STATUS FROM YFS_DATA_EXTR_CFG CFG, 
    YFS_DATA_EXTR_JOB JOB WHERE CFG.DATA_EXTR_CFG_KEY=JOB.DATA_EXTR_CFG_KEY ORDER BY 
    JOB.RUN_AT DESC
  3. Alternatively, you can see the next trigger time for this criteria from the Server Dashboard (https://<ip>:<port>/sma/sma/nghome.do). Click the server that is configured for regular data extract and see the Next Schedule Trigger time. It you are very close to the trigger interval, you can wait for the scheduled run to pick your configuration and extract data. Otherwise, you can trigger the criteria manually from the same screen.
    Each triggering of data extract agent goes over all the extract configuration and triggers them based on the last extract time and frequency interval. For the first time extract, it goes by the attribute FirstRunExtractInDays given in the configuration. Therefore, for ad hoc extracts, it is advised to delete the configuration after the extract or keep the frequency high.
  4. The input XML to delete the data extract configuration task that is created above is as follows:
    <DataExtractConfig Action="DELETE" TableName="YFS_ORDER_LINE" 
    TaskId="yfs_order_line_adhoc_extract_1"/>
  5. If this configuration is for one extract, you can create it directly on the production database by using a REST call and delete it after the use. But if you see a continuous need of the same tables and columns and you want only the new or modified records for subsequent runs, you can create this configuration as regular extract in the Master Config environment and get it to Production environment by using the CDT.
    Table 1. Sample REST URLs for create and delete of the extract configuration
    Requirement REST URL Pattern
    Create Data Extract Configuration
    POST
    URL: https://<ip>:<port>/smcfs/restapi/executeFlow/CocDataExtractConfig
    Body: 
    { 
    "Action":"ADD",
    "TaskId":"yfs_order_line_adhoc_extract_1",
    "TableName":"YFS_ORDER_LINE",
    "Columns":"ORDER_LINE_KEY,ORDER_HEADER_KEY,PRIME_LINE_NO,ITEM_ID,UOM, ORDERED_QTY",
    "FrequencyInHours":"2400",
    "FirstRunExtractInDays":"90"
    }
    Headers:
    Content-Type:	application/json
    X-XAPI-Tag:	DataExtractConfig
    
    Delete Data Extract Configuration
    POST
    https://<ip>:<port>/smcfs/restapi/executeFlow/CocDataExtractConfig
    Body: 
    { 
    "Action":"DELETE",
    "TableName":"YFS_ORDER_LINE",
    "TaskId":"yfs_order_line_adhoc_extract_1"
    }
    Headers:
    Content-Type:	application/json
    X-XAPI-Tag:	DataExtractConfig
    
  6. After the criteria are triggered, run the following query from Database Query Client to identify whether the ad hoc extract is successful and to get information on the extracted file.
    SELECT CFG.TASK_ID, CFG.TABLE_NAME, JOB.STATUS, JOB.RUN_AT, JOB.STARTTS, JOB.ENDTS, 
    JOB.FILE_XFR_SIZE, JOB.AVG_ROW_LENGTH FROM YFS_DATA_EXTR_CFG CFG, YFS_DATA_EXTR_JOB 
    JOB WHERE CFG.DATA_EXTR_CFG_KEY=JOB.DATA_EXTR_CFG_KEY AND TASK_ID=<provide  task_id> 
    ORDER BY JOB.RUN_AT DESC
    The extract is successful if the STATUS column displays SUCCESS in the result. The extracted data is compressed and send to remote FTP location configured for regular data extracts. The extract file pattern is <TASK_ID>__<TABLE_NAME>__<STARTTS>_<ENDTS>_m_of_n.zip. Note that TASK_ID, TABLE_NAME, STARTTS and ENDTS from the query result identify the extracted file . For example, results from a sample run are as follows:
    Results from a sample run

    And the extract file is named as yfs_inbox_adhoc_extract_3__YFS_INBOX__20170808054131_20171116053131_1_of_1.zip