Data extract agent

A time triggered agent is configured for the IBM® Sterling Order Management System. This agent runs at a configured interval to extract data. You can configure this interval, or frequency, for the agent to ensure that data is extracted when you need to obtain the data.

You can also configure the agent to run with multiple threads and JVMs.

The agent can be configured to be triggered every n minutes. By default, the agent is triggered every 10 minutes.
The agent can also be configured to run with multiple parallel threads and multiple JVMs. By default, the agent runs with 20 threads.

You must run data extract by using only a single data extra agent criteria on a dedicated agent server at a given time. You must not run the data extract with multiple agent criteria at the same time.

For more information about configuring the data extract agent to work within your environments, see Configuring the data extract.

Data extract process

When the data extract agent is triggered, the following steps occur:

The data extract agent causes the extract tool to run and read the extract table configurations from the production database. The status for each valid table configuration is checked to determine which tables need to have data extract. Before commencing the data extract process, if the backup database is down or incorrectly configured, the extract tool displays a warning message.
Notes:
- When you run the extract tool for the first time, and if the warning message is displayed, verify your backup database configuration and run the extract tool again. Otherwise, wait for sometime for the data extract tool to establish connection with the backup database. If you are still encountering issues, contact the IBM Support team.
- Data extraction is delayed when the backup database is in the maintenance or replay mode.
When the agent runs the first time, the query includes all records that are modified since the last n days, where n is specified by the FirstRunExtractInDays attribute that is specified. Every time a query runs after, the extract agent considers the start time as the end time from when the extract tool last run for the query. The end time is calculated as the current time – 1 hour. The 1-hour interval ensures that all data that is being extracted is copied to the backup database. The 1-hour interval also ensures that all data gets loaded to the backup database and this tool is not putting load on recent data segment.
Note: By default, the agent uses the 1-hour time interval unless it is overridden by a value of >1 in the yfs.yfs.coc.dataextract.fetch.maxhoursold property. You can also reduce the value of 1 hour by using the yfs.yfs.coc.dataextract.fetch.maxminsold property.
The extract tool then forms queries based on the table configuration.
The extract tool connects to the backup production database and runs the SQL query against the database to retrieve the data that matches the query results. The data extract tool extracts data from only the backup transaction database, not the primary production database. By extracting data from the backup database, the extract process does not impact the performance of the primary database.
The extract tool extracts data from between the start and end times for the extract process. These times are based on the last modification time stamp for the last extracted record from the previous time the extract process ran.

If the extract involves too much data, the data might be extracted in separate jobs with different start and end time ranges. For each table configuration, the extract tool checks the corresponding amount of data to be extracted. The agent then creates the necessary number of jobs to extract the data.
The extract tool writes the query results within CSV formatted output files and includes each file into a compressed (.zip) file. Each output file has the same name as the corresponding compressed file for transferring the data. The file names use the following naming convention:
- taskID__tableName__extractQueryStart_extractQueryEnd_jobNumber_of_totalNumberOfJobs.csv
Where

taskID

The ID of the task.

tableName

The database table name.

extractQueryStart

The time stamp when the extract process started.

extractQueryEnd

The time stamp when the extract process completed.

jobNumber_of_totalNumberOfJobs

The serial number for the data extract job in relation to the total number of data extract jobs.
The extract tool sends the compressed files in a stream to your file transfer server. The file is not written to the file system within an IBM Sterling Order Management System environment. When the last job is processed, the agent checks if the file transfer is completed for all files of all jobs for every table configuration that is run. If the transfers are completed, the agent creates and transfers a zero-byte .DONE file for each table configuration.

Output file format

The output of each data extract job is formatted into a CSV formatted file.

Comma-separated values of each column to be extracted in the same order as defined in configuration management service
No header row of column names
Each value is with in double quotation marks. Actual double quotation marks within a value are escaped by another double quotation mark.
Any value that includes a new line character, such as -remove all, is removed.

The generated CSV files are then included within a compressed (.zip) file for transferring to the configured file transfer server.

Sample run scenario

An extract is configured to run for the following conditions and configured settings (assuming a current date of 6/24/2016):

The frequency for the task OrderExt and the database table YFS_ORDER_HEADER is 24 hours. The data extract occurs daily at approximately 8:00 AM.
24,000 records exist between 7:00 AM of the previous day and 7:00 AM of the current day.
Internal memory calculation limits the extract process to extracting 7000 records at a time.

When the data extract agent is triggered on or just after 8:00 AM of the current day, the data extract tool creates four jobs to extract the records. Each job is processed independently. For each job, the required data is fetched and included within a generated output file. The files are then transferred to the configured file transfer server.

The generated output files resemble the following files. Each file includes the timestamp of the extraction process within the file name.

Table that displays a sample output from the data extract tool.

#	File name	Rows
1	OrderExt__YFS_ORDER_HEADER_20160623050000_20160623070000_1_of_4.zip	7000
2	OrderExt__YFS_ORDER_HEADER_20160623050000_20160623070000_2_of_4.zip	7000
3	OrderExt__YFS_ORDER_HEADER_20160623050000_20160623070000_3_of_4.zip	7000
4	OrderExt__YFS_ORDER_HEADER_20160623050000_20160623070000_4_of_4.zip	3000
5	OrderExt__YFS_ORDER_HEADER_20160623050000_20160623070000.DONE

If one or more files for a particular table fail to be extracted or transferred to the configured file storage service while other files are successfully extracted or transferred, the entire table extract is processed again. For files that were successfully extracted and transferred, the files are extracted and transferred again.

As an example, in this sample extract scenario, if jobs 1 and 4 failed and jobs 2, 3, and 5 are successful, the entire process runs again. When the process runs again, the files for all five jobs are extracted and transferred, with the files for jobs 2, 3, and 5 transferred again.

Cleaning up data extract jobs

If you are using data extract extensively, the data extract jobs table gets filled up fast. Data in this table is useful to triage failed extracts that happened in the past. Hence, deleting data from the immediate past is not recommended. However, bulk of the data, which is old, must be cleaned up from the main table to improve the performance of queries on this table.

To clean up the table before you start the data extract jobs, you can specify the number of days for which the records must be deleted from the table by setting the yfs.yfs.coc.dataextract.jobpurge.leadDays property.

This property is not set by default, that is, the job records are not cleaned by default. The value for this property must be greater than 120 days.

If this property is set to a value less than or equal to 120, an error is thrown. Records for days up to 120 days cannot be deleted. If the cleanup process fails for any reason, the data extract process still continues with its primary task of extracting data.