Archive and Purge using API
You can use the APIs provided by API Gateway to archive and purge data. This method is easy and helps you automate the archive and purge operations.
Archive transaction and Audit data
You can archive outdated data (for example, data that is older than a year) for forensic analysis and decision making. Archived data is not the same as backup data. Software AG recommends that you use a proper naming convention for every archive that you create. You can specify the period (like from 1 June to 1 July) and the type of events to be archived. For the list of events that you can archive, see List of events that can be archived or purged.
curl -X POST -u "Administrator:manage" -H "content-type:application/json" -H "Accept:application/json"
"http://localhost:5555/rest/apigateway/apitransactions/archives?"time interval"
curl -X POST -u "Administrator:manage" -H "content-type:application/json" -H "Accept:application/json"
"http://localhost:5555/rest/apigateway/apitransactions/archives?
from=2021-06-03%2000:00:00&until=2021-06-04%2000:00:00&eventType=ALL"
You can schedule archiving using cron jobs or any other scheduling methods. Archiving is a resource-intensive operation. You must not schedule it during peak hours.
curl -X GET -u "Administrator:manage" -H "content-type:application/json" -H "Accept:application/json"
"http://localhost:5555/rest/apigateway/apitransactions/jobs/9c0eefde-dc26-4cb7-b0eb-dfe8f3a8a545"
{
"status":"Completed",
"Filename":"\\default-2021-06-14-1623648788446"
}
If the archive job fails, the status field in the above output, displays Failed. You must configure alerts for failures to get notified about the failed archive jobs. Common reasons for failure include health of the Elasticsearch cluster, load on the system, and so on. You can look into server logs and analyze the failure reasons.
Purging data
You can schedule and automate the purge process. You can also purge data manually through the API Gateway UI. To learn more about how to purge data manually, refer to the Archive and Purge using UI section. You can purge the following data using commands.
- Analytics data
- Backup snapshots
- Obsolete or empty indexes
- Expired OAuth tokens
- Archive data
Purge Analytics Data
You can purge analytics data based on timeline or size. As an example of timeline based purging, you can purge data older than an year. As an example of size-based purging, you can purge data greater than 100 GB.
Timeline based purging
You can use the following API to purge the analytics data of the specified event type and period:
curl -X DELETE -u "Administrator:manage" -H "Accept:application/json"
"http://localhost:5555/rest/apigateway/apitransactions?
action=purge&eventType=eventtype&olderThan=timeline"
For the list of events that you can specify with the API, see
List of events that can be archived or purged.
The olderthan field is the timeline field and can have one of the following values:
Timeline Name | Syntax | Example | Usage |
---|---|---|---|
Year | <number>Y | <1>Year | Purges all data up to last 1 year |
Month | <number> M | <1> M | Purges all data up to last 1 month |
Days | <number>d | <1>d | Purges all data up to last day |
Time |
<number>h<number>
m<number>s |
14h30m2s\ | Purge all data up to the given time |
http://localhost:5555/rest/apigateway/apitransactions/jobs/<job_id>
If the purge is successful, you get the following output.
{
"status": "Completed",
"Filename": "File_name"
}
Purge based on size
You can purge data based on size. When the size of an index exceeds the specified 25 GB limit, you must roll over the index. When you roll over an index, a new index is created. When you have new indexes, you can purge the old indexes. For example, if you have set maximum size for analytics data as 300 GB, maximum size of an index to be 25 GB, and if your data grows to 325 GB, then you have 13 indexes and the size of each index is 25 GB. Each index contains a primary and a replica shard. So, when the size of the primary shard of an index equals 12.5 GB, the size of the replica index will also be 12.5 GB. The total size of the index will be 25 GB. Hence, you must check the size of the primary shard of an index to decide whether the index needs to be rolled over.
You must regularly monitor the indexes that need to be purged. For information on calculating index size, see Calculating index size.
If you regularly roll over indexes, it becomes easier to find the oldest indexes and purge them. Purging older and obsolete indexes ensure the quick recovery of disk space.
- Find the oldest index using the following API:
curl -X GET http://localhost:9240/_cat/indices/gateway_default_analytics_transactionalevents*?h=i&s=i:desc
Note: The above API returns the list of indexes in descending order of index name. API Gateway follows the pattern, gateway_default_transactionalevents_epoch_00000n, where the date and time is represented in the epoch format and 'n' denotes any number starting from 1, which increments during rollover.API Gateway returns the following pattern, aliasname_yyyyMMddhhmm-00000n when no target index suffix parameter is provided during roll over. If a target index suffix parameter is provided during rollover, API Gateway returns aliasname_<targetIndexSuffix>
- Delete the index returned
in the previous step using the following API:
curl -X DELETE http://localhost:9240/indexname
- To ensure that an index is
deleted, run the following command with the required index name:
curl -v -X GET http://localhost:9240/indexname
If the deletion is successful, the above API returns the status code 404.
Purge backup snapshots
Data backups are created to safeguard data in a repository for restoring in case of any disasters. The backup snapshots created over a period of time occupies a considerable disk space. Hence, it is essential to purge backup snapshots that are older than the data retention period.
For information on purging backup snapshots, see Deleting a Backup File.
Purge obsolete or empty indexes
API Gateway may have empty indexes due to roll-over and purge operations. It is essential to cleanup the empty indexes. You can delete an index if there are multiple indexes and the index to be deleted is not a write index. Software AG recommends that you perform the purge operation after the scheduled backups.
You can use the following API to check the documents stored by the indexes:
curl -X GET "http://localhost:9240/_cat/indices/
gateway_default_analytics_transactionalevents*?s=docs.count:asc&h=i,docs.count"
The
API returns the following response:
gateway_default_analytics_transactionalevents_202106111722 0
gateway_default_analytics_transactionalevents_1622725174422-000001 2
If an
index's response value is more than 0, it implies that
index is not empty and must not be deleted. You can use the following API to check if the indexes are write index:
curl -X GET "http://localhost:9240/_cat/aliases/gateway_default_analytics_transactionalevents?
h=i,is_write_index&s=is_write_index:asc"
The above API returns the following response:
gateway_default_analytics_transactionalevents_1622725174422-000001 false
# gateway_default_analytics_transactionalevents_202106111722 true
If an index has a value
true, it implies that the index is a
write index and should not be deleted.
curl -X DELETE http://localhost:9240/indexname
curl -v -X GET http://localhost:9240/indexname
If the deletion is successful, the above API returns status code
404. You must configure alerts for failed purge jobs. When a purge job fails,
you must check the Elasticsearch logs to troubleshoot.
Purge expired OAuth tokens
https://<>/invoke/pub.oauth:removeExpiredAccessTokens
You can schedule the purge operation of indexes using a cron job or some other scheduling method. You can schedule OAuth token purging on a daily basis. You must configure alerts for failed purge jobs. When a purge job fails, you must check the server logs.
Purge Archive Data
You must delete the archive data after it reaches the maximum retention period. There is no API to clear the archive data. You must delete archives manually. You can delete archives on a daily basis.
If you want to free up the disk space immediately after initiating the purge process, use the following REST API:
http://Elasticsearch_host:Elasticsearch_port/target_indexes/_forcemerge?
only_expunge_deletes=true
where,
- <target>. Comma-separated list of data streams, indexes, and aliases used to limit the request. This parameter supports wildcards (*). To target all data streams and indices, exclude this parameter or use * or _all.
- only_expunge_deletes. Boolean parameter. When set to true, it expunges only those segments containing document deletions.
http://localhost:9240/gateway_default_analytics_transactionalevents/_forcemerge?only_expunge_deletes=true
- Transaction events (gateway_default_transactionalevents)
- Lifecycle events (gateway_default_lifecycleevents)
- Performance metrics (gateway_default_performancemetrics)
- Monitor events (gateway_default_monitorevents)
- Threat Protection events (gateway_default_threatprotectionevents)
- Policy violation events (gateway_default_policyviolationevents)
- Error events (gateway_default_errorevents)
- Audit events (gateway_default_auditlogs)
- Application logs (gateway_default_log)
- Mediator trace span (gateway_default_mediatortracespan and gateway_default_serverlogtracespans)
To learn more about this API, see Elasticsearch documentation on Force Merge API.