Data Housekeeping
This section provides guidelines and procedures on how to perform housekeeping of your data and it covers the following:

Requirements
- API
transactions considerations:
- What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
- Do you need a copy of the older data for long-term retention? (for legal compliance requirements).
- Would you ever require restoring data for a particular period? (for forensic analysis).
- What would be the storage locations if you have to export archived data?
- Server log considerations. What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
- Audit log considerations. What is the data retention period to be set? (for example, 90 days, 120 days, and so on).
Housekeeping approaches for API transaction data
- Archive. Archive is the process of moving data that is no longer actively used, to a separate storage location for long-term retention. You may require archived data for future reference, forensic analysis, or regulatory compliance.
- Purge. Purge is the process of freeing up space in Elasticsearch by deleting obsolete data, not required by the system (data older than the defined retention-period).
Capacity sizing
If you decide to setup archiving, you must first analyze the capacity sizing requirements.
The archiving process has a few capacity sizing requirements. The size of the memory and the required storage depends on how much data is stored for every archive interval and the data retention period.
- What is the archive interval?
The archiving frequency to be practiced. This factor impacts the memory sizing.
- Should the archives include API payload?
Inclusion of API payload details such as the headers, parameters, request, response, and so on impacts memory and disk sizing.
- What is the archive retention period?
This factor impacts disk sizing.
You must consider other factors based on your data archival requirements.
Purge does not require any additional capacity sizing.
Archive considerations
- Use a dedicated storage area for archiving that is stored outside the Elasticsearch.
- Schedule the archive process to be run during non-peak periods as it is generally resource intensive and may affect the performance.
- Perform the process of deleting the older archives after the defined retention period (for example, after 2 years) either manually or using scripts.
Purge considerations
- For long-term data retention needs, to prevent data loss, you must archive the older data before initiating the purge activity.
- You must schedule the purge process to be run during non-peak periods.
Archive and purge methods
You can automate the archive and purge operations using REST APIs. Alternatively, you can also archive and purge manually from the webMethods API Gateway UI. It is recommended to use APIs and automate the archive and purge process.