Planning archiving, storage capacity, and scheduling

Learn about archiving and purging, and capacity and scheduling planning, so you can configure archiving and purging on your system.

Archiving and Purging

Archiving and purging data on a regular basis is essential for the health of your Guardium® system. Archive preserves information for future use. Purge frees up space and speeds up access operations on the internal database. For the best performance, archive and purge all data that is not needed. For example, if you only need three moths of data on the Guardium appliance, archive and purge all data that is older than 90 days.

The Guardium archive function creates signed, encrypted files that cannot be tampered with. Archive files are transferred and stored on external systems such as file servers or storage systems. For more information about the encryption used for archives, see the File backup cipher section of Cipher suites.

If both archive and purge are scheduled, purge runs after archive.

Data that was archived on a collector can be restored either on another collector or an aggregator server. Data that was archived on an aggregator cannot be restored on a collector.

The export, archive, and purge functions can work on the same data, but not the same date ranges. For example, you may want to export and archive all information older than one day and purge all information older than one month, thereby always leaving one month of data on the sending unit.

Data retention

Data retention policies vary widely from depending on your system and your needs. Factors to consider include:
  • Required rentention time
  • Amount of stored data/day
  • Disk size

You could decide to keep seven days of data on the collectors, and to maintain the data on the aggregators for a much longer period.

For disaster recovery: keep a rolling two-weeks worth of daily archives from the managed collectors.

Note: If you have stand-alone collectors, maintain daily archives according to your data retention policy.

For historical investigation or auditing purposes, maintain daily archives from the collectors for the period that is required by your auditing or corporate data-retention policies.

Archiving from collectors versus aggregators

Archive is required if you ned to store more data than your disk allows. You can archive from either collectors or aggregators. There are advantages and disadvantages to both. It comes down to what works best for you. After data is successfully archived, it can be purged from the Guardium appliance.

Audit data is stored in normalized form in an internal database of a Guardium system. The audit data that changes constantly is referred as dynamic data. The audit data that stays relatively constant is referred as static data. The user login time is a good example of dynamic data. It is unique for every user, for each login. Username is an example of static data. It stays the same for every login of this user to the database.

Archives from an aggregator are full archives: static and dynamic data, which simplifies the archive restore process.

Archives from collectors use an incremental archive strategy. The dynamic audit data is archived when it is observed. Static audit data is archived only when the data is observed for the first time. This incremental approach reduces the size of archive files dramatically. The tradeoff is that a single archive file might not contain all of the audit data that you want to restore. To compensate for this tradeoff, the archive process generates a full (not incremental) archive file the first time the archive process runs, and then the first day of every month. For example, to restore 28 June, either restore 1 June through 28 June, or restore 28 June and 1 July.

Archiving on the collectors:
  • Incremental daily information for both static and dynamic data.
  • Static data is archived in full during the archive on the first day of each month.
  • Archiving data from the original source.
  • Less usage of long-term storage.
  • To restore data for specific day: restore data for all days of that month up to the target day.
  • Collector’s archive file can be restored directly into aggregator.
Archiving on the aggregators:
  • Incremental daily information for dynamic data.
  • Static data is archived in full daily.
  • Verify that imports are complete before starting the archive; otherwise, some data can be missing in the archive.
  • More usage of long-term storage (static data is included every day).
  • Faster restore: need to restore only one file for the specific target day.
Figure 1. Archiving from collectors to an external server, or exporting to an aggregator and archiving from the aggregator
archive to external server or aggregator

Storage capacity

The following are only estimates of backup and archive file sizes for auxiliary storage capacity planning purposes. The actual sizes vary depending on the volume and granularity of the database activity that is logged on the Guardium collectors, and the retention period of the backup files.

Daily Archives:
  • Collector: approximately 40 MB (privileged user monitoring) to 1 GB (Comprehensive monitoring with full details logged on all traffic).
  • Aggregator: a rough multiple of the number of collectors, for example, Number of collectors multiplied by 40 MB.
Monthly System backups, assuming a 50% full database on 600 GB disk size:
Note: The backup gets roughly a 1:8 compression for the backup file.
  • Collector: 7 – 10 GB
  • Aggregator: 16 – 20 GB
  • Central Manager (no aggregation): < 1 GB

Results Archives: Depends on the number and frequency of audit processes implemented.

The amount of data kept online is constrained by the size of the database on each Guardium system. The Purge process helps to manage how much data is kept online. Purge is coordinated with the Daily Archive so that all data is retained as required. Keep the minimum amount of data necessary on your Guardium system to avoid filling up the database and to maintain database performance.

Guardium recommends 15 days for the collector and 30 days for the aggregator. The actual length depends on how much data is recorded (for example, numbers of S-TAPs, policy rules, and collectors).

Control activity volume

Controlling the volume of activity monitored (on the database server) and logged (on the collector) helps to:
  • Reduce network usage.
  • Reduce the Guardium system's database disk consumption.
  • Improves the overall capacity and performance of the system.
This control is primarily achieved in the policy rules, and in the inspection engine configuration, by using these guidelines:
  • Avoid specifying port ranges in inspection engines.
  • Identify all trusted applications and batch programs (these programs generally generate the bulk of the database activity). If possible, ignore, or skip their activity by using the Ignore S-TAP® Session or Skip Logging actions.
  • Unless necessary, avoid the Log Full Details action in your policies.
  • If possible, use the Selective Audit policy (with the Ignore S-TAP session rules) to minimize network traffic.
  • If no extrusion rules are used, for example, result sets are not examined. Consider using the Ignore Responses per Session action to eliminate result sets being sent to the Guardium system.
  • Establish a process to periodically review and update policy rules, including groups to accommodate new databases and applications.
  • Establish a process to periodically monitor SQL Errors and provide to the DBA and Application development teams for remediation.


Default schedule times are supplied when the unit is built and these can be amended accordingly. The Data Management tasks should be scheduled at less busy times, for example, overnight. They should be spaced out so as not to overlap (for example, one task should complete before the next one starts.)

In a typical scenario, data older than one day is archived, and data older than two days is ignored. It's critical that the data import completes before the data archive starts, to capture the data that is one day old.

If the Data Archive runs BEFORE the Data Imports from other Collector(s)/Aggregator(s), then the Archive does NOT contain the imported data that should be archived. For example: Data Archive runs at 00:30 and Data Import runs at 06:00. In this scenario, when the Archive runs, yesterday's data is not yet present in the system because the Import of yesterday's data has not occurred. By scheduling Data Archive AFTER the Data Import(s) have finished, the Archive contains yesterday's data.

Another issue that can arise if the archive is scheduled too early in the day: suppose the export on one of your collectors failed, and you successfully rerun the export in the morning. You would need to manually rerun all processes related to this data: import to process the file, possibly rerun your audit processes, rerun the archive since the archive file didn't have data from one of the collectors. But when the archive is scheduled for 7PM, you don't need to rerun any tasks.

The timing of the archive is flexible as long as it is later in the day, and it finishes on the same day.

It's recommended to run import and audit tasks early in the morning, so that you have current reports available at the start of the workday.

The following tables provide a summary of the key schedules to be configured on your Guardium systems.

Use the Aggregation/Archive log to record the time and status of these processes to assist with adjusting your scheduling times.

Table 1. Typical collector schedule
Function Schedule
Data export (to the Aggregators) and purge Daily: 12:30 AM
Purge is initially set to 15 days.
Data Archive (stand-alone) Daily: 03:00 AM
Audit/Workflow jobs Daily: 03:00 AM (for stand-alone systems)
CSV/CEF export to the SCP/FTP Server Daily: 05:00 AM, if configured in the Audit jobs and after the audit jobs complete.
IP-to-Hostname Aliasing Daily: 06:00 AM
Policy Reinstallation Daily: 11:00 PM
System backups Monthly: First Sunday of each month at 7:00 AM
Table 2. Typical aggregator schedule
Function Schedule
Data Archive and Purge Daily: 7:00 PM
Purge is initially set to 60 days.
Archives are scheduled during a quiet period after the following tasks are complete:
  • The daily import of data from the collectors, so that the data is included in the archive. This must include data from the collectors in different time zones.
  • Any audit processes are run.
Data Import (from the Collectors) Daily 2:30 AM
Audit/Workflow jobs Daily: 03:30 AM
CSV/CEF export to the SCP/FTP Server Daily: 05:15 AM, if configured in the Audit jobs, and after the audit jobs complete.
IP-to-Hostname Aliasing Daily: 06:00 AM
System backups Monthly: First Sunday of each month at 7:00 AM
Note: Avoid scheduling before 12:15 AM to avoid any conflicts with the internal start-of-day processing on each Guardium system.