Planning archiving, storage capacity, and scheduling

Archiving, purging, capacity and scheduling planning is essential for managing and preserving data for future use.

Archiving and Purging

Archiving and purging data regularly is essential for the health of your Guardium® system. Archive data to preserve information for future use. Purge date to free up space and speed up access operations on the internal database. For the best performance, archive and purge all data that is not needed. For example, if you need only three moths of data on the Guardium appliance, archive and purge all data that is older than 90 days.

The Guardium archive function creates signed, encrypted files that cannot be tampered with. Archive files are transferred and stored on external systems such as file servers or storage systems. For more information about the encryption used for archives, see the File backup cipher section of Cipher suites.

If both archive and purge are scheduled, the purge runs after the archive.

Data that was archived on a collector can be restored either on another collector or an aggregator server. Data that was archived on an aggregator cannot be restored on a collector.

The export, archive, and purge functions can work on the same data, but not the same date ranges. For example, you can export and archive all information older than one day and purge all information older than one month, thus always leaving one month of data on the sending unit.

Data retention

Data retention policies vary widely depending on your system and your needs. Factors to consider are:
  • Required retention time
  • The amount of data stored in a day
  • Disk size

You can decide to keep seven days of data on the collectors and maintain the data on the aggregators for a longer period.

For disaster recovery, keep a rolling two-weeks worth of daily archives from the managed collectors.

Note: If you have stand-alone collectors, maintain daily archives according to your data retention policy.

To meet the requirements of your auditing or corporate data-retention policies, maintain daily archives from the collectors during the investigation or auditing period.

Archiving from collectors versus aggregators

Archive is required if you need to store more data than your disk allows. You can archive from either collectors or aggregators. Archiving data by using collectors or aggregator has both advantages and disadvantages. It comes down to what works best for you. After data is successfully archived, it can be purged from the Guardium appliance.

Audit data is stored in normalized form in an internal database of a Guardium system. The audit data that changes constantly is referred as dynamic data. The audit data that stays relatively constant is referred as static data. The user login time is an example of dynamic data; it is unique for every user, for each login. Username is an example of static data; it stays the same for every login of this user to the database.

Archives from an aggregator are full archives: static and dynamic data, which simplifies the archive restore process.

Archives from collectors use an incremental archive strategy. The dynamic audit data is archived when it is observed. Static audit data is archived only when the data is observed for the first time. This incremental approach reduces the size of archive files dramatically. The tradeoff is that a single archive file might not contain all the audit data that you want to restore. To compensate for this tradeoff, the archive process generates a full (not incremental) archive file the first time the archive process runs, and then the first day of every month. For example, to restore 28 June, either restore 1 June through 28 June, or restore 28 June and 1 July.

Archiving on the collectors:
  • Incremental daily information for both static and dynamic data.
  • Static data is archived in full during the archive on the first day of each month.
  • Archiving data from the original source.
  • Less usage of long-term storage.
  • To restore data for specific day: restore data for all days of that month up to the target day.
  • Collector’s archive file can be restored directly into the aggregator.
Archiving on the aggregators:
  • Incremental daily information for dynamic data.
  • Static data is archived in full daily.
  • Verify that imports are complete before you start the archive; otherwise, some data can be missing in the archive.
  • More usage of long-term storage (static data is included every day).
  • Faster restore: need to restore only one file for the specific target day.
Figure 1. Archiving from collectors to an external server, or exporting to an aggregator and archiving from the aggregator
archive to external server or aggregator

Storage capacity

The following are only estimates of backup and archive file sizes for auxiliary storage capacity planning purposes. The actual sizes vary depending on the volume and granularity of the database activity that is logged on the Guardium collectors, and the retention period of the backup files.

Daily Archives:
  • Collector: approximately 40 MB (privileged user monitoring) to 1 GB (Comprehensive monitoring with full details logged on all traffic).
  • Aggregator: a rough multiple of the number of collectors, for example, Number of collectors multiplied by 40 MB.
Monthly System backups, assuming a 50% full database on a 600 GB disk size:
Note: The backup gets roughly a 1:8 compression for the backup file.
  • Collector: 7 – 10 GB
  • Aggregator: 16 – 20 GB
  • Central Manager (no aggregation): < 1 GB

Results Archives depends on the number and frequency of audit processes implemented.

The size of the database on each Guardium system constrains the amount of data kept online. The Purge process helps to manage how much data is kept online. Purge is coordinated with the Daily Archive to make sure that all data is retained according to the required duration. Keep the minimum amount of data necessary on your Guardium system to avoid filling up the database and to maintain database performance.

Guardium recommends 15 days for the collector and 30 days for the aggregator. The actual length depends on how much data is recorded (for example, numbers of S-TAPs, policy rules, and collectors).

Control activity volume

Controlling the volume of activity monitored (on the database server) and logged (on the collector) helps to:
  • Reduce network usage.
  • Reduce the Guardium system's database disk consumption.
  • Improves the overall capacity and performance of the system.
This control is primarily achieved in the policy rules, and in the inspection engine configuration, by using these guidelines:
  • Avoid specifying port ranges in inspection engines.
  • Identify all trusted applications and batch programs (these programs generally generate the bulk of the database activity). If possible, ignore, or skip their activity by using the Ignore S-TAP Session or Skip Logging actions.
  • Unless necessary, avoid the Log Full Details action in your policies.
  • If possible, use the Selective Audit policy (with the Ignore S-TAP session rules) to minimize network traffic.
  • If no extrusion rules are used, for example, result sets are not examined. Use the Ignore Responses per Session action to eliminate result sets being sent to the Guardium system.
  • Establish a process to periodically review and update policy rules, including groups to accommodate new databases and applications.
  • Establish a process to periodically monitor SQL Errors and provide to the DBA and Application development teams for remediation.

Scheduling

Default schedule times are supplied when the unit is built and these can be amended. The Data Management tasks must be scheduled at less busy times, for example, overnight. They must be spaced out so as not to overlap. For example, one task must be completed before the next one starts.

In a typical scenario, data older than one day is archived, and data older than two days is ignored. It's critical that the data import completes before the data archive starts to capture the data that is one day old.

If the Data Archive runs BEFORE the Data Imports from one or more Collectors or one or more Aggregators, then the Archive does NOT contain the imported data that needs to be archived. For example: Data Archive runs at 00:30 and Data Import runs at 06:00. In this scenario, when the Archive runs, yesterday's data is not yet present in the system because the Import of yesterday's data has not occurred. By scheduling Data Archive AFTER nu or more Data Import finishes, the Archive contains yesterday's data.

Another issue that can arise if the archive is scheduled too early in the day: suppose the export on one of your collectors failed, and you successfully rerun the export in the morning. You need to manually rerun all processes that are related to this data: import to process the file, rerun your audit processes, and, rerun the archive since the archive file didn't have data from one of the collectors. But when the archive is scheduled for 7:00 PM, you don't need to rerun any tasks.

The timing of the archive is flexible if it is later in the day, and it finishes on the same day.

It's recommended to run import and audit tasks early in the morning so that you have current reports available at the start of the workday.

The following tables provide a summary of the key schedules to be configured on your Guardium systems.

Use the Aggregation/Archive log to record the time and status of these processes to assist with adjusting your scheduling times.

Table 1. Typical collector schedule
Function Schedule
Data export (to the Aggregators) and purge Daily: 12:30 AM
Purge is initially set to 15 days.
Data Archive (stand-alone) Daily: 3:00 AM
Audit/Workflow jobs Daily: 3:00 AM (for stand-alone systems)
CSV/CEF export to the SCP/FTP Server Daily: 5:00 AM, if configured in the Audit jobs and after the audit jobs complete.
IP-to-Hostname Aliasing Daily: 6:00 AM
Policy Reinstallation Daily: 11:00 PM
System backups Monthly: First Sunday of each month at 7:00 AM
Table 2. Typical aggregator schedule
Function Schedule
Data Archive and Purge Daily: 7:00 PM
Purge is initially set to 60 days.
Archives are scheduled during a quiet period after the following tasks are complete:
  • The daily import of data from the collectors so that the data is included in the archive. This import must include data from the collectors in different time zones.
  • Any audit processes are run.
Data Import (from the Collectors) Daily 2:30 AM
Audit/Workflow jobs Daily 3:30 AM
CSV/CEF export to the SCP/FTP Server Daily 5:15 AM, if configured in the Audit jobs, and after the audit jobs complete.
IP-to-Hostname Aliasing Daily 6:00 AM
System backups Monthly: First Sunday of each month at 7:00 AM
Note: Avoid scheduling before 12:15 AM to avoid any conflicts with the internal start-of-day processing on each Guardium system.