Archive, Purge and Restore

Archive and purge operations should be run on a scheduled basis. Use Data Archive and Results Archive to store captured and information for auditing. Amazon S3 Archive and Backup in Guardium also appears at the end of this topic.

Data Archive and Results Archive can be found by clicking Manage > Data Management.

Data Archive backs up the data that has been captured by the Guardium system, for a time period. When configuring Data Archive, a purge operation can also be configured. Typically, data is archived at the end of the day of everyday to ensure that in the event of a catastrophe, only one day of data is lost. The purging of data depends on the application and is highly variable, depending on business and auditing requirements. In most cases, data can be kept on the Guardium systems for more than six months.
Results Archive backs up audit tasks results (reports, assessment tests, entity audit trail, privacy sets, and classification processes) as well as the view and sign-off trails and the accommodated comments from workflow processes. Results sets are purged from the system according to the workflow process definition.

In an aggregation environment, data can be archived from the collector, from the aggregator, or from both locations. Most commonly, the data is archived only once, and the location from where it is archived varies depending on your requirements.

Scheduled export operations send data from Guardium® collector units to a Guardium aggregation server. On its own schedule, the aggregation server executes an import operation to complete the aggregation process. On either or both units, archive and purge operations are scheduled to back up and purge data regularly (both to free up space and to speed up access operations on the internal database).

Archive files can be sent using SCP or FTP protocol, or to an EMC Centera or TSM storage system (if configured). You can define a single archiving configuration for each Guardium system.

Guardium’s archive function creates signed, encrypted files that cannot be tampered with. DO NOT change the names of the generated archive files. The archive and restore operations depend on the file names that are created during the archiving process.

Archive and export activities use the system shared secret to create encrypted data files. Before information encrypted on one system can be restored on another, the restoring system must have the shared secret that was used on the archiving system when the file was created.

Whenever archiving data, be sure to verify that the operation completes successfully. To do this, open the Aggregation/Archive Log by clicking Manage > Reports > Data Management > Aggregation/Archive Log. There should be multiple activities that are listed for each archive operation, and the status of each activity should as completed.

Perform System Backup tasks by clicking Manage > Data Management > System Backup. You can also perform backup tasks from the CLI. See File handling CLI commands for further information.

Default Purging

The default value for purge is 60 days
The default purge activity is scheduled every day at 5:00 AM.
For a new install, a default purge schedule is installed that is based on the default value and activity.
When a unit type is changed to a managed unit or back to a standalone unit, the default purge schedule is applied.
The purge schedule will not be affected during an upgrade.
When purging a large number of records (10 million or higher), a large batch size setting (500k to 1 million) is the most effective way to go. Using a smaller batch size or NULL causes the purge to take hours longer. Smaller purges finish quickly, so a large batch size setting is only relevant for large purges.

Note: Setting batch size is not available in the UI. Use the GuardAPI command grdapi set_purge_batch_size batchSize to set batch size.

How to determine what days are not archived

Use the Report Builder to view the list of all files with archive dates. Open the Report Builder by clicking Manage > Reports > Report Builder. From the Query menu, select Location View. Dates not on this report indicate that those dates have not been archived. Run archive for the dates not on the list, if required.

Configure Data Archive and Purge

Open the Data Archive by clicking Manage > Data Management > Data Archive.
To archive, check the Archive check box. Additional fields will appear in the Configuration panel.
For Archive data older than, enter a value and select a unit of time from the menu. To archive data starting with yesterday’s data, enter the value 1, and select Day(s) from the menu.
Use Ignore data older than to control how many days of data is archived. Any value that is specified here must be greater than the Archive data older than value.
Note: If you leave this field blank, you archive data for all days older than the value specified in Archive data older than. This means that if you archive daily and purge data older than 30 days, you archive each day of data 30 times (before it is purged on the 31st day).
Check the Archive Values check box to include values from SQL strings in the archived data. If this box is cleared, values are replaced with question mark characters on the archive (and hence the values will not be available following a restore operation).
Select a Protocols option, and fill in the appropriate information. Depending on how your Guardium system has been configured, one or more of these buttons might not be available. For a description of how to configure the archive and backup storage methods, see the description of the show and store storage-system commands.
Perform the appropriate procedure, depending on the storage method selected:
- Configure SCP or FTP Archive or Backup
- Configure EMC Centera Archive or Backup
- Configure TSM Archive or Backup
Check the Purge check box to define a purge operation.
IMPORTANT: The Purge configuration is used by both Data Archive and Data Export. Changes that are made here apply to any executions of Data Export and vice versa. In the event that purging is activated and both Data Export and Data Archive are run on the same day, the first operation that runs will likely purge any old data before the second operation's execution.

For this reason, any time that Data Export and Data Archive are both configured, the purge age must be greater than both the age at which to export and the age at which to archive.
If purging data, use the Purge data older than field to specify a starting day for the purge operation as a number of days, weeks, or months before the current day, which is day zero. All data from the specified day and all older days are purged, except as noted. Any value that is specified for the starting purge date must be greater than the value specified for the Archive data older than value. In addition, if data exporting is active, the starting purge date that is specified here must be greater than the Export data older than value. See the IMPORTANT note.
Note:
There is no warning when you purge data that has not been archived or exported by a previous operation.

The purge operation does not purge restored data whose age is within the do not purge restored data timeframe that is specified on a restore operation.
Use the Scheduling section to define a schedule for running this operation on a regular basis.
Click Save to save the configuration changes. The system attempts to verify the configuration by sending a test data file to that location.
- If the operation fails, an error message is displayed and the configuration will not be saved.
- If the operation succeeds, the configuration is saved.
Click Run Once Now to run the operation once.

Configure SCP or FTP Archive or Backup

After selecting SCP or FTP in an archive or backup configuration panel, the following information must be provided:

For Host, enter the IP address or host name of the host to receive the archived data.
For Directory, identify the directory in which the data is to be stored. How you specify this depends on whether the file transfer method used is FTP or SCP.
- For FTP: Specify the directory relative to the FTP account home directory.
- For SCP: Specify the directory as an absolute path.
For Port that can be used to send files over SCP and FTP. The default port for ssh/scp/sftp is 22. The default port for FTP is 21.
Note: Seeing a zero (0) for port indicates that the default port is being used and that there is no need to change.
For Username and Password, enter the credentials for the user logging on to the SCP or FTP server. This user must have write/execute permissions for the directory that is specified in Directory.
For Windows, a domain user is accepted with the format of domain\user
Click Save to save the configuration.

Configure EMC Centera Archive or Backup

This backup or archiving task copies files to an EMC Centera storage system off-site. A license is needed with user name and password from EMC. Four main actions are needed for this task:

Establish account with an EMC Centera on the network (IP addresses and a ClipID are needed)
Configure the data and/or configuration files from a Guardium system
Define and export a library
Confirm that your files are stored on the EMC Cetera storage system.

CLI action

From the CLI, run these commands:

store storage-system centera backup ON
show storage-system

Configure Centera Archive or Backup

Open System Backup by clicking Manage > Data Management > System Backup. Select EMC Centera, the following information must be provided:

For Retention, enter the number of days to retain the data. The maximum is 24855 (68 years). If you want to save it for longer, you can restore the data later and save it again.
For Centera Pool Address, enter the Centera Pool Connection String; for example: 10.2.3.4,10.6.7.8?/var/centera/us1_profile1_rwe.pea txt
Note: This IP address and the .PEA file comes from EMC Centera. The question mark is required when configuring the path. The .../var/centera/... path name is important as the backup might fail if the path name is not followed. The .PEA file gives permissions, username, and password authentication per Centera backup request.
Click Upload PEA File to upload a Centera PEA file to be used for the connection string. The Centera Pool Address is still needed.
Note: If the message Cannot open the pool at this address.. appears, check the size of the Guardium system host name. A timeout issue has been reported with Centera when using host names that are fewer than four characters in length.
Click Save to save the configuration. The system attempts to verify the Centera address by opening a pool using the connection string specified. If the operation fails, you will be informed and the configuration will not be saved.
Click Run Once Now to perform the backup using the downloaded .PEA file.

Confirm that your files have been copied to the EMC Centera. The name of the files and a ClipID are required for this task.

Configure TSM Archive or Backup

Before archiving to a TSM server, a dsm.sys configuration file must be uploaded to the Guardium system, via the CLI. Use the import tsm config CLI command. After you select TSM in an archive or backup configuration panel, provide following information:

For Password, enter the TSM password that this Guardium system uses to request TSM services, and re-enter it in the Re-enter Password box.
Optionally, enter a Server name matching a servername entry in your dsm.sys file.
Optionally, enter an As Host name.
Click Save to save the configuration. When you click the Save button, the system attempts to verify the TSM destination by sending a test file to the server using the dsmc archive command. If the operation fails, you will be informed and the configuration will not be saved.
Return to the archiving or backup procedure to complete the configuration.

Configure Results Archive

Open the Results Archive by clicking Manage > Data Management > Results Archive (Audit).
In the files following Archive results older than, specify a starting day for the archive operation as a number of days, weeks, or months before the current day, which is day zero. To archive results starting with yesterday’s data, enter the value 1, and select Day(s) from the list.
Optionally, use the fields following Ignore results older than to control how many days of results are archived. Any value that is specified here must be greater than the Archive results older than value.
Select a storage method from the radio buttons. Depending on how the Guardium system has been configured, one or more of these buttons might not be available. For a description of how to configure the archive and backup storage methods, see the description of the show storage-system and store storage-system commands in Configuration and Control CLI Commands.
- EMC CENTERA
- TSM
- SCP
- FTP
Perform the appropriate procedure depending on the storage method selected:
- Configure SCP or FTP Archive or Backup
- Configure EMC Centera Archive or Backup
- Configure TSM Archive or Backup
- Amazon S3 Archive and Backup in Guardium
Use the Scheduling section to define a schedule for running this operation on a regular basis.
Click Save to verify and save the configuration changes. The system attempts to verify the configuration by sending a test data file to that location.
- If the operation fails, an error message is displayed and the configuration will not be saved.
- If the operation succeeds, the configuration is saved.
Click Run Once Now to run the operation once.

Restore Data

If this system is not the system that generated the archive to be restored, you must create a location entry in the catalog via Catalog Archive, then click Add (reference: Guardium catalog) or GuardAPI (reference: CLI and API > GuardAPI Reference > GuardAPI Catalog Entry Functions). When the Data Restore is started this information is used to transfer the file to the system before processing the data.

Before Restoring Data

Before restoring from TSM, a dsm.sys configuration file must be uploaded to the Guardium system, via the CLI. Use the import tsm config CLI command.
Before restoring from EMC Centera, a pea file must be uploaded to the Guardium system, via the Data Archive panel.
Before restoring or importing a file that was encrypted by a different Guardium system, make sure that the system shared secret used by the Guardium system that encrypted the file is available on this system (otherwise, it will not be able to decrypt the file). See About the System Shared Secret in System Configuration.
Before restoring on a Guardium collector run the CLI command stop inspection-core to stop the inspection-core process.
Note: The data cannot be captured during the restore process.

To restore data:

Open Data Restore by clicking Manage > Data Management > Data Restore.
Enter a date in From to specify the earliest date for which you want data.
Enter a date in To to specify the latest date for which you want data.
For Host Name, optionally enter the name of the Guardium system from which the archive originated.
Click Search.
In the Search Results panel, check the Select check box for each archive you want to restore.
In the Don't purge restored data for at least field, enter the number of days that you want to retain the restored data on the system.
Click Restore.
Click Done when you are finished.

Note: The restore of data archived from a collector should be done only to: the same collector; an aggregator; or, a different collector dedicated to investigation that is not part of an aggregation cluster. In the case of a crashed collector, a system backup can be restored onto a new, clean collector.

Amazon S3 Archive and Backup in Guardium

Use this feature to archive and backup data, from Guardium, to Amazon S3.

Amazon S3 (Amazon Simple Storage Service) provides a simple web service interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, inexpensive infrastructure that Amazon uses to run its own web sites.

Prerequisites

An Amazon account.
Register for S3 service
Amazon S3 credentials are required in order to access Amazon S3. These credentials are:
- Access Key ID - identifies user as the party responsible for service requests. It needs to be included it in each request. It is not confidential and does not need to be encrypted. (20-character, alphanumeric sequence).
- Secret Access Key - Secret Access Key is associated with Access Key ID calculating a digital signature included in the request. Secret Access Key is a secret, and only the user and AWS should have it (40-character sequence). This key is just a long string of characters (and not a file) that is used to calculate the digital signature that needs to be included in the request.

Data Archive backs up the data that has been captured by the system, for a given time period.
Results Archive backs up audit tasks results (reports, assessment tests, entity audit trail, privacy sets, and classification processes) as well as the view and sign-off trails and the accommodated comments from work flow processes.

When Guardium data is archived, there is a separate file for each day of data.

Archive data file name format:

 <time>-<hostname.domain>-w<run_datestamp>-d<data_date>.dbdump.enc

Guardium's archive function creates signed, encrypted files that cannot be tampered with. The names of the generated archive files should not be changed. The archive operation depends on the file names that are created during the archiving process.

System backups are used to backup and store all the necessary data and configuration values to restore a server in case of hardware corruption.

All configuration information and data is written to a single encrypted file and sent to the specified destination, using the transfer method that is configured for backups on this system.

Backup system file format:

<data_date>-<time>-<hostname.domain>-SQLGUARD_CONFIG-9.0.tgz
<data_date>-<time>-<hostname.domain>-SQLGUARD_DATA-9.0.tgz

Use the Aggregation/Archive Log report in Guardium to verify that the operation completes successfully. Open the Aggregation/Archive Log by clicking Manage > Reports > Data Management > Aggregation/Archive Log. There should be multiple activities that are listed for each Archive operation, and the status of each activity should be Succeeded.

Regardless of the destination for the archived data, the Guardium catalog tracks where every archive file is sent, so that it can be retrieved and restored on the system with minimal effort, at any point in the future.

A separate catalog is maintained on each system, and a new record is added to the catalog whenever the system archives data or results.

Catalog entries can be transferred between appliances by one of the following methods:

Aggregation - Catalog tables are aggregated, which means that the aggregator will have the merged catalog of all of its collectors
Export/Import Catalog - These functions can be used to transfer catalog entries between collectors, or to backup a catalog for later restoration, etc.
Data Restore - Each data restore operation contains the data of the archived day, including the catalog of that day. So, when restoring data, the catalog is also being updated.

When catalog entries are imported from another system, those entries will point to files that have been encrypted by that system. Before restoring or importing any such file, the system shared secret of the system that encrypted the file must be available on the importing system.

Enable Amazon S3 from the Guardium CLI

Amazon S3 archive and backup option is not enabled by default in the Guardium GUI. To enable Amazon S3 via Guardium CLI, run the following CLI commands:

store storage-system amazon_s3 archive on
store storage-system amazon_s3 backup on

Amazon S3 requires that the clock time of Guardium system to be correct (within 15-minutes). Otherwise, this results in an Amazon error. If there is too large a difference between the request time and the current time, the request will not be accepted.

If the Guardium system time is not correct, set the correct time using the following CLI commands:

show system ntp server
store system ntp server (An example is ntp server: ntp.swg.usma.ibm.com)
store system ntp state on

User Interface

Use the System Backup to configure the backup. Open the System Backup by clicking Manage > Data Management > System Backup.

User input requires:

S3 Bucket Name (Every object that is stored in Amazon S3 is contained in a bucket. Buckets partition the namespace of objects that are stored in Amazon S3. Within a bucket, you can use any names for your objects, but bucket names must be unique across all of Amazon S3.
Access Key ID
Secret Access Key

If bucket name does not exist, it will get created.

Secret Access Key is encrypted when saved into the database.

Check that files got uploaded on Amazon S3

Log onto AWS Management Console using your email address and password.

http://aws.amazon.com/console/

Click S3.
Click the bucket that you specified in Guardium UI.

How to purge data from the Guardium appliance

Two areas can get full on a Guardium appliance which can then cause the GUI to stop:

The internal database
The filesystem itself (usually the /var partition)

As user CLI, check if the database is full with this CLI command:

support show db-status free %

If this comes back with 10% or less, the database is 90% full or more.

To check if /var partition (filesystem) is 90% full or more. run a must gather command from the CLI:

support must_gather system_db_info

You should be able to use fileserver to check the df -k output within the system_output.txt file that can be seen in fileserver

must_gather/system_logs/system_output.txt

or extracted from the system.<datetime>.tgz file once you have downloaded it

Inside the system_output.txt file you can find the detail.

Here the /var partition is 65% full.

==========2016-11-30 08:36:09 ... Output of df command:==========

Filesystem 1024-blocks Used Available Capacity Mounted on

/dev/sda3 10154020 2272668 7357232 24% /

/dev/sda2 28571320 17384504 9712052 65% /var

/dev/sda1 505604 33476 446024 7% /boot

tmpfs 6169768 0 6169768 0% /dev/shm

The later Guardium versions have a safety catch/feature that will stop the main processes from collecting any more data when the database or filesystem reaches a certain level .

The default is to stop the processes when the database and /or the filesystem reaches a 90% full level. as per this example v10.1 documentation. You can check the current value of the safety catch via CLI:

CLI> show auto_stop_services_when_full

Note: If the auto_stop_services_when_full is switched off the appliance may go on to fill the system to 100% preventing you from accessing the system at all.

You should never need to or want to set the auto_stop_services_when_full to OFF unless used temporarily in the specific circumstance described in the answer below when you should then use it as described before switching it back to ON once you have resolved the space problem.

Note: You must stop inspection-core before switching the auto_stop off - this will avoid the system filling any further .

So in this case the system will automatically stop inspection-core and other processes when the filesystem or database is 90% full. This includes the GUI interface - so you won't be able to connect to the GUI at that point.

If you attempt to restart stopped services with this command below then the system (and GUI interface) is likely to stop again after 5 minutes for the same reason. restart stopped_services

Note: This command should only be used once you are sure that space has been recovered.

Before the database or the filesystem fills to the "auto stop" level you should receive warnings in the system log (messages file)

Alerts can be made to email you about the space problems before the auto stop is triggered. see Guardium Full database Alert

You can run a must_gather command and look inside the compressed file that gets created to check the latest messages file within

support must_gather system_db_info

>>>Purging Data from the internal database when the GUI is down

If the auto stop has been triggered then this stops services such as the GUI - which stops you from making an emergency purge of data via the "Run Once Now " purge option

To make that emergency purge, do the following:

Make sure that the inspection-core is switched off on Collectors to stop more data flooding into the appliance

stop inspection-core

Check that NO database commands are running except the show processlist - (if needed let any running commands finish before the next step )

support show db-processlist running

You should be able to simply restart gui to gain access to the GUI to perform the purge as per What can I do if I see my Guardium Appliance getting full?

If there is a problem where GUI keeps going down every 5 minutes - then you can then consider switching the auto_stop_services_when_full to off TEMPORARILY to allow you to restart gui and purge some data. Just restarting GUI on its own might only stay running for 5 minutes - the main nanny process might stop the services again before enough data is purged or before you've had time to set the purge going.

Note: If the auto_stop_services_when_full is switched off the appliance may go on to fill the system to 100% preventing you from accessing the system at all.

You should never need to or want to set the auto_stop_services_when_full to OFF unless used temporarily in the specific circumstance described here when you should then use it as described before switching it back to ON once you have resolved the space problem.

You must stop inspection-core before switching the auto_stop off - this will avoid the system filling any further )

CLI> store auto_stop_services_when_full off

CLI> show auto_stop_services_when_full [off | restart | gui ]

Now you can go to the GUI and then Data Management Archive and set a purge running to clear some data.

Keep checking the database full and the Aggregation Archive log will show when the purge process is finished.

Once it is finished and you have space on the system you should set the auto_stop back on and then restart the stopped services thus

store auto_stop_services_when_full on

restart stopped services

If needed, then start the inspection-core.

Now data should start to be collected again.

If the system has filled up it usually means that too much activity is being recorded.