CAS start-up and failover

Various failover and connect parameters can be modified through S-TAP Control Change Auditing.

When the CAS client starts on the host, it looks for a checkpoint file that it may have written to the system. This file tells CAS what it was doing the last time it was running. CAS then connects to its Guardium® system. If it has found a checkpoint file, CAS will ask the Guardium system to verify its version of its monitoring assignment against what is stored in the Guardium database. While the CAS client and the Guardium system have been disconnected, there may have been changes to the assignment. When any differences are resolved, CAS will resume monitoring. If CAS does not find a checkpoint file, it will ask the Guardium system what it should do. If the Guardium system finds the CAS host in its database, then the associated template sets will be sent to the CAS client, expanded into monitored items, and monitoring will begin. If the Guardium system cannot find the CAS host in its database, it will add it to the database and send the default template set for the CAS host operating system.

When connectivity is lost between the CAS client and Guardium system, it may take the CAS client and Guardium system up to five minutes (the wait time for a CAS client to expect a message from the Guardium system) to discover that it has lost contact with the primary Guardium system, but may happen sooner if the communication error is detected.

If the CAS client loses its connection to the Guardium system or cannot make an initial connection, it opens a failover file and begins writing the messages that it would have sent to the Guardium system, to the failover file. The path to this fail over file is stored in guard_tap.ini with the name cas_fail_over_file. When communication is reestablished the CAS client shuts down and restarts, sends all messages stored in the failover file to the Guardium system, and deletes the file. If the CAS client was unable to make the initial connection, it will use the checkpoint file to determine what to monitor, and continues doing what it was doing before communication failed.

When communication is lost, the client also starts a thread which periodically tries to reconnect with the primary Guardium system. The number of times CAS will attempt to reconnect, and the average time interval between reconnect attempts, are configurable parameters. It will try to reconnect for a period of time set in guard_tap.ini with the name cas_server_failover_delay. After that time has passed, the client will also try to connect to any secondary servers identified in guard_tap.ini. The secondaries will be tried in the order of the value of the primary attribute listed in the SQL_Guard sections of guard_tap.ini. When primary is not 1, it is a secondary. While the client is connected to a secondary server it will continue to try to reconnect to the primary server.

If the reconnect attempt limit is met, the CAS client stops trying to reconnect, but continues to write data to a failover file. To cap disk space requirements on the database server, there are actually two failover files. CAS writes to one file until it reaches its maximum failover file size (which is configurable), and then switches to the other, overwriting any previous data on that file. The default failover file size is 50MB (for each of the files).

You can specify one or more secondary Guardium systems when configuring the CAS client. In failover mode, CAS only tries to reconnect to its primary server until the time specified by cas_server_failover_delay in guard_tap.ini is exceeded. At that time, CAS begins trying to connect to any of the secondary servers, as well as its primary server (which is always the first server it tries to connect with during any reconnect attempt). While it is connected to a secondary server, CAS continues to try to reconnect to its primary server.

Changes to the CAS client configuration can only be made from the primary server and only while the host is online. Whenever the configuration of the CAS client is changed on the primary server and Guardium system is in standalone configuration, an export file is saved on the host. If the CAS client connects to a secondary server, the saved export file is imported from the host to the secondary server.

There is no need to separately maintain configurations on both primary and secondary servers. However, if on the primary server, the parameters for an individual monitored item have been changed from those defined in the template, then these changes will not be transferred to the secondary server. For example, even if the test interval on a particular file was changed from the template default of 1 hr to 10 min, the test interval on the secondary server will again be 1 hr. Essentially, monitored items are regenerated from the templates of the imported configuration. The delay before searching for secondary servers is based directly on time rather than failover file size. The delay is set with the cas_server_failover_delay parameter in guard_tap.ini and has a default of 60 minutes.

Various failover and connect parameters can be modified through S-TAP Control Change Auditing.

As with S-TAP, CAS connectivity outages create exceptions on the Guardium system, so alerts can be issued within moments of detecting the outage.

Setting Up and Maintaining Secondary Servers

In the S-TAP/CAS configuration file on the database server system, one or more secondary Guardium servers can be defined. If the primary Guardium server becomes unavailable, CAS on that database server system will connect to a secondary Guardium system (as described previously, see Start Up and Failover).

Rules of Failover

Rule # Guardium system Fails over to Valid
1 stand alone stand alone Yes
2 managed managed (same manager) Yes
3 managed managed (different manager) No
4 managed stand alone No
5 stand alone managed No

CAS Failover Limitations

  1. CAS instances will not be relocated to the failed-over Guardium system when the source Guardium system is a managed unit and the target Guardium system is either:
    • a stand-alone Guardium system
    • a managed unit which is being managed by a different manager
  2. CAS import/export option will be limited to manager and stand-alone machines only.

Exporting CAS Hosts

  1. Click Manage > Aggregation & Archive > Export to open the Definitions Export panel. Select CAS Hosts from the Type menu, select the to-be exported definitions from the Definitions to Export menu, and click .in the Export
  2. A file named exp_<date>_<time>.sql is saved on your system. This file will contain the definitions of all CAS hosts selected, and the definitions of any template sets used by those CAS hosts.

Importing CAS Hosts

  1. Click Manage > Aggregation & Archive > Import to open the Definitions Import panel.
  2. Use the Browse and Upload buttons to select files and upload them, then select the definition from the Import Uploaded Definitions pane.
  3. Click Import this set of definitionsImport this set of definitions icon to import the definition.
  4. Confirm the selected action (or not).
    Note: An import operation does not overwrite an existing definition. If you attempt to import a definition with the same name as an existing definition, you are notified that the item was not replaced. If you want to overwrite an existing definition with an imported one, you must delete the existing definition before performing the import operation.

Maintaining Secondary Servers for a CAS Host

CAS configurations can also be maintained through the use of export and import operations. Since the import operation will not replace an existing definition, on each secondary server you must delete the old CAS host definition before importing the new one.

Be sure to perform this procedure only while the selected CAS host is connected to its primary server.
  1. Export the definition of the CAS host (see the previous section).
  2. On each secondary server:
    • Delete the old CAS host definition that you want to replace.
    • Import the definitions that were exported from the primary server (see Importing CAS Hosts, previous).

CAS Client Ignore Change Alerts

The CAS client agent can avoid sending change notifications to the CAS server based on a predefined settings.

The CAS client agent will now look for a new parameter ignore_change_alerts in the CAS client agent's cas.client.config.properties configuration file.

If the parameter is not found or not set, the CAS client will work without any changes and the Ignore change alerts functionality will not be enabled (for example, the CAS client will alert on any file change).

If the new parameter is set, CAS client agent will ignore sending change notifications based on the change-types specified in the parameter value.

The possible change-types are:

PERMISSION, SIZE, OWNER, GROUP, TIMESTAMP

Ignoring multiple change-types can be set by + delimited concatenation of any of the specified change-type.

For example:

In order to avoid sending change notification on OWNER and GROUP changes, set up the parameter as follows:

ignore_change_alerts=OWNER+GROUP

Note: In the initial installation or when defining a new template, the FIRST scan of the files will be performed and these files will appear in the CAS changes report regardless to settings of Ignore change alerts.

Correcting an invalid non-IP hostname

In case the user installs CAS agent with a bogus tap_ip, guard_tap.ini param, or CAS_TAP_IP (GIM param), Windows datasources defined for that host might be useless (if used for activity that requires accessing the remote database).

If the scenario happens, the user will have to delete the datasource and change the tap_ip parameter to the correct database server hostname/ip.