Distributed Report Builder

This Central Manager feature provides a way to automatically gather data from all or a subset of the Guardium managed units that are associated with this particular Central Manager. Distributed reports are designed to provide a high-level view, to correlate data from across data sources, and, to summarize views of the data. You would continue to use aggregators for the row level data gathering across collectors.

This capability alleviates an issue that can arise in complex enterprise environments when users do not always know the exact managed unit that has the data that is required to for a particular report. This can happen because the link between Guardium collectors and databases can change over time that is based on configuration options such as load balancing. This is further complicated by considerations such as the time period and data retention policy on the Aggregator and Collectors.

It is easy to create a Distributed Report. Simply define it via the Distributed Report screen, add to a Pane and it is ready for your use.

Furthermore, this feature optionally makes use of data marts on the Central Manager to enable scheduled collection of aggregated data over time. In essence, the distributed report data is stored on the Central manager as a flat table, so no complex joins are required to create the report you want, which can significantly improve response time for these enterprise reports.

Distributed report data can be gathered from Collectors, Aggregators, and even Central Managers. The default distributed versions of the reports includes the host name of the unit responsible for that data.

The following are predefined distributed reports:

Running Distributed reports: Immediate or scheduled

When you define a distributed report, run it immediately or schedule it to run in the background and gather the results to the Central Manager:

  • Immediate: This mode gathers data on demand (upon execution via the GUI) and displays results while gathering the results from the relevant managed units. The distributed report includes a status indicator that data is still in transit or that all data has been received from a particular managed unit. In this mode, data is not saved on the Central Manager. As soon as the report is closed, the data is gone.

  • Scheduled: This mode gathers data in advance in order to enable instant response. On the time interval you specify in the scheduler, all relevant, aggregated data from the specified managed units is sent to a designated data mart table on the Central Manager machine and creates a default report against this table. This table also has its own domain and entity to enable creation of additional queries and reports using the query builder. Those reports can be added to an audit process in order to run the process periodically and assign the results of the process to a Role, User and/or User Group for review or sign-off.

Planning considerations for distributed reports
  • In a mixed environment where the Central Manager is 32-bit and managed units are 64-bit, the Distributed Report will not show information from the 64-bit systems. To see information in this situation, the Central Manager needs to be upgraded to 64-bit.

  • Because of the coordination of data to be sent to the Central Manager, it is critical that the clock time on all managed units is set to the real-time at the time zone where the managed units are located. Even a difference of ten minutes between the Central Manager and the managed units impact the performance and reliability of the distributed reports.

  • Scheduled Distributed report definitions can be exported and imported, however immediate Distributed Report definitions cannot be exported or imported. The schedule itself is not included in the exported and imported definition. It is recommended that you keep a record of the definitions and scheduling if needed to re-create on another system such as a backup or test Central Manager. System backup does include distributed report configurations.

  • If you specify that report data is collected from both aggregators and collectors, it is conceivable that the default distributed report includes duplicate data (although the Guardium host name is different). In this case, it is best to specify only collectors or only aggregators for the distributed report configuration.

  • Distributed reports are based on existing non-distributed reports. When defining a distributed report in scheduled mode, if the original query includes run-time parameters, then you will be asked to provide those values (or wildcards, %).

  • Plan for the fact that now you will have data residing on your Central Manager in a database that you did not before. So you will need to plan for operational changes for purging, for upgrades, and for backup.

Creating a distributed report

Distributed report building is available only from an appliance that is configured as a Central Manager. To access the distributed report builder when logged in as an administrator, go to Reports > Report Configuration Tools > Distributed Report Builder.

From the Distributed Report Builder, you can select from a list of existing reports to modify the configuration or add to a pane, or click New to create a new distributed report. In general, any existing report on the Central Manager can be distributed immediately or run on a schedule (or both).

Creating a new distributed report

From the Report Builder, select New, which clears any existing data in the report builder, in the Based on Report pulldown, select one of the existing reports that are available for distribution. Each report from the list can be distributed once as immediate and once as scheduled. Those that are defined to be distributed immediately have the term (Immediate) appended to the distributed report name.

Select an existing report to create one distributed report.

In the Gather Data From section of the builder, choose All Managed Units (that the Central Manager is managing) or specify certain Groups and/or specific Managed Units.

Note: You can define managed unit groups from the Central Manager. Examples of groups are: Group of collectors versus aggregators; groups that are based on application, responsibility, or geography.

In the Operation Mode section of the builder, choose the report operation mode:

  • Immediate: Run the report when the user requests it. When you select this option there are no additional options to consider. You can click Apply to save the changes and then optionally click Add to Pane to add the report to the GUI.

  • Schedule: Run in a batch that prepares and gathers the data in advance.

With the Scheduled Report option, you specify the following additional values:

  • Time Granularity: Specify the time period for which the Data Mart is captured. The Data Mart extract is done at the next Time Granularity interval boundary and covers the time interval specified. The Data Mart extract for a DAYS Granularity starts at Midnight and runs every X days. The Data Mart extract for a HOURS Granularity starts at the next hour boundary and runs every X hours. The Data Mart extract for a MINUTES Granularity starts at the next X minute boundary and runs every X minutes. For example, if you specify a Time Granularity of 1-hour for the Count Of Failed Logins report, the count is based on an hourly aggregation of failed logins.

  • Purge After: Specify how long to keep the report data in the data mart before it is automatically purged.

  • Runtime parameters: Depending on what report you are basing the distributed report on, you must specify the runtime parameters. To see valid values for these fields, examine the query for the original report, or specify the wildcard, %.

Click Apply. When the system is done saving the distributed report configuration, Modify Schedule and Roles are activated.

To create the schedule, click Modify Schedule, which takes you into the general-purpose scheduler.

The schedule definition is pushed down to the managed units and tells each managed unit when and how often to send the aggregated data to the Central Manager.

To specify which roles can see this distributed report, click Roles.

Modifying an existing distributed report

For existing distributed reports, you can:

  • Change the configuration, including managed units, schedule details, or runtime parameters

  • Add a report to a dashboard

  • Delete a distributed report

  • Create a scheduled report that is based on an existing immediate report. This option replaces the immediate report. You cannot create an immediate report from an existing scheduled report.

To select an existing report, use the text search box or scroll through the list of existing reports and select the one you want to modify.

Viewing distributed reports

The following additional columns are included in distributed reports:

  • Source: The Guardium system where the data was gathered from.

  • TZ: Time Zone - because the Guardium system might be located in a different time zone from the Central Manager.

  • Date: This column shows the Start Period time for scheduled reports and enable grouping results according to hour/day. For Immediate mode, this column shows start time and will not be meaningful.
    Note: Only a maximum of three date fields are permitted.

Edit and update

For distributed reports, edit and update the base report and update the distributed report based on the updated report structure.

If a user changes the columns in a base report, or adds or removes the where clause in the base report, and then saves and re-generates the report, then to update the distributed report based on this updated report, the user only needs to click on "Save report changes" on the existing distributed report for changes to take effect.

Should the user choose to update the existing report parameter, user should first click on "Apply report changes", then update the parameter value, then click on "Save report changes" for the updates to take effect.

More about time

When running a report, the report customizer lets you specify an absolute time window for the query (from 3-31-2014 8:00am to 3-31-2014 11:00am) or a relative time window (NOW -3 HOUR).

For absolute time, each Guardium system will run in its local time. For example, if a distributed report gathers data from Guardium systems in Eastern Standard Time (EST) and Pacific Standard Time (PST), then each system will execute the query based on local time. In the example (useful for checking morning peak hours, midnight or any specific absolute time), a system in New York will gather the results from 08:00 - 11:00 EST and a system in California will gather the results from 08:00 – 11:00 PST.

For a relative time specification, each system will run NOW –N according to the current time on that system. This is important for real-time reports. Absolute Time cannot be used for real-time or near real-time reports. Use the Immediate mode for real-time monitoring.

Viewing Distributed Report Status

Every distributed report is accompanied by a status report that show the user what machines succeed in bringing in the results and what did not. The link to access the status report is highlighted when you navigate to the report in the GUI.

For scheduled reports, clicking on a line on the Status Report enables execution of API to rerun the report on the specific unit(s).

If the specific run for Distributed Report in Scheduled mode comes back with an error, you can rerun the report from the status report as follows:

  1. Double click on one of the rows in the status report to bring up the Invoke menu. Click on Invoke.

  2. Click the selection, rerun_distributed_report.

  3. This will open up a pop-up screen that lets you choose the specific run to rerun. Any row of the report can be opened, but only rows with ERROR status can be rerun.

GuardAPI for Rerun Distributed Report

The retry command described in the GUI, for invoking the status report, can also be accessed via GuardAPI command.

Syntax

grdapi rerun_distributed_report

Distributed Report - Immediate

This diagram illustrates the process to run an Immediate Distributed Report.

Distributed Report - Scheduled

This diagram illustrates the process to schedule a Distributed Report.

Distributed Report enhancement - set Target system to any Guardium system

The Distributed Reports distributes the query request to the specified Guardium systems, it gathers the data into the Target system, consolidates the results and provides views on the consolidated results. The results are available via the Query Builder for additional queries definition.

The Distributed Report feature can now set the Target system to any Guardium system. The previous version does not allow setting the Target system and it always goes to the Central Manager (CM).

Requirement justification

In many cases the CM is overloaded (regardless of the Distributed Report) and the CM is sometime used as an Aggregator which adds load to the CM.

In those cases it will be much more efficient to enable the user to determine the target system.

Solution

  • A target System can be set for each Distributed Report. A CLI command is available to set the optional Target systems. The list set via the CLI is shown in the Distributed Report builder GUI.

  • Important note: This change affects the Distributed Report Scheduled mode only. The Immediate mode is not included in this change! This means that the ad-hoc distributed report result viewer is accessible via the CM only.

  • The Distributed Report definition is still editable via the CM only.

GUI Change

  • A new field "Send Data To" is added to the Distributed Report Builder screen to enable the user to set the target System(s) (either Collector(s) or Aggregator(s)) for the Distributed Report.

  • This field is relevant only in case of Scheduled Mode (otherwise, it is disabled).

  • The default is set to the CM.

  • The list of available Target Systems is limited to the Systems that were set via the CLI (see CLI list below).

  • The Distributed Report definition is editable via the CM and View-Only via the target.

  • The "Add To Pane" of the report (adding the report viewer to the menu) is available from the definition screen on the Target System and CM.

  • This option is available on CM even if the CM is not the Target System for that report. It's done to give a possibility to view Distributed Report Status on CM but no data will be displayed in the report itself.

The CLI commands (available via the CM only)

1. Set System as a Target System

grdapi set_distributed_report_target target_host_name=[unit host name]

2. Cancel System to be a Target system

grdapi cancel_distributed_report_target target_host_name=[unit host name]

If there are still distributed reports with this unit as target then returns error and the list of such reports

3. Get list of Target system

grdapi get_distributed_report_target_info

Additional CLI commands

For scheduled distribute reports, store or show the value of a maximum number of rows per unit.

show scheduled_distributed

store scheduled_distributed

The Store command has one parameter, maximum_rows_per_unit. If the value of that parameter is greater than 15,000 or equals 0 (no limit), the user will see a warning message:

"Depending on number of collectors, setting maximum number of rows per unit to a high value might have negative impact on performance".