Distributed Report Builder
This central manager feature provides a way to automatically gather data from all or a subset of the Guardium managed units that are associated with this particular central manager. Distributed reports are designed to provide a high-level view, to correlate data from across data sources, and to summarize views of the data. Continue to use aggregators for the row level data gathering across collectors.
This capability alleviates an issue that can arise in complex enterprise environments when users do not always know the exact managed unit that has the data that is required to for a particular report. This can happen because the link between Guardium collectors and databases can change over time, for example depending on configuration options such as load balancing. This is further complicated by considerations such as the time period and data retention policy on the aggregator and collectors.
It is easy to create a Distributed Report. Simply define it via the Distributed Report Configuration page, add to My Dashboard.
Distributed reports optionally make use of data marts on the central manager to enable scheduled collection of aggregated data over time. The distributed report data is stored as a flat table, so no complex joins are required to create the report you want. The report aggregates summarized and analyzed data from all units to enable a high-level/corporate view in a reasonable response time.
Distributed report data can be gathered from collectors, aggregators, and even central managers. The default distributed versions of the reports includes the host name of the unit responsible for that data.
- Immediate reports present a limited amount of data from each unit from its "gather data from " list. It does this on demand.
- Scheduled reports run in the background as defined by its schedule and time granularity and saves data in a table on primary and secondary, if defined, targets.
- Enterprise S-TAP Verification
- Aggregation/Archive Log
- Failed User Login Attempts
- Scheduled Jobs Exceptions
Prerequisites – create group of Managed Units via the Central Management screen.
- Create Distributed Report.
- Review the data gathered.
- Create additional summary reports on the data gathered.
- Running Distributed reports: Immediate or scheduled
- When you define a distributed report, run it immediately or schedule it to run in the background
and gather the results to the Central Manager:
- Immediate: This mode gathers data on demand (upon execution via the GUI) and displays results
while gathering the results from the relevant managed units. The distributed report includes a
status indicator that data is still in transit or that all data has been received from a particular
managed unit. In this mode, data is not saved on the Central Manager. As soon as the report is
closed, the data is gone. Results for immediate reports are limited to 100
rows.
Immediate reports have two export options: Download Display records and Download as PDF. - Scheduled: This mode gathers data in advance in order to enable instant response. On the time
interval you specify in the scheduler, all relevant, aggregated data from the specified managed
units is sent to a designated data mart table on the Central Manager machine and creates a default
report against this table. This table also has its own domain and entity to enable creation of
additional queries and reports using the query builder. Those reports can be added to an audit
process in order to run the process periodically and assign the results of the process to a Role,
User and/or User Group for review or sign-off. Results for scheduled reports are limited to 10000
rows by default, but the limit is configurable using the store
scheduled_distributed CLI command.
Scheduled reports have these export options: Download all records, Download display records, Full printable report, and Download as PDF.
- Immediate: This mode gathers data on demand (upon execution via the GUI) and displays results
while gathering the results from the relevant managed units. The distributed report includes a
status indicator that data is still in transit or that all data has been received from a particular
managed unit. In this mode, data is not saved on the Central Manager. As soon as the report is
closed, the data is gone. Results for immediate reports are limited to 100
rows.
- Planning considerations for distributed reports
-
- In a mixed environment where the Central Manager is 32-bit and managed units are 64-bit, the Distributed Report will not show information from the 64-bit systems. To see information in this situation, the Central Manager needs to be upgraded to 64-bit.
- Because of the coordination of data to be sent to the Central Manager, it is critical that the clock time on all managed units is set to the real-time at the time zone where the managed units are located. Even a difference of ten minutes between the Central Manager and the managed units impact the performance and reliability of the distributed reports.
- Scheduled Distributed report definitions can be exported and imported, however immediate Distributed Report definitions cannot be exported or imported. The schedule itself is not included in the exported and imported definition. It is recommended that you keep a record of the definitions and scheduling if needed to re-create on another system such as a backup or test Central Manager. System backup does include distributed report configurations.
- If you specify that report data is collected from both aggregators and collectors, it is conceivable that the default distributed report includes duplicate data (although the Guardium host name is different). In this case, it is best to specify only collectors or only aggregators for the distributed report configuration.
- Distributed reports are based on existing non-distributed reports. When defining a distributed report in scheduled mode, if the original query includes run-time parameters, then you will be asked to provide those values (or wildcards, %).
- You can choose the target Guardium system for each scheduled distributed report. (By default, the target is the central manager. The list of available target systems is set by the GRDAPI command: grdapi set_distributed_report_target target_host_name=[unit host name].) The target system will have data residing on its database that it did not have previous to the distributed report. Plan ahead for operational changes for purging, upgrades, and backup.
- Edit and update
-
For distributed reports, edit and update the base report and update the distributed report based on the updated report structure.
If a user changes the columns in a base report, or adds or removes the where clause in the base report, and then saves and re-generates the report, then to update the distributed report based on this updated report, the user only needs to click on "Save report changes" on the existing distributed report for changes to take effect.
Should the user choose to update the existing report parameter, user should first click on "Apply report changes", then update the parameter value, then click on "Save report changes" for the updates to take effect.
- More about time
-
When running a report, the report customizer lets you specify an absolute time window for the query (from 3-31-2014 8:00am to 3-31-2014 11:00am) or a relative time window (NOW -3 HOUR).
For absolute time, each Guardium system will run in its local time. For example, if a distributed report gathers data from Guardium systems in Eastern Standard Time (EST) and Pacific Standard Time (PST), then each system will execute the query based on local time. In the example (useful for checking morning peak hours, midnight or any specific absolute time), a system in New York will gather the results from 08:00 - 11:00 EST and a system in California will gather the results from 08:00 – 11:00 PST.
For a relative time specification, each system will run NOW –N according to the current time on that system. This is important for real-time reports. Absolute Time cannot be used for real-time or near real-time reports. Use the Immediate mode for real-time monitoring.
- Viewing Distributed Report Status
-
Every distributed report is accompanied by a status report that show the user what machines succeed in bringing in the results and what did not. The link to access the status report is highlighted when you navigate to the report in the GUI.
For scheduled reports, clicking on a line on the Status Report enables execution of API to rerun the report on the specific unit(s).
If the specific run for Distributed Report in Scheduled mode comes back with an error, you can rerun the report from the status report as follows:
- Double click on one of the rows in the status report to bring up the Invoke menu. Click on Invoke.
- Click the selection, rerun_distributed_report.
- This will open up a pop-up screen that lets you choose the specific run to rerun. Any row of the report can be opened, but only rows with ERROR status can be rerun.
- GuardAPI for Rerun Distributed Report
-
The retry command described in the GUI, for invoking the status report, can also be accessed via GuardAPI command.
Syntax
grdapi rerun_distributed_report
This diagram illustrates the process to run an Immediate Distributed Report.
This diagram illustrates the process to schedule a Distributed Report.
- Distributed Report enhancement - set Target system to any Guardium system
-
The Distributed Reports distributes the query request to the specified Guardium systems, it gathers the data into the Target system, consolidates the results and provides views on the consolidated results. The results are available via the Query Builder for additional queries definition.
The Distributed Report feature can now set the Target system to any Guardium system. The previous version does not allow setting the Target system and it always goes to the Central Manager (CM).
Requirement justification
In many cases the CM is overloaded (regardless of the Distributed Report) and the CM is sometime used as an Aggregator which adds load to the CM.
In those cases it will be much more efficient to enable the user to determine the target system.
Solution
- A target System can be set for each Distributed Report. A CLI command is available to set the
optional Target systems. The list set via the CLI is shown in the Distributed Report builder
GUI.Note: This change affects the Distributed Report Scheduled mode only. The Immediate mode is not included in this change! This means that the ad-hoc distributed report result viewer is accessible via the CM only.
- The Distributed Report definition is still editable via the CM only.
The CLI commands (available via the CM only)
1. Set System as a Target System
grdapi set_distributed_report_target target_host_name=[unit host name]
2. Cancel System to be a Target system
grdapi cancel_distributed_report_target target_host_name=[unit host name]
If there are still distributed reports with this unit as target then returns error and the list of such reports
3. Get list of Target systems
grdapi get_distributed_report_target_info
Additional CLI commands:
For scheduled distribute reports, store or show the value of a maximum number of rows per unit:show scheduled_distributed store scheduled_distributed
Thestore
command has one parameter, maximum_rows_per_unit. If the value of that parameter is greater than 15,000 or equals 0 (no limit), the following warning message displays:Depending on number of collectors, setting maximum number of rows per unit to a high value might have negative impact on performance.
- A target System can be set for each Distributed Report. A CLI command is available to set the
optional Target systems. The list set via the CLI is shown in the Distributed Report builder
GUI.