Optimizing queries

To understand how to optimize queries, it is imperative to have an understanding of how queries work and how reported data is stored internally on the appliance. Each domain is associated with a particular predefined set of data. For example, Access domain for captured traffic, Exception domain for captured errors from the database server, or Guardium activity domain to monitor activities that are performed by Guardium users.

Most of the domains deal with small amounts of data and are irrelevant for this performance description. The Access domain is one domain that deals with large amounts of data that is used often. This domain contains database activity data that is captured by an appliance. This domain is composed of many tables with many millions of rows of data in each table. Carefully consider your decisions when you are designing queries in this domain. Each entity in the domain corresponds to an underlying table and has a list of attributes. Attributes correspond to fields in the table. When you select fields (attributes) from the Session entity and some other fields from the SQL entity, these two entities are joined to retrieve the data that you request. The more entities that are joined in your query, the more complex the final query is and the longer it takes to produce the results.

Guardium monitors and captures numerous details about database user activity. All of this information can be put into the following major categories.

Who: Describes a connection to a database, who made a connection, and when the connection was made.
What: Contains the SQL statements that ran on the database.

Connection details include attributes (fields), such as Server IP, DB Type, DB User Name. This information is recorded in the Client/Server and Session entities. Two entities are used: even though the login information (for example, IP, user name) of a user stays the same for every connection, every connection has unique information, for example, login time or client port. The relatively static, repeatable login information is stored in the Client/Server entity and the unique, connection-specific information is stored in the Session entity. Splitting information between two entities helps to reduce data redundancy and saves disk space.

In addition to login information, Guardium captures the SQL statements that are issued by the user or an application. The SQL statements are recorded in the SQL entity. To create queries with conditions on specific groups of tables, or sets of commands, Guardium parses captured SQLs to commands, objects, and fields, and places this information in three other entities: Commands, Objects, and Fields.

If you want to create a query that shows only the activity on a particular table, you can create a query with a condition, such as
Where OBJECT.OBJECT_NAME = 'myTable'
Or, if you want to create a query that shows only DML activity, you can create query condition, such as
WHERE COMMAND.VERB in group 'DML commands'
The redundancy here helps to create more efficient queries. When you create a query, you first enter a query name, then you select a main entity. It is important to select a main entity that indicates to the Query-Report builder the focal point for the new query and how to construct a query. Ultimately, it might also affect query performance.

For example, you can create two queries with identical fields, one with the main entity as Session, and the other with Command. The attributes are:

Session start time
Session end time
Client IP
Server IP
DB user name
Source program

The following query is generated by the Query-Report builder with the Session main entity:
select ... from GDM_ACCESS, GDM_SESSION where.... The following query is generated by the Query-Report builder with the Command main entity:
select ... from GDM_ACCESS, GDM_SESSION, GDM_CONSTRUCT_INSTANCE, GDM_SENTENCE where...
Both queries have the same columns. However, the first query joins two tables to produce the results and the second query has four tables that participated. The second query takes longer to complete. Even more important, most likely the second report has more records and some of the rows appear multiple times.

When Command is selected as the main entity, the Query-Report builder defines the report with the focus on “command”. A session usually has many commands. Each command appears on the report in a separate row. Even if a session has no commands, there is a row for the session in the report, with an empty Command column. Main entities are organized hierarchically from high-level details to more granular. The main entity defines the level of details in the report. Selecting a main entity on too high a level in the list might limit your ability to select fields to report. An example is a single SQL statement with multiple fields. If you select SQL as a main entity, your level of detail is an SQL statement and each line in the report is dedicated to one SQL statement. This means that you cannot display fields in the same line because there is no space for multiple fields.

However, you can use the Count function to display total count of the fields in an SQL statement or the Max function to display the highest field value. If you query definition has SQL as the main entity for database activity, you cannot add the SQL Verb field from the Command entity because the Command entity is positioned lower than SQL entity in the entity list. Therefore, you cannot use the field value directly but you can apply the value to one of the math functions, such as Count, Min, or Max.

When you are designing a new query, consider the relationships between entities to avoid data redundancy in reports.

Certain database operations (such as GROUP BY, DISTINCT, ORDER BY, or HAVING clauses) provide flexibility to the Query-Report builder. However, these operations might take more processor time. If you have report performance issues, consider revising your report to limit the usage of these database operations.

In general, the data volume that is stored on the appliance is the major factor that can affect the report performance. When you tune the report performance, consider the following points:

Define the purge process to run nightly.
Configure the data retention period to the minimum that is allowed by your business requirements.
Record Full SQL only when it is necessary (for example, to monitor sensitive objects or privileged users). Full SQL tables can add data volume quickly.
Reduce the period of the report to have a positive effect on the report run time.

In centrally managed environments, reports frequently run on the aggregators instead of the collectors. Data that is exported from the collectors nightly in one-day chunks is transferred to the aggregator. On an aggregator, data from multiple collectors and multiple days is merged for reports. For efficiency reasons, not all of the data that is presented on an aggregator is merged and made available for reports. The merge period defines the date range that you can use in reports. It shows on top of every report you run. The default merge period is 14 days for interactive reports. The merge period is a derived internally. The merge period doesn't impact the date range available for audit processes.

Tip: Keep your purge period at a maximum of 90 days, and the maximum number of collectors that report to an aggregator at 10 to ensure timely data aggregation (and keep the merge period at 14).