- Too much information
- Summary of enhancements in V10.1.2
- How outlier detection can help: Scenarios
- How outlier detection works
- Configuring outlier detection
- Validating outliers is working
- Interpreting the outlier results
- Customizing outlier detection
- Operational considerations
- Downloadable resources
- Related topics
New and enhanced Guardium Outlier Detection
Detect anomalous data and file access behavior
Too much information
The infamous 2013 data breach at a major retailer revealed that security alarms raised by their monitoring software were often ignored or at least deemed not worth further investigation. This should not be a surprise. Security analysts are bombarded with false positives and without any indication of relative risk, there is no way to prioritize the analysis.
Even worse after the attackers are inside the network, they are often able to steal additional credentials and gain unfettered access to the "crown jewels" resident in an organization's database server. The attackers can then take their time and do this over a long time without detection. Similarly, SQL injection attacks that occur under application privileges might also be able to access sensitive data under the cloak of normalcy.
This article describes a way to extend traditional database monitoring with increased intelligence to help you understand the risks based on relative changes in behavior.
Summary of enhancements in V10.1.2
Since the earlier version of this article was initially published, there are many new enhancements to outlier detection. In summary, these changes include:
- Learning (formerly called training) is now done every hour.
- New and more granular outlier reason indicators.
- Outlier mining can be run on an aggregator now, not just a collector, giving you a wider view of potential outlier threats.
- Support for file activity monitoring.
- A new outlier mining status screen to help you determine if outliers is configured and running correctly.
- Prior to 10.1.2, “volume” was used to refer to the number of activities (or errors). Now the volume is the number of records affected by the activities or the number of errors that occurred.
- The number of records affected is a coming from the data Guardium is collecting. (Requires that you enable records affected on the inspection engine configuration.
- The outlier detection capability has been revised and has led to changes in the API. If you are on a version of outlier detection prior to V10.1.2, please see the older version of this article.
This version of the previous article has been updated to reflect these changes.
How outlier detection can help: Scenarios
For example, if Joe the DBA is observed accessing a particular table many more times than he has in the past, it could be that he is slowly downloading small amounts of data over time. If an application generates more SQL errors than it has in the past, maybe there is a SQL injection attack under way.
Consider a valid bank transfer transaction, which can translate to tens or hundreds of SQL statements that ensure proper validation and authorizations, get the account number, calculate balance details, and so on. An attacker or malicious insider will most likely access a large number of records, whereas normal application activity affects only a small number. Guardium outlier detection does look at the number of records affected to determine higher than normal volumes.
A disgruntled DBA decides to extract the entire contact list into a CSV file or other format that they can take with them, or dump all the sensitive data into a database. The outlier detection algorithm can detect that access to this particular source is coming from a DBA who is not supposed to extract from operational data. There might also be an increase in exceptions (errors) as the user bypasses access control mechanisms and tries to learn the structure and privileges needed. The volume that is created by the download within the time window is probably exceptional as well. Any or all of these incidents could trigger an outlier indication.
A DBA attempts to put some buggy code into stored procedures that will "blow up" after they are gone to prove how much they were needed. The algorithm can identify that there is an exceptional volume of errors because DBAs are not supposed to access the stored procedure object. It can also detect if the DBA was temporarily granted elevated privileges to hack the stored procedure.
How outlier detection works
IBM Guardium Data Activity Monitoring and File Activity Monitoring include an advanced Machine Learning algorithm to aid in the early detection of possible attacks during operation. The algorithm automatically models the normal patterns of a user's activity without requiring any supervision. That is, it is based mainly on unsupervised learning techniques.
The outlier detection algorithm uses data that is being collected normally for security and compliance reasons. If data is not being audited already by a Guardium security policy, it is not available for Guardium to analyze.
Look at Figure 1 for a quick review of how Guardium collects audit data during its normal operations.
Figure 1. How Guardium collects and logs database event data
- A database user logs in and enters a database command.
- The Guardium software tap (S-TAP) on the database server captures the activity and sends a copy to the Guardium collector, which is a hardened hardware or software appliance.
- The analysis engine on the collector parses the command and breaks it down into its component parts for reporting or other analysis. That logged activity includes the user, the time they issued the command, where they logged in from (client IP), what source program they used (JDBC, for example), which database server they accessed, and what data was accessed.
Outlier detection operates on a subset of audit data that is transparently extracted from the collected audit data regularly into a separate data mart. This extraction occurs hourly. There are two processes to outlier detection: learning and analysis.
After the initial learning is done, analysis is activated. With analysis, new activities are compared against the existing model that represents the normal behavior. Events that fall outside of the established model norms are assigned an anomaly score and a reason for the score.
The next sections go into more detail on the learning and analysis processes.
With learning, there is both an initial learning period to create the initial models and then ongoing learning, which refines the models based on new information. Learning examines the following aspects of user activity:
- Who: Database user and operating system user. For file activity monitoring, this is the OS User
- What: The object – table, stored procedure, view, or synonym – and the verb, such as select, update, delete, or insert. For files, the object is the file name and the verb is the file command.
- When: Statically determined work hours (9 AM to 5 PM,) and weekends (Saturday and Sunday) and off hours. (Work hours can be customized.)
- Where: The source program, the database name, and database server. For file activity monitoring, this is the file server.
- How many: How many actions were performed? How many records were affected? (Note that records affected is available only if the Log Records Affected inspection engine configuration is enabled.)
The normal behavior of the activity and the users (roles) are modeled from the perspective of each of the above aspects.
Figure 2 below shows a diagram that illustrates the learning process.
Figure 2. Learning process for outliers
The initial learning needs 7 days or so to build the initial model and generate alerts, but it will take 3-4 weeks of learning to create an improved model. Learning runs hourly and updates the model every hour as well. You can reduce the initial learning period by using a Guardium API command, but this is not generally recommended; because the model is not mature, you can expect many more false positives.
Figure 3 below is a highly simplified representation of the output of a weekly outlier model for user 'assange' and his use of temporary tables. It provides an average for his use of temporary tables (18) during working hours and a standard deviation (3).
Figure 3. Weekly training generates enhanced models of normal behavior
Important: The past use of temporary tables is just one of many aspects of a user's behavior that is trained and modeled to enable comparison with new observed behavior. Other aspects include, for example, use of temporary applications, volume of activity, time of day/week of activity, rareness of activity, and others. Each of these aspects is handled similarly to what is shown above to create separate dependent or independent scores. The scores are combined and weighted to produce the final score.
Relearning: After the model is built, learning is re-executed hourly.
Analysis is the runtime comparison of the learned models of typical historical behavior against new activities that are captured by Guardium. When the models have been adequately trained, analysis occurs on incoming data activity, and outlier data begins appearing in the Guardium interface and reports.
Figure 4 below shows the process flow when both learning and analysis are active.
Figure 4. Outlier detection process flow: Learning and analysis
Figure 5 below shows that during normal work hours, user 'assange' increased his use of temporary tables beyond what was modeled based on previous behavior. The increase was significant enough to generate an outlier signal.
Figure 5. Hourly runtime analysis triggers an outlier event
Remember, the data mart is continually being fed new activity data based on your security policies and thus, new outliers can be detected based on the configured detection interval of 1 hour.
You can influence the accuracy of the analysis algorithms by indicating when specific events can be ignored. See Excluding events from outlier detection for more information.
Important: Any sensitive data objects (those in your sensitive object group) or admin users (for example, users in your privileged user group who can access sensitive data) get a higher boost in scoring. These are the groups that, by default, get a higher score. To add additional groups, see Customizing outlier detection .
Table 1. Default groups that get scoring boosts
|Guardium Group ID||Description|
Configuring outlier detection
Outlier detection was introduced in Version 9.1 (Version 9, GPU 100). However, this article covers functions included up through V10.1.3 (V10 with GPU 230). The following prerequisites and recommendations apply:
- We recommend that you enable outliers only on 64-bit collectors or aggregators with a minimum of 24GB of RAM.
Outlier processing using aggregators
In Version 10.1.2, outlier detection was enabled to run on aggregators. This removes a major impediment for outlier detection by enabling learning phase to happen across collected activity, such as those cases where audit data is load balanced across more than one collector.
In this case, data is extracted from the managed units and the learning and analysis phases happens on the aggregator as shown in Figure 6 below.
Figure 6. Outlier learning and analysis happens on aggregator
Data is sent to the aggregator every hour. Note that this extracted data is significantly smaller than the collected data.
API commands to enable and disable outlier detection
You can enable outlier detection on collectors or on aggregators:
- If you enable outlier detection on the aggregator, do not explicitly enable it on the associated collectors.
- If you enable outlier detection on a collector, all data is processed locally and not sent to the aggregator.
The following GuardAPI command enables the Outlier Detection function on a collector. With this command, outlier detection will start extracting into the data mart on the current date every hour.
grdapi enable_outliers_detection schedule_interval=1 schedule_units=HOUR DAM_FAM=DAM
Outliers will start extracting into the data mart on the current date every hour.
If you wanted to delay outlier detection, you could put a scheduled start date on the command:
grdapi enable_outliers_detection schedule_interval=1 schedule_units=HOUR DAM_FAM=DAM schedule_start="2015-06-10 00:00:00"
DAM_FAM is optional. The default value is
To enable outlier detection on an aggregator, run the following from the Central Manager:
grdapi enable_outliers_detection_agg schedule_interval=1
schedule_units=HOUR aggregator_host_name=<aggregator host name> DAM_FAM=DAM
If you wanted to delay outlier detection, you could put a scheduled start date on the command:
grdapi enable_outliers_detection_agg schedule_interval=1 schedule_units=HOUR
aggregator_host_name=<aggregator host name> DAM_FAM=DAM schedule_start="2015-06-10 00:00:00
To disable outlier detection (which disables data mart extraction, learning, and analysis), enter the following command:
Validating outliers is working
New in Version 10.1.2 is an outlier mining status report. This report indicates whether the unit is operational and if outlier mining and quick search is enabled or not. For managed units, it indicates whether the data extraction is scheduled to be sent to the aggregator.
Figure 7. Outlier mining status report
You can find this report under Manage>Maintenance>Outlier Mining Status.
Interpreting the outlier results
After the analysis phase becomes active, outlier data populates the Guardium system with the results of its analysis of real-time events. You can see this information on charts in the investigation dashboard. To get to an Investigation Dashboard, choose Data (for data activity) or File (for file activity) from the Guardium banner, or go to the Investigate menu.
Figure 8. Open the Quick Search UI
Figure 9. Investigate menu
To see the outlier alerts, see the Activity chart at the top of the Basic Preset Investigation Dashboard. You can also add the Activity chart to any existing or new dashboard. To see details, add the Results table chart.
The activity chart in Figure 10 below includes a blue line (with circles) to indicate the volume of activity for the tab selected (activity, errors, or violations).
Figure 10. Volume of activity for selected tab
Outliers show up as red and yellow indicators that reflect the severity or total outliers score for a time interval (usually an hour). Red indicators reflect highly anomalous events that require immediate attention. Yellow indicators represent less extreme anomalies that warrant attention as part of other or related investigations. The outlier score is a calculated aggregate value based on the volume of outliers as compared to the predicted volume of outliers for a given time of day, the severity of individual outliers, and other factors.
If there are multiple sources with outliers during the hour, the score that is shown is the highest of those sources.
Figure 11. How outliers appear on Investigation Dashboards
By hovering on one of the outlier icons, you can see the amount of activity in this time period and link directly to the detailed outliers or activities in the related time period of the result table chart.
Figure 12. Hovering on an outlier alert
The summary tab for the outliers results includes one hourly row per source (database or user) in which an anomaly is found and includes the anomaly score and the outlier reasons. Each reason is given a column to make it easier to sort the result.
The detailed tab for the outliers results includes a row for each outlier and includes additional data such as the source program, the object and the verb.
Why is this activity an outlier?
The following are reasons why an activity is called out as an outlier. Each reason has its own column in the outlier results.
|High volume||An unusually high incidence of a condition.|
|New||An unusually large number of new or rare behaviors.|
|Error||An unusually high incidence of error conditions.|
|Temp||An unusually high volume of access to temporary tables.|
|Diverse||An unusually high number of different activity types, meaning that a user is doing much different types of activities. This could mean that the user may be doing something different for a legitimate reason, but it could also mean that the user’s privileges were stolen and someone else is doing it.|
|Ongoing||Indicates in the last few hours that there was a high anomaly score for this database or user. What this could indicate is that the database or user is changing behavior slowly over time.|
For file activity, the supported reasons are:
- High volume
Outlier reasons are assigned in combinations when needed. For example, an outlier might be flagged as both temp and high volume if a at the same hour the user performed unusually high volume of accesses to some table, and in addition unusually high number of access to temporary tables.
Customizing outlier detection
Although the Guardium outlier detection capability is designed to require minimal intervention to operate, there are some things that you can do to optimize the capability for your environment, such as adding additional groups of privileged users or sensitive objects, or by telling the system to ignore certain events.
In addition, although it is a bit more advanced, you can tweak other things that are related to the algorithm such as anomaly score thresholds.
Boosting scores of users and objects
As stated in the beginning of this article, there are two default groups that get scoring "boosts:" Admin Users and Sensitive Objects. In addition, if you have file activity monitoring, there is the Sensitive Files group. However, you might already have additional groups set up as part of your normal operating procedures that could also be useful for outlier detection. For example, you might be maintaining a group of Suspicious Users or you might have several different groups of sensitive objects that are aligned with different applications.
You can use a
grdapi command to add additional groups to the
outlier detection algorithm.
Prerequisite: This command requires that you know the
Guardium group ID. To get the group ID, you can use the command
grdapi list_group_by_desc. For example, if you have a group
that is named "BadGuys," you can enter the following command to get its
Guardium group ID:
grdapi list_group_by_desc desc="BadGuys"
After you have the ID (let's assume it is 1234), you can include it as privileged user group for a boosted score as follows (note that you must also include the default group 1 if you want to boost scores for that as well):
grdapi set_outliers_detection_parameter parameter_name="privUsersGroupIds"
You can do the same thing with sensitive objects:
Excluding events from outlier detection
If you want to exclude events from outlier detection, such as activity from test applications, you can right-click on a particular outlier and select Ignore.
You can ignore the entire event as is, as shown in Figure 13 below.
Figure 13. Ignore this specific event
Or you can widen the scope by deleting specific event parameters. For example, if you want to ignore the source program SQLPLUS when running against ON1PARTR on a particular database server host, you would remove all the other parameters and click OK.
Figure 14. Remove a criteria by clicking the red X
Note: The values for fields other than DB user, Source Program, Server, Database, Object and Verb are ignored and do not affect exclusion criteria.
This feedback is recorded and can be reported on in the Analytic User Feedback report, as shown in Figure 15 below. The first line in the report shows what it looks like if you select an activity with no excluded criteria. The second line shows what it looks like if you select a subset of fields as criteria.
Figure 15. Analytic user feedback report
If your user feedback includes a single criterion without asterisks (user, server IP, database, and more) it automatically populates one of the existing analytic exclusion groups as well:
- Analytic Exclude DB User
- Analytic Exclude OS User
- Analytic Exclude Server IP
- Analytic Exclude Service Name
- Analytic Exclude Source Program
For example, if you delete all criteria except for DB User from the filter window, you could go to the Group Builder, edit the Analytic Exclude DB User group, and see the item that you entered right there, as shown in Figure 16 below.
Figure 16. Guardium Group Builder
Of course, you can also use all the power of the Group Builder to populate the group in bulk, including populating from a query.
You can also use Guardium APIs to populate groups with single exclusion criterion:
grdapi create create_member_to_group_by_desc desc="Analytic Exclude DB User" member="DB_USER_X"
To include previously ignored events, view the Analytic User Feedback report, double-click the previously ignored event, and select Invoke > delete_analytic_user_feedback.
Figure 17. Deleting an event from the ignored events
You have the choice of invoking the deletion now or adding the generated command to a script to run later.
Additional configurations and customizations (API)
We've already suggested the user of the Guardium API set_outliers_detection_parameter for a few different scenarios, such as adding additional user groups or sensitive objects for outlier detection consideration. Other aspects of outlier detection can be modified with this API, including increasing or decreasing the amount of time for alerts to be issued, alert limits, and more.
Recommendation: In general, do not modify the defaults unless you are working with someone knowledgeable in outlier mining.
You can see current settings by entering:
The parameters include:
|This is how many days to retain model data on the collector. The default is 90 days.|
|The Guardium group IDs for objects (tables, views, and more) to receive scoring boosts.|
|The Guardium group IDs for database users to receive scoring boosts.|
|The number of days of activity required before outlier alerts are produced. The default is 7. The value for this parameter must not exceed parameter budgetTrainingDays, or an error will be issued.|
|The number of
high volume scoring message alerts for a summary row of
outliers. The default is 20.|
Any outlier Summary row has a number of Detail rows. These rows are a sample of the anomalies that occurred during the hour. The details for the “high volume” outlier are X number of rows that had the highest anomaly score, where X is the value of this parameter.
is related to non-high-volume outliers. This is the number of
sample anomalies for a summary alert. |
Non-high-volume outliers have Y sample rows with no order between them as the score is not relevant for these outliers (no object is newer than another.) Y is the value of this parameter. The default is 5.
| This is the number of
outliers (summary level) to be issued a day. The default is
Use this parameter to control the input of alerts that is manageable by your security analysts. This controls the number and gives you the ones with the highest scores. This number is based on statistics from the last budgetTrainingDays days parameter (such as 14 days). The process calculates an additional threshold (always higher than intervalAlertsThreshold) that will produce (more or less) this number of outliers.
|Number of days the system looks back for learning. The default is 14.|
|Anomaly score that exceeds this threshold is issues as an outlier. The default is 0.99. If lowered, the system will become more sensitive to anomalies. Anomalies with a lower score will be issued as outliers and there may be a lot of false-positives. If raised, then the system will become less sensitive to anomalies.|
This section describes more about how you can incorporate Guardium capabilities to integrate outlier detection with your operational procedures.
Use distributed reporting to view outliers from multiple collectors
To view consolidated outliers data from all collectors or from a group of collectors, you can create a distributed report based on the existing Analytic Outliers List report that is shown in Figure 18 below.
Figure 18 also shows distributed reporting, where each collector sends its data to the Central Manager on a scheduled basis. (There is also an option to create an online version that allows for adhoc viewing of the centralized report data.)
As input, you need the group of collectors that include outlier data. For details on creating distributed reports, see the product documentation. A direct link is in the Related topics.
Note: If you are using Guardium V10, outliers are already consolidated across collectors using Quick Search for Enterprise.
Distribute report data using workflow automation
As with any report in Guardium, you can set up an automated process for distributing and reviewing outlier report data. This is sometimes known as compliance workflow automation. Use the Audit Process Builder in the Guardium UI to create this process, including appropriate receivers, and add the Outlier report as a task. For more details on creating an audit process, see the link in the Related topics.
Be aware of retention periods
Because outliers alerts (algorithmic output data) are associated with both Quick Search indexes also written to the Guardium repository, outliers alerts are impacted by the retention periods for both Quick Search index files (default is 3 days) and for the Analytics Outliers information stored in the Guardium database (default is 60 days). Note also that Quick Search is impacted by unit utilization thresholds including disk space and it is possible that data could be purged more frequently or that quick search would stop indexing altogether if there are issues with disk space.
Set up correlation (threshold) alerts
Because outlier detection is a separate process from security policy rules and enforcement, you cannot set up real-time alerts on them. However, because outlier data is included in reports, you can create a correlation alert. A correlation alert is triggered by a query that looks back over a specified time period to determine whether the alert threshold has been met.
For example, you can create an alert based on the query that is used in the report entitled Analytic Outliers Summary by Date – enhanced.
Figure 18. Analytic Outliers Summary by Date - enhanced report
Assume that you want alerts that are written to syslog or sent using email. You can create an alert that runs this report query periodically and sets up the alert to be fired whenever there are one or more lines in the report that have an Anomaly Score greater than or equal to 99 over the past 4 hours. Instructions for creating correlation alerts are in the product documentation (see Related topics).
We hope that you find this new use of data mining in this new version of Guardium as exciting as we do. The goal is to help your information security teams focus their analysis skills on the most important events and ones that could slip by unnoticed
- "Target Ignored Data Breach Alarms" explains how Target's security team reviewed—and ignored—urgent warnings from threat-detection tool.
- Get more information in the Outlier detection topic in the Guardium Knowledge Center.
- Learn how to create distributed reports in the Guardium Knowledge Center.
- Learn how to create audit processes (workflow) in the Guardium Knowledge Center.
- Learn how to create correlation alerts in the Guardium Knowledge Center.
- Learn more about Guardium.
- Join the Guardium Community on developerWorks.