Contents


Use Guardium outlier detection to detect hidden threats

Advanced mining techniques can help you find a needle in a haystack

Comments

Editor's note: Please refer to the article, "New and enhanced Guardium Outlier Detection," for V10.1.2 enhancements.

Too much information

The infamous 2013 data breach at a major retailer revealed that security alarms raised by their monitoring software were often ignored or at least deemed not worth further investigation. This should not be a surprise. Security analysts are bombarded with false positives and without any indication of relative risk, there is no way to prioritize the analysis.

Even worse after the attackers are inside the network, they are often able to steal additional credentials and gain unfettered access to the "crown jewels" resident in an organization's database server. The attackers can then take their time and do this over a long time without detection. Similarly, SQL injection attacks that occur under application privileges might also be able to access sensitive data under the cloak of normalcy.

This article describes a way to extend traditional database monitoring with increased intelligence to help you understand the risks based on relative changes in behavior.

How outliers can help: Scenarios

For example, if Joe the DBA is observed accessing a particular table many more times than he has in the past, it could be that he is slowly downloading small amounts of data over time. If an application generates more SQL errors than it has in the past, maybe there is a SQL injection attack under way.

Consider a valid bank transfer transaction, which can translate to tens or hundreds of SQL statements that ensure proper validation and authorizations, get the account number, calculate balance details, and so on. An attacker or malicious insider will most likely bypass the "normal" way to access data. By bypassing these hundreds of SQL statements, that attacker could be flagged as abnormal, an "outlier."

A disgruntled DBA decides to extract the entire contact list into a CSV file or other format that they can take with them, or dump all the sensitive data into a database. The outlier detection algorithm can detect that access to this particular source is coming from a DBA who is not supposed to extract from operational data. There might also be an increase in exceptions (errors) as the user bypasses access control mechanisms and tries to learn the structure and privileges needed. The volume that is created by the download within the time window is probably exceptional as well. Any or all of these incidents could trigger an outlier indication.

A DBA attempts to put some buggy code into stored procedures that will "blow up" after they are gone to prove how much they were needed. The algorithm can identify that there is an exceptional volume of errors because DBAs are not supposed to access the stored procedure object. It can also detect if the DBA was temporarily granted elevated privileges to hack the stored procedure.

How outlier detection works

IBM Guardium Data Activity Monitoring includes an advanced Machine Learning algorithm to aid in the early detection of possible attacks during operation. The algorithm automatically models the normal patterns of a user's activity without requiring any supervision. That is, it is based mainly on unsupervised learning techniques. Clustering users according to behavior captures real effective groups, unlike rule-based security systems that rely on predefined groups.

The process checks not only user activities to see whether they are consistent with that user's previous activities, but also models that user's actions against activity of similar users. For example, a new behavior by a user, when compared with a cluster of similar users, might be consistent and normal for that type of user. This two-pronged approach reduces the number of false positives.

The outlier detection algorithm uses data that is being collected normally for security and compliance reasons. If data is not being audited already by a Guardium security policy, it is not available for Guardium to analyze.

Look at Figure 1 for a quick review of how Guardium collects audit data during its normal operations.

Figure 1. How Guardium collects and logs database event data
The text below describes the flow in the figure
The text below describes the flow in the figure
  1. A database user logs in and enters a database command.
  2. The Guardium software tap (S-TAP) on the database server captures the activity and sends a copy to the Guardium collector, which is a hardened hardware or software appliance.
  3. The analysis engine on the collector parses the command and breaks it down into its component parts for reporting or other analysis. That logged activity includes the user, the time they issued the command, where they logged in from (client IP), what source program they used (JDBC, for example), which database server they accessed, and what data was accessed.

As shown in Figure 2, outlier detection operates on a subset of audit data that is transparently extracted from the collected audit data regularly into a separate data mart. (This extraction occurs hourly, but might be configurable in the future.) There are two phases to outlier detection: training and analysis.

Figure 2. Outlier detection relies on both a training phase and an analysis phase
The text above and below describes the flow in the figure
The text above and below describes the flow in the figure

Assuming the model has been trained and analysis is active, new activities are compared against the existing model that represents the normal behavior. Events that fall outside of the established clustering are assigned an anomaly score and a reason for the score.

The next sections go into more detail on the training and analysis phases.

Training phase

During the training phase, the system learning the analysis uses the following aspects of user activity:

  • Who: Database user, operating system user, and client host name.
  • What: The object—table, stored procedure, view, or synonym-—and the command, such as select, update, delete, or insert.
  • When: Statically determined work hours (9 AM to 5 PM) and weekends (Saturday and Sunday) and off hours.
  • Where: The source program, the database name, and database server.

The normal behavior of the database activity and the database users (roles) are modeled from the perspective of each of the above aspects. The model also does user profiling based on their behavior.

The initial training needs 10 days of learning data, and the training runs weekly on Sunday. Thus, depending on when you start outlier detection, it will take 10 days plus the number of days until the next Sunday to begin seeing outliers. However, it takes 3-4 weeks of data to create an improved model based on the dynamic clusters of normal behavior. (You can reduce the initial training period by using a Guardium API command, but expect many more false positives until the models have had time to be fully trained.)

Example

Example: Figure 3 is a highly simplified representation of the output of a weekly outlier model for user 'assange' and his use of temporary tables. It provides an average for his use of temporary tables (18) during working hours and a standard deviation (3).

Figure 3. Weekly training generates enhanced models of normal behavior
The text above describes thee figure
The text above describes thee figure

Important: The past use of temporary tables is just one of many aspects of a user's behavior that is trained and modeled to enable comparison with new observed behavior. Other aspects include, for example, use of temporary applications, volume of activity, time of day/week of activity, rareness of activity, and others. Each of these aspects is handled similarly to what is shown above to create separate dependent or independent scores. The scores are combined and weighted to produce the final score.

Retraining: By default, after the model is built, the training phase is re-executed weekly on Sundays. This frequency was determined to avoid noise in the normal model that could be caused by too frequent training. If you get repeated false alarms, you can use the GUI or the Guardium API to tell the system to ignore certain events.

Analysis phase

The analysis phase is the runtime comparison of the learned models of typical historical behavior against new activities that are captured by Guardium. When the models have been adequately trained, analysis occurs on incoming data activity, and outlier data begins appearing in the Guardium interface and reports.

Example: Figure 4 shows that during normal work hours, user 'assange' increased his use of temporary tables beyond what was modeled during weekly training. The increase was significant enough to generate an outlier signal.

Figure 4. Hourly runtime analysis triggers an outlier event
The text above describes thee figure
The text above describes thee figure

Remember, the data mart is continually being fed new activity data based on your security policies and thus, new outliers can be detected based on the configured detection interval of 1 hour.

You can influence the accuracy of the analysis algorithms by indicating when specific events can be ignored. See Excluding events from outlier detection for more information.

Important: Any sensitive data objects (those in your sensitive object group) or admin users (for example, users in your privileged user group who can access sensitive data) get a higher boost in scoring. These are the groups that, by default, get a higher score. To add additional groups, see Customizing outlier detection .

Table 1. Default groups that get scoring boosts
Guardium Group IDDescription
1Admin Users
5Sensitive objects

Configuring outlier detection

Prerequisites

Outlier detection was introduced in Version 9.1 (Version 9, GPU 100). However, this article covers functions included up through V9.5 (Version 9 with GPU 530), which is the recommended release for using outliers. The following prerequisites and recommendations apply:

  • Version 9.5 patch 530, available from IBM FixCentral
  • It is strongly recommended that you enable outliers only on 64-bit collectors with a minimum of 24GB of RAM. (Note that outlier detection is not available on Aggregators or Central Managers.)

The presentation of the outlier results as described in this article is included with quick search. Therefore, you must ensure that quick search is enabled. Search is enabled by default on new installations of 64-bit systems, or you can use the command grdapi enable_quick_search. You can also review outliers in the Analytic Outliers List report.

CLI commands to enable and disable outlier detection

Log in to the collector as a user or administrator with the CLI role. Use the following GuardAPI command to enable the Outliers Detection function.

grdapi enable_outliers_detection schedule_interval=1 schedule_units=HOUR

Outliers will start extracting into the data mart on the current date every hour.

If you wanted to delay outlier detection, you could put a scheduled start date on the command:

grdapi enable_outliers_detection schedule_interval=1 schedule_units=HOUR schedule_start="2015-06-10 00:00:00"

To disable outlier detection (which disables data mart extraction, training, and analysis), enter the following command:

grdapi disable_outliers_detection

Interpreting the outlier results

After the analysis phase becomes active, outlier data populates the Guardium system with the results of its analysis of real-time events. You can see this information on the summary chart at the top of the Quick Search user interface.

To get to Quick Search, click the microscope in the Guardium banner.

Figure 5. Click search to open the Quick Search UI
Picture of search window and microscope
Picture of search window and microscope

The summary chart (Figure 6) includes a blue line (with circles) to indicate the volume of activity for the particular tab selected (activity, errors, or violations).

Outliers show up as red and yellow indicators that reflect the severity or total outliers score for a time interval. Red indicators reflect highly anomalous events that require immediate attention. Yellow indicators represent less extreme anomalies that warrant attention as part of other or related investigations. The outlier score is a calculated aggregate value based on the volume of outliers as compared to the predicted volume of outliers for a given time of day, the severity of individual outliers, and other factors.

For example, on a system that typically identifies zero outliers at 1 AM and 5-10 outliers at 1 PM during weekdays, the presence of two additional outliers (of two outliers at 1 AM or of 12 outliers at 1 PM) is more significant—and weighted more heavily—than the hourly total itself.

Figure 6. How outliers appear on the Quick Search UI
Outliers tab is clicked and include the details described in this section of the text.
Outliers tab is clicked and include the details described in this section of the text.

By hovering on one of the outlier icons, you can see the amount of activity in this time period and link directly to the detailed outliers or activities in the related time period.

Figure 7. Hovering on an outlier alert
One of the red outliers is hovered over. shows details and links to show activities and show outliers.
One of the red outliers is hovered over. shows details and links to show activities and show outliers.

Why is this activity an outlier?

The Outlier Reason column lists the reason or reasons that a particular activity was called out as an outlier:

ReasonDescription
RareA seldom seen condition
High VolumeAn unusually high incidence of a condition
NewA condition that is seen for the first time
ErrorAn unusually high incidence of error conditions

Outlier reasons are assigned in combinations when needed. For example, an outlier might be flagged as both rare and high volume if a seldom-seen condition suddenly occurs many times.

Customizing outlier detection

Although the Guardium outlier detection capability is designed to require minimal intervention to operate, there are some things that you can do to optimize the capability for your environment, such as adding additional groups of privileged users or sensitive objects, or by telling the system to ignore certain events.

In addition, although it is a bit more advanced, you can tweak other things that are related to the algorithm such as anomaly score thresholds.

Boosting scores of users and objects

As stated in the beginning of this article, there are two default groups that get scoring "boosts:" Admin Users and Sensitive Objects. However, you might already have additional groups set up as part of your normal operating procedures that could also be useful for outlier detection. For example, you might be maintaining a group of Suspicious Users or you might have several different groups of sensitive objects that are aligned with different applications.

You can use a grdapi command to add additional groups to the outlier detection algorithm.

Prerequisite: This command requires that you know the Guardium group ID. To get the group ID, you can use the command grdapi list_group_by_desc. For example, if you have a group that is named BadGuys, you can enter the following command to get its Guardium group ID:

grdapi list_group_by_desc desc="BadGuys"

After you have the ID (let's assume it is 1234), you can add it to the outlier detection as follows:

grdapi set_outliers_detection_parameter parameter_name="privUsersGroupIds" parameter_value=1234

You can do the same thing with sensitive objects:

set_outliers_detection_parameter parameter_name="sensitiveObjectGroupIds" parameter_value=333,156

Excluding events from outlier detection

If you want to exclude events from outlier detection, such as activity from test applications, you can right-click on a particular outlier and select Ignore.

Figure 8. Right-click on an outlier to invoke actions
Menu includes show related activity, show related errors, show related violations, add as filter, ignore and drill down reports.
Menu includes show related activity, show related errors, show related violations, add as filter, ignore and drill down reports.

You can ignore the entire event as is, as shown in Figure 9:

Figure 9. All event conditions
Criteria include time, date, db user, client ip, server program , source program, database, object verb, and number of instances.
Criteria include time, date, db user, client ip, server program , source program, database, object verb, and number of instances.

Or you can widen the scope by deleting specific event parameters. For example, if you want to ignore the source program MIPGE001 when running against SYBASE on a particular database server host, you would remove all the other parameters and click OK.

Figure 10. Click on red X to remove a criteria
Conditions now include only nclude source program, server, and database.
Conditions now include only nclude source program, server, and database.

Note: The values for date, time, and instances are ignored and do not affect exclusion criteria.

This feedback is recorded and can be reported on in the Analytic User Feedback report, as shown in Figure 11. The first line in the report shows what it looks like if you select an activity with no excluded criteria. The second line shows what it looks like if you select a subset of fields as criteria.

Figure 11. Analytic User Feedback report
report contains event criteria columns and a feedback type column that contains Ignore.
report contains event criteria columns and a feedback type column that contains Ignore.

If your user feedback includes a single criteria (user, server IP, database, and so on) it automatically populates one of the existing analytic exclusion groups as well:

  • Analytic Exclude DB User
  • Analytic Exclude OS User
  • Analytic Exclude Server IP
  • Analytic Exclude Service Name
  • Analytic Exclude Source Program

For example, if you delete all criteria except for Application Source Program from the Outlier Response window, you could go to the Group Builder, edit the Analytic Exclude Source Program group, and see the item that you entered right there, as shown in Figure 12

Figure 12. Guardium Group Builder
group name is analytics exclude source program and contains MIPGE001 as member of the group.
group name is analytics exclude source program and contains MIPGE001 as member of the group.

Of course, you can also use all the power of the Group Builder to populate the group in bulk, including populating from a query.

You can also use Guardium APIs to populate groups with single exclusion criterion:

grdapi create create_member_to_group_by_desc desc="Analytic Exclude Source Program" member="MIPGE001"

To include previously ignored events, view the Analytic User Feedback report, double-click the previously ignored event, and select Invoke > delete_analytic_user_feedback.

Figure 13. Deleting an event from the ignored events
described above.
described above.

You have the choice of invoking the deletion now or adding the generated command to a script to run later.

Additional configurations and customizations

We've already suggested the user of the Guardium API set_outliers_detection_parameter for a few different scenarios, such as adding additional user groups or sensitive objects for outlier detection consideration. Other aspects of outlier detection can be modified with this API, including increasing or decreasing the amount of time for training, alert thresholds, and more. You can see current settings by entering:

grdapi get_outliers_detection_info

The parameters include:

ParameterDescription
cleanupKeepDaysThis is how many days to retain model data on the collector. The default is 90 days. This is not training data, which is gathered weekly and then removed. This is the model data that is created from previous trainings.
sensitiveObjectGroupIdsThe Guardium group IDs for objects (tables, views, and so on) to receive scoring boosts.
privUsersGroupIdsThe Guardium group IDs for database users to receive scoring boosts.
minimalRequiredTrainPeriodThis is the default training period to build the initial model. The default is 10 days, but you will get more accurate models and fewer false positives if you do a longer time period, such as 30 days.

Customizing scoring behavior (advanced users only)

The following API parameters are related to scoring. We recommend that you modify only the behavior of outlier scoring under the direction of someone knowledgeable in mining.

  • intervalThreshold
  • alertThreshold
  • alertRarityThreshold
  • alertVolumeThreshold
  • minNumOfIntervalsForAlerts

Before going into detail on each of these parameters, let's go into a little more detail on how scoring works.

Intervals represent an hour of activity of some user or database. Let's call these user-intervals and database-intervals. Each interval gets an anomaly score according to how anomalous the user or database's behavior was during the interval.

The GUI for outlier detection shows only user- and database-intervals that get scores higher than intervalThreshold. The reason the interval is deemed anomalous is the activity of the user or database during that hour. The specific actions that make up this hourly activity are also scored. When the GUI indicates that an interval is anomalous (this is called an "alert," which is not the same as Guardium real-time or threshold alerts), it also lists the highest scoring activities during the interval. Only activities that received a score higher than alertThreshold are displayed.

Usually, if an interval has a high enough score that it causes an alert, then there would be some high scoring activities in the interval that would also be displayed. But this is not always the case. For example, an interval could be anomalous due to an abnormally high volume of activities, even if each activity itself is normal.

As described in Why is this activity an outlier?, the GUI also shows the non-mutually exclusive reasons why an activity is deemed anomalous: rare, high-volume, new, and error. The final activity score is affected by several scores, including scores that relate to these four causes. The GUI indicates "rare" as a reason if the corresponding rarity score is higher than alertRarityThreshold. The GUI indicates a "high volume" reason if the corresponding volume score is higher than alertVolumeThreshold.

The anomaly detection is based on comparing current activity against historic activity of the user/database. For this detection to have significant statistical value it is necessary to learn from enough historic behavior. minNumOfIntervalsForAlerts defines how much is enough in terms of intervals (that is, hours) of activity of a user/database before corresponding alerts are given.

You can change the default values dynamically using the API. The defaults are:

DefaultDescription
intervalThresholdAnomaly score threshold for intervals. Valid value is between 0 and 1. The default is 0.999.
alertThresholdAnomaly score threshold for activities in anomalous intervals. Valid value is between 0 and 1.The default is 0.99.
alertRarityThresholdThreshold determining when an anomalous activity is shown as "rare." Valid value is between 0 and 1. The default is 0.98.
alertVolumeThresholdThreshold determining when an anomalous activity is shown as "high volume." Valid value is between 0 and 1. The default is 0.9998.
minNumOfIntervalsForAlertsIf the number of intervals in training is below this number, no alerts are generated. The default is 100 intervals (hours). Valid value is an integer greater than or equal to 0.

Operational considerations

This section describes more about how you can incorporate Guardium capabilities to integrate outlier detection with your operational procedures.

Use distributed reporting to view outliers from multiple collectors

To view consolidated outliers data from all collectors or from a group of collectors, you can create a distributed report based on the existing Analytic Outliers List report that is shown in Figure 14.

Distributed reporting is illustrated in Figure 14. Basically, each collector sends its data to the Central Manager on a scheduled basis. (There is also an option to create an online version that allows for adhoc viewing of the centralized report data.)

As input, you need the group of collectors that include outlier data. For details on creating distributed reporting, see the product documentation on Central Management. A direct link is included in Related topics.

Note: If you are using Guardium V10, outliers are already consolidated across collectors using Quick Search for Enterprise.

Distribute report data using workflow automation

As with any report in Guardium, you can set up an automated process for distributing and reviewing outlier report data. This is sometimes known as compliance workflow automation. Use the Audit Process Builder in the Guardium UI to create this process, including appropriate receivers, and add the Outlier report as a task. For more details on creating an audit process, see the link in Related topics.

Be aware of retention periods

Because outliers alerts (algorithmic output data) are associated with both Quick Search indexes also written to the Guardium repository, outliers alerts are impacted by the retention periods for both Quick Search index files (default is 3 days) and for the Analytics Outliers information stored in the Guardium database (default is 60 days). Note also that Quick Search is impacted by unit utilization thresholds including disk space and it is possible that data could be purged more frequently or that quick search would stop indexing altogether if there are issues with disk space.

Set up correlation (threshold) alerts

Because outlier detection is a separate process from security policy rules and enforcement, you cannot set up real-time alerts on them. However, because outlier data is included in reports, you can create a correlation alert. A correlation alert is triggered by a query that looks back over a specified time period to determine whether the alert threshold has been met. A prerequisite is to ensure that you have enabled Anomaly Detection in the Guardium Administration Console.

For example, you can create an alert based on the query that is used in the Analytic Outliers Summary by Date report.

Figure 14. Analytic Outliers List Report
described above.
described above.

Assume that you want alerts that are written to syslog or sent using email. You can create an alert that runs this report query periodically and sets up the alert to be fired whenever there are one or more lines in the report that have an Anomaly Score greater than or equal to 99 over the past 4 hours. Instructions for creating correlation alerts are in the product documentation (see Related topics).

Conclusion

We hope that you find this new use of data mining in Guardium as exciting as we do. The product and research teams continue to work closely together to enhance the algorithms as we get more feedback from customers, and enhancements will continue to roll out over time.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Security, Information Management
ArticleID=1010954
ArticleTitle=Use Guardium outlier detection to detect hidden threats
publish-date=07142015