InfoSphere Guardium data security and protection for MongoDB, Part 1: Overview of the solution and data security recommendations

Protect and secure a new generation of databases

This article series describes how to monitor and protect MongoDB data using IBM InfoSphere® Guardium®. Part 1 describes an overview of the solution, the architecture, and the benefits of using InfoSphere Guardium with MongoDB. The value of the fast growing class of NoSQL databases such as MongoDB is the ability to handle high velocity and volumes of data while enabling greater agility with dynamic schemas. Many organizations are just getting started with MongoDB, and now is the time to build security into the environment to save time, prevent breaches, and avoid compliance violations. This article series describes configuration of the solution, sample monitoring use cases, and additional capabilities such as quick search of audit data and building a compliance workflow using an audit process.

Indrani Ghatare (indrani@us.ibm.com), Software Engineer, InfoSphere Guardium, IBM

Photo of Indrani GhatareIndrani Ghatare has been a software developer at IBM for more than 12 years. Currently, Indrani is a member of the Research and Development team for InfoSphere Guardium Collector. Indrani has worked on the development of the MongoDB parser and the logger component of Guardium Collector.



Matt Kalan (matt.kalan@10gen.com), Senior Solutions Architect, 10gen

Photo of Matt KalanMatt Kalan is a senior solutions architect at 10gen, based in New York, with extensive experience helping more than 200 customers in financial services and other industries solve business problems with technology. Before 10gen, Matt grew Progress Software's Apama Algorithmic Trading and Complex Event Processing (CEP) Platform business in North America and later sold broader operational intelligence solutions to financial services firms. He previously worked for Caplin Systems selling solutions to stream real-time market data over the web to FX and FI portals, and for Sapient providing consulting services to global 2000 clients.



Sundari Voruganti, QA Specialist, InfoSphere Guardium, IBM

Sundari Voruganti photoSundari Voruganti is a member of the InfoSphere Guardium QA team at IBM Silicon Valley Lab. Sundari has been with IBM over a decade and has a diverse background in both engineering as well as customer enablement roles. As a passionate technologist, she loves the challenge of learning and working with new technologies as well as helping customers understand and implement IM solutions. Sundari has a double Masters in Computer Science from Bangalore University and University of Alberta.



Kathryn Zeidenstein, InfoSphere Guardium Evangelist, IBM

Photo of Kathryn ZeidensteinKathy Zeidenstein has worked at IBM for a bazillion years. Currently, she is working as a technology evangelist for InfoSphere Guardium data activity monitoring, based out of the Silicon Valley Lab. Previously, she was an Information Development Manager for InfoSphere Optim data lifecycle tools. She has had roles in technical enablement, product management and product marketing within the Information Management and ECM organizations at IBM.



30 October 2013 (First published 06 June 2013)

Also available in Chinese Portuguese

Note: In October 2013, this article was updated to reflect a new recommended S-TAP configuration.

Introduction

The authors of this article, from IBM and 10gen, The MongoDB Company, collaborated on certifying InfoSphere Guardium for MongoDB. This was a great exercise, as we learned a lot about each others' products and use cases. We really wanted to write something that would help people who are familiar with relational database and InfoSphere Guardium understand a little more about why organizations are adopting MongoDB for certain types of applications. From an InfoSphere Guardium point of view, our infrastructure has been extended to cover the different message protocols from MongoDB, but administrators and information security personnel will be following the same procedures for enabling MongoDB for auditing, reporting, and so on.

We also hope to provide MongoDB users with at least some understanding of the capabilities in InfoSphere Guardium to help their organizations meet audit and compliance objectives, guard against data leaks, and to help uncover risky activities such as the use of server-side JavaScript.

What's new in InfoSphere Guardium GPU Patch 50?

Although this article is focused on MongoDB support, InfoSphere Guardium 9.0 GPU Patch 50 includes many additional capabilities. This patch is planned to be available by the end of June 2013, and enhancements include:

  • Expanding data activity monitoring to new and expanding Big Data and NoSQL platforms, including MongoDB, Greenplum HD and Greenplum DB, and Hortonworks Data Platform
  • Enhancing reporting capabilities to speed enterprise wide reporting, and the use of a new federated query capability to improve the performance of audit processes in federated environments
  • Reducing time to discovery with speedy faceted search on database activities, exceptions, and policy violations
  • Detecting more suspicious activities with more built-in best practice viewers (reports)
  • Supporting Guardium appliances on 64-bit architecture for improved scalability
  • Consolidating all agents (S-TAP, Guardium Installation Manager, and Discovery) into a single installer to simplify deployments and reduce the time to get up and running
  • Simplifying automation and maintenance with 3-click API assignment to reports
  • Taking Guardium with you on your mobile device to manage your to-do lists and keep an eye on overall security health

What's new in 9.1 (GPU 100)

InfoSphere Guardium 9.1 was released on October 25th, 2013. Enhancements include support for SAP HANA, enhanced support for Hadoop, and outlier (anomaly) detection. For more information about enhancements in 9.1, see the detailed Release Notes or the slides and replay from the Guardium Tech Talk on this subject.

Accomplishing the objectives was exciting, but also required a lot more space than what a single article could cover. Thus, to provide broad coverage for some topics as well as step-by-step instructions for others, we have created a series of three articles:

  • Part 1, this article, introduces MongoDB and InfoSphere Guardium, briefly describes security best practices for MongoDB, and explains the benefits of the joint solution. The article also describes the architecture and walks through the steps of how data flows to InfoSphere Guardium and how it is processed in the InfoSphere Guardium appliance.
  • Part 2 tackles the nuts and bolts of configuring the InfoSphere Guardium monitoring agent for MongoDB. It also walks through how to use security policies to accomplish some common use cases, including monitoring privileged user access, alerting, and repeated failed logins.
  • Part 3 describes some of the features of InfoSphere Guardium that make it such a popular data security and compliance solution for the enterprise, including blocking, viewing and reporting on audit data, and creating audit processes to automate a compliance workflow.

Introducing MongoDB

The proliferation of data from endpoint devices, growing user volumes, and new computing models like cloud, social business, and big data have created demands for data access and analytics that can handle these staggering amounts of data. The value of the fast-growing class of NoSQL databases is the ability to handle the much higher velocity and volumes of data these trends demand while also enabling greater agility with dynamic schemas. Dynamic schemas can, for example, enable organizations to react to changing regulations quickly.

MongoDB, in particular, gives you those benefits while also providing rich querying capabilities necessary for a general purpose operational data store. It provides greater developer productivity and agility through a document model; it can be used in situations appropriate for relational or typical NoSQL databases, or new ones altogether.

Figure 1 shows a high-level picture of the MongoDB architecture and introduces some key concepts that are important to understand if you are responsible for configuring InfoSphere Guardium for MongoDB. (This figure represents a sharded environment. MongoDB can also be run stand-alone.) MongoDB uses auto sharding for horizontal scalability and replica sets for high availability. Clients connect to the MongoDB process (mongos in a sharded environment), optionally authenticate themselves if security is turned on, and perform inserts, queries, and updates of the documents in the database. The mongos routes the query to the appropriate shards (mongod) to fulfill the particular command.

Figure 1. MongoDB architecture provides a scalable environment
clients connect to a mongos (query router) and that gets split out to shards configured as replica sets.

Because it uses a document data model, MongoDB is a type of NoSQL database called a Document Database. Documents are modeled in a hierarchical fashion using JSON (JavaScript Object Notation), so they simply include name-value pairs. The values can be an individual typed value, an array, or documents themselves (and combinations of those) to match any object you have in your application. This model can provide fast queries because the data can be stored next to each other in a document rather than spread out among multiple tables and requiring a join. Listing 1 shows an example of a JSON document.

Listing 1. Sample JSON document
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
"awards" : [
           {
             "award" : "W.W. McDowell Award",
             "year" : 1967,
             "by" : "IEEE Computer Society"
           },
           { "award" : "Draper Prize",
             "year" : 1993,
             "by" : "National Academy of Engineering"
           }
]
}

This document structure lets you store data in MongoDB in a similar way to how you model objects in your application, so it provides great time-to-market and agility benefits (especially combined with having a dynamic, not predefined schema).

Behind the scenes, documents are stored in a binary JSON format (BSON) for efficiency, but you never have to deal with anything besides the JSON documents.

Table 1 compares relational and MongoDB concepts.

Table 1. Mapping relational concepts to MongoDB
Relational conceptMongoDB equivalentComment
Database/SchemaDatabaseContainer for everything else
TableCollection
RowDocument
ColumnFieldFields are defined at the document level, not at the collection. In other words, there is no predefined schema as there is for relational databases.
IndexIndex

The rest of the features of Mongo are intended to allow you to use this document structure and your favorite programming language without spending a lot of time administering the database. High-availability, performance, and auto-sharding (partitioning) are built in a way that the application does not have to worry about them much so that application development teams can focus on satisfying business requirements quickly.


Security recommendations for MongoDB

As more enterprises are looking at MongoDB to fulfill specific application needs, they are likely to be confronted with the need to meet the security and compliance requirements that more established databases in their organization must do as well.

MongoDB has a good set of security best practices that are outlined on their wiki pages. (You can find links in Resources.) Also, to address some fundamental security pain points, MongoDB in its 2.4 release, delivered the following security enhancements:

  • Kerberos authentication (Enterprise Edition only) for enterprises that require this approach to be able to be integrated into their standard security systems
  • Role-based access control system for more granular control
  • Enhanced SSL support requirements for clients

Now look at some best practices for network configuration, authentication, use of roles, and preventing JavaScript injection attacks.

Network configuration

To reduce the security risk for MongoDB, the recommendation is to run MongoDB and all its components in a trusted environment, with appropriate firewall controls, tight bindings between MongoDB components, and specific IP ports. When you are running a sharded cluster, query traffic is generally routed through a separate process known as mongos.

As shown in Figure 2, it is possible for privileged users or others to log directly into the mongod instances in the shards. For this reason, our recommendation is to not only use firewalls to lock down access to the shards as much as possible, but to also implement InfoSphere Guardium monitoring on the shards to catch that activity. As you will learn in Part 2 of this article series, you can configure Guardium in such a way to avoid double counting message traffic and yet still catch local traffic on the shards and mongos servers.

There are actually many options for setting up monitoring, but this is a reasonable option for efficiency, while providing necessary monitoring capabilities for administrators.

Figure 2. One recommended configuration
Use firewalls to prevent client access to the mongod shards and route through mongos instead. For privileged users using clients or local connection, Guardium will monitor that activity.

In addition, it is a security best practice to run with non-default ports. For clarity of understanding, we do use the default port configurations (27017 for mongos and 27018 for shard servers), but InfoSphere Guardium will work fine when non-default ports are used.

Authentication

An obvious recommendation is to run MongoDB with authentication enabled. (By default authentication is not enabled.) Authentication is also critical from a monitoring and audit perspective to enable InfoSphere Guardium to pick up the database user name. If you do not have authentication enabled you will see the string "NO_AUTH" in the DB USER name field of reports. (You may also rarely see NO_AUTH in those cases where InfoSphere Guardium starts monitoring in the middle of a session and does not pick up the DB user.)

From a MongoDB perspective, authentication prior to 2.4 is limited to basic authentication using a user name and password managed in Mongo only and not integrated into enterprise user management systems. After authentication there were minimal roles, only read-only or full access. In 2.4 Enterprise Edition, additional capabilities were added including support for Kerberos

Roles and role-based access

In 2.4, MongoDB supports many new roles, whose scopes can be roughly divided into server-wide roles and database-level roles. In both cases, there are roles that focus on user administration, cluster administration, and application access.

Why does it show SQL headings in the reports?

The default report headings do use SQL terminology. Because InfoSphere Guardium is heterogeneous, many different databases will use the same policies and reports across the enterprise. However, if you have specific MongoDB reports and you do not want to include SQL in the report heading, you can customize the report headings however you like.

Because some of these roles are basically equivalent to super users, it is important to ensure that those roles are carefully distributed and monitored.

Note that with InfoSphere Guardium, you can monitor and audit changes to the system.users collection for the environment or for any logical database so you can ensure that only proper authorizations are being issued. Figure 3 shows an example report in which you can see that the database user Indrani granted a read/write role to both Kathy and Sundari, which translates to inserts on the system.users collection in MongoDB.

Figure 3. Sample audit report showing roles being granted (insert documents into system.users)
object names are sundari and system.users and verb is insert.

Preventing JavaScript injection attacks

Traditional SQL-injection type attacks are not a significant issue with MongoDB because of its use of BSON (binary JSON) rather than strings. However, there are still some cases to beware of, including the use of the following operations that enable you to run arbitrary JavaScript expressions directly on the server:

  • $where
  • db.eval()
  • mapReduce
  • group

There are ways to mitigate the risk (including turning off all server-side scripting), but first you need to be able to identify where those operations are used. With InfoSphere Guardium, JavaScript in those operations is logged and reported as objects. The significance of this is that you can set alerts or policy violations to occur whenever these are accessed. This could be useful in testing environments to find and identify these risky uses and do the necessary code reviews before deploying into production. The sample report in Figure 4 (see larger image) shows that $eval is being reported as a JavaScript object. You see the full command text as well so you can see its use in context. If this is something you want to monitor over time, you can create a regularly scheduled report that would indicate whenever a new use of one of these operations occurs.

Figure 4. JavaScript for certain operations (such as db.eval) are logged in InfoSphere Guardium as objects
shows a dbeval example being logged with a JavaScript object in the report.

Benefits of using InfoSphere Guardium with MongoDB

InfoSphere Guardium is a vital and useful complement to native MongoDB capabilities for any organization that is interested in true database auditing and one that may have regulatory requirements to address such as the Payment Card Industry (PCI) or Sarbanes-Oxley (SOX) requirements. The native security and authentication capabilities in MongoDB, as in other databases, are not really adequate to address the multitude of regulations and compliance requirements around the world. Most of these requirements call for more robust accountability in terms of being able to log and verify who did what and when for a database transaction.

InfoSphere Guardium is deployed in both large and small organizations around the world to address the issues of data protection and compliance for a wide variety of databases. The solution is flexible, powerful, and effective. Benefits that MongoDB users can expect to see are:

  • InfoSphere Guardium logs very detailed and granular information, as you can see in the sample report shown in Figure 5 (see larger image), which includes details of the client and server IPs, the source program, the database user (if authorization is enabled), and the full command message (as it flows across the wire). In the sample below, Kerberos is being used as the authentication mechanism.

    Note that parts of the message are parsed and stored as a Verb (sometimes known as the Command) and Object, which means that these entities are items that can be used in specifying policy rules, as you will see in Part 2 of this series. For example, you can specify a rule that fires whenever a find is issued, or a rule that fires whenever anyone touches an object in the sensitive data objects group.

    Figure 5. Example of an audit report
    sample report that shows delete, find, and insert on mycollection. shows verb and object.
  • You can audit database exception conditions, such as failed authentications, that can be an indicator of a brute force attack. Part 2 of this series will show how to configure this.
  • You can optionally record the number of documents affected for read activity, which you can use to alert for unusually high numbers of downloads. Part 2 of this series will show an example of this.
  • You can block local user access to prevent the case where an administrator is reading the contents of sensitive data. You'll see an example of this in Part 3 of this series.
  • Real-time alerts can be specified for a wide variety of conditions, and you will learn how to configure those in Part 2 of this series. Organizations have a responsibility to do whatever they can to avoid embarrassing or damaging data breaches. Demonstrating compliance gets you part of the way there, but when breaches do occur, being able to detect and react quickly, within minutes or hours rather than days or weeks, can mean the difference between a hugely damaging loss and a minor inconvenience. Real-time alerting and alerts on threshold breaches help you detect suspicious behavior within seconds or minutes, rather than days, weeks, or even longer.

    Those alerts can be sent using email, or through SNMP to another monitor system. Built-in integration with IBM QRadar and HP ArcSight message types can also be used to automatically forward alert conditions from syslog to those systems.

  • Audit information must be stored for a defined period of time, sometimes years. InfoSphere Guardium is designed with these kinds of requirements in mind and provides secure archiving capabilities for this very reason.
  • Finally, demonstrating compliance can be time-consuming and burdensome, as these often require some level of regular review and signoff. InfoSphere Guardium not only lets you create the reports you need to satisfy audit requirements, it also has a robust workflow capability that integrates into your business processes and saves all signoffs and reviews as part of the audit trail.

Architecture of the solution

With its nonintrusive architecture (see Figure 6), InfoSphere Guardium provides full visibility into data activity requiring no configuration changes on the MongoDB cluster.

Figure 6. High-level architecture
mongo client connects to mongos, which connects to shards..each server (mongos and shards) have stap installed on them. stap forwards traffic to collector. from collector, you can generate real time alert or reports.

For those of you familiar with InfoSphere Guardium, you install the S-TAPs (lightweight, software agents) on the MongoDB servers. The S-TAP is a lightweight agent that sits in the operating system. When the Mongo server receives a data request from a client, the S-TAP copies the network packet and sends it to a hardened, tamper-proof hardware or software appliance known as a collector for parsing, analysis, and logging in the InfoSphere Guardium repository.

The real intelligence of the InfoSphere Guardium system is on the collector. That is where the message is broken down into its component parts and logged into an internal repository on the collector and any necessary actions are taken, such as generating a real-time alert, logging the activity, or blocking a particular local user. By offloading the bulk of the activity (parsing and logging) to a hardened collector, the performance impact on Mongo applications is minimized (not more than 2-4% in many cases), and you can effectively institute separation of duties.

The InfoSphere Guardium role-based web console provides centralized management of alerts, report definitions, compliance workflow processes, and settings (such as archiving schedules) without the involvement of MongoDB administrators, thus providing the separation of duties required by auditors and streamlining compliance activities.

High-level example of message flow from client to Guardium collector

For those of you new to InfoSphere Guardium, a basic understanding of how data flows through the system can help you effectively understand and use other parts of the system such as policy configuration, reporting, and alerting.

Figure 7. How a MongoDB command flows to the InfoSphere Guardium collector
see step by step text description. The example command is a test.creditcard.insert statement issued by Joe.

Refer to Figure 7 as we describe the flow:

  1. (Not shown in the graphic.) A new session starts when a user or application logs on to MongoDB. InfoSphere Guardium always logs the beginning and end of a session and, if the policy requires it, the activity that occurs in that session. (In Part 2 of this series, you will learn how to configure a security policy to ignore everything between the beginning and end of a trusted connection.)
  2. A user or application enters a MongoDB command.

    Note:The MongoDB client may do some transformations of what is actually entered (syntactic sugar and so on). InfoSphere Guardium collects only what actually flows on the wire. In this example, the user Joe is entering the following command:

    Listing 2. MongoDB command shown in this example
    test.CreditCard.insert({
        "Name" : "Sundari",
         "profile" : [
            {"CCN" : "11999002"},
            {"log" : ["new", "customer"]}
        ],
        });
  3. The BSON message flows to the Mongo server (mongos for sharded or mongod if local) and includes additional information in the network packet such as the name of the database user, the client IP, the server IP, and a timestamp.
  4. The S-TAP sitting on the mongo server intercepts and copies the message. The message flows over TCP/IP to the collector, which is listening on port 16016.
  5. At the collector, the analysis engine (sometimes known as the "sniffer") recognizes that this is a MongoDB message and parses it accordingly. In our sample statement, the information is logged into the following entities.
    • Client/server entity: Joe is logged as the DB User
    • Command entity: INSERT
    • Object entity: CreditCard
    • Field entity: Name
    • Field entity: profile.CCN
    • Field entity: profile.log
    This is a vast simplification of what actually occurs on the collector, but it serves to provide a basic understanding.

Summary and what's next

In this first part of our three-part article series, we covered the basic groundwork of the joint solution, including providing an overview of both MongoDB and InfoSphere Guardium, and how InfoSphere Guardium works with MongoDB to provide significant value with regard to data protection and compliance. In addition, InfoSphere Guardium provides a robust and flexible infrastructure for automating auditing and compliance tasks to ease the burden on IT staff as much as possible.

In Part 2, we will tackle the nuts and bolts of configuring the S-TAP for MongoDB as well as walking through how to use security policies to accomplish some common use cases, including monitoring admin access, alerting, and repeated failed logins.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Security
ArticleID=932677
ArticleTitle=InfoSphere Guardium data security and protection for MongoDB, Part 1: Overview of the solution and data security recommendations
publish-date=10302013