Linux-UNIX: Hadoop integration with Cloudera Navigator

Learn how to integrate Hadoop with Cloudera Navigator 5.8 - 6.3.x, Cloudera's native data governance solution.

Guardium provides the capability to subscribe to audit events when Cloudera Navigator is configured to publish audits to Kafka. Audited activity is sent to a Kafka cluster where the Guardium S-TAP® consumes the events and sends them to the Guardium collector to be parsed and logged. The data is highly protected in the hardened Guardium system. All normal Guardium functions can be used, such as real-time alerting and integration with SIEM, reporting and workflow, and analytics.

Why integrate with Navigator? A key reason is the fact that many organizations are now using SSL encryption for their clients to access Hadoop data. By using this integration, events can be monitored even though the wire traffic is encrypted. Support for Kerberos and LDAP authentication is also easier with the Navigator integration since no special keytab configuration is required.

The Cloudera version that is referenced in this content is https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cn_iu_introduce_navigator.html

Guardium functions supported with Navigator

  • Audit SSL-encrypted activity
  • Audit Kerberos-authenticated traffic - With Navigator integration, you do not need to propagate keytabs for Guardium.
  • Audit Hive, HBASE, HDFS, Impala
  • Audit Solr
    Note: For Navigator, only commands that are issued with solrctl are collected, not normal user activity.
  • Audit Sentry
  • Audit exceptions

Limitations and restrictions

  • Guardium-based blocking does not support any Hadoop components with Cloudera Navigator integration.
  • The supported policy rule actions are:
    • Alert per match
    • Log only
    • Skip logging
    • Alert daily
    • Alert only
    • Alert per time granularity
    • Log full details with replaced values
    • Log masked details
    • No parse
    • Record values separately
    • Quick parse
    • Quick parse no fields
  • Capturing failed logins from Hue requires Cloudera Manager 5.8 or later.
  • Access to “new projects and documents” from Hue are not captured by Navigator and thus are not captured by Guardium.
  • In Hbase, Navigator access through Hue reports only the DB User of HBASE or HUE.
  • In HBase, Navigator captures the Grant commands but not the user that is granted.
  • When using Impala statements with Cloudera Navigator, statements across multiple lines are merged into a single-line statement. As a result, statements that use single-line comments (that is, comments that begin with two dashes [--]), can cause errors. To use comments with Impala statements, use one of the following techniques:
    • Use comments that begin with two dashes (--) only at the very end of the statements.
    • Wrap all comments in slash-asterisk format (/* comment */). For example,
      select * from user_table from all_users
       /* comment appearing within the statement */
      where user_age > 70 and user_location in ('TAIWAN', 'JAPAN');

Prerequisites

Guardium integration with Cloudera Navigator requires the following minimum software release levels:
  • IBM Security Guardium and S-TAP at V10.1.2 or later.
  • CDH 5.7, and Cloudera Manager 5.8, and the Kafka included with those versions.

Architecture and data flow

Instead of locating an S-TAP on the Hadoop servers, the Cloudera Manager agent sends audit events from the Hadoop component logs to the Navigator audit server. At that point, Navigator writes the audit events to its audit database. To integrate with Guardium, set up Navigator to publish. Guardium gathers the event records from Kafka.

Figure 1. S-TAP reads Navigator audit events off the Kafka cluster
S-TAP reads Navigator audit events off the Kafka cluster
Configuration is flexible. You can install the S-TAP:
  • On a node in the Hadoop cluster.
  • On a separate server outside of the Hadoop cluster, if that server has network connectivity to the Kafka cluster and the Guardium appliance.
Configuring multiple STAPs for the same Kafka cluster and specifying the same kafka_group_name allows the Kafka cluster to balance the load of the topics and partitions over the S-TAPs.
In this configuration, Navigator produces the log events for each Hadoop component, and the S-TAP consumes those events. Using the Guardium user interface, you specify the message topic identifier that Navigator uses so that the Guardium S-TAP knows which events it is supposed to pick up.
Figure 2. Audit events are written to the Kafka cluster where S-TAP picks them up
Audit events are written to the Kafka cluster where S-TAP picks them up
Recommendation: Use a secure Kafka cluster to ensure that your audit events are protected.
A few words about Kafka. Apache Kafka is a distributed messaging system.
  • A server in a Kafka cluster is a Broker.
  • Message writers are called Producers.
  • Message readers are called Consumers.
  • A message category is called a Topic.