Using VA with Cloudera

Learn how to use Guardium vulnerability assessments with Cloudera distributions of Apache Hadoop.

Cloudera Manager

Datasource setup

The Cloudera Manager datasource uses the Cloudera Manager Java API for a connection. It does not use JDBC.

The Cluster Name must be defined in the datasource GUI. The Cluster name is displayed on the left side of the Cloudera manager GUI.

To run Vulnerability Assessment tests for Cloudera Manager, you need to define a datasource user with the Read-Only role for most of the Vulnerability Assessment tests. Few Vulnerability Assessment tests, the datasource user needs to have the Cluster Administrator role as the minimum privilege to run the tests.

The following Vulnerability Assessment tests require the datasource user to have the Cluster Administrator role:

  1. Authentication Backend Order
  2. HTTP port for Admin Console
  3. HTTPS port for Admin Console
  4. Use TLS Authentication of Agents to server
  5. Use TLS Encryption for Admin Console
  6. Use TLS Encryption for Agents

This information is also available in the Cloudera Manager gdmmonitor script, /log/debug-logs/gdmmonitor_scripts/gdmmonitor-Cloudera-Manager.sql.

If SSL is enabled, select Use SSL and Import server ssl certificate checkboxes.

CAS Database Instance setting

The Account should be root.

The Directory needs to be defined as the Cloudera manager installation path. For example, installpath=/opt/cloudera

Hive

Datasource setup

Use the Apache Hive JDBC driver 1.1.1.

Kerberos - The username and password must be a valid Kerberos user ID and password. This user ID and password are also used for CA.
Important: Test if your Kerberos username and password can be used to login to the Hive beeline command line.

Make sure that you create a Kerberos Configuration that defines your KDC and Realm for your appliance. In Guardium, click SetupTools and ViewsViewsKerberos Configuration. If no Kerberos Configuration is created, then click the Add icon to create a new Kerberos Configuration.

You can now select the Kerberos Configuration to configure your datasource setup.

If SSL is enabled, select Use SSL and Import server ssl certificate checkboxes.

Remember: Hive can support either LDAP/SSL or Kerberos, not both.

CAS Database Instance setting

  1. The Directory needs to be defined as the Cloudera manager installation path. For example, installpath=/opt/cloudera

  2. If HDFS is enabled for Kerberos, the datasource username and password must be a valid Kerberos user ID and password. CAS scripts use it for obtaining a Kerberos ticket.

  3. The Account must be root. For certain parameter tests that require CAS, the CAS user must be a root user to access the real-time configuration under the Cloudera agent process directory, /var/run/cloudera-scm-agent/process/.

Important: Guardium does not modify or alter your configuration data.

For Hive

For the Privilege tests, the datasource account must be a member of the Sentry Admin group. See the Hive gdmmonitor script for steps to check the Sentry Admin group.

When you set up Hive datasources, you can perform only a JDBC test connection when the datasource is pointing to your Hive server2. For all other Hive datasources, you can clone this specific datasource by using nodename where the Cloudera service is installed. Make sure that the cloned datasource has a valid username and password like the Hive server2 datasource. For these datasources, you cannot perform a datasource test connection. However, when Kerberos is enabled, Guardium relies on the accuracy of the username and password from the datasource to perform a Kerberos connection by using CAS.

Vulnerability Assessment Tests

Sentry Services needs to be installed and configured for the Hive Privilege tests. Without Sentry, data is not secure and everyone can connect to Hive and access data.

The Vulnerability Assessment CAS test for HDFS parameters are from configuration files under the Cloudera agent process directory, /var/run/cloudera-scm-agent/process/. The folder names inside these process directories change every time the Cloudera agent services are started.

Some of the HDFS parameter CAS tests require the datasource system to be a specific node configuration. For example, NameNode or DataNode. Some CAS tests require Yarn, MapReduce, or Hive Server to be installed on the datasource system. Select the tests carefully for your assessment based on your datasource system configuration. If the requirements are not met for the test, then an error is displayed with a recommendation to run these tests on the correct Cloudera services. The requirements are also mentioned in the test description.

When you create a Hive datasource, you need to have one datasource for each Cloudera service, NameNode, DataNode, HiveServer2, Hive metastore, Yarn Node Manager, and Yarn ResourceManager.

Regardless of the number of nodes in your cluster, if you have Guardium Hive datasources that cover all of these services, then you can properly setup your environment to run Vulnerability Assessment.