Using VA with Cloudera
Learn how to use Guardium vulnerability assessments with Cloudera distributions of Apache Hadoop.
- Cloudera Manager
-
Datasource Setup
The Cloudera Manager datasource uses the Cloudera Manager Java API for a connection. It does not use JDBC.
The Cluster Name must be defined in the datasource GUI. The Cluster name is the Cluster display name in the Cloudera manager GUI on the left-hand side.
To execute Vulnerability Assessment tests for Cloudera Manager, you need to define a datasource user with the Read-Only role for most the Vulnerability Assessment tests. Then, there are a small number of Vulnerability Assessment tests which require the datasource user have the Cluster Administrator role as the minimum privilege to run the tests.
The following Vulnerability Assessment tests require the datasource user to have the Cluster Administrator role:
- Authentication Backend Order
- HTTP port for Admin Console
- HTTPS port for Admin Console
- Use TLS Authentication of Agents to server
- Use TLS Encryption for Admin Console
- Use TLS Encryption for Agents
This information is also available in the Cloudera Manager gdmmonitor script (/log/debug-logs/gdmmonitor_scripts/gdmmonitor-Cloudera-Manager.sql).
If SSL is enabled, check Use SSL and check Import server ssl certificate.
CAS Database Instance setting
The Account should be root.
The Directory will need to be defined as the Cloudera manager install path. For example: installpath=/opt/cloudera
Example of Cloudera Manager datasource settings.
- Hive
-
Datasource Setup
Use the Apache Hive JDBC driver 1.1.1.
Kerberos - The User Name and Password must be a valid Kerberos User ID and Password. It is also used for CA. Test to make sure your Kerberos User ID and Password can be used to login to the Hive beeline command line.
Make sure you have already created a Kerberos Configuration that defines your KDC and Realm for your appliance. On the Guardium GUI, go to Setup > Tools and Views > Kerberos Configuration. If no Kerberos Configuration has been created, then click on + icon to create a new Kerberos Configuration.
After you have created a Kerberos Configuration, you can select it to configure your datasource setup.
If SSL is enabled, check “Use SSL” box and check the “Import server ssl certificate” box.
Note: Hive can only support either LDAP/SSL or Kerberos, not both.
CAS Database Instance setting
-
The Directory will need to be defined as the Cloudera manager install path. For example: installpath=/opt/cloudera
-
If HDFS is enabled for Kerberos, the Datasource User Name and Password must be a valid Kerberos User ID and Password. CAS scripts uses it for obtaining a Kerberos ticket.
-
The Account must be root. For certain parameter tests that require CAS, it is important that the CAS user is root in order to access the real-time configuration under the Cloudera agent process directory (/var/run/cloudera-scm-agent/process/).
Note: Guardium does not in any way modify or alter your configuration data.For Hive
For the Privilege tests, the datasource account must be a member of the Sentry Admin group. See the Hive gdmmonitor script for steps to check the Sentry Admin group.
When setting up Hive datasources, you can only perform a JDBC test connection when the datasource is pointing to your Hive server2. For all other Hive datasources, you can clone this specific datasource using nodename where the Cloudera service is installed. Make sure the cloned datasource has a valid Username and Password just like the Hive server2 datasource. For these datasources, you cannot perform a datasource test connection. However, Guardium relies on the accuracy of the Username and Password from the datasource to perform a Kerberos connection using CAS when Kerberos enabled.
-
- Vulnerability Assessment Tests
-
The Hive Privilege tests require Sentry Services to be installed and configured. Without Sentry, there is no security. Everyone can connect to Hive and access data.
The Vulnerability Assessment CAS test for HDFS parameters are from configuration files under the Cloudera agent process directory (/var/run/cloudera-scm-agent/process/). The folder names inside these process directories change every time the Cloudera agent services are started.
Some of the HDFS parameter CAS tests require the datasource system to be a specific node configuration (for example, NameNode or DataNode). Some CAS tests require Yarn, Mapreduce or Hive Server to be installed on the datasource system. Please select the tests carefully for your assessment based upon on your datasource system configuration. If the requirements are not met for the test, then the test will error with the recommendation to execute these tests on the correct Cloudera services. The requirements are also mentioned in the test description.
When creating a Hive datasource, it is recommended to have one datasource for each Cloudera service (NameNode, DataNode, HiveServer2, Hive metastore, Yarn NodeManager and Yarn ResourceManager).
Regardless of the number of nodes in your cluster, if you have Guardium Hive datasources that cover all of these services, you then have properly setup your environment to run Vulnerability Assessment.
For example