Apache Ranger

Learn how to enable Apache Ranger plug-in for HDFS Transparency.

Make sure to meet the following prerequisites before you enable the Apache Ranger plug-in for HDFS Transparency:

  • Set up Apache Ranger according to its installation instructions.
  • Install a relational database management system (RDBMS) supported by Apache Ranger, such as MySQL or MariaDB.
  • Verify that Ranger Admin, Ranger Usersync, and Ranger TagSync are successfully installed and without errors.
  • Even though not mandatory for installing and using Apache Ranger, it is strongly recommended to enable Kerberos in your Hadoop. This data security tool ensures that all requests are authenticated, which is very important for authorization and auditing. Without Kerberos, the users would be able to impersonate other users and workaround any authorization policies.
  • Make sure that Apache Solr is working well for Apache Ranger. When properly configured, Apache Solr is used by Apache Ranger to store audit logs; Apache Solr also provides a search capability of the audit logs through the Ranger Admin GUI.
  1. Stop the HDFS Transparency by using the following command:
    # mmhdfs hdfs stop
  2. To enable Apache Ranger for HDFS Transparency, log in to one of the HDFS Transparency nodes and change the configuration as described in this step.

    For hadoop-env.sh, set the following configuration:

    Note: Based on your environment, substitute the correct path to the Apache Ranger ranger-hdfs-plugin library.
    for f in <ranger_hdfs_plugin_directory>/lib/*; do 
      export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f 
    done
    
    for f in /usr/share/java/mysql-connector-java.jar; do
      export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
    done

    For core-site.xml, set the following configuration:

    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>
    RULE:[2:$1@$0](rangeradmin@<REALM_NAME>)s/(.*)@<REALM_NAME>/ranger/
    RULE:[2:$1@$0](rangertagsync@<REALM_NAME>)s/(.*)@<REALM_NAME>/rangertagsync/
    RULE:[2:$1@$0](rangerusersync@<REALM_NAME>)s/(.*)@<REALM_NAME>/rangerusersync/
    ……
    DEFAULT
      </value>
      <final>false</final>
    </property>
    

    For hdfs-site.xml, set the following configuration:

    <property>
      <name>dfs.namenode.inode.attributes.provider.class</name>
      <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
       <final>false</final>
    </property>
    
  3. Copy the following configuration files from the Apache Ranger installation directory to an HDFS Transparency node configuration directory (/var/mmfs/hadoop/etc/hadoop). These configuration files are generated by the enable-hdfs-plugin.sh script when the Apache Ranger plug-in is enabled.
    • ranger-hdfs-audit.xml
    • ranger-hdfs-security.xml
    • ranger-policymgr-ssl.xml
  4. To synchronize the configuration in all the HDFS Transparency nodes, issue the following command:
    # mmhdfs config upload
  5. Create ranger, rangertagsync, and rangerusersync using the gpfs_create_hadoop_users_dirs.py script.

    Log in to a CES HDFS NameNode and run the following commands:

    # /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-custom-hadoop-user-group ranger
    
    # /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-custom-hadoop-user-group rangertagsync
    
    # /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-custom-hadoop-user-group rangerusersync
    
  6. To ensure that changes are effective, start the HDFS Transparency by using the following command:
    # mmhdfs hdfs start