Configuring Apache Knox

This section describes how to configure Apache Knox on CDP Private Cloud Base clusters integrated with CES HDFS.

The Apache Knox Gateway (or simply, Apache Knox) extends the perimeter security for Hadoop. By encapsulating Kerberos, Apache Knox eliminates the need for client software or client configuration for Kerberos; and thus simplifies the access model. For more information, see Apache Knox Gateway in Cloudera documentation.

To be able to use Apache Knox with HDFS Transparency, more configurations are needed, as described in the following steps.

  1. Configure HDFS Transparency:

    Set the hadoop.proxyuser.knox.groups parameter to * by using the following command:

    /usr/lpp/mmfs/hadoop/sbin/mmhdfs config set core-site.xml -k hadoop.proxyuser.knox.groups=*

    To upload the configuration, issue the next command:

    /usr/lpp/mmfs/hadoop/sbin/mmhdfs

    Then, restart HDFS Transparency services.

  2. Go to Cloudera Manager > Knox service, and set the WEBHDFS:url parameter to https://<myceshost>:50470/webhdfs.

    Where myceshost is the hostname corresponding to the CES IP configured for HDFS Transparency and 50470 is the default HTTPS port for NameNodes.

  3. Apache Knox uses Linux PAM authentication. The Apache Knox user should have permission for the /etc/shadow file for PAM to be able to authenticate local users to Apache Knox. For more information, see An introduction to Pluggable Authentication Modules (PAM) in Linux in Red Hat documentation.

To configure the Apache Knox user, perform the following steps.

  1. Make the following configurations:
    groupadd shadow
    chgrp shadow /etc/shadow
    usermod -a -G shadow knox
    chmod 600 /etc/shadow
    setfacl -m "u:knox:r--" /etc/shadow
    rm -f /var/run/nologin
  2. Validate that users can access HDFS Transparency through Apache Knox by using a file listing command:
    curl -ikv -u <user> https://<knox-host>:<knox port>/gateway/cdp-proxy-api/webhdfs/v1/?op=LISTSTATUS