Setting up Kerberos for HDFS Transparency nodes
This topic lists the steps to set up the Kerberos clients on the HDFS Transparency nodes. These instructions work for both Cloudera Private Cloud Base and Open Source Apache Hadoop distributions.
- Before you enable Kerberos, configure FQDN for all the hostname entries in your environment.
- For all the hostname entries that are being replaced in this section, ensure that you use the hostname -f output from your environment. This also includes the workers file hostnames for HDFS Transparency.
- Hostnames should not be changed after you enable Kerberos. If you change the hostname after enabling Kerberos, you need to recreate the principals and keytab files.
- If you need to set up more than one HDFS Transparency cluster using a common KDC server, see the Note under Kerberos.
- Install the Kerberos clients package on all the HDFS Transparency
nodes.
yum install -y krb5-libs krb5-workstation
- Copy the /etc/krb5.conf file to the Kerberos client hosts on the HDFS Transparency nodes.
- Create a directory for the keytab directory and set the appropriate permissions on each of the
HDFS Transparency
node.
mkdir -p /etc/security/keytabs/ chown root:root /etc/security/keytabs chmod 755 /etc/security/keytabs
- Create KDC principals for the components, corresponding to the hosts
where they are running, and export the keytab files as follows:
Service User:Group Daemons Principal Keytab File Name HDFS root:root NameNode nn/<NN_Host_FQDN>@<REALM-NAME> nn.service.keytab NameNode HTTP HTTP/<NN_Host_FQDN>@<REALM-NAME> spnego.service.keytab NameNode HTTP HTTP/<CES_HDFS_Host_FQDN>@<REALM-NAME> spnego.service.keytab DataNode dn/<DN_Host_FQDN>@<REALM-NAME> dn.service.keytab Replace the < NN_Host_FQDN > with the HDFS Transparency NameNode hostname and the <DN_Host_FQDN> with the HDFS Transparency DataNode hostname. Replace the <CES_HDFS_Host_FQDN> with the CES hostname configured for your CES HDFS cluster.
You need to create one principal for each HDFS Transparency NameNode and DataNode in the cluster.
Note: If you are using CDP Private Cloud Base, Cloudera Manager creates the principals and keytabs for all the services except the IBM Spectrum® Scale service. Therefore, you can skip the create service principals section below and go directly to step a.If you are using Open source Apache Hadoop, you need to create service principals for YARN and Mapreduce services as shown in the following table:Service User:Group Daemons Principal Keytab File Name YARN yarn:hadoop ResourceManager rm/<Resource_Manager_FQDN>@<REALM-NAME> rm.service.keytab NodeManager nm/<Node_Manager_FQDN>@<REALM-NAME> nm.service.keytab Mapreduce mapred:hadoop MapReduce Job History Server jhs/<Job_History_Server_FQDN>@<REALM-NAME> jhs.service.keytab Replace the <Resource_Manager_FQDN> with the Resource Manager hostname, the <Node_Manager_FQDN> with the Node Manager hostname and the <Job_History_Server_FQDN> with the Job History Server hostname.
- Create service principals for each service. Refer to the sample table
above.
kadmin.local -q "addprinc -randkey {Principal}"
For example:kadmin.local -q "addprinc -randkey nn/nn01.gpfs.net@IBM.COM"
- Create host principals for each Transparency
host.
kadmin.local -q "addprinc -randkey host/{HOST_NAME}@<Realm Name>"
For example:kadmin.local -q "addprinc -randkey host/nn01.gpfs.net@IBM.COM"
- For each service on each Transparency host, create a keytab file by
exporting its service principal and host principal into a keytab
file:
kadmin.local ktadd -k /etc/security/keytabs/{SERVICE_NAME}.service.keytab {Principal}
kadmin.local ktadd -k /etc/security/keytabs/{SERVICE_NAME}.service.keytab host/{HOST_NAME}@<Realm Name>
For example:
NameNode:kadmin.local ktadd -k /etc/security/keytabs/nn.service.keytab nn/nn01.gpfs.net@IBM.COM kadmin.local ktadd -k /etc/security/keytabs/nn.service.keytab host/nn01.gpfs.net@IBM.COM
NameNode HTTP:kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/nn01.gpfs.net@IBM.COM kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/myceshdfs.gpfs.net@IBM.COM kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab host/nn01.gpfs.net@IBM.COM
Note: The spnego.service.keytab file contains two HTTP principals. myceshdfs.gpfs.net is an example of the CES IP configured for your CES HDFS service.DataNode:kadmin.local ktadd -k /etc/security/keytabs/dn.service.keytab dn/dn01.gpfs.net@IBM.COM kadmin.local ktadd -k /etc/security/keytabs/dn.service.keytab host/dn01.gpfs
Note:- The filename for a service is common (for example, dn.service.keytab) across hosts but the contents would be different because every keytab would have a different host principal component.
- After a keytab is generated, move the keytab to the appropriate host immediately or move it into a different location to avoid the keytab from getting overwritten.
- Create service principals for each service. Refer to the sample table
above.
- For
CES HDFS HA failover, a HDFS client user id is required to be created and setup for the CES
NameNodes. This user will be used by the CES framework to initiate the NameNode failover.
After the HDFS client user is created, create the Kerberos user principal for it.
In this example, the HDFS client user is hdfs. Create hdfs user that belongs to the Hadoop super group such as supergroup. Refer step 8 for configuring this user into hadoop-env.sh.kadmin.local -q "addprinc hdfs@<REALM_NAME>" kadmin.local ktadd -k /etc/security/keytabs/hdfs.headless.keytab hdfs@<REALM_NAME>
Copy the /etc/security/keytabs/hdfs.headless.keytab file to all the NameNodes and change the owner permission of the file to root:
chown root:root /etc/security/keytabs/hdfs.headless.keytab chmod 400 /etc/security/keytabs/hdfs.headless.keytab
- Copy the appropriate keytab file to each host. If a host runs more than one component (for example, both NameNode and DataNode), copy the keytabs for both these components.
- Set the appropriate permissions for the keytab files.On the HDFS Transparency NameNode hosts:
chown root:root /etc/security/keytabs/nn.service.keytab chmod 400 /etc/security/keytabs/nn.service.keytab chown root:root /etc/security/keytabs/spnego.service.keytab chmod 440 /etc/security/keytabs/spnego.service.keytab
On the HDFS Transparency DataNode hosts:chown root:root /etc/security/keytabs/dn.service.keytab chmod 400 /etc/security/keytabs/dn.service.keytab
On the Yarn resource manager hosts:chown yarn:hadoop /etc/security/keytabs/rm.service.keytab chmod 400 /etc/security/keytabs/rm.service.keytab
On the Yarn node manager hosts:chown yarn:hadoop /etc/security/keytabs/nm.service.keytab chmod 400 /etc/security/keytabs/nm.service.keytab
On Mapreduce job history server hosts:chown mapred:hadoop /etc/security/keytabs/jhs.service.keytab chmod 400 /etc/security/keytabs/jhs.service.keytab
- Update the HDFS Transparency configuration files and upload the
changes.Get the config files
mkdir /tmp/hdfsconf mmhdfs config export /tmp/hdfsconf core-site.xml,hdfs-site.xml,hadoop-env.sh
Update the config files with the following changes based on your environment.
File: core-site.xml<property> <name>hadoop.security.authentication</name> <value>kerberos</value> </property> <property> <name>hadoop.rpc.protection</name> <value>authentication</value> </property>
If you are using Cloudera Private Cloud Base cluster, create the following rules:<property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1/$2@$0](nn/.*@.*IBM.COM)s/.*/hdfs/ RULE:[2:$1/$2@$0](dn/.*@.*IBM.COM)s/.*/hdfs/ RULE:[1:$1@$0](hdfs@IBM.COM)s/@.*// RULE:[1:$1@$0](.*@IBM.COM)s/@.*// DEFAULT </value> </property>
Otherwise, if you are using Open source Apache Hadoop, create the following rules:<property> <name>hadoop.security.auth_to_local</name> <value> RULE:[2:$1/$2@$0](nn/.*@.*IBM.COM)s/.*/hdfs/ RULE:[2:$1/$2@$0](jn/.*@.*IBM.COM)s/.*/hdfs/ RULE:[2:$1/$2@$0](dn/.*@.*IBM.COM)s/.*/hdfs/ RULE:[2:$1/$2@$0](nm/.*@.*IBM.COM)s/.*/yarn/ RULE:[2:$1/$2@$0](rm/.*@.*IBM.COM)s/.*/yarn/ RULE:[2:$1/$2@$0](jhs/.*@.*IBM.COM)s/.*/mapred/ DEFAULT </value> </property>
Replace IBM®.COM with your Realm name in above example rules.
File: hdfs-site.xml<property> <name>dfs.data.transfer.protection</name> <value>authentication</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:1004</value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>700</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:1006</value> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>dn/_HOST@IBM.COM</value> </property> <property> <name>dfs.datanode.keytab.file</name> <value>/etc/security/keytabs/dn.service.keytab</value> </property> <property> <name>dfs.encrypt.data.transfer</name> <value>false</value> </property> <property> <name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>HTTP/_HOST@IBM.COM</value> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>nn/_HOST@IBM.COM</value> </property> <property> <name>dfs.namenode.keytab.file</name> <value>/etc/security/keytabs/nn.service.keytab</value> </property> <property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/spnego.service.keytab</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>*</value> </property>
Update the HDFS Transparency configuration files and upload the changes.
File: hadoop-env.shKINIT_KEYTAB=/etc/security/keytabs/hdfs.headless.keytab KINIT_PRINCIPAL=hdfs@IBM.COM
- Stop the HDFS Transparency services for the cluster.
- Stop the DataNodes.
On any HDFS Transparency node, run the following command:
mmhdfs hdfs-dn stop
- Stop the NameNodes.
On any CES HDFS NameNode, run the following command:
mmces service stop HDFS -N <NN1>,<NN2>
- Stop the DataNodes.
- Import the
files.
mmhdfs config import /tmp/hdfsconf core-site.xml,hdfs-site.xml,hadoop-env.sh
- Upload the changes.
mmhdfs config upload
- Start the HDFS Transparency services for the cluster.
- Start the DataNodes.
On any HDFS Transparency node, run the following command:
mmhdfs hdfs-dn start
- Start the NameNodes.
On any CES HDFS NameNode, run the following command:
mmces service start HDFS -N <NN1>,<NN2>
- Verify that the services have started.On any CES HDFS NameNode, run the following command:
mmhdfs hdfs status
- Start the DataNodes.