Implementing security within PureData System for Hadoop

Learn about the types of security features integrated with the IBM PureData™ for Hadoop appliance. Configure and use different authentication methods for users and groups, the credentials store, and session tokens. Also learn to implement reverse proxy and Big SQL security.

Darliene Hopes, Technical Enablement Specialist, IBM

Darliene HopesDarliene Hopes is an IBM PureData system technical enablement specialist. She has worked with DB2 for Linux, UNIX, and Windows since the start of her career. She is now working with the IBM big data platform, PureData System for Hadoop, and IBM PureData System for Transactions.



04 June 2014

Ensuring that your data and systems are secure is a top priority with any enterprise. Although many security products exist, some lack the integrated security needed to keep data from being compromised. With IBM PureData System for Hadoop, security is integrated in an appliance. This article describes the appliance and explains how to implement security using it.

Working with users and groups

To gain access to the InfoSphere® BigInsights™ console within the appliance, you must be authenticated. As shown in Table 1, the level of access depends on role membership. Users and groups are assigned to InfoSphere BigInsights roles, which determine what resources they have access to. To use security features, a user must be assigned to one of the following InfoSphere BigInsights roles, even if the user is using external LDAP.

Table 1. InfoSphere BigInsights roles and privileges
InfoSphere BigInsights roleGroup namePrivileges
BigInsightsSystemAdministratorbi_supergroupMonitors cluster health
Adds, removes, starts, and stops nodes
bi_sys_admins
BigInsightsDataAdministratorbi_data_adminsCreates directories, runs Hadoop file system commands
Uploads, delets, downloads, and views files
BigInsightsApplicationAdministratorbi_app_adminsPublishes, unpublishes, deploys, and undeploys an application to a cluster
Assigns application permissions to groups
BigInsightsUserbi_usersRuns applications that the user has permission to run
Views application run results, data, and cluster health

Authentication methods

The InfoSphere BigInsights console supports two authentication methods:

  • Local authentication:— Uses the appliance command-line interface user command to manage users. This interface can be accessed by logging into an SSH connection as the ihadmin user, which grants access to the administrative shell.
  • External Lightweight Directory Access Protocol (LDAP):— Enables you to configure the InfoSphere BigInsights console to communicate with the LDAP credential stores for users, groups, and user-to-group mappings. The ldap command configures and manages the use of LDAP servers by the appliance.

Local authentication method

To add users and groups using local authentication:

  1. Issue the following command to connect to the appliance as user ihadmin. <host> refers to the master node IP or host name of PureData System for Hadoop.
    ssh ihadmin@<host>
  2. Create a user with user add <name> <password>[<role>...]. Note: If a role is not specified, the default is bi_users, as in these examples:
    • user add user1 password— Creates a user user1 with password password. It is assigned to the bi_users group.
    • user add user1 password bi_data_admins— Creates a user user1 with password password. It is assigned to the bi_data_admins group.

To delete users and groups using local authentication:

  1. Issue the following command to connect to the appliance as user ihadmin. <host> refers to the master node IP or host name of PureData System for Hadoop.
    ssh ihadmin@<host>
  2. Delete a user with the following command:
    user delete [-f]<name>

    For example, user delete user1 deletes a user user1.

Using external LDAP authentication

To configure external LDAP:

  • Make a note of the IP address of your LDAP server.
  • Make a note of the base distinguished name (DN), which indicates where your user and group information is located in your directory.
  • Enable anonymous search and authentication to the LDAP directory.

To configure external LDAP:

  1. Issue the following command to connect to the appliance as user ihadmin. <host> refers to the master node IP or host name of PureData System for Hadoop.
    ssh ihadmin@<host>
  2. To check the status of LDAP (on/off), issue the following command:
    ldap status
  3. To enable LDAP, issue the following command:
    ldap on <customer-ldap-ip-address> <customer-base-dn>
  4. To turn off LDAP service, issue the following command:
    ldap off

Adding and deleting users and groups must be done within your LDAP configuration.


Loading information into the credentials store

The InfoSphere BigInsights credentials store is a folder on the distributed file system (DFS) designated for the storage of sensitive information, such as passwords and tokens. To load information into the credential store for storage, run the credstore.sh utility in the $BIGINSIGHTS_HOME/bin directory. The utility supports load, store, and update options.

To load credentials from an already existing private credential store, issue the following command:

credstore.sh load [-pub] dfs_file [-o output_file]
  • dfs_file: Name of your credential file. This can be an absolute path to a file in your DFS, a relative path to a file, or a file name.
  • output_file: File on local file system where you want to save the <key> = <value> pairs.

To store the credentials from an input file or existing private credential store in <key> = <value> pairs, issue the following command:

credstore.sh store[-pub] dfs_file ((-i input_file) | <key> = <value>...<key> = <value>)
  • dfs_file: Name of your credential file.
  • input_file: File on the local file system containing the <key> = <value> pairs you want to upload.

You can also enter the <key> = <value> pairs directly from the command line if you choose.


Using session tokens for REST APIs

REST API commands can be used to manage servers in your cluster, such as starting and stopping services or checking the status of operations. To use session tokens for REST APIs to authorize REST calls:

  1. Issue the following command to connect to the appliance as user ihadmin. <host> refers to the master node IP or host name of PureData System for Hadoop.
    ssh ihadmin@<host>
  2. Use curl commands to obtain a session token for the REST APIs. The session cookies are stored in a JAR file and can be used for all subsequent curl HTTP requests. InfoSphere BigInsights provides form-based authentication with the j_security_check action. A user ID and password must be specified using the j_username and j_password options. Any filename can be used for the file with the stored token cookies. Run the following command to create your session token:
    curl -i -c /cookie_jar -d 'j_username=user' -d
    'j_password=password
    'https://<host>:<port>/j_security_check'
    cookie_jar - cookie files local system location
    • user: The user ID you are authenticating to the InfoSphere BigInsights console
    • password: The password of the user you are authenticating to the InfoSphere BigInsights console.

If the authentication is successful, the web server returns a 302 redirect response code, and the location field is set to the URL of the web console.

Figure 1. Sample 302 redirect response code
Image shows 302 code

If the authentication isn't accepted, the web server returns a 302 redirect response with no cookies returned.

Figure 2. Sample failed 302 redirect response code
Image shows 302 error code

Configuring single sign-on

To share the same credentials between two clusters, you must configure single sign-on. With single sign-on, the private key for the Lightweight Third-Party Application (LTPA) token can be shared. Configure single sign-on by following these steps:

  1. Change to the directory $BIGINSIGHTS_HOME/console/wlp/usr/servers/waslp-server/resources/security on the first cluster.
  2. Copy the ltpa.keys file to the same directory on the second cluster that is to share credentials with the first cluster.
  3. On the second cluster, change to the $BIGINSIGHTS_HOME/console/wlp/usr/servers/waslp-server directory. Update the keysFileName parameter in the ltpa section of the server.xml file. The token password must match on both clusters.
  4. Save and close the server.xml file.

Configuring reverse proxy

Reverse proxy is a proxy server that acts as an intermediary that retrieves resources for the client from one or more servers. It also returns all the resources to the client as though they were from the reverse proxy originally. Reverse proxies are used to hide the existence of the actual servers.

To configure a reverse proxy, close the cluster node ports to the public network, except for the InfoSphere BigInsights console port (8080). As shown below, click Access secure cluster servers under the Quick Links section of the InfoSphere BigInsights console. This link opens the reverse proxy page, which lists the URLs and proxy links for all HTTP services started by InfoSphere BigInsights that can be used as reverse proxies.

Figure 3. Access server cluster servers link
access link

Note: Only users with the roles BigInsightsSystemAdministrator and BigInsightsDataAdministrator can access this link if InfoSphere BigInsights was installed with security.

The following table shows the InfoSphere BigInsights components that support the reverse proxy feature.

Table 2. InfoSphere BigInsights components that support reverse proxy
InfoSphere BigInsights componentDefault port
Namenode (HDFS only)50070
Secondary NameNode (HDFS only)50090
DataNode (HDFS only)50075
JobTracker (Apache MapReduce only)50030
TaskTracker (Apache MapReduce only)50060
Hive Web Interface9999
HBase master60010
HBase region server60030

Configuring Big SQL security

Big SQL, the IBM SQL interface to the Hadoop-based platform, InfoSphere BigInsights, is designed to provide SQL developers the ability to query data managed by Hadoop. You can enable Big SQL with its own authentication, authorization, and encryption. These security features are unique to Big SQL and are built directly into Big SQL.

The following sections describe how to configure authentication, authorization, and encryption by updating the Big SQL configuration file bigsql-conf.xml in the $BIGSQL_HOME/conf/ directory. For the changes to take effect, the Big SQL server must be restarted.

Enable authentication

Update the bigsql-conf.xml configuration file in the $BIGSQL_HOME/conf/ directory. To use the InfoSphere BigInsights console as the authenticator for Big SQL, set the bigsql.security.authenticator value to WebConsole, which is also the default value, as shown in the following command:

<bigsql.security.authenticator value="WebConsole">

Note: When authentication is configured correctly, Big SQL sends the user credentials to the InfoSphere BigInsights console for verification. Based on the validations of the users, InfoSphere BigInsights provides the user roles to Big SQL.

Enable authorization

As the biadmin user, create personal directories in HDFS for all users. In these directories, they can create private, shared, or public schemas and tables. To enable authorization, issue the following commands from the command line within the Hadoop Distributed File System.

$HADOOP_HOME/bin/hadoop fs -mkdir /user/user_id
$HADOOP_HOME/bin/hadoop fs -chown user_id:group_id /user/user_id $HADOOP_HOME/bin/hadoop fs -chmod 755 /user/user_id

Update the bigsql-conf.xml configuration file in the $BIGSQL_HOME/conf/ directory. Set the authorizer to the authorization mode by using the following command:

<bigsql.security.authorizer value="authorization_mode">

The authorization modes use the HDFS permissions to control data access. Choose one of the following values for authorization_mode:

  • none— Use this mode when security is not important. All operations inside the Big SQL server are done using a super-user or admin user authority, regardless of who is connected to the server.
  • dataSource— If the current user connected is an admin user, the server uses the same user ID as the Big SQL server process owner (usually biadmin) to perform tasks such as reading HDFS files, writing to HDFS files, and spawning a MapReduce job. If the current user is not an admin user, those tasks are performed using the currently connected user's user ID.

Enable SSL encryption

To enable SSL encryption,you must configure both the server and the client.

To configure the Big SQL server:

  1. Update the bigsql-conf.xml configuration file in the $BIGSQL_HOME/conf/ directory using the following values:

    Click to see code listing

    <bigsql.security.ssl value="true"/>
    <!-- location of keystore file. keystore file contains private key. keystore file can be created using $JAVA_HOME/bin/keytool -->
    <bigsql.security.ssl.keyStore value="/path/to/keystore_file">
    <!-- Keystore file password -->
    <bigsql.security.ssl.keyStore.passWord value="password_for_keystore_file"/> <!-- Key Manager --> <bigsql.security.ssl.keyStore.keyManager value="keymanager_value"/>
  2. Restart the Big SQL server.

To configure the Big SQL (JDBC) client for SSL:

  1. Make sure the certificate the Big SQL server provides in a trust store file is imported by using the following command:
    $JAVA_HOME/bin/keytool
  2. In the JDBC application, use the following connection URL and commands:
    jdbc:bigsql://<host>:<port>/<database>:<parameter>=<value>;
    <parameter2>=<value2>...
  3. Set these parameter and value pairs:
    user=<username>
    password=<password>
    SSL=true
    truststore=/path/to/truststore_file
    truststore.password=truststore_password
    keymanager=IbmX509

Summary

On its own, the Hadoop architecture lacks the type of security needed for an enterprise. This article describes how to secure your big data platform by using the IBM PureData System for Hadoop appliance, an all-in-one system with built-in enterprise security. Implement authentication, authorization, and encryption to help ensure that your enterprise data is secure.

Resources

Learn

Get products and technologies

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics
ArticleID=973074
ArticleTitle=Implementing security within PureData System for Hadoop
publish-date=06042014