Configuring an Apache HDFS external storage manager

Content Manager OnDemand supports data storage in an Apache Hadoop Distributed File System (HDFS).

The Apache® Hadoop® project develops a variety of open-source software for reliable, scalable, distributed computing. The project includes Apache HDFS, which is a distributed file system that provides high-throughput access to application data. More information on Apache HDFS can be found at: https://hadoop.apache.org/

Updating the ARS.CFG file on AIX, Linux, or Linux on System z servers

Perform these steps to configure Apache HDFS on an AIX, Linux, or Linux on System z server.

  1. Two new entries must be added to the ARS.CFG file.

    For AIX servers:
    ARS_HDFS_CONFIG_FILE=/opt/IBM/ondemand/V10.1/config/ars.hdfs
    ARS_HDFS_CONFIG_DIR=/opt/IBM/ondemand/V10.1/config
    
    For Linux and Linux on System z servers:
    ARS_HDFS_CONFIG_FILE=/opt/ibm/ondemand/V10.1/config/ars.hdfs
    ARS_HDFS_CONFIG_DIR=/opt/ibm/ondemand/V10.1/config
    

    The ARS_HDFS_CONFIG_FILE entry specifies an existing Apache HDFS configuration file which the server uses by default.

    The ARS_HDFS_CONFIG_DIR entry specifies the directory in which any alternate configuration files are kept. This directory is used if additional Apache HDFS configuration files are defined. The names of these additional configuration files can be specified when defining storage nodes in Content Manager OnDemand. If no configuration file is specified in the storage node, the default configuration file is used.

    The configuration file name and directory path shown in the examples are the recommended values for these entries.

  2. The ARS_STORAGE_MANAGER entry in the ARS.CFG file might also need to be changed. If you specify ARS_STORAGE_MANAGER=CACHE_ONLY, this disables all storage managers supported by Content Manager OnDemand.

    To configure the Content Manager OnDemand server to use Apache HDFS as a storage manager, the value must be set to one of the following:
    ARS_STORAGE_MANAGER=TSM
    This setting will enable all external storage managers supported by Content Manager OnDemand. The Content Manager OnDemand server requires additional software to utilize Tivoli® Storage Manager (TSM) as a storage manager. If that additional software is not installed, the server will not start when the ARS_STORAGE_MANAGER value is set to TSM.
    ARS_STORAGE_MANAGER=NO_TSM
    This setting will enable all external storage managers supported by Content Manager OnDemand except Tivoli Storage Manager. This setting is used when the additional software to support Tivoli Storage Manager is not installed and Tivoli Storage Manager is not required as an external storage manager.

Updating an instance configuration on Windows servers

Perform these steps to configure Apache HDFS on a Windows server. Both steps use the OnDemand Configurator to create or update the configuration information.
  1. Select Apache HDFS as an external storage manager.
  2. Set the configuration entries:
    Configuration Directory
    The Configuration Directory specifies the directory in which any alternate configuration files are kept. This directory is used if additional Apache HDFS configuration files are defined. The names of these additional configuration files can be specified when defining storage nodes in Content Manager OnDemand. If no configuration file is specified in the storage node, the default configuration file is used. For example: C:\Program Files\IBM\OnDemand\V10.1\config
    Default Configuration File
    The Default Configuration File specifies an existing Apache HDFS configuration file which the server uses by default. For example: C:\Program Files\IBM\OnDemand\V10.1\config\ars.hdfs A sample configuration file is included as part of the installation of Content Manager OnDemand.

Creating an Apache HDFS configuration file

An Apache HDFS configuration file for Content Manager OnDemand contains entries specific to your Apache HDFS implementation. You specify the location and name of the default configuration file in the ARS.CFG entry or via the OnDemand Configurator. Required entries must be specified. Optional entries are not required in the configuration file unless those values need to be changed.

The following list describes the entries that can be specified in an Apache HDFS configuration file.

ARS_HDFS_SERVER
Specifies the Apache HDFS server name. Do not include http:// or https:// in the name. This entry is required.
ARS_HDFS_PORT
Specifies the Apache HDFS server port number. This entry is optional if using a standard port. Content Manager OnDemand assumes port 80 for HTTP or port 443 for HTTPS communications.
ARS_HDFS_TLD
Specifies the Apache HDFS top-level directory name. This is any additional path information after the server name and port in the URL. This entry is optional.
ARS_HDFS_USE_SSL
Indicates whether or not to use SSL in server communications. The possible values are:
  • 0 - SSL will not be used
  • 1 - SSL will be used
The default value is 0. This entry is optional.
ARS_HDFS_AUTH_TYPE
Specifies the user authentication type. The possible values are:
  • NONE - Open system
  • KNOX - Access and authenticate through Apache Knox
The default value is NONE. This entry is optional.
ARS_HDFS_CONNECT_ TIMEOUT
Specifies the maximum number of seconds that Content Manager OnDemand waits for a response from the storage manager. The default is 60. This entry is optional. Warning: Setting this value too low might cause connection failures.
ARS_HDFS_FILE_PERMS
Specifies the permissions for new files. The default is 440. This entry is optional.
ARS_HDFS_HLD
Specifies the high-level directory name. This attribute is available to group sets of Content Manager OnDemand data together which might be needed if sharing external storage among multiple Content Manager OnDemand servers. Warning: Once this value is set, it must not be changed. If it is changed, any data that is already stored will not be retrievable. There is no default value. This entry is optional.
As an example, for a URL such as http://hdfs.example.com/webhdfs/v1, the Apache HDFS configuration file contains:
ARS_HDFS_SERVER=hdfs.example.com
ARS_HDFS_TLD=/webhdfs/v1

Defining an Apache HDFS storage node with the Administrator client

You can define the settings for using the Apache HDFS access method on the Add a Primary Node dialog of the OnDemand Administrator client.

The Storage Node field is not used for communication with the Apache HDFS server and can be set to any name you choose.

The Logon field is the user name from the Apache HDFS system which Content Manager OnDemand uses to store and retrieve data. A password might not be required for open Apache HDFS systems, so this field is optional.

The Access Method radio button is set to Apache HDFS. For Content Manager OnDemand servers running on all platforms except Windows, the Configuration File Name defaults to the value specified by the ARS_HDFS_CONFIG_FILE parameter in the ARS.CFG file if no value is entered. Otherwise, Content Manager OnDemand looks for the configuration file in the directory defined by the ARS_HDFS_CONFIG_DIR parameter specified in the ARS.CFG file. For Content Manager OnDemand servers running on Windows, the server uses the Configuration File Name field and the Configuration Directory field that are specified in the OnDemand Configurator instead of using the ARS.CFG file parameters.