To run jobs on Hadoop, an administrator must create and
set the APT_YARN_CONFIG environment variable
for each project.
Before you begin
Verify that the Linux computer
that you are running the jobs on has Java™ Development
Kit (JDK) 1.7 installed. To verify that JDK 1.7 is installed, log
on to your Linux computer as
the InfoSphere®
DataStage® administrator.
For example: su - dsadm , java -version
Verify
that the version of Java is
1.7. If it is not, install Java Development
Kit 1.7.
About this task
The
APT_YARN_CONFIG environment variable
provides a path for
InfoSphere
DataStage to
read the
yarnconfig.cfg file, which specifies
all the environment variables that you need to run
InfoSphere Information Server on
Hadoop. To take advantage of the resource management functionality
of Hadoop when running jobs, you must do the following:
- Set APT_YARN_CONFIG.
- Ensure that APT_YARN_CONFIG points to a yarncongfig.cfg file
where APT_YARN_MODE is set to the default value
of true or 1.
Otherwise, the jobs will not run on Hadoop, but will run in the
standard manner, without using the YARN resource management.
Procedure
- Open the InfoSphere DataStage and QualityStage Administrator
client.
- In the Administrator window, click the Project tab.
- Select the project that you want to run on Hadoop. The
default project is dstage1.
For InfoSphere Information Analyzer,
select the InfoSphere
DataStage project
that is set in the InfoSphere Information Analyzer global
or project properties to be used by InfoSphere Information Analyzer.
The default project is ANALYZERPROJECT.
- Click Properties.
- On the General tab, click Environment,
and then click User Defined.
- Enter the following information to define an environment
variable:
- For the name, enter APT_YARN_CONFIG.
- For the type, enter string.
- In the Prompt field, enter DataStage
Hadoop Configuration file.
- In the Value field, enter /IS_install/Server/PXEngine/etc/yarn_conf/yarnconfig.cfg. IS_install is
the InfoSphere Information Server installation
directory. The default directory is /opt/IBM/InformationServer/.
What to do next
Run a sample job with the new environment variable set, and
verify that the job runs successfully. In the Operations Console,
verify that the job logs contain messages that indicate that the job
successfully connected to the YARN Application Master. If Hadoop is
set up successfully, the project that contains the job that you created
includes (Hadoop) after the project name.