IBM Support

Tasktracker process fails to start on a data node due to org.apache.hadoop.util.Shell$ExitCodeException

Troubleshooting


Problem

When starting Hadoop a tasktracker process may fail to to start on one or more data nodes due to a org.apache.hadoop.util.Shell$ExitCodeException error.

Symptom

Failure will show the following stack:


2013-11-25 10:55:02,443 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
    at org.apache.hadoop.util.Shell.run(Shell.java:182)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
    at org.apache.hadoop.mapred.LinuxTaskController.deleteAsUser(LinuxTaskController.java:281)
    at org.apache.hadoop.mapred.TaskTracker.deleteUserDirectories(TaskTracker.java:774)
    at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:811)
    at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1565)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3907)

Cause

BigInsights is unable to cleanup the mapred.local.dir contents on the data node(s).

Diagnosing The Problem

The QA team has seen this when changing authentication methods (specifically between an open data node appliance and closed data node appliance). Example directory listing:

[biadmin@rack1-data15 ~]$ ls -l /hadoop/data1/mapred-local/taskTracker/
total 16
drwxr-s--- 4 biadmin biadmin 4096 Nov 13 11:41 biadmin
drwxrwxr-x 10 biadmin biadmin 4096 Nov 14 15:40 distcache
drwxr-s--- 4 188744797 biadmin 4096 Nov 13 11:56 rcessna
drwxr-s--- 4 smithmg biadmin 4096 Nov 13 11:10 smithmg

Resolving The Problem

Login as root and manually clean up the mapred.local.dir taskTracker subdirectories and restart Hadoop or BigInsights. For PDH, the commands are:

rm -rf /hadoop/data1/mapred-local/taskTracker/*
rm -rf /hadoop/data2/mapred-local/taskTracker/*
rm -rf /hadoop/data3/mapred-local/taskTracker/*
rm -rf /hadoop/data4/mapred-local/taskTracker/*
rm -rf /hadoop/data5/mapred-local/taskTracker/*
rm -rf /hadoop/data6/mapred-local/taskTracker/*
rm -rf /hadoop/data7/mapred-local/taskTracker/*
rm -rf /hadoop/data8/mapred-local/taskTracker/*
rm -rf /hadoop/data9/mapred-local/taskTracker/*
rm -rf /hadoop/data10/mapred-local/taskTracker/*
rm -rf /hadoop/data11/mapred-local/taskTracker/*
rm -rf /hadoop/data12/mapred-local/taskTracker/*

[{"Product":{"code":"SSERCR","label":"PureData System for Hadoop"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"BigInsights","Platform":[{"code":"PF016","label":"Linux"}],"Version":"1.0.0.0;1.0.0.1;1.0.0.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

More support for:
PureData System for Hadoop

Software version:
1.0.0.0, 1.0.0.1, 1.0.0.2

Operating system(s):
Linux

Document number:
503317

Modified date:
16 June 2018

UID

swg21657255