IBM Support

MapReduce tasks failed at reduce stage. How to fix it?

Troubleshooting


Problem

Large scale MapReduce tasks failed at reduce stage.

Symptom

From task logs, the error messages are as follows.
18/01/12 15:59:20 GMT ERROR shuffler.RemoteMultiFileCopier: ShuffleLibImpl.cpp:633: Failed to receive ack to <xxxxxx:7879> for </xxxxxx/interdata/app_name/22418/1522_32294/1522_java.pkvf>, errno <-62>
18/01/12 15:59:20 GMT ERROR shuffler.ShuffleFetcher: Failed to fetch interdata file, error message <ShuffleLibImpl.cpp:633: Failed to receive ack to <xxxxxx:7879> for </xxxxxx/interdata/app_name/22418/1522_32294/1522_java.pkvf>, errno <-62>>, error code <17>

From psmr log, it shows jvm crashed around the same time the above ERROR happened.

/lib64/power8/libpthread.so.0(+0x8728)[0x3fff9d878728]
/lib64/power8/libc.so.6(clone+0x98)[0x3fff9d707ae0]
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00003ffead1d3dc0, pid=62708, tid=70361502380496
#
# JRE version: OpenJDK Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
# Java VM: OpenJDK 64-Bit Server VM (25.65-b01 mixed mode linux-ppc64 compressed oops)
# Problematic frame:
# C
[error occurred during error reporting (printing problematic frame), id 0xb]
# Core dump written. Default location: /xxxxxx/mr_work/app_name/22/core or core.62708
#
# An error report file with more information is saved as:
# /xxxxxx/mr_work/app_name/22/hs_err_pid62708.log

[{"Product":{"code":"SSZUMP","label":"IBM Spectrum Symphony"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1.2","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
17 June 2018

UID

isg3T1026963