Troubleshooting
Problem
Large scale MapReduce tasks failed at reduce stage.
Symptom
From task logs, the error messages are as follows.
18/01/12 15:59:20 GMT ERROR shuffler.RemoteMultiFileCopier: ShuffleLibImpl.cpp:633: Failed to receive ack to <xxxxxx:7879> for </xxxxxx/interdata/app_name/22418/1522_32294/1522_java.pkvf>, errno <-62>
18/01/12 15:59:20 GMT ERROR shuffler.ShuffleFetcher: Failed to fetch interdata file, error message <ShuffleLibImpl.cpp:633: Failed to receive ack to <xxxxxx:7879> for </xxxxxx/interdata/app_name/22418/1522_32294/1522_java.pkvf>, errno <-62>>, error code <17>
From psmr log, it shows jvm crashed around the same time the above ERROR happened.
/lib64/power8/libpthread.so.0(+0x8728)[0x3fff9d878728]
/lib64/power8/libc.so.6(clone+0x98)[0x3fff9d707ae0]
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00003ffead1d3dc0, pid=62708, tid=70361502380496
#
# JRE version: OpenJDK Runtime Environment (8.0_65-b17) (build 1.8.0_65-b17)
# Java VM: OpenJDK 64-Bit Server VM (25.65-b01 mixed mode linux-ppc64 compressed oops)
# Problematic frame:
# C
[error occurred during error reporting (printing problematic frame), id 0xb]
# Core dump written. Default location: /xxxxxx/mr_work/app_name/22/core or core.62708
#
# An error report file with more information is saved as:
# /xxxxxx/mr_work/app_name/22/hs_err_pid62708.log
Log InLog in to view more of this document
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1026963