Troubleshooting
Problem
While creating HDFS directory using the spark session by calling the Python function in the hi_core_utils library
hi_core_utils.run_command, it might fail with an error.
%%spark -s $session_name
# Declare imports needed for all of the cells that will run remotely.
import getpass, time, os, shutil
# Load IBM Hadoop Integration utilities to facilitate remote functionality.
# This line assumes that HI version >= X.Y has been installed on the registered
# Hadoop Integration system.
hi_utils_lib = os.getenv("HI_UTILS_PATH", "")
sc.addPyFile("hdfs://{}".format(hi_utils_lib))
import hi_core_utils
# Declare a target HDFS directory path that will be used for our data.
hdfs_dataset_dir = "/user/{}/datasets".format(getpass.getuser())
input_ds = "{}/{}".format(hdfs_dataset_dir, "cars.csv")
# Create target hdfs directory, if it does not already exist.
hi_core_utils.run_command("hdfs dfs -mkdir -p {}".format(hdfs_dataset_dir))
Symptom
The error observed on Jupyter Notebook is
log4j:ERROR Could not read configuration file from URL [file:/run/cloudera-scm-agent/process/7147-yarn-NODEMANAGER/log4j.properties].
java.io.FileNotFoundException: /run/cloudera-scm-agent/process/7147-yarn-NODEMANAGER/log4j.properties (Permission denied)
Cause
This is in fact a non-fatal error that comes from CDH configuration.
Environment
Hadoop Execution Engine (2.1.0) and Watson Studio Local (1.2.3.1 Patch 10)
CDH 6.2
CDH 6.2
Notebook: Jupyter with Python 3.5 and Spark 2.2.1
Resolving The Problem
It has to do with CDH configuration and it is non-fatal error. In this case, check if the directory is created.
While admittedly annoying to have in the logs, if it is in fact a non-fatal error that comes from CDH config, it can be avoided by setting the HADOOP_CONF_DIR environment var in one of two ways:
- As part of the command being run, prepend the command with export (note the semi-colon).
HADOOP_CONF_DIR=/etc/hadoop/conf; - Within your Notebook, run the following once
os.environ['HADOOP_CONF_DIR']=os.environ['HADOOP_CLIENT_CONF_DIR']
Document Location
Worldwide
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGWL","label":"IBM Watson Studio Local"},"ARM Category":[{"code":"a8m0z000000bmNlAAI","label":"Modeling"}],"ARM Case Number":"TS003661724","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Was this topic helpful?
Document Information
Modified date:
27 May 2020
UID
ibm16217320