Configuring Spark log level information

Review the applications that run and identify the issues that are present by using the logs that the watsonx.data Spark application generates. The standard logging levels available are ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, and OFF. By default, the watsonx.data Spark application logs at the Spark INFO level. You can configure the logging level to display relevant, and fewer verbose messages.

Applies to :

Spark engine

Apache Gluten accelerated Spark engine

Configuring options

Configure the following watsonx.data logs to set up the Spark application log level:

Spark driver logs (by using `ae.spark.driver.log.level`)
Spark executor logs (by using `ae.spark.executor.log.level`)

Specify the option in the Spark configurations section at the time of provisioning a Spark engine or submitting a Spark application. You can specify the following standard log level values:

ALL
TRACE
DEBUG
INFO
WARN
ERROR
FATAL
OFF

The default value for both driver and executor log level is `INFO`.

Configuring Spark log level information at the engine level

At the time of provisioning a watsonx.data instance, specify the log level configurations under the default_config attribute.

Example:


"default_config": {
    "ae.spark.driver.log.level": "WARN",
    "ae.spark.executor.log.level": "ERROR"
}

Configuring Spark log level information at the application level

At the time of submitting a job, specify the options in the payload under `conf`.

Example:


{
     "conf": {
	"ae.spark.driver.log.level":"WARN",
	"ae.spark.executor.log.level":"WARN",
     }
}

Sample use case

Setting the log-level Spark configuration at an engine level : The sample use case considers the scenario where you provision a watsonx.data Spark engine and configure the log level such that all the applications in the engine log at ERROR.

Set the following configurations as default Spark configurations:
- ae.spark.driver.log.level = ERROR
- ae.spark.executor.log.level = ERROR
After setting the default Spark configuration, the log level for all applications that are submitted to the Spark engine is set to ERROR (provided the application payload does not specify the Spark configuration during submission).

Setting log-level Spark configuration at application level : The sample use case considers a scenario where you have an application and the log level configured such that logs are logged at the INFO level. You can specify the Spark configuration in the payload. Consider the sample payload:

curl -k -X POST --url https://<cpd_host_name>/lakehouse/api/<api_version>/spark_engines/<spark_engine_id>/applications \  -H "Authorization: ZenApiKey ${TOKEN}" -d '{
    "application_details": {
        "application": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
        "arguments": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
        "conf": {
            "ae.spark.driver.log.level":"DEBUG",
	    "ae.spark.executor.log.level":"DEBUG",
        }
    }
}'

In the sample use case, the Spark application overrides the log level Spark configuration set at engine level that is, ERROR to INFO.

Enabling automated cleaning of history logs

You can configure Spark history to enable automatic cleaning of Spark history logs. The following default configuration automatically enables cleaning of history log information that are older than 30 days. It also monitor the logs everyday to delete the history logs, which is older than 30 days.

Example :

"default_config": {
    "spark.history.fs.cleaner.enabled": "true","spark.history.fs.cleaner.interval": "1d","spark.history.fs.cleaner.maxAge": "30d",
}