Configuring Spark log level information

Review the applications that run and identify the issues that are present by using the logs that the Analytics Engine powered by Apache Spark Spark application generates. The standard logging levels available are ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, and OFF. By default, the Analytics Engine powered by Apache Spark Spark application logs at the Spark INFO level. You can configure the logging level to display relevant, and fewer verbose messages.

The Analytics Engine powered by Apache Spark logging configuration configures the log level information of the Spark framework. It does not affect the logs written by the user code using commands such as 'logger.info()', 'logger.warn()', 'print()', or'show()' in the Spark application.

This feature is applicable only to Spark applications. This feature is not supported for Spark interactive applications or Spark kernels.

Configuring options

Configure the following Analytics Engine powered by Apache Spark logs to set up the Spark application log level:

  • Spark driver logs (by using ae.spark.driver.log.level)
  • Spark executor logs (by using ae.spark.executor.log.level)

Specify the option in the Spark configurations section at the time of provisioning an Analytics Engine powered by Apache Spark instance or submitting a Spark application. You can specify the following standard log level values:

  • ALL
  • TRACE
  • DEBUG
  • INFO
  • WARN
  • ERROR
  • FATAL
  • OFF

The default value for both driver and executor log level is INFO.

You can apply the configuration in the following two ways:

  • Instance level configuration
  • Application level configuration

Configuring Spark log level information at the instance level

At the time of provisioning an Analytics Engine powered by Apache Spark instance, specify the log level configurations under the default_config attribute. For more information, see Default Spark configuration.

Example :


"default_config": {
    "ae.spark.driver.log.level": "WARN",
    "ae.spark.executor.log.level": "ERROR"
}

Configuring Spark log level information at the application level

At the time of submitting a job, specify the options in the payload under conf. For more information, see Spark application.

Example :


{
     "conf": {
	"ae.spark.driver.log.level":"WARN",
	"ae.spark.executor.log.level":"WARN",
     }
}

Sample use case

Setting the log-level Spark configuration at an instance level : The sample use case considers the scenario where you provision an Analytics Engine powered by Apache Spark instance and configure the log level such that all the applications in the instance log at ERROR.

  1. Set the following configurations as default Spark configurations:

    • ae.spark.driver.log.level = ERROR
    • ae.spark.executor.log.level = ERROR

    After setting the default Spark configuration, the log level for all applications that are submitted to the instance is set to ERROR (provided the application payload does not specify the Spark configuration during submission).

Setting log-level Spark configuration at job level : The sample use case considers a scenario where you have an application and the log level configuired such that logs are logged at the INFO level. You can specify the spark configuration in the payload. Consider the sample payload:

curl -k -X POST <V4_JOBS_API_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}" -d '{
    "application_details": {
        "application": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
        "arguments": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
        "conf": {
            "ae.spark.driver.log.level":"DEBUG",
	    "ae.spark.executor.log.level":"DEBUG",
        }
    }
}'

In the sample use case, the Spark application overrides the log level Spark configuration set at instance level that is, ERROR to INFO.

Enabling automated cleaning of history logs

You can configure Spark history to enable automatic cleaning of Spark history logs. The following default configuration automatically enables cleaning of history log information that are older than 30 days. It also monitor the logs everyday to delete the history logs, which is older than 30 days.

Example :


"default_config": {
    "spark.history.fs.cleaner.enabled": "true"
    "spark.history.fs.cleaner.interval": "1d"
    "spark.history.fs.cleaner.maxAge": "30d"
}

Learn more

Parent topic: Getting started with Spark applications