Checking the results of a Spark application or cluster

The results of an Apache Spark application are contained in its standard output, standard error, and status log files. Information about a Spark cluster are contained in its standard output and standard error files.

All Spark log files are located in your $HOME/spark/log directory, which has the following directory structure:

spark/
  log/
    cluster_cluster_id/
         master.err
         master.out
         worker_hostname.err
         worker_hostname.out
    submission_submission_id/
         app-application_id
         submission.err
         submission.out
         submission.info
    latest -> submission_submission_id/

The $HOME/spark/log directory contains the following three subdirectories:

cluster_cluster_id

This directory contains the standard output and standard error files for the Spark master and Spark worker processes. The cluster ID is generated when a Spark cluster is created for a user.

submission_submission_id

This directory contains the following files for the Spark application:

app-application_id

A JSON object file containing information about the Spark application. The application ID contained in the file name uniquely identifies the Spark application.

submission.out

The standard output file.

submission.err

The standard error file.

submission.info

A file containing information about the submitted application:

A return code that indicates the application's final status. Possible values are described in the description of the submit endpoint of IBM® Db2® Warehouse Analytics API
The full stack trace.
Information about application errors and exceptions, if any.

latest

This pseudo-directory points to the most recently created application directory.

Note: Spark also writes executor log files to the worker nodes that are accessible only through the Spark monitoring UI. The executor log files are cleaned up automatically after some time.

Checking results using the `spark-submit.sh` script

Use the spark-submit.sh script to download or display log files:

Download all log files of a particular type:
- To download the standard error and standard output logs of the master and worker processes of your Spark cluster, issue the spark-submit.sh command with the --download-cluster-logs option.
- To download the log files for an application, issue the spark-submit.sh command with the --download-app-logs option.
Display the contents of a single log file:
- To display the contents of a single cluster log file, issue the spark-submit.sh command with the --display-cluster-log option.
- To display the contents of a single application log file, issue the spark-submit.sh command with the --display-app-log option.

Checking results using a REST API call

To retrieve a list of the contents of your $HOME/spark/log directory, use the IBM Db2 Warehouse API to submit an HTTP GET request that calls the /dashdb-api/home endpoint for your $HOME/spark/log directory. For example, issue the following cURL command (replace the user ID, password, and host name):

curl --user "userid:password" 
  -X GET "https://hostname:8443/dashdb-api/home/spark/log"

A typical result looks similar to this:

{"message":"NONE","result":".\/cluster_20160701084701013000\n.\
/cluster_20160701084701013000\/master.out\n.\
/cluster_20160701084701013000\/master.err\n.\
/cluster_20160701084701013000\/worker_9.152.63.161.out\n.\
/cluster_20160701084701013000\/worker_9.152.63.161.err\n.\
/submission_20160701105300359000\n.\
/submission_20160701105300359000\/submission.out\n.\
/submission_20160701105300359000\/submission.err\n.\
/submission_20160701105300359000\/submission.info\n.\
/submission_20160701105300359000\/app-20160701105305-0005\n.\
/submission_20160701105318773000\n.\
/submission_20160701105318773000\/submission.out\n.\
/submission_20160701105318773000\/submission.err\n.\
/submission_20160701105318773000\/submission.info\n.\
/submission_20160701105318773000\/app-20160701105324-0006\n.\
/latest\n.\
/latest\/submission.out\n.\
/latest\/submission.err\n.\
/latest\/submission.info\n.\
/latest\/app-20160701105324-0006\n","errorMessageCode":"NONE","resultCode":"SUCCESS"}

To retrieve a particular log file, use the IBM Db2 Warehouse API to submit an HTTP GET request that calls the /dashdb-api/home endpoint for the file. For example, issue the following cURL command (replace the user ID, password, and host name):

curl --user "userid:password" 
  -X GET "https://hostname:8443/dashdb-api/home/spark/log/submission_20160701105300359000/submission.err"

Checking the results of a Spark application or cluster

Checking results using the spark-submit.sh script

Checking results using a REST API call

Checking results using the `spark-submit.sh` script