Checking the results of a Spark application or cluster
The results of an Apache Spark application are contained in its standard output, standard error, and status log files. Information about a Spark cluster are contained in its standard output and standard error files.
All Spark log files are located in your
$HOME/spark/log directory, which has the
following directory
structure:spark/
log/
cluster_cluster_id/
master.err
master.out
worker_hostname.err
worker_hostname.out
submission_submission_id/
app-application_id
submission.err
submission.out
submission.info
latest -> submission_submission_id/The
$HOME/spark/log directory contains the following three subdirectories:- cluster_cluster_id
- This directory contains the standard output and standard error files for the Spark master and Spark worker processes. The cluster ID is generated when a Spark cluster is created for a user.
- submission_submission_id
- This directory contains the following files for the Spark application:
- app-application_id
- A JSON object file containing information about the Spark application. The application ID contained in the file name uniquely identifies the Spark application.
- submission.out
- The standard output file.
- submission.err
- The standard error file.
- submission.info
- A file containing information about the submitted application:
- A return code that indicates the application's final status. Possible values are described in the description of the submit endpoint of IBM® Db2® Warehouse Analytics API
- The full stack trace.
- Information about application errors and exceptions, if any.
- latest
- This pseudo-directory points to the most recently created application directory.
Note: Spark also writes executor log files to the worker nodes that are accessible only through the
Spark monitoring UI. The executor log files are cleaned up automatically after some time.
Checking results using the spark-submit.sh script
Use the spark-submit.sh script to download or display log files:
- Download all log files of a particular type:
- To download the standard error and standard output logs of the master and worker processes of
your Spark cluster, issue the
spark-submit.shcommand with the--download-cluster-logsoption. - To download the log files for an application, issue the
spark-submit.shcommand with the--download-app-logsoption.
- To download the standard error and standard output logs of the master and worker processes of
your Spark cluster, issue the
- Display the contents of a single log file:
- To display the contents of a single cluster log file, issue the
spark-submit.shcommand with the--display-cluster-logoption. - To display the contents of a single application log file, issue the
spark-submit.shcommand with the--display-app-logoption.
- To display the contents of a single cluster log file, issue the
Checking results using a REST API call
To retrieve a list of the contents of your
$HOME/spark/log directory, use the
IBM Db2 Warehouse API to submit an HTTP GET request that calls the
/dashdb-api/home endpoint for your $HOME/spark/log directory. For
example, issue the following cURL command (replace the user ID, password, and host
name):curl --user "userid:password"
-X GET "https://hostname:8443/dashdb-api/home/spark/log"A typical result looks similar to
this:
{"message":"NONE","result":".\/cluster_20160701084701013000\n.\
/cluster_20160701084701013000\/master.out\n.\
/cluster_20160701084701013000\/master.err\n.\
/cluster_20160701084701013000\/worker_9.152.63.161.out\n.\
/cluster_20160701084701013000\/worker_9.152.63.161.err\n.\
/submission_20160701105300359000\n.\
/submission_20160701105300359000\/submission.out\n.\
/submission_20160701105300359000\/submission.err\n.\
/submission_20160701105300359000\/submission.info\n.\
/submission_20160701105300359000\/app-20160701105305-0005\n.\
/submission_20160701105318773000\n.\
/submission_20160701105318773000\/submission.out\n.\
/submission_20160701105318773000\/submission.err\n.\
/submission_20160701105318773000\/submission.info\n.\
/submission_20160701105318773000\/app-20160701105324-0006\n.\
/latest\n.\
/latest\/submission.out\n.\
/latest\/submission.err\n.\
/latest\/submission.info\n.\
/latest\/app-20160701105324-0006\n","errorMessageCode":"NONE","resultCode":"SUCCESS"}To retrieve a particular log file, use the IBM Db2 Warehouse API to submit an HTTP GET
request that calls the
/dashdb-api/home endpoint for the file. For example, issue
the following cURL command (replace the user ID, password, and host
name):curl --user "userid:password"
-X GET "https://hostname:8443/dashdb-api/home/spark/log/submission_20160701105300359000/submission.err"