Accessing Spark runtime driver and executor logs
Applies to :
Spark engine
Apache Gluten accelerated Spark engine
Before you begin
- Required permissions
- To debug Spark runtime, you must be the User of engine or user of the storage volume associated with the engine.
Procedure
The method you use depends on the watsonx.data Spark configuration:
If the Spark advanced features are not enabled for your service instance, you can only view the Spark runtime driver logs by downloading them from storage.
To download the Spark runtime driver logs for debugging purposes if you do not have the advanced features enabled:
- Get the engine id from engine details page. See Getting connection information.
- Get or generate the access token. See Generating an API authorization token. Export the token in a
variable:
export TOKEN=<token_generated> -
Set the instance ID and volume name:
export cluster_url=<platform_instance_route> export VOLUME_NAME=<replace with name of the engine home volume> export ENGINE=<replace with Engine ID> export APPLICATION_ID=<replace with application_id> export TOKEN=<replace with bearer token> -
Start the file server:
curl -k -X POST https://${CloudPakforData_URL}/zen-data/v1/volumes/volume_services/${Volume_name} -H "Authorization: ZenApiKey ${TOKEN}" -H 'Content-Type: application/json' -d '{}'If you receive a 409 error, this means the file server has already started. You can ignore this error and proceed to the next step.
-
Download the log file. The log file is stored in the path:
<Instance_id>/<Application_id>/logs/spark-driver-<Application_id>-stdout.curl -k -X GET "https://$cluster_url/zen-volumes/{$VOLUME_NAME}/v1/volumes/files/spark%2F{$ENGINE}%2F{$APPLICATION_ID}%2Flogs%2Fspark-driver-{$APPLICATION_ID}-stdout" -H "Authorization: Bearer $TOKEN" -
Stop the file server:
curl -k -X DELETE https://${CloudPakforData_URL}/zen-data/v1/volumes/volume_services/${Volume_name} -H "Authorization: ZenApiKey ${TOKEN}" -H 'Content-Type: application/json' -d '{}'
Refer to Managing persistent volume instances with the Volumes API for more information on working with stored files.
Downloading logs with Spark advanced features enabledIf the Spark advanced features are enabled for your instance you can view or download the Spark runtime driver logs for debugging purposes. For details on enabling Spark advanced features, see Using advanced features.
You can download the logs in two ways:
-
Through the IBM Cloud Pak for Data web client
- From the navigation menu , click Services > Instances, find the instance and click it to view the instance details.
- Click on the right of the instance details page and select Deployment Space to open the deployment space on the Runtimes tab where you can view the Spark runtimes.
- Click the runtime to see the runtime runs. Performance metrics, partitions, and execution plans of the completed runtimes can be viewed on the Spark history server. See accessing the Spark history server.
- Click a runtime run to view the run details and log tail. You can download the complete log for the run by clicking Download log.
Spark
-
Through the REST API
-
Get the name of the deployment space.
- From the navigation menu in Cloud Pak for Data, click Services > Instances, find the instance and click it to view the instance details.
- Make a note of the deployment space name.
-
Export the following values. The runtime ID is included in your runtime POST response.
export CLOUDPAKFORDATA_URL=<CloudPakforData_URL> export SPACE_NAME=<space_name> export APPLICATIONID=<application_id> -
Get the access token for the service instance. See Generating an API authorization token.
-
Export the access token in a variable:
export TOKEN=<ACCESS_TOKEN> -
Run the following. When using the v2 API, set the
<api_version>parameter tov2; for thev3API, set it tov3.SPACE_ID=$(curl -k -X GET https://$CLOUDPAKFORDATA_URL/<api_version>/spaces?name=$SPACE_NAME -H 'content-type:application/json' -H "Authorization: ZenApiKey ${TOKEN}" | python3 -c "import json, sys; print(json.load(sys.stdin)['resources'][0]['metadata']['id'])") curl -ivk https://$CLOUDPAKFORDATA_URL/<api_version>/asset_files/runtimes%2Fspark%2F$APPLICATIONID%2Flogs%2Fspark-driver-$APPLICATIONID-stdout?space_id=$SPACE_ID -H "accept: application/json" -H "Authorization: ZenApiKey ${TOKEN}"
-
View Spark application logs
export cluster_url=<platform_instance_route>
export VOLUME_NAME=<replace with name of the engine home volume>
export ENGINE=<replace with Engine ID>
export APPLICATION_ID=<replace with application_id>
export TOKEN=<replace with bearer token>
curl -k -X GET "https://$cluster_url/zen-volumes/{$VOLUME_NAME}/v1/volumes/files/spark%2F{$ENGINE}%2F{$APPLICATION_ID}%2Flogs%2Fspark-driver-{$APPLICATION_ID}-stdout" -H "Authorization: Bearer $TOKEN"
View executor logs
"env": {
"SPARK_WORKER_DIR": "/home/spark/shared/logs/executors"
}