Table of contents

Accessing Spark job driver logs

Depending on the way Analytics Engine powered by Apache Spark was configured, you can access and view the Spark job driver logs:

Downloading the driver logs persisted in storage

If the Spark advanced features are not enabled for your service instance, you can only view the Spark job driver logs by downloading them from storage.

To download the Spark job driver logs for debugging purposes if you do not have the advanced features enabled:

  1. Get the Spark instance ID from the Spark jobs v3 endpoint. The format of the endpoint is as follows: https://<CloudPakforData_URL>/v3/instances/<INSTANCE_ID>/spark/applications. See Managing Analytics Engine powered by Apache Spark instances for details of how to find the endpoint.
  2. Get or generate the access token. See Generate an access token. Export the token in a variable:
     export TOKEN=<token_generated>
    
  3. Set the instance ID and volume name:
     export Instance_id=<instance_id_from_step_1>
     export Application_id=<application_id>
     export Volume_name=<name of volume associated with instance>
     export CloudPakforData_URL=<zen_route>
    
  4. Start the file server:
     curl -k -X POST https://${CloudPakforData_URL}/zen-data/v1/volumes/volume_services/${Volume_name} -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{}'
    
  5. Download the log file. The log file is stored in the path: <Instance_id>/<Application_id>/logs/spark-driver-<Application_id>-stdout.
     export File_path=$Instance_id%2F$Application_id%2Flogs%2Fspark-driver-$Application_id-stdout
    
     curl -ivk -X GET https://${CloudPakforData_URL}/zen-volumes/${Volume_name}/v1/volumes/files/${File_path} -H "Authorization: Bearer $TOKEN" -H 'cache-control: no-cache'
    
  6. Stop the file server:
     curl -k -X DELETE https://${CloudPakforData_URL}/zen-data/v1/volumes/volume_services/${Volume_name} -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' -d '{}'
    

Refer to Managing persistent volume instances with the Volumes API for more information on working with stored files.

Downloading logs with Spark advanced features enabled

If the Spark advanced features are enabled for your instance you can view or download the Spark job driver logs for debugging purposes. For details on enabling Spark advanced features, see Using advanced features.

You can download the logs in two ways:

  • Through the IBM Cloud Pak for Data user interface

    1. From the Navigation menu on the Cloud Pak for Data web user interface, click Services > Instances, find the instance and click it to view the instance details.
    2. Click the open and close list of options icon on the right of the instance details page and select Deployment Space to open the deployment space on the Jobs tab where you can view the Spark jobs.
    3. Click the job to see the job runs. Performance metrics, partitions, and execution plans of the completed jobs can be viewed on the Spark history server. See accessing the Spark history server.
    4. Click a job run to view the run details and log tail. You can download the complete log for the run by clicking Download log.
  • Through the REST API

    1. Get the name of the deployment space.

      1. From the Navigation menu on the {site.data.keyword.datalong}} web user interface, click Services > Instances, find the instance and click it to view the instance details.
      2. Make a note of the deployment space name.
    2. Export the following values. The job ID is included in your job POST response.
       export CLOUDPAKFORDATA_URL=<CloudPakforData_URL>
       export SPACE_NAME=<space_name>
       export JOBID=<job_id>
      
    3. Get the access token for the service instance. See Generating an access token.
    4. Export the access token in a variable:
       export TOKEN=<ACCESS_TOKEN>
      
    5. Run the following:
       SPACE_ID=$(curl -k -X GET https://$CLOUDPAKFORDATA_URL/v2/spaces?name=$SPACE_NAME -H 'content-type:application/json' -H "Authorization: Bearer $TOKEN" | python3 -c "import json, sys; print(json.load(sys.stdin)['resources'][0]['metadata']['id'])")
      
       curl -ivk https://$CLOUDPAKFORDATA_URL/v2/asset_files/runtimes%2Fspark%2F$JOBID%2Flogs%2Fspark-driver-$JOBID-stdout?space_id=$SPACE_ID -H "accept: application/json" -H "Authorization: Bearer $TOKEN"