Accessing and customizing the Spark history server

The Spark history server provides the status of running and completed Spark jobs on a provisioned instance of Analytics Engine powered by Apache Spark. If you want to analyze how different stages of your Spark job performed, you can view the details in the Spark history server.

You can access and customize the Spark history server using two methods:

Notes

  • If Running jobs are cancelled or stopped, the Spark application will appear under the Incomplete Applications tab.
  • When you open the Spark history server, only the Spark applications listed on the landing page have timestamps in your time zone. As you drill down for more information, all other timestamp values are in UTC. This is default open source Spark behavior.

Accessing the Spark history server from the Cloud Pak for Data web client

To access and customize the Spark history server from the web client:

  1. Log in to Cloud Pak for Data.
  2. From the navigation menu Cloud Pak for Data navigation menu, select Services > Instances, then select the Analytics Engine powered by Apache Spark instance.
  3. Click the Spark history tab to view the details of processed applications. You can start or stop the Spark history server. To open the Spark history server UI page:
    1. Click Start history server. The Start Spark history server window opens.
    2. Increase or decrease the Cores and Memory (GB).
    3. Click Start. The status message is displayed.
    4. Click Stop history server to stop the server that is running.
    5. Click View Spark history to view the complete history of the processed applications.

Access the Spark history server by using the REST API

The history server is started for an instance of Analytics Engine powered by Apache Spark only when you call the start API of the history server. The history server is stopped when you call the stop API of the history server or when the Analytics Engine powered by Apache Spark instance is deleted.

You use cURL commands to start and stop the history server. To access the history server, you need the Spark history server endpoint and the access token for the service instance. For details about how to get this information, see Managing Analytics Engine powered by Apache Spark instances.

Starting the history server

To start the Spark history server, enter the following cURL command:

curl -ik -X POST <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"

Example of the response:

{
    "state": "started",
    "cores": "1",
    "memory": "4G",
    "start_time": "2022-06-08T11:28:16.521Z"
}

You will see one of the following return codes:

Return code Meaning of the return code Description
200 Ok History server started successfully
401 Unauthorized Invalid authorization token
500 Internal server errors Invalid instance ID or other internal server errors

Viewing the history server status

To view the status of the Spark history server, enter the following cURL command:

curl -ik -X GET <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"

Example of the response:

{
    "state": "started",
    "cores": "1",
    "memory": "4G",
    "start_time": "2022-06-08T11:28:16.521Z"
}

You will see one of the following return codes:

Return code Meaning of the return code Description
200 Ok History server details retrieved successfully
401 Unauthorized Invalid authorization token
500 Internal server errors Invalid instance ID or other internal server errors

Stopping the history server

To stop the history server, enter the following cURL command:

curl -ik -X DELETE <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"

You will see one of the following return codes:

Return code Meaning of the return code Description
204 No content History server stopped successfully
401 Unauthorized Invalid authorization token
500 Internal server errors Invalid instance ID or other internal server errors

Opening the history server Web UI

To access the link to the Spark history server for your provisioned instance:

  1. From the navigation menu Cloud Pak for Data navigation menu in Cloud Pak for Data, click Services > Instances, find the instance and click it to view the instance details.
  2. Copy the view history server endpoint.
  3. Paste the view history server endpoint in a new tab in the same Cloud Pak for Data browser window to view the history server UI.

Notes

  • Ensure that the Spark history server is running before you open the Web UI.
  • Log links under the Stages and Executors tabs of the Spark history server UI will not work as logs are not preserved with the Spark events.
  • The stdout and stderr logs are not supported in the Spark history server UI.

Customizing the Spark history server

By default, the Spark history server consumes 1 CPU core and 4 GiB of memory while it is running. If you want to allocate more resources to the Spark history server, you can set the following properties to the values you want by using the REST API:

  • ae.spark.history-server.cores for the number of CPU cores
  • ae.spark.history-server.memory for the amount of memory

Updating the CPU cores and memory settings

From the History server endpoint, get the instance ID. For details about how to get this information, see Managing Analytics Engine powered by Apache Spark instances.

The format of the endpoint is: https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/spark_history_server.

Update the CPU cores and memory settings using the REST API as follows:

curl --location --request PATCH <https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/default_configs -H "Authorization: ZenApiKey ${TOKEN}" --header 'Content-Type: application/json' --data-raw '{
        "ae.spark.history-server.cores": "2",
        "ae.spark.history-server.memory": "8G"
}'

Additional customizations

You can customize the Spark history server further by adding properties to the default Spark configuration of your Analytics Engine Power by Apache Spark instance. See standard Spark history configuration options.

Best practices

Always stop the Spark history server when you no longer need to use it. Bear in mind that the Spark history server consumes CPU and memory resources continuously while its state is Started.

Parent topic: Apache Spark