Accessing and customizing the Spark history server
The Spark history server provides the status of running and completed Spark jobs on a provisioned instance of Analytics Engine powered by Apache Spark. If you want to analyze how different stages of your Spark job performed, you can view the details in the Spark history server.
You can access and customize the Spark history server using two methods:
Notes
- If Running jobs are cancelled or stopped, the Spark application will appear under the Incomplete Applications tab.
- When you open the Spark history server, only the Spark applications listed on the landing page have timestamps in your time zone. As you drill down for more information, all other timestamp values are in UTC. This is default open source Spark behavior.
Accessing the Spark history server from the Cloud Pak for Data web client
To access and customize the Spark history server from the web client:
- Log in to Cloud Pak for Data.
- From the navigation menu
, select Services > Instances, then select the Analytics Engine powered by Apache Spark instance.
- Click the Spark history tab to view the details of processed applications. You can start or stop the Spark history server. To open the Spark history server UI page:
- Click Start history server. The Start Spark history server window opens.
- Increase or decrease the Cores and Memory (GB).
- Click Start. The status message is displayed.
- Click Stop history server to stop the server that is running.
- Click View Spark history to view the complete history of the processed applications.
Access the Spark history server by using the REST API
The history server is started for an instance of Analytics Engine powered by Apache Spark only when you call the start API of the history server. The history server is stopped when you call the stop API of the history server or when the Analytics Engine powered by Apache Spark instance is deleted.
You use cURL commands to start and stop the history server. To access the history server, you need the Spark history server endpoint and the access token for the service instance. For details about how to get this information, see Managing Analytics Engine powered by Apache Spark instances.
Starting the history server
To start the Spark history server, enter the following cURL command:
curl -ik -X POST <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"
Example of the response:
{
"state": "started",
"cores": "1",
"memory": "4G",
"start_time": "2022-06-08T11:28:16.521Z"
}
You will see one of the following return codes:
Return code | Meaning of the return code | Description |
---|---|---|
200 | Ok | History server started successfully |
401 | Unauthorized | Invalid authorization token |
500 | Internal server errors | Invalid instance ID or other internal server errors |
Viewing the history server status
To view the status of the Spark history server, enter the following cURL command:
curl -ik -X GET <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"
Example of the response:
{
"state": "started",
"cores": "1",
"memory": "4G",
"start_time": "2022-06-08T11:28:16.521Z"
}
You will see one of the following return codes:
Return code | Meaning of the return code | Description |
---|---|---|
200 | Ok | History server details retrieved successfully |
401 | Unauthorized | Invalid authorization token |
500 | Internal server errors | Invalid instance ID or other internal server errors |
Stopping the history server
To stop the history server, enter the following cURL command:
curl -ik -X DELETE <HISTORY_SERVER_ENDPOINT> -H "Authorization: ZenApiKey ${TOKEN}"
You will see one of the following return codes:
Return code | Meaning of the return code | Description |
---|---|---|
204 | No content | History server stopped successfully |
401 | Unauthorized | Invalid authorization token |
500 | Internal server errors | Invalid instance ID or other internal server errors |
Opening the history server Web UI
To access the link to the Spark history server for your provisioned instance:
- From the navigation menu
in Cloud Pak for Data, click Services > Instances, find the instance and click it to view the instance details.
- Copy the view history server endpoint.
- Paste the view history server endpoint in a new tab in the same Cloud Pak for Data browser window to view the history server UI.
Notes
- Ensure that the Spark history server is running before you open the Web UI.
- Log links under the Stages and Executors tabs of the Spark history server UI will not work as logs are not preserved with the Spark events.
- The stdout and stderr logs are not supported in the Spark history server UI.
Customizing the Spark history server
By default, the Spark history server consumes 1 CPU core and 4 GiB of memory while it is running. If you want to allocate more resources to the Spark history server, you can set the following properties to the values you want by using the REST API:
ae.spark.history-server.cores
for the number of CPU coresae.spark.history-server.memory
for the amount of memory
Updating the CPU cores and memory settings
From the History server endpoint, get the instance ID. For details about how to get this information, see Managing Analytics Engine powered by Apache Spark instances.
The format of the endpoint is: https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/spark_history_server
.
Update the CPU cores and memory settings using the REST API as follows:
curl --location --request PATCH <https://<CloudPakforData_URL>/v4/analytics_engines/<INSTANCE_ID>/default_configs -H "Authorization: ZenApiKey ${TOKEN}" --header 'Content-Type: application/json' --data-raw '{
"ae.spark.history-server.cores": "2",
"ae.spark.history-server.memory": "8G"
}'
Additional customizations
You can customize the Spark history server further by adding properties to the default Spark configuration of your Analytics Engine Power by Apache Spark instance. See standard Spark history configuration options.
Best practices
Always stop the Spark history server when you no longer need to use it. Bear in mind that the Spark history server consumes CPU and memory resources continuously while its state is Started.
Parent topic: Apache Spark