Debugging Spark applications

If an application is in the Error state, you can debug the application from the Spark Instance Groups page in the cluster management console.

About this task

When a Spark application enters the Error state, you can debug the application by downloading the driver and executor logs and viewing the application's activity that is related to the resource orchestrator.

The following table lists issues that cause an application to either fail or become stuck in the waiting or submitted state. The table also provides the log files or resource data where you can view the issues.
Log or resource data Issues that cause an application to fail Issues that cause an application to become stuck in the waiting or submitted state
driver log
  • Missing executor .jar file
  • Executor doesn't have permission to write to the log
  • Dependencies fail (for example, HDFS is down)
  • Driver fails because too many tasks retry
  • Error inside the application (for example, an exception)
  • Application does not have executors
executor log
  • Application's access control list (ACL) cannot get the binary
  • Application's ACL cannot get data
  • Dependencies fail (for example, HDFS is down)
  • Several task failures and retries
  • Issues with third-party libraries
  • Error inside the application (for example, an exception)
 
Spark master log
  • Spark master fails because it is out of memory
  • The application schedule cannot get the required slots
resource metrics
  • Host memory is overloaded by the executor
  • Node goes down
  • Node goes down
resource activity
  • The application does not have permissions to complete driver start-up (for example, unable to complete the driver directory)
  • Driver gets reclaimed
  • The application cannot find the driver .jar file
 

Procedure

  1. From the cluster management console, select Spark Instance Groups.
  2. Select a Spark instance group.
  3. Click the Applications tab, then click the application that is in the Error state.

    The application's driver logs and activity page opens up.

  4. Download the driver and executor logs by clicking the download icon (download_icon) in the Logs tab, and check the activity that is related to the resource orchestrator in the Activity tab for any potential errors.