Monitoring Azure Databricks

Instana provides end-to-end visibility of your environment and supports monitoring of Azure Databricks. After you install the Instana host agent, the Azure Databricks sensor is automatically installed and enabled. You can view infrastructure metrics that are related to the Azure Databricks in the Instana UI.

For more information about other supported Azure services, see Monitoring and Instrumenting Microsoft® Azure with Azure agent.

Supported information

Instana supports metrics and configuration data for all supported Azure Databricks Runtime releases.

Configuring the Azure Databricks sensor

To configure Azure Databricks, enable the Azure sensor in the configuration.yaml agent configuration file as shown in the following example:

com.instana.plugin.azure:
  enabled: true
  subscription: "[Your-Subscription-Id]"
  tenant: "[Your-Tenant-Id]"
  principals:
    - id: "[Your-Service-Principal-Account-Id]"
      secret: "[Your-Service-Principal-Secret]"

For more information, see Installation of the Azure agent.

To configure the Azure Databricks sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml agent configuration file as shown in the following example:

com.instana.plugin.azure.databricks:
  enabled: false # Enabled (true) by default. Valid values: true, false
  unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for all workspaces. The default poll rate is 60 minutes.
  workspaces:
    workspaceName1: '[Your-Azure-Databricks-Workspace-Name]'
      databricks_workspace_access_token: '[Your-Databricks-Workspace-Access-Token]' # Required field.
      log_analytics_workspace_id: '[Your-Log-Analytics-Workspace-Id]' # Optional field.
      unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for this workspace. The default poll rate is 60 minutes.
    workspaceName2: '[Your-Azure-Databricks-Workspace-Name]'
      databricks_workspace_access_token: '[Your-Databricks-Workspace-Access-Token]' # Required field.
      log_analytics_workspace_id: '[Your-Log-Analytics-Workspace-Id]' # Optional field.
      unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for this workspace. The default poll rate is 60 minutes.
Notes:
  • You can generate access token for Azure Databricks workspace. For more information about how to generate the access token, see Databricks personal access token authentication.
  • Workspace ID is an optional field. If the Workspace ID field is filled, you can retrieve more metrics.
  • You need to configure the Azure Databricks cluster to use the Log Analytics service to collect more metrics. By configuring the Azure Databricks cluster, you can use the monitoring library to stream Apache Spark level events and Spark Structured Streaming metrics from your jobs to Azure Monitor. For more information, see Send Azure Databricks application logs to Azure Monitor.
  • You can view metrics that are related to Unity Catalog if you enable your workspace for Unity Catalog. For more information, see Enable a workspace for Unity Catalog. Some of the Unity Catalog metrics are collected from diagnostic logs. To view these metrics, turn on diagnostic logs for Databricks Unity Catalog category and configure Send to Log Analytics in the Azure portal for the Azure Databricks workspace and then configure the log_analytics_workspace_id for your workspace in the agent configuration.yaml file. For more information, see Configure diagnostic log delivery.

Disabling the Azure Databricks sensor

To disable monitoring of the Azure Databricks sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml agent configuration file as shown in the following example:

com.instana.plugin.azure.databricks:
  enabled: false

Viewing metrics

To view the metrics, complete the following steps:

  1. From the navigation menu in the Instana UI, select Infrastructure.
  2. Click the specific Azure Databricks block, which is grouped by the Location, in the Azure Databricks workspace.

You can view all the collected metrics on the Azure Databricks dashboard.

Metrics are pulled every minute, which is the resolution that Azure provides for monitoring these services.

Configuration data

Workspace Details Description
Name Name of the workspace
Resource Group Resource groups of the workspace
Location Location of the workspace
Subscription ID Subscription ID of the workspace
Type Type of the Databricks workspace
Cluster Name Name of the cluster
Cluster Id ID of the Cluster
Spark Version Version of Spark that is embedded in the cluster
Cluster Source Source type of the cluster
Cluster Core CPU core number of the cluster
Executor Name Name of the executor
Unity Catalog
Metastore Name of the metastore that is assigned to the workspace
Catalog Name Name of the catalog
Kind Catalog securable kind
Asset Name Name of the data asset
Type Type of the data asset: Table, Volume, Function, or Model
Action Name Name of the action in the diagnostic log
Error Message Error message in the response of the action

Performance metrics

Metric Unit Aggregation Description
Workspace metrics
Executors Count Average Total number of executors in the workspace
Jobs Running Count Average Total number of jobs that are running in the workspace
Memory Megabytes Average Sum of the total memory that is used in the workspace
Cluster metrics
Execution Count Count Average Total number of executors on the cluster
Job Count Count Average Total number of running jobs on the cluster
Memory Megabytes Average Memory that is used by the cluster
Execution Duration[1] Second Average Streaming metric for process duration of the cluster
Throughput Rows Per Second[1:1] Second Average Streaming throughput metric that indicates input rows per second in the cluster
Sum Shuffle Bytes Per Cluster[1:2] Byte Average Sum of the total shuffle read/write bytes in the cluster
Shuffle Bytes Written[1:3] Byte Average Number of bytes that are written in shuffle operations
Executor metrics
Deserialize Time (Ratio with Executor Runtime)[1:4] Percent Average The ratio of elapsed time that is spent to deserialize the task to the elapsed time the executor spent to run the task
Serialize Time (Ratio With Executor Runtime)[1:5] Percent Average The ratio of elapsed time that is spent to serialize the task result to the elapsed time the executor spent to run the task
Executor CPU (Ratio With Executor Runtime)[1:6] Percent Average Ratio of the CPU time the executor spent to run the task to the elapsed time that the executor spent to run the task
Shuffle Client Direct Memory[1:7] Byte Average Direct memory that is used to shuffle data
Shuffle Heap Memory[1:8] Byte Average Heap memory that is used to shuffle data
Jvm CPU (Ratio With Executor Runtime)[1:9] Percent Average The ratio of elapsed time that the JVM spent in garbage collection while executing the task to the elapsed time that the executor spent to run the task
Unity Catalog metrics
Catalogs Count Average Total number of catalogs
Schemas Count Average Total number of schemas
Tables Count Average Total number of tables
Views Count Average Total number of views
Volumes Count Average Total number of volumes
Tables created by Type Count Average Number of tables that are created for a specific type of table
Volumes created by Type Count Average Number of volumes that are created for a specific type of volume
Metrics per catalog
Schemas Count Average Total number of schemas for a specific catalog
Tables Count Average Total number of tables for a specific catalog
Views Count Average Total number of views for a specific catalog
Volumes Count Average Total number of volumes for a specific catalog
ML Models Count Average Total number of machine learning models for a specific catalog
Functions Count Average Total number of functions for a specific catalog
Number of access to the asset[1:10][2] Count Average Number of accesses to a specific asset in the last 24 hours
Number of unauthorized access[1:11][2:1] Count Average Number of unauthorized accesses to a specific action in the last 24 hours

  1. You can retrieve the metrics from Log Analytics. You cannot view metrics in Instana UI if Log Analytics is not configured. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. The metric is available when you turn on diagnostic logs and configure Send to Log Analytics in your Azure Databricks workspace. ↩︎ ↩︎