Monitoring Azure Databricks
Instana provides end-to-end visibility of your environment and supports monitoring of Azure Databricks. After you install the Instana host agent, the Azure Databricks sensor is automatically installed and enabled. You can view infrastructure metrics that are related to the Azure Databricks in the Instana UI.
For more information about other supported Azure services, see Monitoring and Instrumenting Microsoft® Azure with Azure agent.
Supported information
Instana supports metrics and configuration data for all supported Azure Databricks Runtime releases.
Configuring the Azure Databricks sensor
To configure Azure Databricks, enable the Azure sensor in the configuration.yaml
agent configuration file as shown in the following example:
com.instana.plugin.azure:
enabled: true
subscription: "[Your-Subscription-Id]"
tenant: "[Your-Tenant-Id]"
principals:
- id: "[Your-Service-Principal-Account-Id]"
secret: "[Your-Service-Principal-Secret]"
For more information, see Installation of the Azure agent.
To configure the Azure Databricks sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml
agent configuration file as shown in the following example:
com.instana.plugin.azure.databricks:
enabled: false # Enabled (true) by default. Valid values: true, false
unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for all workspaces. The default poll rate is 60 minutes.
workspaces:
workspaceName1: '[Your-Azure-Databricks-Workspace-Name]'
databricks_workspace_access_token: '[Your-Databricks-Workspace-Access-Token]' # Required field.
log_analytics_workspace_id: '[Your-Log-Analytics-Workspace-Id]' # Optional field.
unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for this workspace. The default poll rate is 60 minutes.
workspaceName2: '[Your-Azure-Databricks-Workspace-Name]'
databricks_workspace_access_token: '[Your-Databricks-Workspace-Access-Token]' # Required field.
log_analytics_workspace_id: '[Your-Log-Analytics-Workspace-Id]' # Optional field.
unity_catalog_pollRate: 60 # Optional field. Unit is minute. This indicates the poll rate of Unity Catalog related data in minutes for this workspace. The default poll rate is 60 minutes.
- You can generate access token for Azure Databricks workspace. For more information about how to generate the access token, see Databricks personal access token authentication.
- Workspace ID is an optional field. If the Workspace ID field is filled, you can retrieve more metrics.
- You need to configure the Azure Databricks cluster to use the Log Analytics service to collect more metrics. By configuring the Azure Databricks cluster, you can use the monitoring library to stream Apache Spark level events and Spark Structured Streaming metrics from your jobs to Azure Monitor. For more information, see Send Azure Databricks application logs to Azure Monitor.
-
You can view metrics that are related to Unity Catalog if you enable your workspace for Unity Catalog. For more information, see Enable a workspace for Unity Catalog.
Some of the Unity Catalog metrics are collected from diagnostic logs. To view these metrics, turn on diagnostic logs for Databricks Unity Catalog category and configure Send to Log Analytics in the Azure portal for the Azure
Databricks workspace and then configure the
log_analytics_workspace_id
for your workspace in the agentconfiguration.yaml
file. For more information, see Configure diagnostic log delivery.
Disabling the Azure Databricks sensor
To disable monitoring of the Azure Databricks sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml
agent configuration file as shown in the following example:
com.instana.plugin.azure.databricks:
enabled: false
Viewing metrics
To view the metrics, complete the following steps:
- From the navigation menu in the Instana UI, select Infrastructure.
- Click the specific Azure Databricks block, which is grouped by the
Location
, in the Azure Databricks workspace.
You can view all the collected metrics on the Azure Databricks dashboard.
Metrics are pulled every minute, which is the resolution that Azure provides for monitoring these services.
Configuration data
Workspace Details | Description |
---|---|
Name | Name of the workspace |
Resource Group | Resource groups of the workspace |
Location | Location of the workspace |
Subscription ID | Subscription ID of the workspace |
Type | Type of the Databricks workspace |
Cluster Name | Name of the cluster |
Cluster Id | ID of the Cluster |
Spark Version | Version of Spark that is embedded in the cluster |
Cluster Source | Source type of the cluster |
Cluster Core | CPU core number of the cluster |
Executor Name | Name of the executor |
Unity Catalog | |
Metastore | Name of the metastore that is assigned to the workspace |
Catalog Name | Name of the catalog |
Kind | Catalog securable kind |
Asset Name | Name of the data asset |
Type | Type of the data asset: Table , Volume , Function , or Model |
Action Name | Name of the action in the diagnostic log |
Error Message | Error message in the response of the action |
Performance metrics
Metric | Unit | Aggregation | Description |
---|---|---|---|
Workspace metrics | |||
Executors | Count | Average | Total number of executors in the workspace |
Jobs Running | Count | Average | Total number of jobs that are running in the workspace |
Memory | Megabytes | Average | Sum of the total memory that is used in the workspace |
Cluster metrics | |||
Execution Count | Count | Average | Total number of executors on the cluster |
Job Count | Count | Average | Total number of running jobs on the cluster |
Memory | Megabytes | Average | Memory that is used by the cluster |
Execution Duration[1] | Second | Average | Streaming metric for process duration of the cluster |
Throughput Rows Per Second[1:1] | Second | Average | Streaming throughput metric that indicates input rows per second in the cluster |
Sum Shuffle Bytes Per Cluster[1:2] | Byte | Average | Sum of the total shuffle read/write bytes in the cluster |
Shuffle Bytes Written[1:3] | Byte | Average | Number of bytes that are written in shuffle operations |
Executor metrics | |||
Deserialize Time (Ratio with Executor Runtime)[1:4] | Percent | Average | The ratio of elapsed time that is spent to deserialize the task to the elapsed time the executor spent to run the task |
Serialize Time (Ratio With Executor Runtime)[1:5] | Percent | Average | The ratio of elapsed time that is spent to serialize the task result to the elapsed time the executor spent to run the task |
Executor CPU (Ratio With Executor Runtime)[1:6] | Percent | Average | Ratio of the CPU time the executor spent to run the task to the elapsed time that the executor spent to run the task |
Shuffle Client Direct Memory[1:7] | Byte | Average | Direct memory that is used to shuffle data |
Shuffle Heap Memory[1:8] | Byte | Average | Heap memory that is used to shuffle data |
Jvm CPU (Ratio With Executor Runtime)[1:9] | Percent | Average | The ratio of elapsed time that the JVM spent in garbage collection while executing the task to the elapsed time that the executor spent to run the task |
Unity Catalog metrics | |||
Catalogs | Count | Average | Total number of catalogs |
Schemas | Count | Average | Total number of schemas |
Tables | Count | Average | Total number of tables |
Views | Count | Average | Total number of views |
Volumes | Count | Average | Total number of volumes |
Tables created by Type | Count | Average | Number of tables that are created for a specific type of table |
Volumes created by Type | Count | Average | Number of volumes that are created for a specific type of volume |
Metrics per catalog | |||
Schemas | Count | Average | Total number of schemas for a specific catalog |
Tables | Count | Average | Total number of tables for a specific catalog |
Views | Count | Average | Total number of views for a specific catalog |
Volumes | Count | Average | Total number of volumes for a specific catalog |
ML Models | Count | Average | Total number of machine learning models for a specific catalog |
Functions | Count | Average | Total number of functions for a specific catalog |
Number of access to the asset[1:10][2] | Count | Average | Number of accesses to a specific asset in the last 24 hours |
Number of unauthorized access[1:11][2:1] | Count | Average | Number of unauthorized accesses to a specific action in the last 24 hours |