Monitoring Azure Data Factory
Instana offers comprehensive monitoring of your Azure Data Factory, providing end-to-end visibility into your environment. After you install the Instana host agent, the Azure Data Factory sensor is automatically deployed and installed. You can view infrastructure metrics that are related to the Azure Data Factory in the Instana UI.
For more information about other supported Azure services, see the Azure documentation.
Supported versions
Instana supports Azure Data Factory version 2.
Configuring the Azure Data Factory sensor
To monitor your Azure Data Factory, you need to first enable the Azure sensor in the agent <agentinstall_dir>/etc/instana/configuration.yaml
file as follows. For more information, see Installation.
com.instana.plugin.azure:
enabled: true
subscription: "[Your-Subscription-Id]"
tenant: "[Your-Tenant-Id]"
principals:
- id: "[Your-Service-Principal-Account-Id]"
secret: "[Your-Service-Principal-Secret]"
To configure the Azure Data Factory sensor through the agent configuration file <agentinstall_dir>/etc/instana/configuration.yaml
, use the following configuration:
com.instana.plugin.azure.datafactory:
enabled: false # Valid values: true, false. Enabled (true) by default
include_tags: # Comma separated list of tags in key:value format (e.g. env:prod,env:staging)
exclude_tags: # Comma separated list of tags in key:value format (e.g. env:dev,env:test)
include_resource_groups: # Comma separated list of resource groups (e.g. rg_prod,rg_staging)
exclude_resource_groups: # Comma separated list of resource groups (e.g. rg_dev,rg_test)
You can disable the Azure Data Factory sensor and filter it by tags and resource groups.
Disabling the sensor
To disable monitoring of the Azure Data Factory services, use the following configuration:
com.instana.plugin.azure.datafactory:
enabled: false
Viewing metrics
To view the metrics, complete the following steps:
- In the sidebar of the Instana UI, select Infrastructure.
- Click a specific monitored host.
You can see a host dashboard with all the collected metrics and monitored processes.
Metrics are pulled every minute, which is the resolution that Azure provides for the monitoring of these services.
Configuration data
Factory details | Description |
---|---|
Name | Factory name |
Resource Group | Resource group of the factory |
Location | Factory location |
Subscription Id | Subscription ID of the factory |
Type | Type of the resource |
State | State of the factory |
Version | Version of the factory |
List of pipelines | List of all factory pipelines |
Performance metrics
Metric | Name | Unit | Aggregation | Description |
---|---|---|---|---|
Pipelines Succeeded | ||||
Count | pipelineSucceededRuns | Count | Total | The total number of pipeline runs that succeeded within a minute window. |
Percentage | pipelineSucceededRunsPercentage | Percent | Average | The percentage of succeeded pipeline runs within a minute window. |
Pipelines Failed | ||||
Count | pipelineFailedRuns | Count | Total | The total number of pipeline runs that failed within a minute window. |
Percentage | pipelineFailedRunsPercentage | Percent | Average | The percentage of failed pipeline runs within a minute window. |
Pipelines Canceled | ||||
Count | pipelineCancelledRuns | Count | Total | The total number of pipeline runs that were canceled within a minute window. |
Percentage | pipelineCancelledRunsPercentage | Percent | Average | The percentage of canceled pipeline runs within a minute window. |
Pipelines Total | ||||
Count | pipelineTotalRuns | Count | Total | The total number of pipeline runs calculated as a sum of succeeded, failed, and canceled runs within a minute window. |
Pipelines | ||||
Elapsed Time | pipelineElapsedTimeRuns | Count | Total | The number of times, within a minute window, that a pipeline runs longer than the user-defined expected duration. |
Activities Succeeded | ||||
Count | activitySucceededRuns | Count | Total | The total number of activity runs that succeeded within a minute window. |
Percentage | activitySucceededRunsPercentage | Percent | Average | The percentage of succeeded activity runs within a minute window. |
Activities Failed | ||||
Count | activityFailedRuns | Count | Total | The total number of activity runs that failed within a minute window. |
Percentage | activityFailedRunsPercentage | Percent | Average | The percentage of failed activity runs within a minute window. |
Activities Canceled | ||||
Count | activityCancelledRuns | Count | Total | The total number of activity runs that were canceled within a minute window. |
Percentage | activityCancelledRunsPercentage | Percent | Average | The percentage of canceled activity runs within a minute window. |
Activities Total | ||||
Count | activityTotalRuns | Count | Total | The total number of activity runs calculated as a sum of succeeded, failed, and canceled runs within a minute window. |
Triggers Succeeded | ||||
Count | triggerSucceededRuns | Count | Total | The total number of trigger runs that succeeded within a minute window. |
Percentage | triggerSucceededRunsPercentage | Percent | Average | The percentage of succeeded trigger runs within a minute window. |
Triggers Failed | ||||
Count | triggerFailedRuns | Count | Total | The total number of trigger runs that failed within a minute window. |
Percentage | triggerFailedRunsPercentage | Percent | Average | The percentage of failed trigger runs within a minute window. |
Triggers Canceled | ||||
Count | triggerCancelledRuns | Count | Total | The total number of trigger runs that were canceled within a minute window. |
Percentage | triggerCancelledRunsPercentage | Percent | Average | The percentage of canceled trigger runs within a minute window. |
Triggers Total | ||||
Count | triggerTotalRuns | Count | Total | The total number of trigger runs calculated as a sum of succeeded, failed, and canceled runs within a minute window. |
Runtime Memory | ||||
Available | integrationRuntimeAvailableMemory | Bytes | Total | The total number of bytes of memory available for the self-hosted integration runtime within a minute window. |
Runtime CPU | ||||
Percentage | integrationRuntimeCpuPercentage | Percent | Total | The percentage of CPU usage for the self-hosted integration runtime within a minute window. |
Runtime Queue | ||||
Length | integrationRuntimeQueueLength | Count | Total | The total queue length for the self-hosted integration runtime within a minute window. |
Airflow CPU | ||||
Percentage | airflowIntegrationRuntimeCpuPercentage | Percent | Average | The percentage of CPU usage for the Airflow integration runtime within a minute window. |
Airflow Memory | ||||
Percentage | airflowIntegrationRuntimeMemoryPercentage | Percent | Average | The percentage of memory available for the Airflow integration runtime within a minute window. |
Airflow Tasks | ||||
Running | airflowIntegrationRuntimeSchedulerTasksRunning | Count | Total | The total number of running scheduled tasks in the Airflow integration runtime executor within a minute window. |
Queued | airflowIntegrationRuntimeExecutorQueuedTasks | Count | Total | The total number of queued scheduled tasks in the Airflow integration runtime executor within a minute window. |
Airflow Operators | ||||
Succeeded | airflowIntegrationRuntimeOperatorSuccesses | Count | Total | The total number of Airflow integration runtime operator successes within a minute window. |
Failed | airflowIntegrationRuntimeOperatorFailures | Count | Total | The total number of Airflow integration runtime operator failures within a minute window. |
Airflow Triggers | ||||
Succeeded | airflowIntegrationRuntimeTriggersSucceeded | Count | Total | The total number of Airflow integration runtime triggers that succeeded within a minute window. |
Running | airflowIntegrationRuntimeTriggersRunning | Count | Total | The total number of Airflow integration runtime triggers that are running within a minute window. |
Failed | airflowIntegrationRuntimeTriggersFailed | Count | Total | The total number of Airflow integration runtime triggers that failed within a minute window. |
Airflow Jobs | ||||
Succeeded | airflowIntegrationRuntimeJobStart | Count | Total | The total number of Airflow integration runtime jobs that succeeded within a minute window. |
Running | airflowIntegrationRuntimeJobEnd | Count | Total | The total number of Airflow integration runtime jobs that are running within a minute window. |
Failed | airflowIntegrationRuntimeJobHeartbeatFailure | Count | Total | The total number of Airflow integration runtime job heartbeat failures within a minute window. |
Airflow DAG Processing | ||||
Last Duration | airflowIntegrationRuntimeDAGProcessingLastDuration | Milliseconds | Average | The average duration of the last DAG processing in the Airflow integration runtime within a minute window. |