Monitoring Azure Machine Learning
Azure Machine Learning is a cloud service to build, deploy, and manage machine learning models. Instana uses the Azure Machine Learning sensor to monitor the Azure Machine Learning service. Instana comprehensively monitors the Azure Machine Learning service by providing end-to-end visibility into your environment.
After you install the Instana host agent, the Azure Machine Learning sensor is automatically installed and enabled. You can view infrastructure metrics that are related to the Azure Machine Learning in the Instana UI. For more information about other supported Azure services, see Azure documentation.
Configuring the Azure Machine Learning sensor
To configure the Azure Machine Learning sensor, complete the following steps:
-
Enable the Azure subscription on Instana. Update the
<agentinstall_dir>/etc/instana/configuration.yaml
agent configuration file as shown in the following example:com.instana.plugin.azure: enabled: true subscription: "[Your-Subscription-Id]" tenant: "[Your-Tenant-Id]" principals: - id: "[Your-Service-Principal-Account-Id]" secret: "[Your-Service-Principal-Secret]"
For more information about installing the Azure agent, see Installation.
-
Check whether the Azure Machine Learning sensor is enabled in the agent configuration file. You can also configure tags and resource groups as described in the Filtering services by defining tags and resource groups section.
com.instana.plugin.azure.machinelearning: enabled: true # Valid values: true, false. Enabled (true) by default include_tags: # Comma separated list of tags in key:value format (e.g. env:prod,env:staging) exclude_tags: # Comma separated list of tags in key:value format (e.g. env:dev,env:test) include_resource_groups: # Comma separated list of resource groups (e.g. rg_prod,rg_staging) exclude_resource_groups: # Comma separated list of resource groups (e.g. rg_dev,rg_test)
Disabling the Azure Machine Learning sensor
To disable the Azure Machine Learning sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml
agent configuration file as shown in the following example:
com.instana.plugin.azure.machinelearning:
enabled: false
Viewing metrics
To view the metrics, complete the following steps:
- From the navigation menu in the Instana UI, click Infrastructure.
- Click a monitored host.
You can see a host dashboard with all the collected metrics and monitored processes.
Metrics are pulled every minute, which is the resolution that Azure provides for monitoring these services.
Configuration data machine learning workspace
Namespace details | Description |
---|---|
Name | Name of the Azure Machine Learning workspace |
Resource Group | Name of the resource group in which workspace is located |
Location | The location of the resource |
Subscription Id | Azure subscription identifier |
createdAt | The timestamp of resource creation (UTC) |
Performance metrics machine learning workspace
Metric | Name | Unit | Aggregation | Description |
---|---|---|---|---|
Processor Utilization | ||||
Count | CpuUtilizationPercentage | Count | Average | The utilization percentage of a CPU node averaged in a minute |
Count | GpuUtilizationPercentage | Count | Average | The utilization percentage of a GPU node averaged in a minute |
Nodes | ||||
Count | Active Nodes | Count | Average | The nodes which are actively running a job within a minute |
Count | Total Nodes | Count | Average | The sum of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, and Leaving Nodes averaged over one-minute time interval |
Cores | ||||
Count | Active Core | Count | Average | Number of active cores averaged in a minute |
Count | Total Cores | Count | Average | Number of total cores averaged in a minute |
Runs | ||||
Count | Started Runs | Count | Total | The total number of runs running for this workspace in a minute |
Count | Completed Runs | Count | Total | The total number of runs completed successfully for this workspace in a minute |
Count | Cancelled Runs | Count | Total | The total number of runs cancelled for this workspace in a minute |
Count | Errors | Count | Total | The total number of run errors in this workspace within a minute |
Count | Failed Runs | Count | Total | The total number of runs failed for this workspace in a minute |
Disk | ||||
Count | DiskUsedMegabytes | Count | Average | The average disk space utilization in megabytes in a minute |
Count | DiskReadMegabytes | Count | Average | The average of data read from disk in megabytes in a minute |
Count | DiskWriteMegabytes | Count | Average | The average of data written into disk in megabytes in a minute |