Monitoring Azure Machine Learning

Azure Machine Learning is a cloud service to build, deploy, and manage machine learning models. Instana uses the Azure Machine Learning sensor to monitor the Azure Machine Learning service. Instana comprehensively monitors the Azure Machine Learning service by providing end-to-end visibility into your environment.

After you install the Instana host agent, the Azure Machine Learning sensor is automatically installed and enabled. You can view infrastructure metrics that are related to the Azure Machine Learning in the Instana UI. For more information about other supported Azure services, see Azure documentation.

Configuring the Azure Machine Learning sensor

To configure the Azure Machine Learning sensor, complete the following steps:

  1. Enable the Azure subscription on Instana. Update the <agentinstall_dir>/etc/instana/configuration.yaml agent configuration file as shown in the following example:

    com.instana.plugin.azure:
      enabled: true
      subscription: "[Your-Subscription-Id]"
      tenant: "[Your-Tenant-Id]"
      principals:
        - id: "[Your-Service-Principal-Account-Id]"
          secret: "[Your-Service-Principal-Secret]"
    

    For more information about installing the Azure agent, see Installation.

  2. Check whether the Azure Machine Learning sensor is enabled in the agent configuration file. You can also configure tags and resource groups as described in the Filtering services by defining tags and resource groups section.

    com.instana.plugin.azure.machinelearning:
      enabled: true # Valid values: true, false. Enabled (true) by default
      include_tags: # Comma separated list of tags in key:value format (e.g. env:prod,env:staging)
      exclude_tags: # Comma separated list of tags in key:value format (e.g. env:dev,env:test)
      include_resource_groups: # Comma separated list of resource groups (e.g. rg_prod,rg_staging)
      exclude_resource_groups: # Comma separated list of resource groups (e.g. rg_dev,rg_test)
    
    

Disabling the Azure Machine Learning sensor

To disable the Azure Machine Learning sensor, update the <agentinstall_dir>/etc/instana/configuration.yaml agent configuration file as shown in the following example:

com.instana.plugin.azure.machinelearning:
  enabled: false

Filtering services by defining tags and resource groups

To define multiple tags and resource groups, separate them with commas. Define tags as a key-value pair separated by a colon (:).

You can define multiple tags and resource groups in the configuration.yaml file. Use commas to separate multiple tags or resource groups. If you define a tag or resource group in both lists (include and exclude), the exclude list has a higher priority. If you want to include all services without filtering, avoid defining any configuration.

  • To set tags for the include list, update the configuration.yaml file as shown in the following example:

    com.instana.plugin.azure.machinelearning:
      include_tags: # Comma separated list of tags in key:value format (e.g. env:prod,env:staging)
    
  • To set tags for the exclude list, update the configuration.yaml file as shown in the following example:

    com.instana.plugin.azure.machinelearning:
      exclude_tags: # Comma separated list of tags in key:value format (e.g. env:dev,env:test)
    
  • To set resource groups for the include list, update the configuration.yaml file as shown in the following example:

    com.instana.plugin.azure.machinelearning:
      include_resource_groups: # Comma separated list of resource groups (e.g. rg_prod,rg_staging)
    
  • To set resource groups for the exclude list, update the configuration.yaml file as shown in the following example:

    com.instana.plugin.azure.machinelearning:
      exclude_resource_groups: # Comma separated list of resource groups (e.g. rg_dev,rg_test)
    

When you set filters for the Azure Machine Learning service, it takes precedence over the common filter for all Azure services. For more information, see Configuration.

Viewing metrics

To view the metrics, complete the following steps:

  1. From the navigation menu in the Instana UI, click Infrastructure.
  2. Click a monitored host.

You can see a host dashboard with all the collected metrics and monitored processes.

Metrics are pulled every minute, which is the resolution that Azure provides for monitoring these services.

Configuration data machine learning workspace

Namespace details Description
Name Name of the Azure Machine Learning workspace
Resource Group Name of the resource group in which workspace is located
Location The location of the resource
Subscription Id Azure subscription identifier
createdAt The timestamp of resource creation (UTC)

Performance metrics machine learning workspace

Metric Name Unit Aggregation Description
Processor Utilization
Count CpuUtilizationPercentage Count Average The utilization percentage of a CPU node averaged in a minute
Count GpuUtilizationPercentage Count Average The utilization percentage of a GPU node averaged in a minute
Nodes
Count Active Nodes Count Average The nodes which are actively running a job within a minute
Count Total Nodes Count Average The sum of Active Nodes, Idle Nodes, Unusable Nodes, Premepted Nodes, and Leaving Nodes averaged over one-minute time interval
Cores
Count Active Core Count Average Number of active cores averaged in a minute
Count Total Cores Count Average Number of total cores averaged in a minute
Runs
Count Started Runs Count Total The total number of runs running for this workspace in a minute
Count Completed Runs Count Total The total number of runs completed successfully for this workspace in a minute
Count Cancelled Runs Count Total The total number of runs cancelled for this workspace in a minute
Count Errors Count Total The total number of run errors in this workspace within a minute
Count Failed Runs Count Total The total number of runs failed for this workspace in a minute
Disk
Count DiskUsedMegabytes Count Average The average disk space utilization in megabytes in a minute
Count DiskReadMegabytes Count Average The average of data read from disk in megabytes in a minute
Count DiskWriteMegabytes Count Average The average of data written into disk in megabytes in a minute