Orchestration metrics

You can use the orchestration metrics that are produced by intent operations to troubleshoot or monitor the performance of your system.

The following types of metrics are produced by the IBM® Cloud Pak for Network Automation microservices:
A counter is a cumulative metric that can increase in value or be reset to zero. For example, you can use counters to represent the number of requests or the number of errors in an application. The counter type is not used for metrics that can decrease in value.
A gauge is a metric that can increase or decrease in value. For example, you can use gauges to represent a measured value like temperature, current memory usage, or number of concurrent requests.
A timer is a more complex metric that represents a count, a maximum, and a summary measurement over a specific time duration. For example, you can use timers to measure latencies or frequency of events. Timers are not produced by Site Planner.
The base names for the metric timers are listed in the following tables. However, you can see variations of these metric names in the Prometheus output. For example, for the assembly_fetch timer, the following metrics are shown in the output:
  • assembly_fetch_seconds_max
  • assembly_fetch_seconds_count
  • assembly_fetch_seconds_sum
Histogram (Site Planner only)
A histogram is a metric that can be used to track the distribution of a set of recorded values such as request latency. A histogram consists of a count, a sum, and a set of buckets that are configured with high and low boundaries.

The orchestration metrics in the following tables are available in IBM Cloud Pak for Network Automation.

Note: A Yes value in the Supports tracing tags? column indicates that the metric can contain tracing tags, which can be used to categorize the metric.
View the metrics that are produced by the following microservices:

Ishtar metrics

Metric Type Supports tracing tags? Description
assemblies_topn_fetch Timer Yes The time that is taken for requests to fetch the topology of N assemblies most recently undergoing a transition.
assembly_delete Timer Yes The time that is taken for requests to delete the topology of an assembly.
assembly_fetch Timer Yes The time that is taken for requests to fetch the topology of an assembly.
hibernate_query_execution_total_seconds Timer No The latency of database queries for the microservice.

Daytona metrics

Metric Type Supports tracing tags? Description
process_generation Timer Yes The duration of time taken for the generation of a process for an assembly transition.
resource_transition_duration Timer Yes The duration of time for the Brent resource manager to complete a resource transition that is part of a process.
tasks_completion_time Timer Yes For tasks that are not related to resource transitions, the duration of time that is taken to complete a task in a process.
transitions_completion_time Timer Yes The duration of assembly processes.
intents Counter Yes The number of intents to reach each stage of a process, including the retry accepted, rollback accepted, and rollback successful stages. Includes tags that show the state of the intent and the name of the descriptor.
intents_failed Counter Yes The number of intent requests that are not accepted because of failure or rejection, including Cancel, Retry, and Rollback requests. An amendment tag identifies when the metric is related to a Cancel, Retry, or Rollback request.

This metric does not include intents that fail during processing. See the intents metric for that failure data.

tasks Counter Yes The number of process tasks at each stage of processing. Tagged with the state and the type of the task.
transitions Counter Yes The number of transitions to reach an unrecoverable error state.
activeCalls Gauge No The number of requests that are made from the intent engine to other internal services such as Brent and Galileo. This metric does not apply to external systems.
activeProcesses Gauge No The number of processes that are currently being handled by the intent engine.
activeTaskResponses Gauge No The number of task responses that are currently being handled by the intent engine.
activeTasks Gauge No The number of process tasks that are currently being handled by the intent engine.

Talladega metrics

Metric Type Supports tracing tags? Description
process_create Timer Yes The time that is taken for requests to create a process.
process_delete Timer Yes The time that is taken for requests to delete a process.
process_fetch Timer Yes The time that is taken for requests to fetch the details of a process by a unique identifier.
process_query Timer Yes The time that is taken for requests to search for one or more processes.
process_summaries_fetch Timer Yes The time that is taken for requests to fetch a summary of all processes for one or more assemblies.
process_update Timer Yes The time that is taken for requests to update a process.
process_update_shallow Timer Yes The time that is taken for requests to update the metadata of a process.
task_operations_fetch Timer Yes The time that is taken for requests to fetch details for all operations for a specific resource.
tasks_fetch Timer Yes The time that is taken for requests to fetch the tasks of a process.
tasks_update Timer Yes The time that is taken for requests to update the tasks of a process.
hibernate_query_execution_total_seconds Timer No The latency of database queries for the microservice.

Galileo metrics

Metric Type Supports tracing tags? Description
assemblies_topn_fetch Timer Yes The time that is taken for requests to fetch the topology of N assemblies most recently undergoing a transition.
assembly_component_fetch Timer Yes The time that is taken for requests to fetch the details of a component within an assembly.
assembly_component_update Timer Yes The time that is taken for requests to update a component within an assembly.
assembly_create Timer Yes The time that is taken for requests to persist the topology of an assembly.
assembly_delete Timer Yes The time that is taken for requests to delete the topology of an assembly.
assembly_fetch Timer Yes The time that is taken for requests to fetch the topology of an assembly.
assembly_referencing_fetch Timer Yes The time that is taken for requests to fetch details of assemblies that reference a specific assembly.
assembly_update Timer Yes The time that is taken for requests to persist updates to the topology of an assembly.
meter_counter_galileo_createDeploymentLocation Counter No The number of successful requests to persist details of a deployment location.
meter_counter_galileo_createResourceManager Counter No The number of successful requests to persist details of a resource manager.
meter_counter_galileo_deleteDeploymentLocation Counter No The number of successful requests to delete a deployment location.
meter_counter_galileo_deleteResourceManager Counter No The number of successful requests to delete a resource manager.
meter_counter_galileo_getDeploymentLocation Counter No The number of successful requests to fetch the details of a deployment location.
meter_counter_galileo_getDeploymentLocations Counter No The number of successful requests to fetch the details of all deployment locations.
meter_counter_galileo_getResourceManager Counter No The number of successful requests to fetch details of a resource manager.
meter_counter_galileo_getResourceManagers Counter No The number of successful requests to fetch details of all resource managers.
meter_counter_galileo_updateDeploymentLocation Counter No The number of successful requests to persist updates to the details of a deployment location.
meter_counter_galileo_updateResourceManager Counter No The number of successful requests to persist updates to the details of a resource manager.
hibernate_query_execution_total_seconds Timer No The latency of database queries for the microservice.

Brent metrics

Metric Type Supports tracing tags? Description
hibernate_query_execution_total_seconds Timer No The latency of database queries for the microservice.

PostgreSQL metrics

Metric Type Supports tracing tags? Description
cnp_assembly_instance_toplevel_count Gauge No The number of higher-level assemblies in your orchestration instance.
cnp_assembly_instance_count Gauge No The number of assembly instances.
cnp_resource_instance_count Gauge No The number of resource instances.

For more information, including how to define custom PostgreSQL metrics, see PostgreSQL metrics.

Site Planner metrics

The following table lists the Site Planner metrics that support tracing tags. These metrics can contain tracing tags only if the metrics are produced by build and tear down requests.
Metric Type Supports tracing tags? Description
django_http_responses_total_by_status_view_method_total Counter Yes The number of responses that are returned, labeled by user interface or REST API endpoint, status, and HTTP method.
django_http_requests_total_by_view_transport_method_total Counter Yes The number of requests that are made, labeled by user interface or REST API endpoint and HTTP method.
django_http_requests_latency_seconds_by_view_method Histogram Yes A histogram of request processing time, labeled by user interface or REST API endpoint.
The following table lists other useful Site Planner metrics. You can view a full list of the metrics at the /metrics endpoint URL for Site Planner. For example, the URL might be similar to this: https://lifecycle-manager.apps.cp4na.cp.fyre.ibm.com/site-planner/ui/metrics.
Metric Type Supports tracing tags? Description
django_model_inserts_total Counter No The number of insert operations by model.
django_model_updates_total Counter No The number of update operations by model.
django_model_deletes_total Counter No The number of delete operations by model.
django_db_query_duration_seconds Histogram No A histogram of database query duration.

For more information about the Prometheus metric types that are used by Site Planner, see Prometheus Metric Types.