Orchestration metrics
You can use the orchestration metrics that are produced by intent operations to troubleshoot or monitor the performance of your system.
The following types of metrics are produced by the IBM® Cloud Pak for Network
Automation microservices:
- Counter
- A counter is a cumulative metric that can increase in value or be reset to zero. For example, you can use counters to represent the number of requests or the number of errors in an application. The counter type is not used for metrics that can decrease in value.
- Gauge
- A gauge is a metric that can increase or decrease in value. For example, you can use gauges to represent a measured value like temperature, current memory usage, or number of concurrent requests.
- Timer
- A timer is a more complex metric that represents a count, a maximum, and a summary measurement
over a specific time duration. For example, you can use timers to measure latencies or frequency of
events. Timers are not produced by Site Planner. The base names for the metric timers are listed in the following tables. However, you can see variations of these metric names in the Prometheus output. For example, for the
assembly_fetch
timer, the following metrics are shown in the output:- assembly_fetch_seconds_max
- assembly_fetch_seconds_count
- assembly_fetch_seconds_sum
- Histogram (Site Planner only)
- A histogram is a metric that can be used to track the distribution of a set of recorded values such as request latency. A histogram consists of a count, a sum, and a set of buckets that are configured with high and low boundaries.
The orchestration metrics in the following tables are available in IBM Cloud Pak for Network Automation.
Note: A
Yesvalue in the
Supports tracing tags?
column indicates that the
metric can contain tracing tags, which can be used to categorize the metric.View the metrics that are produced by the following microservices:
Ishtar metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
assemblies_topn_fetch | Timer | Yes | The time that is taken for requests to fetch the topology of N assemblies most recently undergoing a transition. |
assembly_delete | Timer | Yes | The time that is taken for requests to delete the topology of an assembly. |
assembly_fetch | Timer | Yes | The time that is taken for requests to fetch the topology of an assembly. |
hibernate_query_execution_total_seconds | Timer | No | The latency of database queries for the microservice. |
Daytona metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
process_generation | Timer | Yes | The duration of time taken for the generation of a process for an assembly transition. |
resource_transition_duration | Timer | Yes | The duration of time for the Brent resource manager to complete a resource transition that is part of a process. |
tasks_completion_time | Timer | Yes | For tasks that are not related to resource transitions, the duration of time that is taken to complete a task in a process. |
transitions_completion_time | Timer | Yes | The duration of assembly processes. |
intents | Counter | Yes | The number of intents to reach each stage of a process, including the retry
accepted , rollback accepted , and rollback successful
stages. Includes tags that show the state of the intent and the name of the descriptor. |
intents_failed | Counter | Yes | The number of intent requests that are not accepted because of failure or rejection,
including Cancel , Retry , and Rollback requests.
An amendment tag identifies when the metric is related to a
Cancel , Retry , or Rollback request. This
metric does not include intents that fail during processing. See the |
tasks | Counter | Yes | The number of process tasks at each stage of processing. Tagged with the
state and the type of the task. |
transitions | Counter | Yes | The number of transitions to reach an unrecoverable error state. |
activeCalls | Gauge | No | The number of requests that are made from the intent engine to other internal services such as Brent and Galileo. This metric does not apply to external systems. |
activeProcesses | Gauge | No | The number of processes that are currently being handled by the intent engine. |
activeTaskResponses | Gauge | No | The number of task responses that are currently being handled by the intent engine. |
activeTasks | Gauge | No | The number of process tasks that are currently being handled by the intent engine. |
Talladega metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
process_create | Timer | Yes | The time that is taken for requests to create a process. |
process_delete | Timer | Yes | The time that is taken for requests to delete a process. |
process_fetch | Timer | Yes | The time that is taken for requests to fetch the details of a process by a unique identifier. |
process_query | Timer | Yes | The time that is taken for requests to search for one or more processes. |
process_summaries_fetch | Timer | Yes | The time that is taken for requests to fetch a summary of all processes for one or more assemblies. |
process_update | Timer | Yes | The time that is taken for requests to update a process. |
process_update_shallow | Timer | Yes | The time that is taken for requests to update the metadata of a process. |
task_operations_fetch | Timer | Yes | The time that is taken for requests to fetch details for all operations for a specific resource. |
tasks_fetch | Timer | Yes | The time that is taken for requests to fetch the tasks of a process. |
tasks_update | Timer | Yes | The time that is taken for requests to update the tasks of a process. |
hibernate_query_execution_total_seconds | Timer | No | The latency of database queries for the microservice. |
Galileo metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
assemblies_topn_fetch | Timer | Yes | The time that is taken for requests to fetch the topology of N assemblies most recently undergoing a transition. |
assembly_component_fetch | Timer | Yes | The time that is taken for requests to fetch the details of a component within an assembly. |
assembly_component_update | Timer | Yes | The time that is taken for requests to update a component within an assembly. |
assembly_create | Timer | Yes | The time that is taken for requests to persist the topology of an assembly. |
assembly_delete | Timer | Yes | The time that is taken for requests to delete the topology of an assembly. |
assembly_fetch | Timer | Yes | The time that is taken for requests to fetch the topology of an assembly. |
assembly_referencing_fetch | Timer | Yes | The time that is taken for requests to fetch details of assemblies that reference a specific assembly. |
assembly_update | Timer | Yes | The time that is taken for requests to persist updates to the topology of an assembly. |
meter_counter_galileo_createDeploymentLocation | Counter | No | The number of successful requests to persist details of a deployment location. |
meter_counter_galileo_createResourceManager | Counter | No | The number of successful requests to persist details of a resource manager. |
meter_counter_galileo_deleteDeploymentLocation | Counter | No | The number of successful requests to delete a deployment location. |
meter_counter_galileo_deleteResourceManager | Counter | No | The number of successful requests to delete a resource manager. |
meter_counter_galileo_getDeploymentLocation | Counter | No | The number of successful requests to fetch the details of a deployment location. |
meter_counter_galileo_getDeploymentLocations | Counter | No | The number of successful requests to fetch the details of all deployment locations. |
meter_counter_galileo_getResourceManager | Counter | No | The number of successful requests to fetch details of a resource manager. |
meter_counter_galileo_getResourceManagers | Counter | No | The number of successful requests to fetch details of all resource managers. |
meter_counter_galileo_updateDeploymentLocation | Counter | No | The number of successful requests to persist updates to the details of a deployment location. |
meter_counter_galileo_updateResourceManager | Counter | No | The number of successful requests to persist updates to the details of a resource manager. |
hibernate_query_execution_total_seconds | Timer | No | The latency of database queries for the microservice. |
Brent metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
hibernate_query_execution_total_seconds | Timer | No | The latency of database queries for the microservice. |
PostgreSQL metrics
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
cnp_assembly_instance_toplevel_count | Gauge | No | The number of higher-level assemblies in your orchestration instance. |
cnp_assembly_instance_count | Gauge | No | The number of assembly instances. |
cnp_resource_instance_count | Gauge | No | The number of resource instances. |
For more information, including how to define custom PostgreSQL metrics, see PostgreSQL metrics.
Site Planner metrics
The following table lists the Site Planner metrics that support tracing tags. These metrics can
contain tracing tags only if the metrics are produced by build and tear down requests.
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
django_http_responses_total_by_status_view_method_total | Counter | Yes | The number of responses that are returned, labeled by user interface or REST API endpoint, status, and HTTP method. |
django_http_requests_total_by_view_transport_method_total | Counter | Yes | The number of requests that are made, labeled by user interface or REST API endpoint and HTTP method. |
django_http_requests_latency_seconds_by_view_method | Histogram | Yes | A histogram of request processing time, labeled by user interface or REST API endpoint. |
The following table lists other useful Site Planner metrics. You can view a full list of the
metrics at the
/metrics
endpoint URL for Site Planner. For example, the URL might be similar to this:
https://lifecycle-manager.apps.cp4na.cp.fyre.ibm.com/site-planner/ui/metrics.
Metric | Type | Supports tracing tags? | Description |
---|---|---|---|
django_model_inserts_total | Counter | No | The number of insert operations by model. |
django_model_updates_total | Counter | No | The number of update operations by model. |
django_model_deletes_total | Counter | No | The number of delete operations by model. |
django_db_query_duration_seconds | Histogram | No | A histogram of database query duration. |
For more information about the Prometheus metric types that are used by Site Planner, see Prometheus Metric Types.