Performance considerations for observers
When you configure observer integrations in IBM Cloud Pak® for AIOps, then use the following considerations:
-
Observer filtering capabilities need to be used where possible to help fine-tune what the integration provides to Cloud Pak for AIOps. This approach ensures that Cloud Pak for AIOps focuses on managing only what is necessary to deliver value, while maximizing efficiency in both data source and data processing.
-
Observer data is sent to a Kafka topic for processing by the topology-service microservice, but the processing rate is significantly lower than the production rate, causing data to accumulate in the Kafka topic. If large observer jobs are run in parallel, processing might exceed Kafka's 8-hour default message retention time, leading to data loss, discarded control plane messages, and stuck jobs. To mitigate these risks, it is recommended to run large observer jobs sequentially and moderate the number of jobs that run in parallel.
-
Merge, group processing and populating the inventory database can continue after a given observer job is marked as finished, which indicates that the topology microservice has consumed and processed the data.
-
If you are using the File observer, files should be constrained to 1m resources to keep the quantity of tracked metadata and processing times manageable.
-
Observer load jobs automate the process of determining the differences between subsequent loads of the same data. The system simplifies the authoring of a data source by automatically handling changes between loads, such as resources A,B,C being provided at time 1 and resources A,C,D being provided at time 2. The observer downloads the set of identifiers for the previous run from the topology service REST APIs at job start-up and compares the IDs for time 1 and 2. The system automatically deletes any IDs that are not found in time 2, eliminating the need for a linger concept. The system resurrects resource B if it reappears at time 3.
-
If you are using the File or REST observers (or writing a new one), then consider the resource properties that are required to meet business goals. It is not useful to provide large number of properties if a better experience and more up-to-date data can be obtained through federation to the source by using a right-click contextual tool launch. Otherwise, every property needs to be stored and managed, which can slow the processing down. In extreme cases, Kafka might discard the message from the observer as the maximum message size is 1.5 Mb.
-
If you are using the File or REST observers (or writing a new one), consider the number of edges (relationships) that each resource contains to avoid the supernodes. A supernode in a topology graph is a node with disproportionately high number of relationships. It is recommended to exercise caution when you are exceeding 400 relationships incident to a given resource. The reason is that the user experience can suffer. In extreme cases, such as thousands of relationships, Kafka might discard the message from the observer as the maximum message size of Kafka is 1.5 Mb. Relationships in topology manager can accumulate across individual contributions to a resource by using merge processing.
-
If you are using the File or REST observers (or writing a new one), consider the order in which data is provided to the Topology Manager. For example, the order of lines in a file or the sequence in which the
applyVertex
method is executed in a COTS observer. Topology Manager contains the logic to handle out-of-sequence data and reassemble it, but this can consume the memory and reduce performance. Consider the case where resource A is related to resources B, C, and D. If A is sent first with relationships to B, C and D as part of it, then placeholders need to be tracked for B, C, and D until the full records arrive. In extreme cases, tens or hundreds of thousands of records might be generated after A is first seen. To minimize placeholder tracking, it is recommended to send B, C, and D before A, if possible. Alternatively, relate B, C, and D to A by using a relationship from/to each of those for A. To increase the likelihood that the resources that are referenced without relationships are previously sent to the topology service, you can generate a sorted list of resources in Python by using the following code:for resource in sorted(topology, key=lambda k: (len(topology[k]['_references']), topology[k]['uniqueId'])):
This code sorts the resources by the number of relationships and unique ID, ensuring that resources with fewer relationships are sent first.
-
When providing data to the topology service, you choose the operation to perform such as,
insert-replace
,insert-update
, ordelete
. The topology service performs the chosen operation.Insert-replace
is the simplest and most reliable method, as it does not require advanced user knowledge to use correctly. Incorrect use ofinsert-update
might result in a topology with decreasing accuracy over time. -
Observers can provide groups to Topology Manager automatically. Consider the meaning of group to the business, the number of groups created and the number of resources within each group. Groups over 2,000 resources can result in a poor user experience. Defining groups by using templates is recommended, as it provides greater administrative control over the creation of groups, including how, when, and why they are created. If a group must be created by using an observer directly, then provide an option to suppress or filter the groups that it creates.