Creating custom integrations
IBM Cloud Pak for AIOps provides many connectors, probes, and observers by default for setting up integrations with common IBM and third-party systems and services. If a default integration is not available for connecting to a system or service that you want to use with IBM Cloud Pak for AIOps, you can set up a custom integration to connect to that system or service.
You can use a custom integration to ingest user-defined log data, which is used to establish a baseline of normal behavior and then identify anomalies. These anomalies can be correlated with other alerts and events and published to your ChatOps interface, which help you determine the cause and resolution of a problem.
Due to each user-defined data source having different formats and sizes, not all log sources can be ingested directly with this custom integration. For more information, see Data source requirements.
Important: Custom integrations cannot use historical data for initial AI training.
Note: As an alternative to using the Custom integration that is provided by default, you can create your own integration connector to interact with IBM Cloud Pak for AIOps. For instance you can create a custom connector for sending events, metrics, and topology data to IBM Cloud Pak for AIOps from an external source when a default integration is not available. For more information see Creating custom integrations using the integration SDK
For more information about working with custom integrations, see the following sections:
- Data source requirements
- Creating custom integrations
- Enabling custom integrations
- Editing custom integrations
- Deleting custom integrations
For more information about HTTP headers for the various credential types, see HTTP headers for credential types.
Data source requirements
Before creating the integration, you should be aware of the following information.
-
Format of source data: The data that is ingested with the log integration must be in newline-delimited JSON format. Each individual log entry must be on a separate line and each log entry must be less than 1 MB, otherwise the custom integration is not able to ingest the data. The following code snippet shows an example of the newline-delimited JSON format:
{“message”: “exampleMessage”, “time”: 500} {“message”: “exampleMessage”, “time”: 500}
Custom integrations support the ingestion of flat JSON source data but do not support nested key-value pairs. The following examples illustrate what is supported and what is not.
Supported JSON source data: the following code snippet shows an example of flat JSON source data.
{ "message": "exampleMessage", "time": "..." }
Unsupported JSON source data: the following code snippet shows an example of nested key-value pairs.
{ "log":["message": "exampleMessage","timezone": "exampleTimezone"], "time": "..." }
Important: If your data does not conform with the preceding format, the custom integration is not be able to ingest the data. Review the requirements before creating your custom integration.
-
Load: To prevent this integration placing an inordinate load on your data source and potentially impacting your logging operations, this integration only connects to one API with a default data frequency of 60 seconds. This is controlled by using the Sampling rate setting in the Procedure section.
-
Access: Custom data sources are cloud-based REST APIs. Access is configured by using the authentication methods that are specified in the Authentication type setting in the Procedure section.
-
Data volume: Data volume depends on the application, and is not a set value. Therefore, it does not appear in the settings.
Note: If you want to collect log data from multiple systems and then route that information through a forwarding agent, such as an Apache Kafka topic (for example, Sysdig or FluentBit), you must use the Kafka integration type. For more information about Kafka integrations, see Creating Kafka integrations.
Creating custom integrations
The custom integration type collects newline-delimited JSON from a single specific data source.
Procedure
To create a custom integration from a specific source, step through the following sections:
- Adding a Custom integration
- Specifying integration parameters
- Specifying field mapping
- Specifying how log data is collected for AI training
Adding a Custom integration
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
On the Integrations page, click Add integration.
-
From the list of available integrations, find and click the Custom tile.
Note: If you do not immediately see the integration that you want to create, you can filter the tiles by type of integration. Click the type of integration that you want in the Category section.
-
On the side-panel, review the instructions and when ready to continue, click Get started.
Specifying integration parameters
-
On the Add integration page enter the following integration information:
-
Name: The display name of your integration.
-
Description: An optional description for the integration.
-
Extract data from an array of log lines embedded in JSON: When enabled, logs are extracted from an array that is embedded within a JSON object that is returned by the REST service. This option works only for an array of flat JSON objects that is embedded within a JSON object.
-
Embedded array path: Type the path to the array from which you want to extract log data. The following examples, show how to configure the array path:
{“results”: [{“message”: “myMessage”}]} would be results[*] {“results”: {“events”: [{“message”: “myMessage”}]}} would be results.events[*] {“results”: [{“events”: [{“message”: “myMessage”}]}]} would be results[0].events[*]
-
REST method: The method by which your data gets transferred, either
GET
orPOST
. If you select POST a POST JSON (Optional) field appears also. -
Authentication type: Select one of the following values:
-
User ID/password: The user ID and password that is required for accessing the REST service. You must enter both in the integration configuration. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: Basic MyCredential'. 'MyCredential' is base64 encoded and in the format 'username:password'.
-
API key: The REST service is authenticated with an API key. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: MyAPIKey'.
-
Token: The REST service is authenticated with a temporary token. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: Bearer MyToken'.
-
Query Key: The REST service is authenticated with a query key. The HTTPS authentication header is sent to the Service URL in the format
X-Query-Key: MyQueryKey
. -
Custom: Your custom REST service has specific authentication requirements that are not covered by the other authentication methods. You must enter both a Header and Value to configure your authentication. Note, both the Header and Value are plain text properties that are case-sensitive and does not remove spaces. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: MyHeader MyValue'.
-
None: Your custom REST service requires no authentication. No Authorization header is sent to the target system.
-
-
Certificate (optional): Certificate used to verify the SSL/TLS connection to the REST service.
-
UI URL: The hostname or IP address of the UI of your custom source.
-
Service URL: The URL for the service that you want to integrate. For more information about customizing your Service URL, see Customizing GET and POST integrations
-
Base parallelism:It is recommended to use a higher value than 1 so that you can process data in parallel. In a small environment the available flinks slots is 16 and for a large environment, the maximum available slots is 32.
-
Sampling rate: The rate at which data is pulled from live source (in seconds). The default value is
60
. -
JSON processing option: Select a JSON processing option.
-
None: The default option. The JSON is not processed or modified.
-
Flatten: This option flattens the JSON object by removing the opening and closing braces.
-
Filter: This option extracts the JSON object and replaces it with an empty string.
-
For more information about the options, see Managing embedded JSON.
Note: To improve data throughput, you can increase the base parallelism value incrementally. For more information about maximum base parallelism for starter and production deployments, see Improving data streaming performance for log anomaly detection.
-
-
-
Click Test connection.
-
Click Next to move to the next page.
Specifying field mapping
-
On the Field mapping page you can improve search performance by mapping the fields from your implementation fields to standard fields within IBM Cloud Pak for AIOps.
- For more information about how field mappings are defined, see Mapping data from incoming sources.
- For more information about using mappings to clean your data for use in IBM Cloud Pak for AIOps, see Cleaning mapped data using regular expressions.
The following code snippet displays an example of field mapping that uses the supported format. When you are coding your mapping, use this example to help you.
{ "codec": "custom", "message_field": "message", "log_entity_types": "kubernetes.container_name", "instance_id_field": "_app", "timestamp_settings":{ "timestamp_field":"_ts", <MANDATORY> "multiplier": 1000, <OPTIONAL> "pattern": "yyyy-MM-dd hh:mm:ss z" <OPTIONAL, SPECIFIC TO RAW DATA OUTPUT> } }
If you are configuring an integration to WebSphere, then you must use the following mapping. In this mapping, all fields must be specified exactly as shown, except for the
log_entity_types
parameter. This can be empty, as shown in the following example.{ "codec": "custom", "rolling_time": 10, "instance_id_field": "ibm_serverName", "log_entity_types": "", "message_field": "message", "timestamp_settings": { "timestamp_field": "ibm_datetime", "pattern": "yyyy-MM-dd'T'HH:mm:ss.SSSZZZZZ" } }
For more information about specific custom mappings, see Custom mappings.
Alternatively, if you want to include specific fields from your WebSphere record within the log alert
resource
field to be generated by the log anomaly detection algorithms, then specify those WebSphere fields in thelog_entity_types
parameter, as shown in the example below.{ "codec": "custom", "rolling_time": 10, "instance_id_field": "ibm_serverName", "log_entity_types": "host,ibm_userDir,ibm_serverName,module", "message_field": "message", "timestamp_settings": { "timestamp_field": "ibm_datetime", "pattern": "yyyy-MM-dd'T'HH:mm:ss.SSSZZZZZ" } }
For more information about the log alerts generated by the log anomaly detection algorithms, see the following topics:
-
Click Next to move to the next page.
Specifying how log data is collected for AI training
-
On the AI training and log data page, select how you want to manage collecting data for use in AI training and anomaly detection. Click the Data collection toggle to turn on data collection then select how you want to collect data:
-
Live data for continuous AI training and anomaly detection: A continuous collection of data from your integration is used to both train AI models and analyze your data for anomalous behavior.
Note: After an initial installation, there is no data at all in the system. If you select this option, then the two different log anomaly detection algorithms behave in the following ways:
-
Natural language log anomaly detection does not initially detect anomalies as no model has been trained. If Live data for continuous AI training and anomaly detection is set to on, the system gathers training data live and after a few days there is enough data to train a model. When this model is deployed, then it detects anomalies as normal.
-
Statistical baseline log anomaly detection does not detect anomalies for the first 30 minutes of data collection. This is because it does not have a baseline yet. After 30 minutes of live data collection the baseline is automatically created. After that it detects anomalies on an ongoing basis, while continuing to gather data and improve its model every 30 minutes.
-
-
Live data for initial AI training: A single set of training data used to define your AI model. Data collection takes place over a specified time period that starts when you create your integration.
Note: Selecting this option causes the system to continue to collect data while the option is enabled; however, the data is collected for training only, and not for log anomaly detection. For more information about AI model training, including minimum and ideal data quantities, see Configuring AI training.
-
-
Important: Keep in mind the following considerations when you select your data collection type:
- Anomaly detection for your integration occurs if you select Live data for continuous AI training and anomaly detection.
- Custom integrations cannot use historical data for AI training.
- Different types of AI models have different requirements to properly train a model. Make sure that your settings satisfy minimum data requirements. For more information about how much data you need to train different AI models, see Configuring AI training.
-
Click Next.
-
On the Resource requirements page, you can review the slot usage for your log integrations to see if there are enough slots to fully support the integration for multizone high availability.
If you set the Data collection toggle to On, you will see the resource management overview.
-
If your current usage and other usage are less than the provisioned slots, but the HA slots exceed the provisioned slots, you will be able to create the integration, but will see a warning that you do not have enough slots. The integration will not have multizone high availability.
-
If your projected usage exceeds the provisioned slots, you will not be able to create the integration because you do not have enough slots on your system for log data integrations.
-
If your total slots, including HA slots, are within the provisioned slots, the integration will have multizone high availability.
Note: HA operation assumes high availability for three zones.
If you set the Data collection toggle to Off, you will see a message stating that you need to enable logs data collection to see the resource management overview. When data collection is off, no slots are used by that integration.
-
-
Click Done.
You have created a custom integration in your instance. After you create your integration, you must enable the data collection to connect your integration with the AI of IBM Cloud Pak for AIOps. For more information about enabling your integration, see Enabling custom integrations.
To create more integrations (such as a ChatOps integration), see Configuring Integrations.
For more information about working with the insights provided by your integrations, see ChatOps insight management.
Customizing GET
and POST
integrations
GET
integrations
If you customize the Service URL of your integration, you must include the start and end time in the URL with properly formatted timestamps (SimpleDateFormat
). For epoch timestamps, milliseconds and seconds are supported.
The following timestamp identifiers are supported in the Service URL field:
- <start-epoch-ms>, <end-epoch-ms>. For example,
http://exampleurl.com/v1/export?to=<end-epoch-ms>&from=<start-epoch-ms>
. - <start-epoch-s>, <end-epoch-s>. For example,
http://exampleurl.com/v1/export?to=<end-epoch-s>&from=<start-epoch-s>
. - <start-
SimpleDateFormat
>, <end-SimpleDateFormat
>. For example,http://exampleurl.com/v1/export?to=<end-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>&from=<start-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>
.
POST
integrations
For POST
integrations, you do not need to customize the Service URL value. For these integrations, the payload is in the body of the JSON and you can customize the format, but must include the start and end times with
properly formatted timestamps (SimpleDateFormat
). For epoch timestamps, milliseconds and seconds are supported. The following timestamp identifiers are supported in the Service URL field:
- <start-epoch-ms>, <end-epoch-ms>. For example,
{"queryString":"exampleQuery","start":"<start-epoch-ms>", "end":"<end-epoch-ms>","isLive":false}
- <start-epoch-s>, <end-epoch-s>. For example,
{"queryString":"exampleQuery","start":"<start-epoch-s>", "end":"<end-epoch-s>","isLive":false}
- <start-
SimpleDateFormat
>, <end-SimpleDateFormat
>. For example,{"query": {"range": {"@timestamp": {"gte": "<start-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>","lt": "<end-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>"}}},"size": 10000}
Custom mappings
Custom mappings allow more flexibility regarding timestamp settings. You must provide the mapping information about the timestamp field and how to parse the timestamp. Custom mappings support unnested JSON inputs.
The basic building blocks are similar to other integrations. The rolling_time
, instance_id_field
, log_entity_types
, and message_field
mappings are defined in the same way. For custom integrations,
an extra timestamp_settings
mapping needs to be specified. It contains the required timestamp_field
to tell what field contains the timestamp, and two optional fields pattern
and multiplier
.
The pattern
field must be used when the timestamps must be parsed for a String
representation, and the multiplier
field must be used when the timestamp is represented as the Double
type.
When timestamps are formatted correctly, as 13-digit UNIX® timestamps, you do not need to specify the pattern
or multiplier
fields. You need to define the timestamp_field
field as in the following example:
{
"rolling_time": 10
"instance_id_field": "_app",
"log_entity_types": "kubernetes.container_name",
"message_field": "message",
"timestamp_settings":{
"timestamp_field":"_ts"
}
{
Custom mapping with String type timestamps
When timestamps are represented as Strings, they must be parsed and converted to 13-digit UNIX timestamps. In this case, the pattern
field must contain the parsing pattern. For example, if the timestamp is 2020-12-31 23:59:59 PST
,
the integration mapping can be defined as follows:
{
"rolling_time": 10,
"instance_id_field": "_account",
"log_entity_types": "kubernetes.container_name",
"message_field": "_line",
"timestamp_settings":{
"timestamp_field":"_ts",
"pattern": "yyyy-MM-dd hh:mm:ss z"
}
}
Custom mapping with Double type timestamps
It might occur that timestamps are not represented in milliseconds. In this case, you must multiply or divide the input timestamps to obtain the milliseconds. The following example illustrates how you can use the multiplier
field to obtain milliseconds.
Consider the following source timestamp without milliseconds:
{
"_ts": 1569873609.539,
"_account": "...",
"_cluster": "...",
"_host": "...",
"_logtype": "...",
"_line": "...",
"_id": "...",
"__key": "...",
}
You can set the multiplier
field in the mapping to show the timestamps in milliseconds:
{
"rolling_time": 10,
"instance_id_field": "_account",
"log_entity_types": "kubernetes.container_name",
"message_field": "_line",
"timestamp_settings":{
"timestamp_field":"_ts",
"multiplier": 1000
}
}
Custom mapping with static instance ID support
You can input a static value for the instance_id_field
in the IBM Cloud Pak for AIOps pipeline. See the following example:
{
"rolling_time":10,
"instance_id_field":"%aStaticInstanceId%",
"log_entity_types":"kubernetes.container_name",
"message_field":"_line",
"timestamp_settings":{
"timestamp_field":"_ts",
"pattern":"yyyy-MM-dd hh:mm:ss z"
}
}
The custom mapping places the value that is contained between %...%
in the instance_id_field
field throughout the IBM Cloud Pak for AIOps data collection. The following example illustrates the resulting data from
the preceding custom mapping:
{
"message":"exampleLogMessage",
"instance_id":"aStaticInstanceId",
"entities":{
"pod":"examplePod",
"_cluster":"exampleCluster",
"container":"exampleContainer"
},
...,
"timestamp":1605195377154
}
Supported custom-mapping fields
Field Name | Description | Default Value |
---|---|---|
rolling_time | The recurring time interval to capture data from in seconds. | 10 |
instance_id_field | The field name in the incoming data that contains the application name. | kubernetes.namespace_name, kubernetes.host, kubernetes.container_name |
log_entity_types | The field name in the incoming data that contains the entity to extract and analyze. | kubernetes.container_name |
message_field | The field name in the incoming data that contains the log message. | @rawstring |
standard_entity_types | Standardized fields. If declared, the performance for event grouping can be improved. | None |
standard_entity_types.pod_name | Standardized field. If declared, the performance for event grouping can be improved. | pod |
standard_entity_types.node_name | Standardized field. If declared, the performance for event grouping can be improved. | node |
timestamp_settings | Customizable settings on timestamps from source's raw log output. | None |
timestamp_settings.timestamp_field | The field name in the incoming data that contains the data's timestamp. | _ts |
timestamp_settings.pattern (Optional) | Must be used when the timestamps must be parsed for a string representation. Custom integrations support SimpleDateFormat for timestamps in incoming data. |
yyyy-MM-dd hh:mm:ss z |
timestamp_settings.multiplier (Optional) | Used when epoch timestamps are represented as a double. | None |
Enabling and disabling custom integrations
If you didn't enable your data collection during creation, you can enable your integration afterward. You can also disable a previously enabled integration the same way. If you selected Live data for initial AI training when you created your integration, you must disable the integration before AI model training. To enable or disable a created integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
On the Manage integrations tab of the Integrations page, click the Custom integration type.
-
Click the integration that you want to enable or disable.
-
Go to the AI training and log data section. Set Data collection to On or Off to enable or disable data collection. Disabling data collection for an integration does not delete the integration.
Your integration is now enabled or disabled. For more information about deleting a integration, see Deleting custom integrations.
Editing custom integrations
After you create your integration, you can edit the integration. To edit a integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
Click the Custom integration type on the Manage integrations tab of the Integrations page.
-
On the Custom integrations page, click the name of the integration that you want to edit. Alternatively, you can click the options menu (three vertical dots) for the integration and click Edit. The integration configuration opens.
-
Edit your integration as required. Click Save when you are done editing.
Your integration is now edited. If you have not enabled or disabled your application, you can enable or disable the integration directly from the interface. For more information about enabling and disabling your integration, see Enabling and disabling custom integrations. For more information about deleting a integration, see Deleting custom integrations.
Deleting custom integrations
If you no longer need your custom integration and want to not only disable it, but delete it entirely, you can delete the integration from the console.
Note: You must disable data collection before deleting your integration. For more information about disabling data collection, see Enabling and disabling custom integrations.
To delete a integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
Click the Custom integration type on the Manage integrations tab of the Integrations page.
-
On the Custom integrations page, click the options menu (three vertical dots) for the integration that you want to delete and click Delete.
-
Enter the name of the integration to confirm that you want to delete your integration. Then, click Delete.
Your integration is deleted.