Creating custom integrations

IBM Cloud Pak for AIOps provides many connectors, probes, and observers by default for setting up integrations with common IBM and third-party systems and services. If a default integration is not available for connecting to a system or service that you want to use with IBM Cloud Pak for AIOps, you can set up a custom integration to connect to that system or service.

You can use a custom integration to ingest user-defined log data, which is used to establish a baseline of normal behavior and then identify anomalies. These anomalies can be correlated with other alerts and events and published to your ChatOps interface, which help you determine the cause and resolution of a problem.

Due to each user-defined data source having different formats and sizes, not all log sources can be ingested directly with this custom integration. For more information, see Data source requirements.

Important: Custom integrations cannot use historical data for initial AI training.

Note: As an alternative to using the Custom integration that is provided by default, you can create your own integration connector to interact with IBM Cloud Pak for AIOps. For instance you can create a custom connector for sending events, metrics, and topology data to IBM Cloud Pak for AIOps from an external source when a default integration is not available. For more information see Creating custom integrations using the integration SDK

For more information about working with custom integrations, see the following sections:

For more information about HTTP headers for the various credential types, see HTTP headers for credential types.

Data source requirements

Before creating the integration, you should be aware of the following information.

  • Format of source data: The data that is ingested with the log integration must be in newline-delimited JSON format. Each individual log entry must be on a separate line and each log entry must be less than 1 MB, otherwise the custom integration is not able to ingest the data. The following code snippet shows an example of the newline-delimited JSON format:

    {“message”: “exampleMessage”, “time”: 500}
    
    {“message”: “exampleMessage”, “time”: 500}
    

    Custom integrations support the ingestion of flat JSON source data but do not support nested key-value pairs. The following examples illustrate what is supported and what is not.

    Supported JSON source data: the following code snippet shows an example of flat JSON source data.

    {
       "message": "exampleMessage",
       "time": "..."
    }
    

    Unsupported JSON source data: the following code snippet shows an example of nested key-value pairs.

    {
       "log":["message": "exampleMessage","timezone": "exampleTimezone"],
       "time": "..."
    }
    

    Important: If your data does not conform with the preceding format, the custom integration is not be able to ingest the data. Review the requirements before creating your custom integration.

  • Load: To prevent this integration placing an inordinate load on your data source and potentially impacting your logging operations, this integration only connects to one API with a default data frequency of 60 seconds. This is controlled by using the Sampling rate setting in the Procedure section.

  • Access: Custom data sources are cloud-based REST APIs. Access is configured by using the authentication methods that are specified in the Authentication type setting in the Procedure section.

  • Data volume: Data volume depends on the application, and is not a set value. Therefore, it does not appear in the settings.

Note: If you want to collect log data from multiple systems and then route that information through a forwarding agent, such as an Apache Kafka topic (for example, Sysdig or FluentBit), you must use the Kafka integration type. For more information about Kafka integrations, see Creating Kafka integrations.

Creating custom integrations

The custom integration type collects newline-delimited JSON from a single specific data source.

Procedure

To create a custom integration from a specific source, step through the following sections:

Adding a Custom integration

  1. Log in to IBM Cloud Pak for AIOps console.

  2. Expand the navigation menu (four horizontal bars), then click Define > Integrations.

  3. On the Integrations page, click Add integration.

  4. From the list of available integrations, find and click the Custom tile.

    Note: If you do not immediately see the integration that you want to create, you can filter the tiles by type of integration. Click the type of integration that you want in the Category section.

  5. On the side-panel, review the instructions and when ready to continue, click Get started.

    Custom integration

Specifying integration parameters

  1. On the Add integration page enter the following integration information:

    • Name: The display name of your integration.

    • Description: An optional description for the integration.

    • Extract data from an array of log lines embedded in JSON: When enabled, logs are extracted from an array that is embedded within a JSON object that is returned by the REST service. This option works only for an array of flat JSON objects that is embedded within a JSON object.

      Custom integration

    • Embedded array path: Type the path to the array from which you want to extract log data. The following examples, show how to configure the array path:

      {“results”: [{“message”: “myMessage”}]} would be results[*]
      {“results”: {“events”: [{“message”: “myMessage”}]}} would be results.events[*]
      {“results”: [{“events”: [{“message”: “myMessage”}]}]} would be results[0].events[*]
      
    • REST method: The method by which your data gets transferred, either GET or POST. If you select POST a POST JSON (Optional) field appears also.

      Custom integration

    • Authentication type: Select one of the following values:

      • User ID/password: The user ID and password that is required for accessing the REST service. You must enter both in the integration configuration. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: Basic MyCredential'. 'MyCredential' is base64 encoded and in the format 'username:password'.

      • API key: The REST service is authenticated with an API key. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: MyAPIKey'.

      • Token: The REST service is authenticated with a temporary token. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: Bearer MyToken'.

      • Query Key: The REST service is authenticated with a query key. The HTTPS authentication header is sent to the Service URL in the format X-Query-Key: MyQueryKey.

      • Custom: Your custom REST service has specific authentication requirements that are not covered by the other authentication methods. You must enter both a Header and Value to configure your authentication. Note, both the Header and Value are plain text properties that are case-sensitive and does not remove spaces. The HTTPS authentication header is sent to the Service URL in the format 'Authorization: MyHeader MyValue'.

      • None: Your custom REST service requires no authentication. No Authorization header is sent to the target system.

    • Certificate (optional): Certificate used to verify the SSL/TLS connection to the REST service.

    • UI URL: The hostname or IP address of the UI of your custom source.

    • Service URL: The URL for the service that you want to integrate. For more information about customizing your Service URL, see Customizing GET and POST integrations

      Custom integration

    • Base parallelism:It is recommended to use a higher value than 1 so that you can process data in parallel. In a small environment the available flinks slots is 16 and for a large environment, the maximum available slots is 32.

    • Sampling rate: The rate at which data is pulled from live source (in seconds). The default value is 60.

    • JSON processing option: Select a JSON processing option.

      • None: The default option. The JSON is not processed or modified.

      • Flatten: This option flattens the JSON object by removing the opening and closing braces.

      • Filter: This option extracts the JSON object and replaces it with an empty string.

      • For more information about the options, see Managing embedded JSON.

      Note: To improve data throughput, you can increase the base parallelism value incrementally. For more information about maximum base parallelism for starter and production deployments, see Improving data streaming performance for log anomaly detection.

  2. Click Test connection.

  3. Click Next to move to the next page.

Specifying field mapping

  1. On the Field mapping page you can improve search performance by mapping the fields from your implementation fields to standard fields within IBM Cloud Pak for AIOps.

    The following code snippet displays an example of field mapping that uses the supported format. When you are coding your mapping, use this example to help you.

    {
      "codec": "custom",
      "message_field": "message",
      "log_entity_types": "kubernetes.container_name",
      "instance_id_field": "_app",
      "timestamp_settings":{
         "timestamp_field":"_ts", <MANDATORY>
         "multiplier": 1000, <OPTIONAL>
         "pattern": "yyyy-MM-dd hh:mm:ss z" <OPTIONAL, SPECIFIC TO RAW DATA OUTPUT>
      }
    }
    

    If you are configuring an integration to WebSphere, then you must use the following mapping. In this mapping, all fields must be specified exactly as shown, except for the log_entity_types parameter. This can be empty, as shown in the following example.

    {
    "codec": "custom",
    "rolling_time": 10,
    "instance_id_field": "ibm_serverName",
    "log_entity_types": "",
    "message_field": "message",
    "timestamp_settings": {
       "timestamp_field": "ibm_datetime",
       "pattern": "yyyy-MM-dd'T'HH:mm:ss.SSSZZZZZ"
                          }
    }
    

    For more information about specific custom mappings, see Custom mappings.

    Alternatively, if you want to include specific fields from your WebSphere record within the log alert resource field to be generated by the log anomaly detection algorithms, then specify those WebSphere fields in the log_entity_types parameter, as shown in the example below.

    {
    "codec": "custom",
    "rolling_time": 10,
    "instance_id_field": "ibm_serverName",
    "log_entity_types": "host,ibm_userDir,ibm_serverName,module",
    "message_field": "message",
    "timestamp_settings": {
       "timestamp_field": "ibm_datetime",
       "pattern": "yyyy-MM-dd'T'HH:mm:ss.SSSZZZZZ"
                          }
    }
    

    For more information about the log alerts generated by the log anomaly detection algorithms, see the following topics:

  2. Click Next to move to the next page.

Specifying how log data is collected for AI training

  1. On the AI training and log data page, select how you want to manage collecting data for use in AI training and anomaly detection. Click the Data collection toggle to turn on data collection then select how you want to collect data:

    • Live data for continuous AI training and anomaly detection: A continuous collection of data from your integration is used to both train AI models and analyze your data for anomalous behavior.

      Note: After an initial installation, there is no data at all in the system. If you select this option, then the two different log anomaly detection algorithms behave in the following ways:

      • Natural language log anomaly detection does not initially detect anomalies as no model has been trained. If Live data for continuous AI training and anomaly detection is set to on, the system gathers training data live and after a few days there is enough data to train a model. When this model is deployed, then it detects anomalies as normal.

      • Statistical baseline log anomaly detection does not detect anomalies for the first 30 minutes of data collection. This is because it does not have a baseline yet. After 30 minutes of live data collection the baseline is automatically created. After that it detects anomalies on an ongoing basis, while continuing to gather data and improve its model every 30 minutes.

    • Live data for initial AI training: A single set of training data used to define your AI model. Data collection takes place over a specified time period that starts when you create your integration.

      Note: Selecting this option causes the system to continue to collect data while the option is enabled; however, the data is collected for training only, and not for log anomaly detection. For more information about AI model training, including minimum and ideal data quantities, see Configuring AI training.

      Custom integration

  2. Important: Keep in mind the following considerations when you select your data collection type:

    • Anomaly detection for your integration occurs if you select Live data for continuous AI training and anomaly detection.
    • Custom integrations cannot use historical data for AI training.
    • Different types of AI models have different requirements to properly train a model. Make sure that your settings satisfy minimum data requirements. For more information about how much data you need to train different AI models, see Configuring AI training.
  3. Click Next.

  4. On the Resource requirements page, you can review the slot usage for your log integrations to see if there are enough slots to fully support the integration for multizone high availability.

    If you set the Data collection toggle to On, you will see the resource management overview.

    • If your current usage and other usage are less than the provisioned slots, but the HA slots exceed the provisioned slots, you will be able to create the integration, but will see a warning that you do not have enough slots. The integration will not have multizone high availability.

    • If your projected usage exceeds the provisioned slots, you will not be able to create the integration because you do not have enough slots on your system for log data integrations.

    • If your total slots, including HA slots, are within the provisioned slots, the integration will have multizone high availability.

      Note: HA operation assumes high availability for three zones.

    If you set the Data collection toggle to Off, you will see a message stating that you need to enable logs data collection to see the resource management overview. When data collection is off, no slots are used by that integration.

  5. Click Done.

You have created a custom integration in your instance. After you create your integration, you must enable the data collection to connect your integration with the AI of IBM Cloud Pak for AIOps. For more information about enabling your integration, see Enabling custom integrations.

To create more integrations (such as a ChatOps integration), see Configuring Integrations.

For more information about working with the insights provided by your integrations, see ChatOps insight management.

Customizing GET and POST integrations

GET integrations

If you customize the Service URL of your integration, you must include the start and end time in the URL with properly formatted timestamps (SimpleDateFormat). For epoch timestamps, milliseconds and seconds are supported. The following timestamp identifiers are supported in the Service URL field:

  • <start-epoch-ms>, <end-epoch-ms>. For example, http://exampleurl.com/v1/export?to=<end-epoch-ms>&from=<start-epoch-ms>.
  • <start-epoch-s>, <end-epoch-s>. For example, http://exampleurl.com/v1/export?to=<end-epoch-s>&from=<start-epoch-s>.
  • <start-SimpleDateFormat>, <end-SimpleDateFormat>. For example, http://exampleurl.com/v1/export?to=<end-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>&from=<start-yyyy-MM-dd'T'HH:mm:ss.SSSXXX>.

POST integrations

For POST integrations, you do not need to customize the Service URL value. For these integrations, the payload is in the body of the JSON and you can customize the format, but must include the start and end times with properly formatted timestamps (SimpleDateFormat). For epoch timestamps, milliseconds and seconds are supported. The following timestamp identifiers are supported in the Service URL field:

  • <start-epoch-ms>, <end-epoch-ms>. For example, {"queryString":"exampleQuery","start":"&lt;start-epoch-ms&gt;", "end":"&lt;end-epoch-ms&gt;","isLive":false}
  • <start-epoch-s>, <end-epoch-s>. For example, {"queryString":"exampleQuery","start":"&lt;start-epoch-s&gt;", "end":"&lt;end-epoch-s&gt;","isLive":false}
  • <start-SimpleDateFormat>, <end-SimpleDateFormat>. For example, {"query": {"range": {"@timestamp": {"gte": "&lt;start-yyyy-MM-dd'T'HH:mm:ss.SSSXXX&gt;","lt": "&lt;end-yyyy-MM-dd'T'HH:mm:ss.SSSXXX&gt;"}}},"size": 10000}

Custom mappings

Custom mappings allow more flexibility regarding timestamp settings. You must provide the mapping information about the timestamp field and how to parse the timestamp. Custom mappings support unnested JSON inputs.

The basic building blocks are similar to other integrations. The rolling_time, instance_id_field, log_entity_types, and message_field mappings are defined in the same way. For custom integrations, an extra timestamp_settings mapping needs to be specified. It contains the required timestamp_field to tell what field contains the timestamp, and two optional fields pattern and multiplier. The pattern field must be used when the timestamps must be parsed for a String representation, and the multiplier field must be used when the timestamp is represented as the Doubletype.

When timestamps are formatted correctly, as 13-digit UNIX® timestamps, you do not need to specify the pattern or multiplier fields. You need to define the timestamp_field field as in the following example:

{
    "rolling_time": 10
    "instance_id_field": "_app",
    "log_entity_types": "kubernetes.container_name",
    "message_field": "message",
    "timestamp_settings":{
      "timestamp_field":"_ts"
    }
{

Custom mapping with String type timestamps

When timestamps are represented as Strings, they must be parsed and converted to 13-digit UNIX timestamps. In this case, the pattern field must contain the parsing pattern. For example, if the timestamp is 2020-12-31 23:59:59 PST, the integration mapping can be defined as follows:

{
    "rolling_time": 10,
    "instance_id_field": "_account",
    "log_entity_types": "kubernetes.container_name",
    "message_field": "_line",
    "timestamp_settings":{
      "timestamp_field":"_ts",
      "pattern": "yyyy-MM-dd hh:mm:ss z"
    }
}

Custom mapping with Double type timestamps

It might occur that timestamps are not represented in milliseconds. In this case, you must multiply or divide the input timestamps to obtain the milliseconds. The following example illustrates how you can use the multiplier field to obtain milliseconds.

Consider the following source timestamp without milliseconds:

{
   "_ts": 1569873609.539,
   "_account": "...",
   "_cluster": "...",
   "_host": "...",
   "_logtype": "...",
   "_line": "...",
   "_id": "...",
   "__key": "...",
}

You can set the multiplier field in the mapping to show the timestamps in milliseconds:

{
    "rolling_time": 10,
    "instance_id_field": "_account",
    "log_entity_types": "kubernetes.container_name",
    "message_field": "_line",
    "timestamp_settings":{
      "timestamp_field":"_ts",
      "multiplier": 1000
    }
}

Custom mapping with static instance ID support

You can input a static value for the instance_id_field in the IBM Cloud Pak for AIOps pipeline. See the following example:

{
   "rolling_time":10,
   "instance_id_field":"%aStaticInstanceId%",
   "log_entity_types":"kubernetes.container_name",
   "message_field":"_line",
   "timestamp_settings":{
      "timestamp_field":"_ts",
      "pattern":"yyyy-MM-dd hh:mm:ss z"
   }
}

The custom mapping places the value that is contained between %...% in the instance_id_field field throughout the IBM Cloud Pak for AIOps data collection. The following example illustrates the resulting data from the preceding custom mapping:

{
   "message":"exampleLogMessage",
   "instance_id":"aStaticInstanceId",
   "entities":{
      "pod":"examplePod",
      "_cluster":"exampleCluster",
      "container":"exampleContainer"
   },
   ...,
   "timestamp":1605195377154
}

Supported custom-mapping fields

Custom mapping fields
Field Name Description Default Value
rolling_time The recurring time interval to capture data from in seconds. 10
instance_id_field The field name in the incoming data that contains the application name. kubernetes.namespace_name, kubernetes.host, kubernetes.container_name
log_entity_types The field name in the incoming data that contains the entity to extract and analyze. kubernetes.container_name
message_field The field name in the incoming data that contains the log message. @rawstring
standard_entity_types Standardized fields. If declared, the performance for event grouping can be improved. None
standard_entity_types.pod_name Standardized field. If declared, the performance for event grouping can be improved. pod
standard_entity_types.node_name Standardized field. If declared, the performance for event grouping can be improved. node
timestamp_settings Customizable settings on timestamps from source's raw log output. None
timestamp_settings.timestamp_field The field name in the incoming data that contains the data's timestamp. _ts
timestamp_settings.pattern (Optional) Must be used when the timestamps must be parsed for a string representation. Custom integrations support SimpleDateFormat for timestamps in incoming data. yyyy-MM-dd hh:mm:ss z
timestamp_settings.multiplier (Optional) Used when epoch timestamps are represented as a double. None

Enabling and disabling custom integrations

If you didn't enable your data collection during creation, you can enable your integration afterward. You can also disable a previously enabled integration the same way. If you selected Live data for initial AI training when you created your integration, you must disable the integration before AI model training. To enable or disable a created integration, complete the following steps:

  1. Log in to IBM Cloud Pak for AIOps console.

  2. Expand the navigation menu (four horizontal bars), then click Define > Integrations.

  3. On the Manage integrations tab of the Integrations page, click the Custom integration type.

  4. Click the integration that you want to enable or disable.

  5. Go to the AI training and log data section. Set Data collection to On or Off to enable or disable data collection. Disabling data collection for an integration does not delete the integration.

Your integration is now enabled or disabled. For more information about deleting a integration, see Deleting custom integrations.

Editing custom integrations

After you create your integration, you can edit the integration. To edit a integration, complete the following steps:

  1. Log in to IBM Cloud Pak for AIOps console.

  2. Expand the navigation menu (four horizontal bars), then click Define > Integrations.

  3. Click the Custom integration type on the Manage integrations tab of the Integrations page.

  4. On the Custom integrations page, click the name of the integration that you want to edit. Alternatively, you can click the options menu (three vertical dots) for the integration and click Edit. The integration configuration opens.

  5. Edit your integration as required. Click Save when you are done editing.

Your integration is now edited. If you have not enabled or disabled your application, you can enable or disable the integration directly from the interface. For more information about enabling and disabling your integration, see Enabling and disabling custom integrations. For more information about deleting a integration, see Deleting custom integrations.

Deleting custom integrations

If you no longer need your custom integration and want to not only disable it, but delete it entirely, you can delete the integration from the console.

Note: You must disable data collection before deleting your integration. For more information about disabling data collection, see Enabling and disabling custom integrations.

To delete a integration, complete the following steps:

  1. Log in to IBM Cloud Pak for AIOps console.

  2. Expand the navigation menu (four horizontal bars), then click Define > Integrations.

  3. Click the Custom integration type on the Manage integrations tab of the Integrations page.

  4. On the Custom integrations page, click the options menu (three vertical dots) for the integration that you want to delete and click Delete.

  5. Enter the name of the integration to confirm that you want to delete your integration. Then, click Delete.

Your integration is deleted.