Creating Kafka connections
A Kafka connection can provide either event or log data. Log data can be used to establish a baseline of normal behavior and then identify anomalies. Event data enables the analysis and processing of different types of alerts and events. Anomalies that are identified can be correlated with alerts and events and published to your ChatOps interface to help you determine the cause and resolution of a problem.
Custom connections can route only a single source of data at one time. Unlike the custom connection type, you can use the Kafka connection type to collect log data and event data from different systems. Then, you can route that information through a forwarding agent, such as an Apache Kafka topic like Sysdig or FluentBit.
You can also use the Kafka connection to enable training with an offline, historical data set. For more information about using Kafka topics for offline training, see Importing offline log data.
Note: The Kafka replication factor is set to one replica by default. If you are implementing a production deployment of IBM Cloud Pak for Watson AIOps, you might lose data if your Kafka pods fail or restart. If the data collection is enabled in your Kafka connection when the Kafka pods go down, you might experience a gap in the data that your connection generated during that down period.
For more information about working with Kafka connections, see the following sections:
- Creating Kafka connections
- Enabling Kafka connections
- Editing Kafka connections
- Importing offline log data
- Deleting Kafka connections
Creating Kafka connections
Unlike other connection types, you can use Kafka connections to push properly formatted historical training data to IBM Cloud Pak for Watson AIOps. To create a Kafka connection, complete the following steps:
-
Log in to IBM Cloud Pak Automation console.
-
Expand the navigation menu (four horizontal bars), then click Define > Data and tool connections.
-
On the Data and tool connections page, click Add connection.
-
From the list of available connections, find and click the Kafka tile.
Note: If you do not immediately see the connection that you want to create, you can filter the tiles by type of connection. Click the type of connection that you want in the Category section.
-
On the side-panel, review the instructions and when ready to continue, click Connect.
-
On the Add connection page, define the general connection details:
-
Name: The display name of your Kafka connection.
-
Description: An optional description for the Kafka connection.
-
Kafka partitions: The number of Kafka partitions. The default value of
1
might be suitable for importing a few log records. However, if you intend to import many records, increase this number. The range is1
to500
. Ideally, this number matches the Base parallelism field. For example, for a proof of content (PoC) deployment where you need to import a large data set, you can use48
Kafka partitions and48
Base parallelism Flink tasks. -
JSON processing option: Select a JSON processing option.
-
None: The default option. The JSON is not processed or modified.
-
Flatten: This option flattens the JSON object by removing the opening and closing braces.
-
Filter: This option extracts the JSON object and replaces it with an empty string.
-
For more information about the options, see Managing embedded JSON.
-
-
-
Click Next.
Entering Field mapping information
Unlike other connection types, you can specify what kind of information you want to run through your Kafka connection. For example, if you want to collect data from multiple custom sources, you can set them up as separate Kafka connections.
-
In the Field mapping section, set the Data source to specify the type of data incoming from the Kafka connection. You can select Events or Logs.
-
Specify the Mapping type, which creates the field mapping for your specified log type.
- Mapping types for Events include None, PagerDuty, and NOI.
- Mapping types for Logs are None, ELK, Falcon LogScale, LogDNA, Custom, and Splunk.
- If you have log data that is already normalized, choose None.
- If your log data is in one of the supported log formats, such as ELK or Falcon LogScale, choose the corresponding log format.
- For field mapping for Custom types, see Creating custom connections.
-
The Topic name for a Kafka connection is predefined. Note this topic and send your data to it.
-
For log connections, enter the Maximum number of logs per second rounded to the nearest thousand. The maximum number of logs per second is collected in increments of 1,000 up to 18,000 and is rounded up to the nearest thousand sent from Kafka to IBM Cloud Pak for Watson AIOps.
-
Set the Field mapping mode to Live data for initial AI training for initial training or to Live data for continuous AI training and anomaly detection.
- You can improve search performance by mapping the fields from your implementation fields to standard IBM Cloud Pak for Watson AIOps fields. For more information about how field mappings are defined, see Mapping data from incoming sources.
- For more information about using mappings to clean your data for use in IBM Cloud Pak for Watson AIOps, see Cleaning mapped data using regular expressions.
-
In the Mapping section, if you use a Mapping type other than None, verify whether the JSON document that is automatically assigned to this field is valid. The validity depends on the structure of your log records. For example, for a PoC deployment, an example default Field mapping for ELK can resemble the following mapping:
{ "codec": "elk", "message_field": "@message", "log_entity_types": "@hostname, @bundleName, @context.Environment", "instance_id_field": "@properties.processtype", "rolling_time": 10, "timestamp_field": "@timestamp" }
If you chose None for the Mapping type, each log record in your log files conforms to the following format. Set both of the
application_id
andapplication_group_id
values to"1000"
, and make sure that a 13-digit timestamp value is assigned to thetimestamp
field:{ "timestamp": 1581549409000, "utc_timestamp": "2020-02-12 23:16:49.000", "instance_id": "calico-node-jn2d2", "application_group_id": "1000", "application_id": "1000", "features": [], "meta_features": [], "level": 1, "message": "[64] ipsets.go 254: Resyncing ipsets with dataplane. family=\"inet\"", "entities": { "pod": "calico-node-jn2d2", "cluster": null, "container": "calico-node", "node": "kube-bmgcm5td0stjujtqv8m0-ai4itsimula-wp16cpu-00002c34" }, "type": "StandardLog" }
Important: The Events data that is collected must follow the Kafka Connection Normalized Event schema. For more information, see Normalized mapping rules.
-
Click Next.
Entering AI training and log data information
- If you want to enable the connection or if you want to enable data collection for AI training and log data, switch the Data collection toggle to On.
- Click Save to save your connection.
You created a Kafka connection in your instance, and you can now use the Kafka connection as the basis for your offline AI model training. After you create your connection, enable the data collection to connect your connection with the AI of IBM Cloud Pak for Watson AIOps.
- For more information about enabling your connection, see Enabling Kafka connections.
- To create more connections, such as a ChatOps connection, see Configuring data and tool connections.
- For more information about working with the insights provided by your connections, see ChatOps insight management.
- For more information about AI model training, see Configuring AI models.
Importing offline log data
Before you proceed with the following steps, make sure that the data flow is enabled for the Kafka Ops integration that you previously defined. You can import offline log data into IBM Cloud Pak for Watson AIOps in two ways. Both mechanisms publish the offline log data to the Kafka topic that is associated with the Kafka Ops integration.
Using a Kafka utility to publish log data
-
Install a Kafka utility, such as the kcat utility.
-
Log in to the Red Hat® OpenShift® cluster where IBM Cloud Pak for Watson AIOps is installed.
-
Run the following commands from the namespace where IBM Cloud Pak for Watson AIOps is installed:
oc extract secret/iaf-system-cluster-ca-cert --keys=ca.crt --to=- > ca.crt export sasl_password=$(oc get secret cp4waiops-cartridge-kafka-auth-0 --template={{.data.password}} | base64 --decode); export BROKER=$(oc get routes iaf-system-kafka-bootstrap -o=jsonpath='{.status.ingress[0].host}{"\n"}'):443
-
Publish the log data to the Kafka topic. This example uses kcat. If you also use kcat, update the variables and run the following commands:
export KAFKA_TOPIC=<kafka topic> export LOG_FILE=<log file> kcat -X security.protocol=SASL_SSL -X ssl.ca.location=ca.crt -X sasl.mechanisms=SCRAM-SHA-512 -X sasl.username=cp4waiops-cartridge-kafka-auth-0 -X sasl.password=$sasl_password -X enable.ssl.certificate.verification=false -b $BROKER -P -t $KAFKA_TOPIC -l $LOG_FILE
Copying log data into the Kafka cluster
Use this option only if you can't use a Kafka utility such as kcat for your environment. Complete the following steps to publish the log data to the corresponding Kafka topic by copying the log data into the Kafka cluster:
-
Run the following commands from the namespace where IBM Cloud Pak for Watson AIOps is installed to obtain the Kafka password and Kafka broker. Record both values, which you need in a following step.
export kafka_password=$(oc get secret cp4waiops-cartridge-kafka-auth-0 --template={{.data.password}} | base64 --decode) export kafka_broker=$(oc get routes iaf-system-kafka-bootstrap -o=jsonpath='{.status.ingress[0].host}{"\n"}'):443 echo $kafka_password echo $kafka_broker
-
Copy the log file to the Kafka cluster:
oc cp <log file> iaf-system-kafka-0:/tmp
-
Check that the file is copied to the Kafka cluster from the namespace where IBM Cloud Pak for Watson AIOps is installed:
oc exec -it iaf-system-kafka-0 -- bash ls /tmp/<log file> exit
-
Get the
ca.cert
certificate file:oc extract secret/iaf-system-cluster-ca-cert --keys=ca.crt --to=- > ca.crt
-
Convert
ca.crt
to Java keystore format (JKS):keytool -import -trustcacerts -alias root -file ca.crt -keystore truststore.jks -storepass password -noprompt
-
Copy the truststore file to the Kafka cluster:
oc cp truststore.jks iaf-system-kafka-0:/tmp
-
Give bash access to the Kafka cluster again:
oc exec -it iaf-system-kafka-0 -- bash
-
Create a file within the Kafka pod that is called
producer.properties
. The contents of theproducer.properties
file can resemble the following properties. Update thevariable placeholders
before you use this content. Thessl.truststore.location
value points to the JSK file that you created earlier.security.protocol=SASL_SSL ssl.truststore.location=/tmp/truststore.jks ssl.truststore.password=password sasl.mechanism=SCRAM-SHA-512 sasl.jaas.config=org.apache.Kafka.common.security.scram.ScramLoginModule required \ username="cp4waiops-cartridge-kafka-auth-0" \ password="<kafka password>"
-
From the pod, run the Kafka producer script to publish the log data to the corresponding Kafka topic. Update the
variable placeholders
:/opt/Kafka/bin/Kafka-console-producer.sh --broker-list <kafka broker> --producer.config producer.properties --topic <kafka topic> < <log file>
Enabling and disabling Kafka connections
If you didn't enable your data collection during creation, you can enable your connection after. You can also disable a previously enabled connection the same way. If you selected Live data for initial AI training when you created your connection, disable the connection before AI model training. To enable or disable a created connection, complete the following steps:
-
Log in to IBM Cloud Pak Automation console.
-
Expand the navigation menu (four horizontal bars), then click Define > Data and tool connections.
-
On the Manage connections tab of the Data and tool connections page, click the Kafka connection type.
-
Click the connection that you want to enable or disable.
-
Go to the AI training and log data section. Set Data connection to On or Off to enable or disable data collection. Disabling data collection for a connection does not delete the connection.
You enabled or disabled your connection. For more information about deleting a connection, see Deleting Kafka connections.
Editing Kafka connections
After you create your connection, you can edit the connection. For example, if you want to refine the field mapping for a Kafka connection from a Falcon LogScale source, you can edit it. To edit a connection, complete the following steps:
-
Log in to IBM Cloud Pak Automation console.
-
Expand the navigation menu (four horizontal bars), then click Define > Data and tool connections.
-
Click the Kafka connection type on the Manage connections tab of the Data and tool connections page.
-
On the Kafka connections page, click the name of the connection that you want to edit. Alternatively, you can click the options menu (three vertical dots) for the connection and click Edit. The connection configuration opens.
-
Edit your connection as required. Click Save when you are done editing.
Your connection is now edited. If your application was not previously enabled or disabled, you can enable or disable the connection directly from the interface. For more information about enabling and disabling your connection, see Enabling and disabling Kafka connections. For more information about deleting a connection, see Deleting Kafka connections.
Deleting Kafka connections
If you no longer need your Kafka connection and want to not only disable it, but delete it entirely, you can delete the connection from the console.
Note: You must disable data collection before you delete your connection. For more information about disabling data collection, see Enabling and disabling Kafka connections.
To delete a connection, complete the following steps:
-
Log in to IBM Cloud Pak Automation console.
-
Expand the navigation menu (four horizontal bars), then click Define > Data and tool connections.
-
Click the Kafka connection type on the Manage connections tab of the Data and tool connections page.
-
On the Kafka connections page, click the options menu (three vertical dots) for the connection that you want to delete and click Delete.
-
Enter the name of the connection to confirm that you want to delete your connection. Then, click Delete.
Your connection is deleted.