Creating Elasticsearch, Fluentd, and Kibana (EFK) integrations
You can connect to, and use, an Elasticsearch, Fluentd, and Kibana (EFK) stack as a log aggregator tool for IBM Cloud Pak for AIOps.
EFK is a suite of tools that functions as a log aggregation tool. Similar to an Elasticsearch, Logstash, and Kibana (ELK) integration with this type of integration, you can collect log data to establish a baseline of normal behavior and then identify anomalies. These anomalies can be correlated with other alerts to help you determine the cause and resolution of a problem.
You can connect to an Elasticsearch, Fluentd, and Kibana (EFK) stack as a variant option to connecting to an Elasticsearch, Logstash, and Kibana (ELK) stack.
There are multiple ways of hosting an Elasticsearch, Fluentd, and Kibana (EFK) cluster, one of which is the Red Hat OpenShift Logging library. The following instructions reference this library, however, the same configuration also works with other hosted EFK stacks.
For more information about working with EFK integrations, see the following sections:
- About this task
- Prerequisites
- Creating EFK integrations
- Enabling EFK integrations
- Editing EFK integrations
- Deleting EFK integrations
- Troubleshooting
About this task
Before creating the integration, you should be aware of the following information.
-
Load: To prevent this integration placing an inordinate load on your data source and potentially impacting your logging operations, this integration only connects to one API with a default data frequency of 60 seconds. This is controlled by using the Sampling rate setting.
-
Access: Custom data sources are cloud-based REST APIs. Access is configured by using the authentication methods that are specified in the Authentication type setting.
-
Data volume: Data volume depends on the application, and is not a set value. Therefore, it does not appear in the settings.
Prerequisites
As a prerequisite, you need to make sure that you have an Elasticsearch, Fluentd, and Kibana (EFK) set up on a cluster. There are many ways to set up the EFK stack. One option is to use the Red Hat OpenShift Logging library.
Note: The Red Hat OpenShift Logging library is being deprecated in OpenShift Container Platform 4.14.
Configure EFK with Red Hat OpenShift Logging library:
-
Install the Red Hat OpenShift Logging library. For more information, see the Red Hat OpenShift documentation Installing the logging subsystem for Red Hat OpenShift.
When you are deploying the Red Hat OpenShift Logging (ClusterLogging) instance, use the following YAML definition as the base for your own definition.
Change the values for the
storageClassName
,maxAge
, andsize
depending on your own system. For example, if you are installing on Red Hat OpenShift on IBM Cloud (ROKS), useibmc-file-gold-gid
for thestorageClassName
. If you want to keep application logs for more than 15 days, update themaxAge
value.apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: application: maxAge: 15d elasticsearch: nodeCount: 3 storage: storageClassName: "rook-cephfs" size: 80G resources: limits: memory: "16Gi" requests: memory: "16Gi" proxy: resources: limits: memory: 256Mi requests: memory: 256Mi redundancyPolicy: "SingleRedundancy" visualization: type: "kibana" kibana: replicas: 1 collection: logs: type: "fluentd" fluentd: {}
-
Verify that all pods under the
openshift-logging
namespace are running by running the following command.oc get pods -n openshift-logging
If your pods are not running, it can take several minutes for all pods to be in the
Running
state. For instance, thecollector
pods can go through several restarts before they are all running. You can have several pods for ClusterLogging, Elasticsearch, fluentd (collector
), and Kibana similar to the example output:NAME READY STATUS RESTARTS AGE cluster-logging-operator-84854b544c-tc758 1/1 Running 0 26m collector-5hndx 2/2 Running 0 10m collector-6sfk9 2/2 Running 0 10m collector-f5w84 2/2 Running 0 10m collector-fgzsr 2/2 Running 0 10m collector-fhsbk 2/2 Running 0 10m collector-fwdhk 2/2 Running 0 10m collector-hj2q9 2/2 Running 0 10m collector-l4v5q 2/2 Running 0 10m elasticsearch-cdm-medm3kgq-1-74fb867496-swgwx 2/2 Running 0 22m elasticsearch-cdm-medm3kgq-2-6cb7495ff5-h5bgd 2/2 Running 0 22m elasticsearch-cdm-medm3kgq-3-5fb8f9f4-bgqgp 2/2 Running 0 22m elasticsearch-im-app-27288900-b8j28 0/1 Completed 0 2m3s elasticsearch-im-audit-27288900-btw2x 0/1 Completed 0 2m3s elasticsearch-im-infra-27288900-ndjcb 0/1 Completed 0 2m3s kibana-5586499d9b-45x7c 2/2 Running 0 22m
-
Optional: If you intend to limit the scope to a particular namespace in your cluster, you can complete the following steps:
-
Update the ClusterLogging instance
managementState
field fromManaged
toUnmanaged
. This change can give you more control of the components that are managed by the Red Hat OpenShift Logging Operator. To change the setting, run the following command to edit the instance:oc edit clusterlogging instance -n openshift-logging
-
Update the
path
variable in thecollector
configmap under the container logs section.You can use the following command to update the configmap.
oc edit configmap collector -n openshift-logging
As an alternative, you can also use the Red Hat OpenShift Console. To use the console, go to Workloads > ConfigMaps. Then, from the Projects drop-down, select openshift-logging. Select the
collector
configmap and edit the YAML. By default, this variable has the following value in thecollector
configmap:path "/var/log/pods/*/*/*.log"
Change the value to reference your preferred namespace, for example:
path "/var/log/pods/qotd_qotd-*/*/*.log"
-
If you updated the collector
ConfigMapRestart
then restart all collector pods (daemonset):oc get pods -n openshift-logging --no-headers=true | awk '/collector/{print $1}' | xargs oc delete -n openshift-logging pod
-
-
Expose the log store service as a route. For more information, see the Red Hat OpenShift documentation Exposing the log store service as a route.
When you are completing this procedure, obtain the token for the
cluster-logging-operator
service account instead of your own Red Hat OpenShift Container Platform token. You must be logged in to the cluster where you installed theClusterLogging
instance.token=$(oc sa get-token cluster-logging-operator -n openshift-logging) echo $token
-
When the token variable is set, run the following commands on your workstation to validate that you can connect to Elasticsearch:
routeES=`oc get route elasticsearch -o jsonpath={.spec.host} -n openshift-logging` curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}" curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/_search?scroll=1m"
If needed, you can run the following command to get the Elasticsearch version:
curl --insecure -H "Authorization: Bearer ${token}" https://${routeES}/
-
Index the Elasticsearch documents. This indexing is done automatically, but the process can take a few minutes in a new or updated cluster. Wait a few minutes for the changes to display in the Kibana UI.
When all configuration steps take effect, you can see entries in the Kibana UI for only the application that you intend to collect logs from.
Creating EFK integrations
With EFK set up, you can now create the integration in IBM Cloud Pak for AIOps.
For this integration you need to select the tile for creating an Elasticsearch, Logstash, and Kibana (ELK) integration, but change some settings to connect to EFK instead of ELK.
To create the integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
On the Integrations page, click Add integration.
-
From the list of available integrations, find and click the ELK tile.
Note: If you do not immediately see the integration that you want to create, you can filter the tiles by type of integration. Click the type of integration that you want in the Category section.
-
On the side-panel, review the instructions and when ready to continue, click Connect.
-
On the Add integration page, define the general integration details:
-
Name: The display name of your integration.
-
Description: An optional description for the integration.
-
ELK service URL: ELK service URL: The Elasticsearch host and public API port. The URL must have the target index that IBM Cloud Pak for AIOps uses to search for the data from your applications. If you want to use all data, specify
*
as your index. For example, your ELK service URL can behttps://myURL.com:8080/*
.If you are using the Red Hat OpenShift Logging library, then run the following command to find the hostname for the URL:
oc get routes -n openshift-logging | grep elastic
Example output:
elasticsearch elasticsearch-openshift-logging.apps.demo-apps.cp.mysite.com elasticsearch <all> reencrypt None
When you have your Elastic hostname, you can build your URL. For example,
https://elasticsearch-openshift-logging.apps.demo-apps.cp.mysite.com/app*
. Ensure that you also replace/app*
in the URL with the value for your EFK pattern. -
Kibana URL: Enter a URL for the service instance.
If you are using the Red Hat OpenShift Logging library, then run the following command to find the hostname for the URL:
oc get routes -n openshift-logging | grep kibana
-
Authentication type: Set this value to Token to indicate that the Elasticsearch instance is authenticated with a temporary token.
If you are using the Red Hat OpenShift Logging library, then run the following command to obtain the token for the
cluster-logging-operator
service account:token=$(oc sa get-token elasticsearch -n openshift-logging) echo $token
-
Certificate (optional): Certificate used to verify the SSL/TLS connection to the REST service.
-
Filters (optional): A custom Boolean Query
to filter the Elasticsearch request for your specific application, terms, keywords, or other filters.
-
Time zone (optional): The time zone in which your data is situated. The default time is converted from the system time relative to UTC time. The default value is UTC.
-
*Kibana port: The port of the Kibana instance that is on the same host as the Elasticsearch instance. Use
443
for this field. -
Base parallelism: Select a value to specify the number of Flink jobs that can run in parallel. These jobs run to process and normalize the collected data. The default value is 1. However, it is recommended to use a higher value than 1 so that you can process data in parallel. This value cannot exceed the total available free Flink slots. In a small environment, the available flinks slots are 16, while in a large environment, the maximum available slots are 32. If you are collecting historical data with this integration, you can set this value to be equal to the source parallelism.
-
Sampling rate: The rate at which data is pulled from live source (in seconds). The default value is
60
. -
JSON processing option: Select a JSON processing option.
- None: The default option. The JSON is not processed or modified.
- Flatten: This option flattens the JSON object by removing the opening and closing braces.
- Filter: This option extracts the JSON object and replaces it with an empty string.
- For more information about the options, see Managing embedded JSON.
Note: To improve data throughput, you can increase the base parallelism value incrementally. For more information about maximum base parallelism for starter and production deployments, see Improving data streaming performance for log anomaly detection.
Note: If you use the Filter option, do not use a timestamp in the filter query. This setting can cause a parsing error in the backend.
"range": { "@timestamp": { "gte": "now-2m", "lt" : "now" }
Other than that limitation, filter can use any clauses so long as the fields and the values specified in the filter are relevant to the target endpoint data set.
-
-
You can test your integration by clicking Test connection.
-
Click Next.
-
Enter Field Mapping information (Optional):
You can improve search performance by mapping the fields from your implementation fields to IBM Cloud Pak for AIOps's standard fields.
Use the following mapping instead of the default mapping that is provided for an ELK integration:
{ "codec": "elk", "message_field": "message", "log_entity_types": "kubernetes.container_image_id, kubernetes.host, kubernetes.pod_name, kubernetes.namespace_name", "instance_id_field": "kubernetes.container_name", "rolling_time": 10, "timestamp_field": "@timestamp" }
-
Click Next.
-
Enter AI training and log data (Optional):
Select how you want to manage collecting data for use in AI training and anomaly detection. Click the Data collection toggle to turn on data collection, then select how you want to collect data:
-
Live data for continuous AI training and anomaly detection: A continuous collection of data from your integration is used to both train AI models and analyze your data for anomalous behavior.
Note: After an initial installation, there is no data at all in the system. If you select this option, then the three different log anomaly detection algorithms behave in the following ways:
-
Natural language log anomaly detection does not initially detect anomalies as no model has been trained. You can retrieve historical data (select Historical data for initial AI training) to speed up the retrieval of data to train on, or you can leave the Live data for continuous AI training and anomaly detection setting on. In the latter case, the system gathers training data live and after a few days there is enough data to train a model. When this model is deployed, then it detects anomalies as normal.
-
Statistical baseline log anomaly detection does not detect anomalies for the first 30 minutes of data collection. This is because it does not have a baseline yet. After 30 minutes of live data collection the baseline is automatically created. After that it detects anomalies on an ongoing basis, while continuing to gather data and improve its model every 30 minutes.
-
Log Anomaly Golden Signals does not initially detect anomalies as no model has been trained. The system gathers training data live, and after a few days, there is enough data to train a model. When this model is deployed, it can detect anomalies that deviate from the normal.
-
-
Live data for initial AI training: A single set of training data used to define your AI model. Data collection takes place over a specified time period that starts when you create your integration.
Note: Selecting this option causes the system to continue to collect data while the option is enabled; however, the data is collected for training only, and not for log anomaly detection. For more information about AI model training, including minimum and ideal data quantities, see Configuring AI training.
-
Historical data for initial AI training: A single set of training data used to define your AI model. You need to give Start and End dates, and specify the parallelism of your source data. Historical data is harvested from existing logs in your integration over a specified time period in the past.
-
Start date: Select a start date from the calendar and enter the time in hh:mm (hours and minutes) format.
Note: The start date must not exceed 31 days from the present as the maximum time period for historical data collection is 31 days. The recommended time period is two weeks.
-
Time zone: Select your time zone from the dropdown list.
-
End date and time: Click Add end date and select an end date from the calendar and enter the time in hh:mm format.
Note: If you do not specify the end date, then live data collection follows the historical data collection. If you do not want to set an end date, click Remove end date.
-
Source parallelism (1-50): Select a value to specify the number of requests that can run in parallel to collect data from the source. Generally, you can set the value to equal the number of days of datat that you want to collect. When you are setting this value, consider the number of requests that are allowed by the source in a minute. For example, if only 1-2 requests are allowed, set the value to be low.
-
Important: Keep in mind the following considerations when you select your data collection type:
- Anomaly detection for your integration occurs if you select Live data for continuous AI training and anomaly detection.
- Different types of AI models have different requirements to properly train a model. Make sure that your settings satisfy minimum data requirements. For more information about how much data you need to train different AI models, see Configuring AI training.
-
-
Click Done.
You created the integration in your instance. After you create your integration, you must enable the data collection to connect your integration with the AI of IBM Cloud Pak for AIOps.
Enabling and disabling EFK integrations
If you didn't enable your data collection during creation, you can enable your integration afterward. You can also disable a previously enabled integration the same way. If you selected Live data for initial AI training when you created your integration, you must disable the integration before AI model training. To enable or disable a created integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
On the Manage integrations tab of the Integrations page, click the ELK integration type.
-
Click the integration that you want to enable or disable.
-
Go to the AI training and log data section. Set Data collection to On or Off to enable or disable data collection. Disabling data collection for an integration does not delete the integration.
You enabled or disabled your integration. For more information about deleting an integration, see Deleting EFK integrations.
Editing EFK integrations
After you create your integration, your can edit the integration. For example, if you specified Historical data for initial AI training but now want your integration to pull in live data for continuous monitoring, you can edit it. To edit an integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
Click the ELK integration type on the Manage integrations tab of the Integrations page.
-
On the ELK integrations page, click the name of the integration that you want to edit. Alternatively, you can click the options menu (three vertical dots) for the integration and click Edit. The integration configuration opens.
-
Edit your integration as required. Click Save when you are done editing.
Your integration is now edited. If your application was not previously enabled or disabled, you can enable or disable the integration directly from the interface. For more information about enabling and disabling your integration, see Enabling and disabling EFK integrations. For more information about deleting an integration, see Deleting EfK integrations.
Deleting EFK integrations
If you no longer need your EFK integration and want to not only disable it, but delete it entirely, you can delete the integration from the console.
Note: You must disable data collection before you delete your integration. For more information about disabling data collection, see Enabling and disabling EFK integrations.
To delete an integration, complete the following steps:
-
Log in to IBM Cloud Pak for AIOps console.
-
Expand the navigation menu (four horizontal bars), then click Define > Integrations.
-
Click the ELK integration type on the Manage integrations tab of the Integrations page.
-
On the ELK integrations page, click the options menu (three vertical dots) for the integration that you want to delete and click Delete.
-
Enter the name of the integration to confirm that you want to delete your integration. Then, click Delete.
Your integration is deleted.