Alerting based on monitoring logs

A logs-based alerting component, ElastAlert, is part of the IBM FCI logging stack. Using ElastAlert, you can add specific rules to monitor the logs and alert, based on the rules configured. IBM FCI does not ship any concrete rules; however, IBM FCI has a rule that traps the errors and exceptions in the message log of the Case Manager fci-solution.

Procedure

To add a new rule to ElastAlert:

  1. Create a custom_rule_name.yaml file based on the ElastAlert Rule Types and Configuration Options documentation.
  2. To get the Elasticsearch query/filter correct for filtering the IBM FCI logs, test the QueryDSL in the Kibana under the Filebeat index pattern.
    Note: You can configure the alert section to perform useful tasks on rule violation. See #task_eyk_sc3_nhb for details.
  3. When the rule file is complete, follow these steps to add the rule to ElastAlert:
    1. Scale down the ElastAlert deployment:
      kubectl scale deploy fci-logging-elastalert --replicas 0
    2. Put the rule file in the Persistent Volume mount of ElastAlert in the nfsserver(fci-elastalert-data).
    3. Start the ElastAlert pod:
      kubectl scale deploy fci-logging-elastalert --replicas 1
      If the ElastAlert pod fails to start, check the logs using the following command:
      kubectl logs $(kubectl get pods -l app=logging-elastalert | grep elastalert | awk '{ print $1 }’)
      The generic log states that the rule file is invalid. Verify the rule file again and repeat this process.

Results

After the pod is restarted, you can see the alerts as described in #task_eyk_sc3_nhb. Here is an example of a Frequency Based alert:
```
#Rule Name
name: Case-Manager-Exceptions
 
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
 
# (Required)
# Index to search, wildcard supported
index: filebeat*
 
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 1
 
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
  hours: 1
 
# By setting realert, you will prevent the same rule from alerting twice in an amount of time.
realert:
  minutes: 0
 
# The aggregation feature will take every alert that has occured over a period of time and send them together in
# one alert.
aggregation:
  hours: 2
 
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- query:
    bool:
      must:
      - match:
          kubernetes.labels.app: "case-manager-fci-solution"
      - match:
          kubernetes.container.name: "message-log"
      - bool:
          should:
          - match_phrase:
              message: "exception"
          - match_phrase:
              message: "error"

# (Required)
# The alert is use when a match is found
alert:
  - command
command: ["echo", "case-manager error"]
```

For detailed information about configuration parameters, refer to ElastAlert documentation.