Configuring Insight Packs that use the LFA to stream data and Log Analysis to annotate it

To integrate the scalable data collection architecture with any Insight® Packs that use the LFA to stream data and Log Analysis to annotate it, you need to adapt the configuration to make it compatible with scalable data collection.

Before you begin

Configure the scalable data collection architecture. For more information, see Configuring scalable data collection.

About this task

The configuration described in this topic is intended for Insight Packs that use the LFA to stream data and Log Analysis to annotate it. For example, the WebSphere® Application Server Insight Pack is one such Insight Pack.

In this task, you update the Receiver cluster configuration so that data is sent from the LFA to Apache Kafka. You also update the Sender cluster configuration to pull the data from Apache Kafka and send it to Log Analysis.

Procedure

  1. Create a custom data source in Log Analysis. Choose an appropriate type, for example WASSystemOut, and complete the other fields.
    For example:
    Table 1. Example data source
    Data source field Input
    Host name PUNE_WAS
    File path SystemOut
    Type WASSystemOut
    Name PUNE_WAS_SystemOut
  2. Configure the LFA.
    Update the format or .fmt file to add the metadata fields for processing. For example
    
    REGEX AllRecords
    (.*)
    hostname LABEL
    -file FILENAME
    RemoteHost DEFAULT
    logpath PRINTF(%s,file)
    type SystemOut
    module WAS
    site PUNE
    text $1
    END
    
    Add the Receiver cluster or the HAProxy server and port information to the LFA's configuration or .conf file. For example:
    ServerLocation=<HAProxy_or_receiver_cluster_server>
    ServerPort=<HAProxy_or_receiver_cluster_port>

    For more information, see Configuring the Log File Agent.

  3. Update the Receiver cluster configuration.

    You need to specify the required metadata information to facilitate the creation of topics and partitions in Apache Kafka. For more information, see Configuring the Receiver cluster for single line logs.

    To update the Receiver cluster, complete these steps:
    1. Configure the matching pattern in the <Logstash_install_location>/<patterns_directory> directory where you store your patterns. This pattern matches the message and extracts the metadata fields. For example:
      WASLFAMESSAGE 
      <START>.*type='%{DATA:LFA_TYPE}';text='%{DATA:LFA_ORIG_MSG}'
      ;RemoteHost='%{DATA:LFA_REMOTE_HOST}';site='%{DATA:LFA_SITE}'
      ;hostname='%{DATA:LFA_HOSTNAME}';module='%{DATA:LFA_MODULE}'
      ;logpath='%{DATA:LFA_LOGNAME}';END
      This pattern needs to be specified on a single line in the patterns file
    2. Update the Receiver cluster configuration to match the message and create the topic and partition information for Apache Kafka. For example:
      
      filter {
      if [type] == lfa {
                 grok {
                       patterns_dir => <patterns_directory>
                       match => [ message, %{WASLFAMESSAGE} ]
                       add_tag => [grok_lfa]
                      }
      if "grok_lfa" in [tags] {
           mutate {
            replace => [message,%{LFA_ORIG_MSG}]
            add_field => [ datasource, %{LFA_SITE}_%{LFA_MODULE}_%{LFA_TYPE}]
            add_field => [ resourceID, %{LFA_HOSTNAME}_%{LFA_LOGNAME}_1]
                      }
              }
      }
      
    3. Update the output section of the Receiver cluster configuration to send data to the Apache Kafka brokers. For example:
      
      output {
              if (grok_lfa in [tags]) and ! (_grokparsefailure in [tags]) {
                  kafka {
                      bootstrap_servers => 
      <Kafka_broker_server1>:<kafka_broker_port1>,..
                      topic_id => %{datasource}
                      message_key => %{resourceID}
                  }
              }
      }
      
    The datasource field is PUNE_WAS_SystemOut. The resourceID field is composed of the host name and absolute file path, which are unique for a specific log file. The datasource and resourceID fields are mapped to topics and partitions in Apache Kafka.
  4. Update the Sender cluster configuration.
    1. Update the input section of the Sender cluster configuration so that it can receive data that is sent from the topic in Apache Kafka. For example:
      
      input {
              kafka {
                      zk_connect => <Zookeeper_host>:<Zookeeper_port>
                      group_id => <Kafka_group_id>
                      topic_id => <Kafka_topic_id>
                      consumer_threads => 5
                      consumer_restart_on_error => true
                      consumer_restart_sleep_ms => 100
              }
      }
      
      The group_id and the topic_id must match the values that are specified in the metadata.
    2. Update the filter section of the Sender configuration. Add the host and path fields to the message so that the message is mapped to the data source that is specified in Log Analysis. For example:
      
      filter {
              mutate {
                      add_tag => [NO_OP]
              }
              if grok_lfa in [tags] {
                      mutate {
                        replace => { host => %{LFA_SITE}_%{LFA_MODULE}}
                        add_field => { path => %{LFA_TYPE} }
                      }
              }
      }
      
      The host and path fields must match the Hostname and File Path that you specified when you created the custom data source in step 1.
    3. Update the output section of the Sender cluster configuration with the Log Analysis plug-in information so that it can communicate with the Log Analysis server. For more information, see Streaming data with Logstash.

    For more information, see Configuring the Sender cluster for single line logs.