IBM® Cloud Private logging

By default, IBM Cloud Private uses an ELK stack for system logs. You can also deploy more ELK stacks from the catalog to collect application logs.

ELK is an abbreviation for three products, Elasticsearch, Logstash, and Kibana, that are built by Elastic External link icon and together comprise a stack of tools that you can use to stream, store, search, and monitor logs. The ELK stack that is provided with IBM Cloud Private uses the official ELK stack images that are published by Elastic. By using the official images, you can use a standard software stack and be assured that upgrades and fixes are supported by the publisher.

While an ELK stack is provided by default, you can use another logging solution.

Standard architecture

Both the default ELK stack and ELK stacks that you deploy from the catalog use the standard Logstash and Elasticsearch architecture, which is shown in the following image:

Image of standard EL architecture

Default system logging

By default, the IBM Cloud Private installer deploys an ELK stack to collect system logs for the IBM Cloud Private managed services, including Kubernetes and Docker. Managed services are deployed under the kube-system namespace. If you accepted the default installation values, then the default ELK stack and Filebeat daemonsets that collect container logs are deployed into that namespace. These services are managed as traditional Kubernetes deployments, so you can modify or uninstall these default services if necessary.

The hardware that you allocate to the ELK stack containers determines the stack's availability, scalability, and durability as it handles directed log traffic.

User-deployed ELK stack

You can deploy more ELK stacks from the catalog to any namespace. You can associate these ELKs stacks with either a broad or narrow range of applications, depending on your business unit standards, privacy concerns, hardware (including storage) resource constraints, network traffic, or geographic location.

IBM Cloud Private does not limit the number of ELK stacks, or other log management tools, that you can deploy into the platform.

Planning resources for an ELK stack

By default, log data from Elasticsearch is stored in the /opt/ibm/cfc/logging/elasticsearch directory of the management node where the ELK stack runs. You can change the default directory during installation by adding the following parameter to the config.yaml file: elasticsearch_path: <your_path>.

Consider several factors when you plan the hardware capacity for an ELK stack, including:

  1. The number of applications that send logs to it
  2. The worst-case volume of logs that each application generates
  3. Any mandatory log retention periods
  4. Any recommended log retention requirements for specific applications, such as support requirements
  5. Any external log availability requirements.

Since IBM Cloud Private deploys a standard ELK stack, review Elastic's hardware requirements for the stack during the planning phase. See https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html External link icon to begin the planning phase.

As Elastic's guide mentions, memory is one of the two most-consumed resources. Given these requirements, you might consider allocating specialized nodes to hosting ELK stacks in production environments.

Disk storage is probably the most critical factor during capacity planning. When a disk is full, it doesn't matter how much computing power is available to process incoming logs. Even in small environments with only 2 or 3 worker nodes and few deployments, it is reasonable to expect several GBs of logs per hour.

Studies appear to indicate a high correlation between the volume of logs that are generated and the disk that is required by Elasticsearch for storage. Allocating extra space as buffer is wise, and, when possible, allocating far more disk space than required is prudent. You might encounter instances where the expected log retention period must be extended for unusual circumstances, and providing extra space can prevent issues.

Configuring logging for applications

Container-collected logs

Many Docker-enabled applications output their logs out to stdout and stderr. When started properly in a container, the two streams are collected automatically by Docker and stored in a discrete log file for that container (often under /var/lib/docker/containers/<container_id>).

Kubernetes offers Docker containers visibility to these log files via hostPath-mounted External link icon volumes. In short, a hostPath volume is a way of making a part of the host VM's disk accessible to the container. IBM Cloud Private deploys a pod to every worker node as a daemonset External link icon, mounts the path that stores the Docker container logs, and then streams the log files out to Logstash. Both the ELK stack that is deployed during installation and the stacks that you deploy from the catalog use Filebeat in the daemonsets to stream container logs from applicable nodes.

To take advantage of this feature, learn about how to start your application directly External link icon from the Dockerfile using either the RUN External link icon or ENTRYPOINT External link icon commands.

Container logs

Most other applications store logs locally in discrete files. On Linux systems, they are typically stored somewhere under the /var/log/ directory. In either case, these files are not directly accessible to other containers, and you must configure another method to stream the logs outside the container. The most effective solution is to add another container to your pod (often called a sidecar) that has visibility to those logs and runs a streaming tool, such as Filebeat. The sidecar approach is effective because main application container shares the folders under which logs are stored without affecting the application in any way.

Build a sidecar

Adding a sidecar to an existing deployment requires several steps, including configuring the application that runs on the sidecar itself.

  1. Create a Filebeat ConfigMap and name it filebeat.yml. Using a ConfigMap makes it easier to dynamically update the ConfigMap for all applications that reference it. Most of the variables in the sample ConfigMap are assigned environment variables as values. These environment variables include the default value, which is shown after the colon in the value. By defining these default values, you can configure parameter values for each deployment. You set the values for each deployment in the environment variables for that deployment. By using the environment variables, you can use the same ConfigMap for many deployments. For more details and settings, see the Filebeat documentation External link icon.

    apiVersion: v1
    kind: ConfigMap
    metadata:
     labels:
       app: filebeat-daemonset
       component: filebeat
       release: myapp
     name: filebeat-sidecar-config
    data:
     filebeat.yml: |-
       filebeat.prospectors:
       - input_type: log
         encoding: '${ENCODING:utf-8}'
         paths: '${LOG_DIRS}'
    
         exclude_lines: '${EXCLUDE_LINES:[]}'
         include_lines: '${INCLUDE_LINES:[]}'
    
         ignore_older: '${IGNORE_OLDER:0}'
         scan_frequency: '${SCAN_FREQUENCY:10s}'
         symlinks: '${SYMLINKS:true}'
         max_bytes: '${MAX_BYTES:10485760}'
         harvester_buffer_size: '${HARVESTER_BUFFER_SIZE:16384}'
    
         multiline.pattern: '${MULTILINE_PATTERN:^\s}'
         multiline.match: '${MULTILINE_MATCH:after}'
         multiline.negate: '${MULTILINE_NEGATE:false}'
    
         fields_under_root: '${FIELDS_UNDER_ROOT:true}'
         fields:
           type: '${FIELDS_TYPE:kube-logs}'
           node_hostname: '${NODE_HOSTNAME}'
           pod_ip: '${POD_IP}'
         tags: '${TAGS:sidecar}'
    
       filebeat.config.modules:
         # Set to true to enable config reloading
         reload.enabled: true
    
       output.logstash:
         # Sends logs to IBM Cloud Private managed Logstash by default
         hosts: '${LOGSTASH:logstash.kube-system:5044}'
         timeout: 15
    
       logging.level: '${LOG_LEVEL:info}'
    
  2. Open your existing Kubernetes deployment .yaml file.

  3. Locate the containers section declaration, and add a peer volumes section. See volumes External link icon in the Kubernetes documentation.
  4. Add volume clauses for the application container. In most cases, for each log folder, add an emptyDir volume mount point. For applications with more complex storage requirements, you can also separately declare a persistent volume. Add a separate name and emptyDir declaration for each folder. The volume declaration resembles the following code:

     volumes:
     - name: <logs_volume>
       emptyDir: {}
    
  5. Add volume mount points to your application's container declaration. Note that the value of the name attribute must be the same as the value specified in the volumes declaration.

     containers:
     - name: my-app
       ...
       volumeMounts:
       - name: <logs_volume>
         mountPath: /var/log
    
  6. Add the sidecar container to the pod. Add the following declaration to the containers section:

    containers:
    - name: myapp-sidecar-filebeat
     image: docker.elastic.co/beats/filebeat:5.5.1
    
  7. Within the sidecar container section, define volume mount points for each of the shared volumes.

     volumeMounts:
     - name: logs_volume
       mountPath: /var/log/applogs
    
  8. Configure the sidecar to find the logs. The Filebeat ConfigMap defines an environment variable LOG_DIRS. You specify log storage locations in this variable's value each time you use the ConfigMap. You can provide a single directory path or a comma-separated list of directories.

     env:
       - name: LOG_DIRS
         value: /var/log/applogs/app.log
    
  9. Attach the Filebeat ConfigMap that you created by referencing it as a volume.

    containers:
    - name: myapp-sidecar-filebeat
     ...
     volumeMounts:
     ...
     - name: filebeat-config
       mountPath: /usr/share/filebeat/filebeat.yml
       subPath: filebeat.yml
    ...
    volumes:
    ...
    - name: filebeat-config
     configMap:
       name: filebeat-sidecar-config
       items:
         - key: filebeat.yml
           path: filebeat.yml
    
  10. Run Filebeat as the root user to prevent permission issues while reading logs.

     spec:
       securityContext:
         runAsUser: 0
       containers:
    

The completed application deployment file might resemble the following text:

apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  template:
    metadata: 
      labels:
        app: myapp
    spec:
      securityContext:
        runAsUser: 0
      containers:
      - name: sidecar-filebeat
        image: docker.elastic.co/beats/filebeat:5.5.1
        env:
          - name: LOG_DIRS
            value: /var/log/applogs/app.log
          - name: NODE_HOSTNAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        volumeMounts:
        - name: logs_volume
          mountPath: /var/log/applogs
        - name: filebeat-config
          mountPath: /usr/share/filebeat/filebeat.yml
          subPath: filebeat.yml
      - name: myapp-container
        [ your container definition ]
        volumeMounts:
        - name: logs_volume
          mountPath: /var/log
      volumes:
      - name: logs_volume
        emptyDir: {}
      - name: filebeat-config
        configMap:
          name: filebeat-sidecar-config
          items:
            - key: filebeat.yml
              path: filebeat.yml

Finalizing configuration

Before you deploy an application, make sure that the Filebeat configuration correctly targets Logstash. In the previous example, the default value of the output.logstash.hosts field is logstash.kube-system:5044. You customize this value to point to an existing Logstash service running in your Kubernetes environment. To obtain the Logstash service name and port number, you can either review the IBM Cloud Private user interface or the output of the kubectl get services command (see the CLI documentation).

Testing the sidecar

Using the CLI, create the ConfigMap and deploy the application. For example:

  1. kubectl create -f filebeat.yml
  2. kubectl create -f myapp.yml

After you deploy the pod, locate the Kibana instance that is associated with your target Elasticsearch instance. If your environment contains multiple Kibana instances and you do not know which instance to use, run kubectl get services to find the one that you need

In the Kibana dashboard, open the Discover tab and search for the logs. The container name takes this form: kubernetes.container_name:myapp-container, where myapp-container is your container name. Your application's log entries display.

If the logs do not display after a short period, an issue might prevent Filebeat from streaming the logs to Logstash. Review the output of the kubectl describe pod and kubectl logs commands to examine why the logs are not streaming. For example, if you followed the previous sidecar steps, you might run the kubectl logs myapp sidecar-filebeat command to retrieve the Filebeat stdout and stderr streams. They might contain error messages that help to troubleshoot the connection to Logstash.

Elasticsearch curator

When you plan for capacity, storage is one of the most critical considerations. When disk space runs out, it doesn't matter what other bottlenecks are removed in the stack. One of the ways to help keep log storage requirements within certain constraints is to use a curator. The ELK stack that is installed by default in IBM Cloud Private includes a curator, and you are free to modify it, or even remove it and separately deploy your own curator.

About curators

A curator removes data that is older than a specified age. Elasticsearch splits stored data into chunks, called indices. The curator deletes indices that are older than the age that you specify. For more information about the default Elasticsearch curator, which is deployed by default in IBM Cloud Private, see the Curator Reference External link icon in the Elastic documentation.

Curator usage in IBM Cloud Private

The IBM Cloud Private installer deploys the same curator that is documented in the Elastic site. It's deployed as a separate pod, and, by default, runs every night at 24:00 to removing indices that are older than one day.

The default curator:

Customizing the curator

You must modify three files to configure the curator:

  1. The cron file contains configuration information about when to run the curator process.
  2. The action file contains index data, including what indices to clean and how old indices can be.
  3. The config file contains basic configuration for the curator itself, including the Elasticsearch endpoint and other general settings.

These files are stored in the Kubernetes ConfigMap es-curator in the kube-system namespace. When the curator is deployed, it maps the files that are specified in that ConfigMap to files on the curator container's file system. You can modify the ConfigMap, but your changes must retain the file's original structure.

To retrieve and update the ConfigMap:

  1. Configure the kubectl client. See Accessing your IBM Cloud Private cluster by using the kubectl CLI External link icon
  2. Run kubectl get configmap es-curator -o yaml --namespace kube-system. This outputs the contents of the ConfigMap, which consists of three major sections, each representing a separate file.
  3. Copy the contents of the output into a text file.
  4. Make any other modifications.
  5. Run kubectl update -f <path_to_updated_file>

Kibana

Kibana External link icon is a user interface that enables users to easily access the data that is stored in Elasticsearch. There is extensive documentation about it on Elastic's website.

Kibana in IBM Cloud Private

You can deploy Kibana as a managed service during the IBM Cloud Private installation. When you deploy this service, the deployed Kibana instance uses the official Docker image that Elastic published. In this image, XPack External link icon and related features are disabled, but all other functions are accessible. You can access Kibana by modifying the IBM Cloud Private console address. In that URL, replace /console with /kibana.

If you deploy Kibana after you install IBM Cloud Private, refer to that deployment's documentation to determine the target URL. The features that are enabled in that deployment vary depending on the provider and the available options that you configure.

Initial configuration

Note: When Kibana initially starts, it requires several minutes to optimize its plug-ins. You cannot access Kibana during this process. See Updating & Removing Plugins External link icon in the Elastic documentation.

After Kibana launches, open it in your browser and configure the indices that you use in Elasticsearch queries. See Creating an Index Pattern to Connect to Elasticsearch External link icon in the Elastic documentation.

Standard fields

After you initially configure Kibana, users can open the Discover tab to search and analyze log data. The fields that Elasticsearch has discovered to be part of the index or index pattern are displayed. Some of those fields are generated by Filebeat and Logstash as the logs are processed through the ELK stack. The following fields are generated:

By using these fields, you can use Kibana to monitor or analyze the logs for both small and large scopes. The scope can be as narrow as the container itself or as broad as every pod in the namespace.