Filtering OpenTelemetry logs
The OpenTelemetry Collector allows the log data that it collects to be filtered in many different ways. This document provides examples of how to filter logs in certain common scenarios. These features are included only in the OpenTelemetry Collector Contrib repo, so these examples require that version. For more information, see the OpenTelemetry Collector's contrib documentation.
Filtering logs by their contents
The filter processor accepts regular expressions that are applied to the contents of log messages. Any log messages that match the given regular expression are dropped, and never forwarded on to the receiver on the other end.
Examples
All these examples follow a similar pattern. A
filter section must be added to the
opentelemetry-collector's configuration file, and that filter must
be included in the logs/processors pipeline in the
same file. If you are using Helm to install the
collector, then the configuration goes in the config:
section of the values.yaml file on installation.
Excluding log messages that contain a particular substring
Consider a log file that contains a timestamp, a log level, and a message, such as this example:
2024-02-09 13:00:51 ERROR This is a test error message
2024-02-09 13:02:49 ERROR This is a test error message containing a secret
These logs can be matched more generally by using a filelog receiver configuration like this:
receivers:
filelog/simple:
include: [ /tmp/foo.log ]
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
severity:
parse_from: attributes.sev
The given regular expression assigns the timestamp and the
severity to the named capture groups. The timestamp
layout is defined so that the collector understands
the format, which allows the correct timestamp from the log message
to be used as the timestamp of the record the collector sends to
the server. The severity is sent through unchanged.
This receiver must be added to the logs/receivers
pipeline:
pipelines:
logs:
receivers:
- filelog/simple
This configuration reports all correctly formatted messages from the log back to the server.
However, one of the log messages contains a secret. To exclude
any messages that contain secrets, add a filter to the
processors section of the config. The filter must
contain a regular expression, which matches any string that
includes the word secret. The following example uses
the
OTTL IsMatch(..) function to match any string that includes the
word secret.
processors:
filter/remove_secret:
error_mode: ignore
logs:
log_record:
- 'IsMatch(body, ".*secret.*")'
Add this filter to the processors pipeline:
pipelines:
logs:
receivers:
- filelog/simple
processors:
- filter/remove_secret
With this configuration in place, only the first message from the example log lines is reported, but the second one, which contains the word "secret", is dropped.
Excluding syslog messages from a particular service
Consider a syslog on a Linux® system. If someone uses the Gnome desktop on this system, it can create noisy logs:
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 31 with keysym 31 (keycode a).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 32 with keysym 32 (keycode b).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 33 with keysym 33 (keycode c).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 34 with keysym 34 (keycode d).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 35 with keysym 35 (keycode e).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 36 with keysym 36 (keycode f).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 37 with keysym 37 (keycode 10).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 38 with keysym 38 (keycode 11).
Feb 9 09:10:00 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b org.gnome.Shell.desktop[4771]: Window manager warning: Overwriting existing binding of keysym 39 with keysym 39 (keycode 12).
Feb 9 09:10:26 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b systemd[1]: fprintd.service: Succeeded.
As with the first example, the filelog receiver can
monitor this log and map the timestamp by using a regular
expression:
receivers:
filelog/syslog:
include: [ /var/log/syslog ]
operators:
- type: regex_parser
regex: '^(?P<time>[A-Za-z]{3}[ ]+\d{1,2} \d{2}:\d{2}:\d{2}) (?P<msg>.*)$'
timestamp:
parse_from: attributes.time
layout: '%b %e %H:%M:%S'
The timestamp looks different from the first example, but with
the correct regular expression and layout definition, it can be
understood. This log contains no severity. The receiver, as always,
must be included in the logs/receivers pipeline:
pipelines:
logs:
receivers:
- filelog/simple
- filelog/syslog
On a server system, one might want to monitor the system processes but exclude logs that the desktop environment generates. A regex filter can block all messages from a particular service:
processors:
filter/remove_gnomeshell:
error_mode: ignore
logs:
log_record:
# message body will have the timestamp stripped off by the regex_parser, so it looks like:
# "hostname service[pid]: message"
# Regex matches this as:
# [^ ]+ <hostname> One or more non-whitespace characters, followed by
# A space, followed by
# org.gnome.Shell.desktop <service name> followed by
# [ followed by
# [0-9]+ <pid> one or more numeric digits, followed by
# ]: followed by
# .* <message> the rest of the log message
- 'IsMatch(body, "[^ ]+ org.gnome.Shell.desktop\\[[0-9]+\\]:.*")'
Add the filter to the logs/processors pipeline:
logs:
receivers:
- filelog/simple
- filelog/syslog
processors:
- filter/remove_secret
- filter/remove_gnomeshell
Now, any log records where the service field is
org.gnome.Shell.desktop are dropped.
Filtering logs by infrastructure data
The filter processor is able to filter based on information
supplied in the resource attributes section of the
payload. This processor makes it possible to exclude all messages
from, for example, a particular Kubernetes pod, or an entire
Kubernetes namespace, or a particular host. For more information
about supported infrastructure data, see
infrastructure data.
Filtering logs by log severity
There might be scenarios where containers are configured to generate logs with various severity levels. In such cases, you might want to filter out all messages except for ERROR and FATAL messages to avoid flooding the collector with unnecessary data.
Consider the following example logs:
[15:52:30 DEBUG] Some debug message.
[15:52:30 INFO] Some info message.
[15:52:30 ERROR] Some error message.
[15:52:30 FATAL] Some fatal message.
To filter the DEBUG and INFO logs by severity, you must first
set the severity_text field for each log record. This
can be accomplished by using the following example
transform processor:
transform/set_log_severity:
log_statements:
- context: log
statements:
- set(severity_text, "Debug") where IsMatch(body.string, "\\[[0-9]{2}:[0-9]{2}:[0-9]{2} DEBUG\\]")
- set(severity_text, "Info") where IsMatch(body.string, "\\[[0-9]{2}:[0-9]{2}:[0-9]{2} INFO\\]")
- set(severity_text, "Error") where IsMatch(body.string, "\\[[0-9]{2}:[0-9]{2}:[0-9]{2} ERROR\\]")
- set(severity_text, "Fatal") where IsMatch(body.string, "\\[[0-9]{2}:[0-9]{2}:[0-9]{2} FATAL\\]")
severity_text will be set to an empty string, which
Instana will interpret as a severity level of
None.With the severity_text set in the log record, you
can use the following sample filter processor to
ignore logs with a severity level lower than ERROR and logs without
a severity level.
filter/remove_unnecessary_logs:
logs:
log_record:
- IsMatch(severity_text, "^(|Debug|Info)$")
This example uses the
OTTL IsMatch(..) function to exclude logs where the specified
regex matches empty string value, the Debug, and the
Info log severities.
Alternatively, you can specify logs you want to include instead
of exclude by the log severity by using the
not keyword. This way, you now filter out logs that do not
match the severity levels Error or
Fatal.
filter/remove_unnecessary_logs:
logs:
log_record:
- not IsMatch(severity_text, "^(Error|Fatal)$")
In the service/pipelines/logs/processors section,
include the new filter:
processors:
- resourcedetection
- transform/set_log_severity
- filter/remove_unnecessary_logs
- batch
Filtering logs by resource attributes
The following is a non-exhaustive list of example resource attributes that can be used for log filtering:
Sample attributes collected by the k8sattributes processor and the resourcedetection processor:
-
resource.attributes["k8s.pod.name"]: The name of the Kubernetes pod that generated the log message -
resource.attributes["k8s.container.name"]: The name of the underlying container that generated the log message -
resource.attributes["k8s.namespace.name"]: The name of the namespace that contains the pod that generated the log message -
resource.attributes["k8s.deployment.name"]: The name of the Kubernetes deployment object that controls the pod that generated the log message -
resource.attributes["k8s.node.name"]: The name of the Kubernetes node where the pod that generated the log message runs -
resource.attributes["k8s.pod.hostname"]: The name of the host where the process that generated the log message is running. If the collector is running inside a container, the hostname is typically the container name, and not the name of the actual underlying host -
resource.attributes["os.type"]: The operating system type of the host where the process that generated the log message is running
By utilizing
k8sattributes processor,
resource processor and
transform processor, user-defined attributes in the
resource.attributes mapping can be dynamically set and
used for customizable log filtering.
Examples
All these examples follow a similar pattern. A
filter section must be added to the
opentelemetry-collector's configuration file, and that filter must
be included in the logs/processors pipeline in the
same file. If you are using Helm to install the
collector, then the configuration goes in the config:
section of the values.yaml file on installation.
Excluding log messages from a particular Kubernetes container
Suppose you want to filter out all log messages from a
Kubernetes container called calico-node. In the
processors section of the config, add a block like
this:
filter/remove_calico:
error_mode: ignore
logs:
log_record:
- resource.attributes["k8s.container.name"] == "calico-node"
Then in the service/pipelines/logs/processors
section, include the new filter:
processors:
- resourcedetection
- transform/set_log_severity
- filter/remove_calico
- batch
The new filter is listed last. The other processors that are shown are just for context, and are not needed for the filter processor to work.
Excluding logs messages from all Linux® systems
If you want to block all log messages from any Linux®-based
system, then in the 'processors section of the config
you can add:
filter/remove_linux:
error_mode: ignore
logs:
log_record:
- resource.attributes["os.type"] == "linux"
Likewise, in the service/pipelines/logs/processors
section, building on the previous example, include the new
filter:
processors:
- resourcedetection
- transform/set_log_severity
- filter/remove_calico
- filter/remove_linux
- batch
Excluding log messages from a particular Kubernetes pod or deployment with specific labels or annotations
Suppose you have the following Kubernetes deployment with the labels and annotations where you want to filter out all the log messages:
apiVersion: apps/v1
kind: Deployment
metadata:
[...]
spec:
[...]
template:
metadata:
labels:
some-keyword-label: "ABCD-label-substring-ABCD"
annotations:
some-keyword-annotation: "ABCD-annotation-substring-ABCD
If you want to filter by some-keyword-label or
some-keyword-annotation, you can extend the
k8sattribute processor as shown the following example
to collect all the pod labels and annotations:
processors:
k8sattributes:
[...]
extract:
metadata:
[...]
## Note: The '$$1' is a placeholder for the label or annotation name and will be used in the 'resource.attributes' mapping.
labels:
- tag_name: $$1
key_regex: (.*)
from: pod
annotations:
- tag_name: $$1
key_regex: (.*)
from: pod
Use the following example to extract the
some-keyword-label label and
some-keyword-annotation annotation:
processors:
k8sattributes:
[...]
extract:
metadata:
[...]
## Note: In this example same label or annotation names are used as the ones to be extracted.
## The 'tag_name' corresponds to the key in the 'resource.attributes' mapping.
## The 'key' corresponds to the label or annotation you want to extract.
labels:
- tag_name: some-keyword-label
key: some-keyword-label
from: pod
annotations:
- tag_name: some-keyword-annotation
key: some-keyword-annotation
from: pod
Once you have configured the k8sattributes
processor to extract the desired labels and annotations, you can
use the filter or filter_by_keyword
processor to exclude log messages from pods that have the
some-keyword-label label or
some-keyword-annotation annotation:
filter/keyword_filter:
logs:
log_record:
## Note: You can add as many filters as you would like if there are multiple labels/annotations.
- IsMatch(resource.attributes["some-keyword-label"], ".*(label-substring).*")
- IsMatch(resource.attributes["some-keyword-annotation"], ".*(annotation-substring).*")
In the service/pipelines/logs/processors section,
building on the previous example, include the new filter:
processors:
- resourcedetection
- filter/filter_by_keyword
- batch
The new filter is listed last. The other processors that are shown are just for context, and are not needed for the filter processor to work.
Redacting sensitive information from logs
Log messages sometimes contain Personally Identifiable
Information (PII) or other sensitive data that needs to be kept
private and not be sent to the server or saved. This information
might include things like passwords, credit card numbers, or any
number of other things. The transform processor can
detect such information by using a regular expression and replace
it with something else.
Examples
These examples follow a similar pattern. A
transform section must be added to the
opentelemetry-collector's configuration file, and that transform
must be included in the logs/processors pipeline in
the same file. If you are using Helm to install the
collector, then the configuration goes in the config:
section of the values.yaml file on installation.
Removing passwords from log messages
Suppose that an application logs the supplied password whenever an authentication failure occurs. The log record might look like this:
2024-02-14 19:40:31 WARNING failed login for user bob, password=bobo
It is important to know that a failed login attempt occurred, but it is inappropriate to log the password that was used.
The application logs the password with a known format. The
pattern is always password=xyz. A regular expression
can detect that pattern and replace it with something else:
transform/redact_password:
log_statements:
- context: log
statements:
# Any log messages containing "password=xxx" or "passwd=yyy" will be matched.
# Regex matches these as:
# passw The literal string 'passw', followed by
# (?:or)?? The literal string "or" 0 or 1 times (this allows either password or passwd)
# d= The literal string "d=", followed by
# [^\s]* Any non-whitespace characters (the password), followed by
# (\s?)* 0 or more whitespace characters, marking the end of the password.
- replace_pattern(body, "passw(?:or)??d\\=[^\\s]*(\\s?)", "password=REDACTED")
- replace_pattern(attributes["msg"], "passw(?:or)??d\\=[^\\s]*(\\s?)", "password=REDACTED")
The replace_pattern directive occurs once for the
message body and once for the msg
attribute. The open telemetry collector puts the contents of the
message in both places so both need to be updated.
And add the transform to the logs/processors
pipeline:
logs:
receivers:
- otlp
- filelog/simple
processors:
- transform/redact_password
The transform turns the initial log message into:
2024-02-15 15:45:37 WARNING failed login for user bob, password=REDACTED
Removing hostnames from log messages
Suppose that another application writes hostnames into the syslog:
Feb 15 15:52:36 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b service[45568]: This is a test error message from where.ever.ibm.com
If the hostnames are considered confidential, it is possible to
block them from being added to the log. If all the hosts are in the
same domain, in this example .ibm.com, then a regular
expression can recognize the pattern and obscure the names:
transform/remove_hostnames:
log_statements:
- context: log
statements:
# Any log message containing a hostname ending in ".ibm.com" will have the hostname removed.
# Regex matches as:
# ([a-zA-Z0-9-_\.]+) One or more letters, digits, dashes, underscores, or dots, followed by
# \.ibm\.com The literal string ".ibm.com"
- replace_pattern(body, "([a-zA-Z0-9-_\\.]+)\\.ibm\\.com", "<hidden hostname>")
- replace_pattern(attributes["msg"], "([a-zA-Z0-9-_\\.]+)\\.ibm\\.com", "<hidden hostname>")
Again, both the body and the msg
attribute are rewritten since the log message text occurs in both
places.
And again, add the transform to the pipeline:
logs:
receivers:
- filelog/syslog
processors:
- transform/redact_password
- transform/remove_hostnames
The message is then rewritten as follows:
Feb 15 15:52:36 li-8dc514cc-2e0d-11b2-a85c-f1d7ce42b83b service[45568]: This is a test error message from <hidden hostname>