IBM Operations Analytics - Log Analysis, Version 1.3.2

Duplication of log records on the SCALA server

On occasion, when the logstash agent is re-started, and the log file being monitored is updated (for example, via a streaming log), logstash will ingest the entire file again rather than restarting from where it stopped monitoring.

Symptoms

The problem results in a duplication of log records on the SCALA server.

Causes

Several problems have been reported on the logstash forum (https://logstash.jira.com/secure/Dashboard.jspa) that its sincedb pointer (which tracks the last monitored position in the log file) sometimes is not updated correctly. In addition, using control-C to terminate the logstash agent does not always kill logstash. The result is a "phantom" logstash agent that is still monitoring log files. This can also result in duplicate log records.

Resolving the problem

A workaround to avoid duplicate log records after restarting logstash is to set the sincedb_path parameter in the file plugin to /dev/null, thereby telling logstash to ignore tracking the last-monitored file position, and always start monitoring from the end of the file. However, this will result in logstash ignoring any updates to the log file while the logstash agent is down. For example, in logstash-scala.conf, update:
```
input {
    file {
        type => "apache"
        path => ["/tmp/logs/myapache.log"]
        sincedb_path => "/dev/null"
    }
}
```
Before re-starting logstash after making these configuration changes, you may also want to clean up any sincedb databases that were already created. By default, the sincedb database is stored in the directory $HOME, and have filenames starting with ".sincedb_".
When terminating the logstash agent using control-C, verify that the logstash java process was actually terminated. You can use the following command to see if logstash is still running:
```
   ps -ef | grep logstash
```

Feedback