Duplication of log records on the SCALA server
On occasion, when the logstash agent is re-started, and the log file being monitored is updated (for example, via a streaming log), logstash will ingest the entire file again rather than restarting from where it stopped monitoring.
Symptoms
Causes
Resolving the problem
A workaround to avoid duplicate log records after restarting logstash is to set the sincedb_path parameter in the file plugin to /dev/null, thereby telling logstash to ignore tracking the last-monitored file position, and always start monitoring from the end of the file. However, this will result in logstash ignoring any updates to the log file while the logstash agent is down. For example, in logstash-scala.conf, update:
input { file { type => "apache" path => ["/tmp/logs/myapache.log"] sincedb_path => "/dev/null" } }
Before re-starting logstash after making these configuration changes, you may also want to clean up any sincedb databases that were already created. By default, the sincedb database is stored in the directory $HOME, and have filenames starting with ".sincedb_".
- When terminating the logstash agent
using control-C, verify that the logstash java
process was actually terminated. You can use the following command
to see if logstash is
still running:
ps -ef | grep logstash