IBM Support

Pure Data Analytics, PTS Replication Error

Troubleshooting


Problem

PTS replication error for certain large volume environments reported as ptsMonitor.cpp Caught exception: Failed to watch directory

Symptom

Replication error in high transaction volume environments reported as "ptsMonitor.cpp Caught exception Failed to watch directory"

Cause

The issue is due to a known defect under RTC # 94760 and is due the to 100,000 limit of directories monitored

Environment

PDA Replicated environment with PTS server running pre 7.2.0.4 NPS code, ie 7.2.0.3P2

Diagnosing The Problem

Checking the monitor.log on PTS server and you will see the following error listed:


2015-03-05 08:24:29 27516 ptsMonitor.cpp: 93 ptsMonitor.cpp:317 Caught exception Failed to watch directory. The PTS monitor is currently watching 100000 directories. The value of sysctl parameter fs.inotify.max_user_watches is 100000.
2015-03-05 08:24:29 28259 ptsMonitor.cpp: 78 Creating PTS monitor.

Resolving The Problem

Since this issue has been validated and resolved in future NPS releases as listed below, you can update the Linux /etc/sysctl.conf parameter to get replication caught up when this error is seen.

Be sure to implement this on each PTS server.



fs.inotify.max_user_watches to 500,000 directories.

<code> sysctl -p fs.inotify.max_user_watches=500000 </code>

Can verify this is set by checking the sysctl.conf after running the above command.

[{"Product":{"code":"SSULQD","label":"IBM PureData System"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Query Processing","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.0.0","Edition":"All Editions","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 October 2019

UID

swg21750424