IBM Streams 4.3.0

Clients disconnect from the ZooKeeper ensemble

Clients, such as IBM® Streams interfaces and services, might disconnect from ZooKeeper for various reasons. For example, this problem occurs if the network is down, memory errors occurred, or embedded ZooKeeper restarted.

Symptoms

If IBM Streams interfaces or services are disconnecting from ZooKeeper, you might see one or more of the following symptoms:
  • ConnectionLoss error in the ZooKeeper logs.
  • ConnectionLoss error at the command prompt after a streamtool command hangs and then fails.
  • ConnectionLoss error in the IBM Streams trace file after a service fails to start.
  • Automatic stop and restart of a service.

Causes

  • The network might be down.
  • Disk input/output contention or swapping is occurring.
  • ZooKeeper is running on the same host as other input/output or CPU intensive services.
  • JVM garbage collection is running too long.
  • The session timeout value is too small or is not configured correctly.
  • Out-of-memory errors occurred. Because of the amount of data that is stored in ZooKeeper by IBM Streams, the server runs out of memory and all of the clients are disconnected.
  • The embedded ZooKeeper monitor automatically restarted the embedded ZooKeeper server.

Resolving the problem

  • Check for network failures.

  • Run the external ZooKeeper server on a dedicated machine.

  • Use a dedicated disk for the transaction log, especially when the ZooKeeper log file contains warnings about high fsync timings.

    • External ZooKeeper: The ZooKeeper Administrator’s Guide recommends having a dedicated disk for the dataLogDir directory that is separate from the dataDir directory. Set the dataLogDir parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.

    • Embedded ZooKeeper: Set the streams.zookeeper.property.dataLogDir bootstrap property by using the streamtool setbootproperty command.

  • The default Java™ heap size for ZooKeeper is the JVM default for the system. When OutOfMemoryError messages indicate that the maximum heap size is not sufficient for the ZooKeeper runtime system and data, you can increase the heap size.
    External ZooKeeper: Set the JVMFLAGS environment variable. The following example shows how to set a maximum heap size of 1 GB:
    1. In the ZooKeeper-installation-directory/conf directory, create a java.env file.
    2. Add export JVMFLAGS="-Xmx1024m" to the file.
    3. Start the external ZooKeeper server.
    Embedded ZooKeeper: Set the streams.zookeeper.jvmFlags property. The following example shows how to set a maximum heap size of 1 GB:
    1. Enter the following command:
        streamtool setbootproperty streams.zookeeper.jvmFlags=-Xmx1024m
    2. Start embedded ZooKeeper.

  • To avoid swapping, ensure that the Java heap size is less than the unused physical memory.

  • Tune the garbage collection flags to minimize GC pauses. For more information, see your JVM vendor documentation.

  • If the ZooKeeper log file contains session timeout messages that are similar to the following example, first check other factors such as network latency and resource contention. If necessary, consider increasing the session timeout value.
    INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session session_id, timeout of 40000ms exceeded

    • External ZooKeeper: To increase the session timeout value, update the value of the maxSessionTimeout configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.
      Note: The requested session timeout that is made by the client is configured by using the STREAMS_ZK_SESSION_TIMEOUT_SEC environment variable. The default value is 60 seconds. The actual, negotiated session timeout is limited by the server minSessionTimeout and maxSessionTimeout configuration parameters.

    • Embedded ZooKeeper: To increase the session timeout value, update the value of the streams.zookeeper.property.maxSessionTimeout bootstrap property. By default, the value of this property is set to 180000 ms. To check the value of this property, use the streamtool getbootproperty -a command. To update the property value, use the streamtool setbootproperty command.

  • External ZooKeeper: If quorum members are falling out of the quorum, first check other factors such as network latency and resource contention. If necessary, increase the value of the syncLimit configuration parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.

  • If embedded ZooKeeper was automatically restarted, it was because the monitor detected that it was no longer serving and restarted it. First, check other factors such as network failure and resource contention. If necessary, increase the interval of the monitor checks by updating the value of the streams.zkmonitor.checkIntervalMs bootstrap property. By default, the check occurs every 50000 ms. To check the value of this property, use the streamtool getbootproperty -a command. To update the property value, use the streamtool setbootproperty command.