Inference service stops processing
The inference service stops processing and does not restart in your IBM® Netcool® Operations Insight® on Red Hat® OpenShift® deployment. Restart pods to workaround the issue.
Problem
The inference service throws a Redis Stack trace, stops processing, but pods remain running and are not restarted by inference self monitoring.
Example stack
trace:
> INFO [2022-07-07 13:06:54,636] org.apache.kafka.clients.consumer.internals.AbstractCoordinator: [Consumer clientId=consumer-inferenceService-10, groupId=inferenceService] Member consumer-inferenceService-10-9e5bf334-0510-4741-9d87-908d4cc2d853 sending LeaveGroup request to coordinator evtmanager-kafka-4.evtmanager-kafka.noi.svc.cluster.local:9092 (id: 2147483643 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
> ERROR [2022-07-11 02:36:29,154] redis.clients.jedis.JedisSentinelPool: Lost connection to Sentinel at evtmanager-ibm-redis:26379. Sleeping 5000ms and retrying.
> ! java.net.SocketException: Connection timed out (Read failed)
> ! at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
> ! at java.base/java.net.SocketInputStream.read(Unknown Source)
> ! at java.base/java.net.SocketInputStream.read(Unknown Source)
> ! at java.base/java.net.SocketInputStream.read(Unknown Source)
> ! at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:195)
> ! ... 9 common frames omitted
> ! Causing: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Connection timed out (Read failed)
> ! at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:201)
> ! at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
> ! at redis.clients.jedis.Protocol.process(Protocol.java:141)
> ! at redis.clients.jedis.Protocol.read(Protocol.java:205)
> ! at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:297)
> ! at redis.clients.jedis.Connection.getRawObjectMultiBulkReply(Connection.java:242)
> ! at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:108)
> ! at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:102)
> ! at redis.clients.jedis.Jedis.subscribe(Jedis.java:2628)
> ! at redis.clients.jedis.JedisSentinelPool$MasterListener.run(JedisSentinelPool.java:290)
Resolution
To work around this issue, delete the inference service pods.
- Run the following command to delete the inference service
pods.
oc delete po <inferenceservice>
- Confirm that the pods have restarted and are in a
Running
state, by running the following command:
Example output:oc get po | grep inference
evtmanager-ibm-hdm-analytics-dev-inferenceservice-f897cf7bhf2p9 1/1 Running 0 3m evtmanager-ibm-hdm-analytics-dev-inferenceservice-f897cf7bl2pn5 1/1 Running 0 29d