No restart for dedup-aggregationservice due to OOMkilled
No new actions or new groupings are displayed for events in the Alert Viewer because the de-duplicator goes in to crashloopbackoff or fails to restart.
Problem
When running the following
command:
kubectl describe po -l
app.kubernetes.io/component=dedup-aggregationservice
You might see the following output: Reason: OOMKilled
Exit Code: 0
A message
similar to the following can be seen in the kubectl logs on the
de-duplicator:{"name":"clients.kafka","hostname":"pvt-ibm-hdm-analytics-dev-dedup-aggregationservice-b4df85bmsd4c","pid":17,"level":30,"brokerStates":{"0":"UP","1":"UP","2":"UP","-1":"UP"},"partitionStates":{"ea-actions.0":"UP","ea-actions.1":"UP","ea-actions.2":"UP","ea-actions.3":"UP","ea-actions.4":"UP","ea-actions.5":"UP"},"msg":"lib-rdkafka status","time":"2020-02-28T09:55:11.258Z","v":0}
{"name":"clients.kafka","hostname":"pvt-ibm-hdm-analytics-dev-dedup-aggregationservice-b4df85bmsd4c","pid":17,"level":50,"err":{"message":"connect ETIMEDOUT","name":"Error","stack":"Error: connect ETIMEDOUT\n at Socket.<anonymous> (/app/node_modules/ioredis/built/redis/index.js:275:31)\n at Object.onceWrapper (events.js:286:20)\n at Socket.emit (events.js:198:13)\n at Socket._onTimeout (net.js:442:8)\n at ontimeout (timers.js:436:11)\n at tryOnTimeout (timers.js:300:5)\n at listOnTimeout (timers.js:263:5)\n at Timer.processTimers (timers.js:223:10)","code":"ETIMEDOUT"},"msg":"Error from redis client","time":"2020-02-28T09:55:11.955Z","v":0}
Resolution
As a workaround, run the following
command:
kubectl -L redis-role get pod | grep redis
You should have one
controller node. If there is no controller node or there is more than one controller node, you must
scale down to 0 the redis server statefulset and then scale it back up to 3.Note: Scaling of
stateful sets in the product is not supported.