IBM Support

Auth IDP won't become Ready (infinite restarts)

Troubleshooting


Problem

This technote proposes a solution to resolve the late startup issue by reviewing and increasing the current CPU and memory requests and limits through the operandconfig.
The authentication identity provider (Auth IDP) is not becoming ready due to the "platform-auth-service" container in IBM Cloud Pak for Integration continuously receiving a sigterm signal during initialization, resulting in an infinite loop of restarts. This issue might be observed despite the environment functioning properly.
You find out that the container gets restarted in an infinite loop due to the readiness/liveness probe probes being triggered too early.

Manually editing the auth IDP deployment and increasing the initial delay seconds for the readiness/liveness probes to 5 minutes allows the "platform-auth-service" to finish initialization and become ready.

To make Auth IDP work you change initialDelaySeconds parameters at Deployment level:

name: platform-auth-service
      readinessProbe:
        ...
        initialDelaySeconds: 300    #default is 60
        ...
      livenessProbe:
        ...
        initialDelaySeconds: 500    #deafult is 180
But this is a temporary workaround, not a solution and you want to set this somehow at CRD level.

Symptom

The persistent late startup problem:  Auth IDP won't become Ready. "platform-auth-service" container gets SIGTERM while initializing and gets restarted (infinite loop of restarts).

Cause

The cause is the too early readiness and liveness probes

Environment

  • Product Version: Cloud Pak for Integration
  • Openshift Container Platform: 4.10
  • Cloud Platform: VSphere, VMWare 

Diagnosing The Problem

As the system expands, it becomes apparent that simply adjusting the readiness and liveness probe timings may not provide a long-lasting solution. It is crucial to examine whether the existing CPU and memory requests and limits are adequate for the expanded system. Insufficient CPU and memory resources can result in delayed startup of the Liberty server, leading to failed probes.

Resolving The Problem

To address the persistent late startup problem, it is recommended to evaluate the CPU and memory resources allocated for the growing system. Follow the steps below to check the current CPU and memory requests and limits and make adjustments using the operandconfig:
1. Assess CPU and Memory Resource Requirements: Evaluate the demands of the system, considering factors such as increased workload and user traffic. Determine whether the current configuration meets the system's requirements or if additional resources are necessary.
2. Review Current CPU and Memory Requests and Limits: Inspect the current CPU and memory requests and limits set for the system. These settings define the minimum resources allocated to ensure smooth operation.
3. Adjust through Operandconfig: To increase the CPU and memory resources, modify the operandconfig settings. Operandconfig allows for granular control over resource allocation in the system.
4. Monitor and Test: After modifying the settings, closely monitor the system's performance and conduct thorough testing. Ensure that the changes effectively resolved the late startup issue without adversely affecting other aspects of the system.
Exemple:
limits:
  cpu: '2'
  memory 2Gi
Implementing these steps should ideally address the late startup problem by providing the necessary CPU and memory resources for the growing system. Note that these recommendations are specific to addressing the late startup issue, and other performance-related concerns may require varying solutions.

By investigating and adjusting the current CPU and memory requests and limits through operandconfig, the late startup issue in the growing system can be mitigated. It is important to regularly assess resource demands and make necessary revisions to ensure optimal performance and stability.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSRV9V","label":"IBM Cloud Pak foundational services"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Product Synonym

Cloud Pak for Integration, platform-auth-service, IBM Cloud Pak foundational services

Document Information

Modified date:
06 July 2023

UID

ibm17009999