Troubleshooting issues in containers

Errors relating to installation, Persistent Volume, SSL, and other issues can occur during SIPEnvironment setup. Refer to this information and take the necessary corrective actions as needed.

Installation errors

  • Error message
    Unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .containers[0].runAsUser: Invalid value: 1000: must be in the ranges
    Cause
    This error occurs when a container is attempting to run with a runAsUser value that is 1000 in this case, and it is not allowed by the assigned Security Context Constraints (SCC) in OpenShift.
    Action
    The user UID range for the restricted or custom SCC must be configured at the namespace level to ensure that containers run with an appropriate UID within the permitted range.

    It is essential to verify that the UID range that is assigned to the namespace falls within the specified range - 1000/100. This ensures compliance with security constraints. Check the status of the probe job for errors.

  • Error message
    Failed to pull image “<image_repo>/image_name” manifest unknown
    Cause
    The specified image or tag does not exist in the registry. Check the event details of the failing pods for detailed errors.
    Action
    Ensure that all the required images mentioned in the image parameter of SIPEnvironment custom resource are pushed to specified repository. For more information, see image parameter.
  • Error message
    Failed to pull image “<image_repo>/image_name” Requesting bearer token: invalid status code from registry 400 (Bad Request)
    Cause
    The specified registry requires authentication, and the credentials might be missing or incorrect. Check the event details of the failing pods for detailed errors.
    Action
    Ensure that the correct authentication credentials are configured in the imagePullSecrets specified in the image parameter of the SIPEnvironment custom resource.
  • Error message
    Truststore job failure : Failed to prepare subPath for volumeMount "shared-volume" of container "sip-truststore-job"
    Cause
    The Persistent Volume (PV) lacks the necessary permissions for the container to access or modify the mounted path. Check the event details of the truststore job for detailed errors
    Action
    Ensure that the PV and its corresponding Persistent Volume Claim (PVC) have the correct ownership and writable permissions.
  • Error message
    Truststore job failure : keytool error: java.io.IOException: keystore password was incorrect
    Cause
    The provided password for the truststore is incorrect, preventing the import process from accessing the keystore. Check the logs of truststore pod for detailed errors.
    Action
    Verify and update the correct truststore password in the Sterling Intelligent Promising secret that is specified in the SIPEnvironment custom resource.
  • Error message
    Exception in ES co.elastic.clients.elasticsearch._types.ElasticsearchException: [es/get] failed: [index_not_found_exception]
    Cause
    This issue occurs due to a state mismatch between Cassandra and Elasticsearch during installation. Specifically
    • Cassandra already contains data from a previous installation or partial setup.
    • However, Elasticsearch, is in a fresh state with no existing indexes.
    • When the system attempts to retrieve data from Elasticsearch that must correspond to Cassandra records, the required index is missing, leading to the index_not_found_exception.
    Action
    Ensure that both services do not have any data mismatches. You can delete the existing setup that includes Cassandra and Elasticsearch data, and restart the installation process to ensure a consistent state.
  • Error message
    Cassandra tables and keyspaces are deleted after a restart
    Cause
    This issue occurs because Cassandra storage is not correctly configured in the SIPEnvironment. As a result, Cassandra runs with ephemeral storage, and when the pod restarts, all stored data including keyspaces and tables is lost.
    Action
    Ensure to define persistent storage in theSIPEnvironment for middleware services such as Cassandra, Elasticsearch and Kafka. For more information, see externalServices parameter.

Persistent Volume (PV) and Persistent Volume Claim (PVC) errors

  • Error message
    Waiting for PVC to be Bound
    Cause
    This issue occurs when a Persistent Volume Claim (PVC) is stuck in the pending state because it is not successfully bound to a Persistent Volume (PV). Check the status of the SIPEnvironment for detailed errors.
    Action
    Ensure that the following checks are complete.
    • Check the status of the PVC.
    • Verify available Persistent Volumes (PVs) and a PV exists with the requested size, access mode, and storage class and is in available state.
    • Ensure that the correct storage class is used.
    • Check for insufficient storage.
    • Verify namespace resource quotas.
    • Delete and re-create PVC, if needed.
  • Error message
    Waiting for PVC to be Writable
    Cause
    This issue occurs when a Persistent Volume Claim (PVC) is bound to a Persistent Volume (PV) but the volume is not yet available for write operations. Check the status of the SIPEnvironment for detailed errors.
    Action
    Ensure that the mount path in the Persistent Volume (PV) has the correct permissions and is writable by the application.

SSL and external services connectivity errors

  • Error message
    Failed to send request (javax.net.ssl.SSLHandshakeException: Certificate signature validation failed)] Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    Cause
    This error is seen when Sterling Intelligent Promising establishes a secure connection to middleware services such as Cassandra, Elasticsearch, or Kafka, but the SSL or TLS certificate validation fails. This failure occurs due to one of the following reasons:
    • Missing or incorrect truststore configuration: The Sterling Intelligent Promising is missing the correct truststore that is configured or does not include the required certificates.
    • Expired or invalid certificate: The certificate is expired or does not match the hostname of the middleware service.
    • Intermediate CA certificates missing: The certificate chain is incomplete, leading to validation failure.
    Action
    • Ensure that the middleware service is using a certificate that is signed by a trusted CA.
    • Import the correct CA certificate into the Sterling Intelligent Promising truststore.
    • Verify the certificate validity.
    • Set the apps.sip.ibm.com/validate-external-services-connections annotation to true to trigger a job. The Operator triggers the job to check the connections of the external instances for development and production environments. If any connection fails, the process stops, and error handling help ensure that the errors are logged in the pod logs, enabling accurate debugging. This annotation enables a pre-deployment validation job that checks the connectivity of external middleware services before Sterling Intelligent Promising is deployed.