Troubleshooting

If you experience issues with your IBM Confidential Computing Containers for Red Hat OpenShift Container Platform (CCCO) deployment, the following information might help you identify and resolve the issues.

  • Missing seeds for the volume

    An encrypted volume depends on its seeds to access and preserve the data stored in that volume. If the seeds are deleted or lost, the system can no longer decrypt the existing data. As a result, the data in the encrypted persistent block volume cannot be recovered or retained.

    To continue using the volume, you must format the persistent block volume and then use it again as a new volume. Formatting removes the inaccessible data and prepares the volume for reuse.

  • Container creation failed

    The container image pull failed because the workload could not unpack layers due to insufficient storage.

    To resolve the issue, perform the following steps:
    1. Access the CCCO VM logs. For more information, see Enable debug logs and identify Kata/QEMU processes for Bare Metal.
    2. If logs show No space left on device, increase the memory allocated to the pod by adding or updating the following annotation:
      io.katacontainers.config.hypervisor.default_memory

      For more information, see Configuring resources by using annotations.

  • The contract is mandatory

    If you create a workload on IBM Hyper Protect Confidential Container virtual machine (VM) without passing any contract, the VM starts running, and then shuts down eventually.

  • Contract format is yaml

    The contract must follow the YAML format. If the format is not proper, it fails the contract validation check, and the IBM Hyper Protect Confidential Container VM shuts down eventually. For more information, see Verifying the contract.

  • Contract schema

    If the contract schema is incorrect, the contract validation fails when the IBM Hyper Protect Confidential Container VM is booting and the VM shuts down. You can check for errors that are associated with ccco-contract log identifier in IBM Logging instance.

  • Contract encryption key

    When using the encryption contract, make sure to encrypt the Encrypted Multi-Persona Contract with the correct encryption key. If the key is incorrect, the bootloader fails to decrypt the contract, and you may see decryption failure messages in the serial console of the IBM Confidential Computing Containers VM.

  • Logging configuration failure

    When the IBM Confidential Computing Containers VM boots, monitor the serial console to identify any errors that are logged by the bootloader or from the logging service. If you don’t see any logs reaching the Logging instance, it might be because your logging configuration failed. Failure to configure logging also leads to the VM shutting down.

    • If the logging service is Syslog

      Ensure the cert and key are valid and if you are able to connect to the server via Openssl connect.

      If the logging configuration is correct, the output should be as follows:

      depth=1 C = US, O = Logstash Test CA, CN = ca.example.org
      verify return:1
      depth=0 C = US, O = Rsyslog Test Server, CN = 192.168.122.153
      verify return:1
    • If the logging service is IBM Cloud Logs, check whether the logging hostname, port, and ingestion key that is provided in the contract are correct.
  • Issues with IBM Secure Execution for Linux

    For information about troubleshooting issues with IBM Secure Execution for Linux, see Troubleshooting.

  • Initdata is incorrect

    When the initdata is incorrect, the IBM Hyper Protect Confidential Container fails to start. The logs cannot be sent to the external logging service and an error message is redirected to the console output.

    Check the logs to investigate the issue.

  • Invalid GZIP header

    The provided initdata is not compressed, which causes the container creation to fail. Ensure that the initdata is properly compressed before use. For more information about creating compressed initdata using gzip, see Creating initdata file.

  • Initdata annotation failed

    The initdata file is compressed more than once. This results in NULL bytes in the file, which causes the TOML parser to fail. Ensure that the initdata file is compressed only once. For more information about creating compressed initdata using gzip, see Creating initdata file.

  • Connection Refused to API Server

    When there is a missing authentication the oc CLI cannot connect to the OpenShift API server due to which the user on the bastion node cannot connect to the cluster.

    You must run the following command to authenticate:

    oc login -u kubeadmin -p "$(cat /root/ansible_workdir/auth/kubeadmin-password)" --insecure-skip-tls-verify
  • Undefined variable 'storage' in ansible playbook

    When you have not defined the storage variable in the inventory or host_vars/hostname.yaml file.

    • You must verify that the volume path specified is accessible to the user by running the following command:

      storage:
          pool_path: /var/lib/libvirt/images/<user-name>
    • Ensure the path mentioned in host_vars/hostname.yaml is correct.
    • Run the playbook and check if the path is accessible.
  • KataConfig Resource Not Found

    The error indicates that the required Custom Resource Definitions (CRDs) are not yet available in the cluster. The KataConfig resource cannot be recognized until the CRDs are installed.

    You must apply the Operatorgroup and Subscription before applying the KataConfig by running the following command:

    oc apply -f Operatorgroup.yaml -f subscription.yaml
  • Image manifest unknown

    This error occurs when the specified image tag is incorrect, missing or does not exist in the container registry.

    You must verify and correct the image tag according to the following example:

    PODVM_IMAGE_URI: "oci::icr.io/ibm_ccco/ibm-ccco-podvm-container-image:1.2.1::/image/ccco-1.2.1.qcow2"
  • Unsupported image path format

    This error occurs when the image path format is incorrect.

    You must ensure the image path follows the correct format according to the following example:

    PODVM_IMAGE_URI: "oci::icr.io/ibm_ccco/ibm-ccco-podvm-container-image:1.2.1::/image/ccco-1.2.1.qcow2"
  • Unauthorized access to image registry

    This error occurs when the cluster cannot authenticate with the image registry.

    You must perform the following steps to reauthenticate:

    1. Extract the current pull secret by running the following command:
      oc get secrets pull-secret -n openshift-config -o template='{{index .data ".dockerconfigjson"}}' | base64 -d | jq
    2. Save the output in config.json file.
    3. Update config.json with valid credentials:
      "icr.io": {
        "auth": "<your-base64-auth>",
        "email": "<your-email>"}
    4. Update the pull secret in a cluster by running the following command:
      oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson="config.json"
    5. Restart the deployment to apply changes:
      oc rollout restart deployement controller-manager -n openshift-sandboxed-containers-operator
  • Invalid Hypervisor section in Kata Configuration

    This error occurs when the configuration.toml file on the worker node contains an unrecognized hypervisor section.

    You must perform the following steps to resolve the error:

    1. Check the worker node IP:
      oc get nodes -o wide
    2. Log in to the worker node:
      ssh core@<worker-node-IP>
    3. Verify the kata-shim version:
      /usr/bin/containerd-shim-kata-v2-tp --version
    4. Check the Red Hat OpenShift Container Platform version on bastion:
      oc version
    5. Upgrade Red Hat OpenShift Container Platform if necessary:
      oc adm upgrade --allow-explicit-upgrade --force=true --to-image=quay.io/openshift-release-dev/ocp-release:<4.16.13>-s390x
      Warning

      Forced upgrades are not advisable, as they are risky and should be used only as a last resort.

  • Failed to create pod sandbox

    This error occurs when there is a timeout due to various reasons, including Pod VM boot issues.

    to investigate the issue.

  • YAML Parsing Error

    This error occurs due to incorrect formatting in the busybox.yaml.

    You must validate the indentation and the structure of busybox.yaml file.

  • Invalid PEM certificate format

    This error occurs when the encryption certificate contains extra characters or has formatting issues.

    You must ensure that the encryption certificate contains only valid PEM blocks, and remove any extra characters or whitespace.

  • Pod restarting frequently

    This error occurs when the valid IBM Cloud Logging (ICL) instance is absent.

    You must verify the availability of the logging endpoint ${PUBLIC_INGRESS_ENDPOINT} in ICL.

  • Contract decryption failure

    This error occurs when the encryption certificate used does not match the one used to encrypt the contract.

    You must ensure the correct version of the encryption certificate is used.

  • Unable to load certificate

    This error occurs when the certificate file is either corrupted, invalid, or in an unsupported format.

    Ensure to use the correct encryption certificate.

  • Logging ingestion failure

    This error occurs when iamApiKey is incorrect or when the logrouter hostname is invalid.

    You must check and correct the information in the env.yaml file.

  • Workload configuration error

    This error occurs when the policy field in the workload.yaml is base64-encoded, but the encoded string is corrupted or invalid. When decoded, it results in unreadable or malformed output.

    Ensure to encrypt the correct policy or use the correct encrypted policy value.

  • Contract signature verification failure

    This error occurs when the signing key in the env.yaml file is tampered or does not match the key used to sign the contract.

    You must check the signingKey field in env.yaml and ensure it matches the private key used to sign the contract.

  • Contract validation error

    This error occurs when the envworkloadsignature has been altered or does not match the actual content of the env.yaml and workload.yaml.

    You must ensure the env.yaml and workload.yaml files are not modified after signing and verify if the envworkloadsignature is correct.

  • Trace called before context set

    This error occurs due to a misconfiguration in configuration.toml. In most cases, the image parameter is missing or incorrectly specified.

    Verify that the configuration.toml file includes a valid image entry and correct any missing or incorrect values.