Troubleshooting Key Management Service

Troubleshoot common Key Management Service issues.

Install the Kubernetes CLI to run the troubleshooting commands. For more information, see Installing the Kubernetes CLI (kubectl).

ContainerCreating status on key-management-hsm-middleware pod

Symptom: ContainerCreating status on key-management-hsm-middleware pod

When you upgrade to IBM Cloud Private 3.2.1, ContainerCreating status shows on the key-management-hsm-middleware pod.

Cause: ContainerCreating status on key-management-hsm-middleware pod

The existing hsm-secret is removed when you upgrade to key-management Helm chart version 3.2.1.

Solution: ContainerCreating status on key-management-hsm-middleware pod

Create the secret after you upgrade the key-management-hsm Helm chart. For more information, see Upgrading KMS Helm charts.

Secret hsm-secret not found error

Symptom: Secret hsm-secret not found error

Installing IBM® Cloud Private {{site.data.keyword.version} does not work. You see error Secret hsm-secret not found.

Cause: Secret hsm-secret not found error

Secret configuration is removed from the key-management-chart. You must now create the secret before you install a Helm chart.

Solution: Secret hsm-secret not found error

Create the secret before you install the key-management-hsm Helm chart. For more information, see Configuring Key Management Service.

UPGRADE FAILED error

Symptom: UPGRADE FAILED error

Upgrading the Helm chart from {{site.data.keyword.previous}} to 3.2.1 does not work. You see the error Error : UPGRADE FAILED.

Cause: UPGRADE FAILED error

You did not specify the overrides.yaml configuration file during Helm upgrade.

Solution: UPGRADE FAILED error

  1. Create a separate overrides.yaml configuration file and specify the new image path for IBM® Cloud Private3.2.1 in the file.

Following is a sample overrides.yaml file:

api:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-api-amd64
    tag: <ICP_VERSION, like 4.1.0>

persistence:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-persistence
    tag: <ICP_VERSION, like 4.1.0>

storage:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-onboarding
    tag: <ICP_VERSION, like 4.1.0>

lifecycle:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-lifecycle
    tag: <ICP_VERSION, like 4.1.0>

pep:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-pep
    tag: <ICP_VERSION, like 4.1.0>

crypto:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-crypto
    tag: <ICP_VERSION, like 4.1.0>

auditService:
  image:
    repository: <CLUSTER_NAME>.icp:8500/ibmcom/icp-audit-service
    tag: <ICP_VERSION, like 4.1.0>
  1. Specify the file when you run the helm upgrade command.
helm upgrade -f overrides.yaml

Key rotation does not work - shows 501 Not Implemented Error

Symptom: Key rotation does not work - shows 501 Not Implemented Error

After you install the key-management-hsm Helm chart, key rotation does not work with Hardware Security Module (HSM). You see the error 501 Not Implemented Error.

Cause: Key rotation does not work - shows 501 Not Implemented Error

Key rotation is supported from 3.2.1 version.

Solution: Key rotation does not work - shows 501 Not Implemented Error

Install the key-management-4.1.0.tgz Helm chart or upgrade the release.

Key operations do not work - show 400 Bad Request Error

Symptom: Key operations do not work - show 400 Bad Request Error

After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 400 Bad Request Error: "Provided API key could not be found".

Cause: Key operations do not work - show 400 Bad Request Error

The kms-api-key data that is contained in the key-management-secret was overwritten to an invalid value of “default_kms_api_key”.

Solution: Key operations do not work - show 400 Bad Request Error

  1. Create a new api-key by following the instructions in API key management APIs.

  2. Encode the key with base64 encryption.

  3. Overwrite the existing data in the kms-api-key section of the secret by using the {{site.data.keyword.console}}.

  4. Restart the pod by removing the key-management-pep pod.

Key operations do not work - show 500 Internal Server Error

Symptom: Key operations do not work - show 500 Internal Server Error

After you install the key-management-hsm Helm chart, you cannot create keys, or wrap or unwrap keys with HSM. You see the error 500 Internal Server Error.

Cause: Key operations do not work - show 500 Internal Server Error

Cleaning up job is not complete due to mismatch of image repository path.

Solution: Key operations do not work - show 500 Internal Server Error

  1. Remove key-management-hsm-cleanup batch job.

    1. Log in to the management console.
    2. From the navigation menu, select Workloads > Jobs > Batch Jobs.
    3. Place the cursor on the key-management-hsm-cleanup batch job.
    4. Click ... > Remove to remove the batch job.
  2. Redeploy the key-management-hsm Helm chart.

Key operations do not work - show 503 Unavailable Experiencing delays error

Symptom: Key operations do not work - show 503 Unavailable Experiencing delays error

After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 503 Service Error “Unavailable Experiencing delays. Please try again in few minutes.”

Cause: Key operations do not work - show 503 Unavailable Experiencing delays error

The HSM that is connected to key-management-hsm-middleware is unavailable or shut down.

Solution: Key operations do not work - show 503 Unavailable Experiencing delays error

  1. Check the status of the HSM to determine if it is offline or if its configuration has been changed.

  2. Restore the original configuration settings to the HSM, if they were changed.

  3. Restart the HSM.

HSM connection does not work on all management nodes

Symptom: HSM connection does not work on all management nodes

HSM connection works on some but not all management nodes.

Cause: HSM connection does not work on all management nodes

The certificate and key pairs are not found on the management nodes on which HSM does not work.

Solution: HSM connection does not work on all management nodes

  1. Install kubectl. For more information, see Installing the Kubernetes CLI (kubectl).
  2. Check the HSM secret to confirm whether the certificate and key pairs are listed for all management nodes.

    kubectl get secret hsm-secret -o yaml --namespace kube-system
    

    The information is available in the following format:

    <master-node-IP>: <BASE64_ENCODED_CERTIFICATE>
    <master-node-IP-key>: <BASE64_ENCODED_KEY>
    

Cannot import root key

You can import root keys only when you use a supported HSM model. SoftHSM is not supported.

For the supported HSM models, see Configuring Key Management Service.

key-management-persistence log reports errors after Key Management Service configuration

Symptom: key-management-persistence log reports errors after Key Management Service configuration

After you configure the Key Management Service, you see errors in the key-management-persistence log.

kubectl logs key-management-persistence-5d6974bf8c-vxxwl --namespace kube-system

Following is a sample output:

2018/11/27 14:31:13 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
{“caller”:“config.go:402",“component”:“config”,“file”:“/opt/keyprotect/config//production”,“location”:“local”,“msg”:“config loaded from local”,“ts”:“2018-11-27T14:31:13.891450032Z”}
{“caller”:“root.go:104",“commit”:“5bbc1228",“component”:“root”,“semver”:“2.1.0",“ts”:“2018-11-27T14:31:15.157576488Z”}
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0

Cause: key-management-persistence log reports errors after Key Management Service configuration

Containers on the management node failed to look up other services on the master node. The routing table was not configured properly because of a configuration issue with the kube-controller.

Solution: key-management-persistence log reports errors after Key Management Service configuration

Update the kube-controller configuration.

Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller

Symptom: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller

When calling https://proxy_ip/, a Kubernetes Ingress Controller Fake Certificate is returned.

Cause: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller

Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ingress controller.

Solution: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller

You can configure --default-ssl-certificate in daemonset nginx-ingress-controller to replace "Kubernetes Ingress Controller Fake Certificate”.

For example:

  1. Create a secret that contains a SSL certificate:
     openssl genrsa -out ing-tls.key 4096
     openssl req -new -key ing-tls.key -out ing-tls.csr -subj "/CN=TTTEEESSSTTT”
     openssl x509 -req -days 36500 -in ing-tls.csr -signkey ing-tls.key -out ing-tls.crt
     kubectl create secret tls ing-tls-secret --cert=ing-tls.crt --key=ing-tls.key -n kube-system
    
  2. Set --default-ssl-certificate in the daemonset nginx-ingress-controller. For example:
     kubectl edit ds -n kube-system nginx-ingress-controller
    
           containers:
           - args:
             - /nginx-ingress-controller
             - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
             - --configmap=$(POD_NAMESPACE)/nginx-ingress-controller
             - --annotations-prefix=ingress.kubernetes.io
             - --enable-ssl-passthrough=true
             - --publish-status-address=172.16.247.161
             - --default-ssl-certificate=$(POD_NAMESPACE)/ing-tls-secret
    
  3. Check the result. For example:

     # ps -ef | grep nginx-ingress-controller | grep default-ssl-certificate
     33       23251 23207  0 22:45 ?        00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret
     33       23308 23251  0 22:45 ?        00:00:02 /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret
    
     # curl -kv  https://172.16.247.161
     * About to connect() to 172.16.247.161 port 443 (#0)
     *   Trying 172.16.247.161...
     * Connected to 172.16.247.161 (172.16.247.161) port 443 (#0)
     * Initializing NSS with certpath: sql:/etc/pki/nssdb
     * skipping SSL peer certificate verification
     * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
     * Server certificate:
     *       subject: CN=TTTEEESSSTTT
     *       start date: May 05 05:44:02 2019 GMT
     *       expire date: Apr 11 05:44:02 2119 GMT
     *       common name: TTTEEESSSTTT
     *       issuer: CN=TTTEEESSSTTT
     > GET / HTTP/1.1
     > User-Agent: curl/7.29.0
     > Host: 172.16.247.161
     > Accept: */*
     >
     < HTTP/1.1 404 Not Found
     < Date: Sun, 05 May 2019 05:49:49 GMT
     < Content-Type: text/plain; charset=utf-8
     < Content-Length: 21
     < Connection: keep-alive
     < Strict-Transport-Security: max-age=15724800; includeSubDomains
     <
     * Connection #0 to host 172.16.247.161 left intact
    

key-management-pep pod not running

Symptom: key-management-pep pod not running

The key-management-pep pod is not running, and displays “CreateContainerConfigError”.

Cause: key-management-pep pod not running

The kms-api-key data inside of the value of the key-management-secret is not valid.

Solution: key-management-pep pod not running

  1. Check the status of secret-watcher pod.

  2. If the pod is running, restart it.

  3. If it is not running, see the troubleshooting guide for the secret watcher service.

For more information, see Pods are not scheduled.