Troubleshooting Key Management Service
Troubleshoot common Key Management Service issues.
Install the Kubernetes CLI to run the troubleshooting commands. For more information, see Installing the Kubernetes CLI (kubectl).
ContainerCreatingstatus onkey-management-hsm-middlewarepod- Secret hsm-secret not found error
- UPGRADE FAILED error
- Key rotation does not work - shows 501 Not Implemented Error
- Key operations do not work - show 400 Bad Request Error
- Key operations do not work - show 500 Internal Server Error
- Key operations do not work - show 503 Unavailable Experiencing delays error
- HSM connection does not work on all management nodes
- Cannot import root key
- key-management-persistence log reports errors after Key Management Service configuration
- Kubernetes Ingress Controller Fake Certificate is returned by NGINX ingress controller
- key-management-pep pod not running
ContainerCreating status on key-management-hsm-middleware pod
Symptom: ContainerCreating status on key-management-hsm-middleware pod
When you upgrade to IBM Cloud Private 3.2.1, ContainerCreating status shows on the key-management-hsm-middleware pod.
Cause: ContainerCreating status on key-management-hsm-middleware pod
The existing hsm-secret is removed when you upgrade to key-management Helm chart version 3.2.1.
Solution: ContainerCreating status on key-management-hsm-middleware pod
Create the secret after you upgrade the key-management-hsm Helm chart. For more information, see Upgrading KMS Helm charts.
Secret hsm-secret not found error
Symptom: Secret hsm-secret not found error
Installing IBM® Cloud Private {{site.data.keyword.version} does not work. You see error Secret hsm-secret not found.
Cause: Secret hsm-secret not found error
Secret configuration is removed from the key-management-chart. You must now create the secret before you install a Helm chart.
Solution: Secret hsm-secret not found error
Create the secret before you install the key-management-hsm Helm chart. For more information, see Configuring Key Management Service.
UPGRADE FAILED error
Symptom: UPGRADE FAILED error
Upgrading the Helm chart from {{site.data.keyword.previous}} to 3.2.1 does not work. You see the error Error : UPGRADE FAILED.
Cause: UPGRADE FAILED error
You did not specify the overrides.yaml configuration file during Helm upgrade.
Solution: UPGRADE FAILED error
- Create a separate
overrides.yamlconfiguration file and specify the new image path for IBM® Cloud Private3.2.1 in the file.
Following is a sample overrides.yaml file:
api:
image:
repository: mycluster.icp:8500/ibmcom/kms-api-amd64
tag: <ICP_VERSION, like 4.1.0>
persistence:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-persistence
tag: <ICP_VERSION, like 4.1.0>
storage:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-onboarding
tag: <ICP_VERSION, like 4.1.0>
lifecycle:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-lifecycle
tag: <ICP_VERSION, like 4.1.0>
pep:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-pep
tag: <ICP_VERSION, like 4.1.0>
crypto:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/kms-crypto
tag: <ICP_VERSION, like 4.1.0>
auditService:
image:
repository: <CLUSTER_NAME>.icp:8500/ibmcom/icp-audit-service
tag: <ICP_VERSION, like 4.1.0>
- Specify the file when you run the
helm upgradecommand.
helm upgrade -f overrides.yaml
Key rotation does not work - shows 501 Not Implemented Error
Symptom: Key rotation does not work - shows 501 Not Implemented Error
After you install the key-management-hsm Helm chart, key rotation does not work with Hardware Security Module (HSM). You see the error 501 Not Implemented Error.
Cause: Key rotation does not work - shows 501 Not Implemented Error
Key rotation is supported from 3.2.1 version.
Solution: Key rotation does not work - shows 501 Not Implemented Error
Install the key-management-4.1.0.tgz Helm chart or upgrade the release.
Key operations do not work - show 400 Bad Request Error
Symptom: Key operations do not work - show 400 Bad Request Error
After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 400 Bad Request Error: "Provided API key could not be found".
Cause: Key operations do not work - show 400 Bad Request Error
The kms-api-key data that is contained in the key-management-secret was overwritten to an invalid value of “default_kms_api_key”.
Solution: Key operations do not work - show 400 Bad Request Error
-
Create a new
api-keyby following the instructions in API key management APIs. -
Encode the key with base64 encryption.
-
Overwrite the existing data in the
kms-api-keysection of the secret by using the {{site.data.keyword.console}}. -
Restart the pod by removing the
key-management-peppod.
Key operations do not work - show 500 Internal Server Error
Symptom: Key operations do not work - show 500 Internal Server Error
After you install the key-management-hsm Helm chart, you cannot create keys, or wrap or unwrap keys with HSM. You see the error 500 Internal Server Error.
Cause: Key operations do not work - show 500 Internal Server Error
Cleaning up job is not complete due to mismatch of image repository path.
Solution: Key operations do not work - show 500 Internal Server Error
-
Remove
key-management-hsm-cleanupbatch job.- Log in to the management console.
- From the navigation menu, select Workloads > Jobs > Batch Jobs.
- Place the cursor on the
key-management-hsm-cleanupbatch job. - Click ... > Remove to remove the batch job.
-
Redeploy the
key-management-hsmHelm chart.
Key operations do not work - show 503 Unavailable Experiencing delays error
Symptom: Key operations do not work - show 503 Unavailable Experiencing delays error
After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 503 Service Error “Unavailable Experiencing delays. Please try again in few minutes.”
Cause: Key operations do not work - show 503 Unavailable Experiencing delays error
The HSM that is connected to key-management-hsm-middleware is unavailable or shut down.
Solution: Key operations do not work - show 503 Unavailable Experiencing delays error
-
Check the status of the HSM to determine if it is offline or if its configuration has been changed.
-
Restore the original configuration settings to the HSM, if they were changed.
-
Restart the HSM.
HSM connection does not work on all management nodes
Symptom: HSM connection does not work on all management nodes
HSM connection works on some but not all management nodes.
Cause: HSM connection does not work on all management nodes
The certificate and key pairs are not found on the management nodes on which HSM does not work.
Solution: HSM connection does not work on all management nodes
- Install
kubectl. For more information, see Installing the Kubernetes CLI (kubectl). -
Check the HSM secret to confirm whether the certificate and key pairs are listed for all management nodes.
kubectl get secret hsm-secret -o yaml --namespace kube-systemThe information is available in the following format:
<master-node-IP>: <BASE64_ENCODED_CERTIFICATE> <master-node-IP-key>: <BASE64_ENCODED_KEY>
Cannot import root key
You can import root keys only when you use a supported HSM model. SoftHSM is not supported.
For the supported HSM models, see Configuring Key Management Service.
key-management-persistence log reports errors after Key Management Service configuration
Symptom: key-management-persistence log reports errors after Key Management Service configuration
After you configure the Key Management Service, you see errors in the key-management-persistence log.
kubectl logs key-management-persistence-5d6974bf8c-vxxwl --namespace kube-system
Following is a sample output:
2018/11/27 14:31:13 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
{“caller”:“config.go:402",“component”:“config”,“file”:“/opt/keyprotect/config//production”,“location”:“local”,“msg”:“config loaded from local”,“ts”:“2018-11-27T14:31:13.891450032Z”}
{“caller”:“root.go:104",“commit”:“5bbc1228",“component”:“root”,“semver”:“2.1.0",“ts”:“2018-11-27T14:31:15.157576488Z”}
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Cause: key-management-persistence log reports errors after Key Management Service configuration
Containers on the management node failed to look up other services on the master node. The routing table was not configured properly because of a configuration issue with the kube-controller.
Solution: key-management-persistence log reports errors after Key Management Service configuration
Update the kube-controller configuration.
Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller
Symptom: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller
When calling https://proxy_ip/, a Kubernetes Ingress Controller Fake Certificate is returned.
Cause: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller
Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ingress controller.
Solution: Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller
You can configure --default-ssl-certificate in daemonset nginx-ingress-controller to replace "Kubernetes Ingress Controller Fake Certificate”.
For example:
- Create a secret that contains a SSL certificate:
openssl genrsa -out ing-tls.key 4096 openssl req -new -key ing-tls.key -out ing-tls.csr -subj "/CN=TTTEEESSSTTT” openssl x509 -req -days 36500 -in ing-tls.csr -signkey ing-tls.key -out ing-tls.crt kubectl create secret tls ing-tls-secret --cert=ing-tls.crt --key=ing-tls.key -n kube-system - Set
--default-ssl-certificatein the daemonset nginx-ingress-controller. For example:kubectl edit ds -n kube-system nginx-ingress-controllercontainers: - args: - /nginx-ingress-controller - --default-backend-service=$(POD_NAMESPACE)/default-http-backend - --configmap=$(POD_NAMESPACE)/nginx-ingress-controller - --annotations-prefix=ingress.kubernetes.io - --enable-ssl-passthrough=true - --publish-status-address=172.16.247.161 - --default-ssl-certificate=$(POD_NAMESPACE)/ing-tls-secret -
Check the result. For example:
# ps -ef | grep nginx-ingress-controller | grep default-ssl-certificate 33 23251 23207 0 22:45 ? 00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret 33 23308 23251 0 22:45 ? 00:00:02 /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret# curl -kv https://172.16.247.161 * About to connect() to 172.16.247.161 port 443 (#0) * Trying 172.16.247.161... * Connected to 172.16.247.161 (172.16.247.161) port 443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * skipping SSL peer certificate verification * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 * Server certificate: * subject: CN=TTTEEESSSTTT * start date: May 05 05:44:02 2019 GMT * expire date: Apr 11 05:44:02 2119 GMT * common name: TTTEEESSSTTT * issuer: CN=TTTEEESSSTTT > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: 172.16.247.161 > Accept: */* > < HTTP/1.1 404 Not Found < Date: Sun, 05 May 2019 05:49:49 GMT < Content-Type: text/plain; charset=utf-8 < Content-Length: 21 < Connection: keep-alive < Strict-Transport-Security: max-age=15724800; includeSubDomains < * Connection #0 to host 172.16.247.161 left intact
key-management-pep pod not running
Symptom: key-management-pep pod not running
The key-management-pep pod is not running, and displays “CreateContainerConfigError”.
Cause: key-management-pep pod not running
The kms-api-key data inside of the value of the key-management-secret is not valid.
Solution: key-management-pep pod not running
-
Check the status of secret-watcher pod.
-
If the pod is running, restart it.
-
If it is not running, see the troubleshooting guide for the secret watcher service.
For more information, see Pods are not scheduled.