Troubleshooting deployment issues
Some common issues, limitations, and logs related to deployment.
Common issues and solutions in IBM Cloud Kubernetes Service
- The deployment of IBM Cloud Kubernetes Service template sometimes fails with a
404 page not found
error. It can occur when the cluster endpoints become temporarily unavailable in IBM Cloud during the deployment. Performing a plan and apply resolves the error.
Common issues and solutions in vSphere
-
The following error might occur because of Self-Signed certificates:
provider.vsphere: Error setting up client: Post https://192.168.64.140/sdk: x509: cannot validate certificate for 192.168.64.140 because it doesn't contain any IP SANs
As a resolution, set the parameter "allow_selfsigned_cert" to "true".
-
The following error occurs whenever vSphere virtual machine is in powered on state:
vsphere_virtual_machine.vm_1: The attempted operation cannot be performed in the current state (Powered on).
A virtual machine with same hostname/IP already present.
- To create folder, use create_vm_folder=1. Use integer 1 to create a virtual machine only if it is not already available.
- IPv4 Prefix: ipv4_prefix_length="24" This is a number string for prefix length.
Common issues in Amazon WebServiced Cloud (AWS)
-
The following error message might occur whenever the virtual private cloud (VPC) is not found:
VPC not found
As a resolution, provide a "Name" for VPC and refer to the name in the template as opposed the VPC ID.
-
Errors occur whenever you do not specify unique public keys.
Common issues in IBM Cloud
-
The following error message might occur because of SSH keys:
ibm_compute_ssh_key.orpheus_public_key: Error deleting SSH key: SoftLayer_Exception_Public: SSH key cannot be deleted because it is currently being used in an active transaction. (HTTP 500)Keys
IBM Cloud does not allow you to upload two identical pub keys having the same fingerprints. However, the IBM Cloud Terraform provider does not throw an error instead reuses that key. Issues occur whenever you try to destroy the key resource that is now associated with another running virtual machine.
The solution is to use different keys for each deployment, reference an existing key or retry the destroy/delete to get past the error.
General issues
-
Set a custom maximum number of Terraform jobs to process in parallel per cam-provider-terraform pod.
-
Run the following command to edit the deployment for cam-provider-terraform:
- If isolateRuntime = false, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-api
- If isolateRuntime = true, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-runtime
-
Update the value of MAX_LOCAL_TERRAFORM_JOBS.
containers: - env: - name: MAX_LOCAL_TERRAFORM_JOBS value: "10"
In this example, it is set to 10.
-
-
The IP address might change after you restart a virtual machine, but it might not get reflected on the Managed services user interface. To refresh the resource state including the IP address, either run a plan/apply or use the refresh API to refresh the resource.
-
Failures might occur in the deployment of a Terraform template that executes actions on a Windows image by using a WinRM connection type. As a resolution, check whether "AllowUnencrypted" parameter of WinRM configuration is set to true in the Windows image that is used in the Terraform template. For more information, see Terraform documentation
.
- You might observe a failure whenever you do back-to-back interdependent VMware operations. For example, a "Start" followed by "Shutdown OS" causes a failure because the second action requires VMware Tools that has not got started yet.
-
Managed services on IBM Cloud Pak® for Multicloud Management has a predefined 6 hours deployment window. Do the following steps to add
TERRAFORM_JOB_TIMEOUT_MS
environment variable to the deployment:-
Run the following command to open the deployment in edit mode.
- If isolateRuntime = false, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-api
- If isolateRuntime = true, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-runtime
-
Add the
TERRAFORM_JOB_TIMEOUT_MS
variable with value.For example, the value of
TERRAFORM_JOB_TIMEOUT_MS
is updated to 10 hours in millisecond:name:TERRAFORM_JOB_TIMEOUT_MS value: "36000000"
-
-
During template deployment, you must specify user credentials based on the operating system.
- For IBM Cloud with Red Hat Enterprise Edition operating system, specify "root" as the user.
- For AWS with Red Hat Enterprise Edition operating system, specify "ec2-user" as the user.
- For AWS with Ubuntu operating system, specify "ubuntu" as the user name.
If the appropriate user is not provided for the corresponding operating system, then the deployment fails with an error message.
For example, If you do not provide "root" as the user while deploying a VMware template and the passwordless sudo is also not enabled, then the following error message is displayed:
Error: Response from pattern manager: StatusCode:500 Message: { “message”: “Bootstrap command failed. See the pattern manager logs for more details.“, “rc”: 1, “request_id”: “1c844be316b74601853d2ddde595d94d”, “stderr”: “Creating new client for cam-apache-1\nCreating new node for cam-apache-1\nConnecting to 0.0.0.0\nstty: ‘standard input’: Inappropriate ioctl for device\nstty: ‘standard input’: Inappropriate ioctl for device\nstty: ‘standard input’: Inappropriate ioctl for device”, “stdout”: “0.0.0.0 knife sudo password: \nEnter your password: \r\n0.0.0.0 \r\n0.0.0.0 Sorry, try again.\r \n0.0.0.0 knife sudo password: \n0.0.0.0 \r\n0.0.0.0 Sorry, try again.\r\n0.0.0.0 knife sudo password: \n0.0.0.0 \r\n0.0.0.0 sudo: 3 incorrect password attempts” }
-
You can import templates from GitHub or GitLab and deploy them. However, if you remove or change access token in GitHub or GitLab and then try to destroy instance, the operation fails with the following error message:
GithubClientError: vmware/terraform: failed to retrieve content list from github.; caused by {"message":"Not Found","documentation_url":"https://developer.github.com/v3"}
Ensure that the Managed services has a connection to GitHub/Gitlab when you destroy the instance.
-
CAM deployment stays in Progress - Log File repeating null_resource.clone_git: Still creating..."
The log file can record the template deployment status as in progress for a very long time. To identify the reason for the issue, do the following analysis on the deployed virtual machine wherein the problem occurred:
-
Run the following command to check whether your user has passwordless sudo:
sudo cat /etc/sudoers
When it prompts for a password, enable passwordless sudo for the specific user.
-
If the user does not have sudo privileges, add that user to
/etc/sudoers
. -
Run the following command to check if name server is set up in your machine:
ping github.com
-
Check if the curl or Git commands are installed on the virtual machine. It might be an issue with the apt mirror.
-
-
SSL Certificate error while deploying a template.
To setup Git client locally:
-
Create
.gitconfig
file in the same Git repository as the template. The file should point to the correct CA certificates.[http] sslCAInfo = /home/terraform/certs/ca-bundle.crt
-
When the provider Terraform deploys the template, it downloads the template and all its files from the Git repository, which includes the
.gitconfig
. When the Terraform init is executed, the Git client looks up in the current folder (user's home folder for the current process) and it uses the appropriate CA certificates to download the modules from Git.
-
-
To debug Terraform template deployment, set
TF_LOG
environment variable incam-provider-terraform
pod. You can setTF_LOG
to one of these log levelsTRACE
,DEBUG
,INFO
,WARN
, orERROR
to change the verbosity of the logs.TRACE
is the most verbose log level. This results in the detailed logs to appear onstderr
.-
Run the following command to open the deployment in edit mode:
-
If isolateRuntime = false, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-api
-
If isolateRuntime = true, then run the following command:
kubectl -n management-infrastructure-management edit deployment cam-provider-terraform-runtime
-
-
Add the
TF_LOG
variable with value.name: TF_LOG value: "TRACE"
-
Logs
-
The logs that you can examine are as follows:
- Check the template instance Terraform logs on the UI for Terraform or template errors.
- Check logs in containers for internal Managed services errors wherein the
CAM_logs
is in thecam-logs-pv
path:/CAM_logs/cam-iaas
/CAM_logs/provider-terraform-local