IBM Cloud Schematics is used to provision IBM Cloud resources by describing their configurations in a Terraform template file.
If you have tried to provision IBM Cloud resources by using the user interface or command line, you would have noticed that some of them take a very long time (e.g., IBM Event Streams). In such scenarios, HashiCorp recommends the use of timeout blocks. In any case, you must be aware of the typical time taken by IBM Cloud to provision or configure its cloud resources in order to avoid failures in your automation.
Now, the question is, what if IBM Cloud takes more time to provision or configure than the predefined timeout values in the timeout block? How do Terraform and IBM Cloud Schematics respond to such a situation?
In our experience with the
terraform apply command, we have seen IBM Cloud Schematics exit the automation with a timeout error message. It sounds harmless, but the real problem is that the IBM Cloud service-broker has not stopped the provisioning operation. In other words, even though the IBM Cloud Schematics has reported the timeout error, the IBM Cloud resource (such as IBM Event Streams) will continue to be provisioned in your account, and you will be charged for that IBM Event Streams instance.
As an cloud engineer, once you see the timeout error message in the logs, your natural instinct is to retry the provisioning of the same Terraform template and the resources therein. For the second time, IBM Cloud Schematics will behave in an unexpected manner. It will use the previously saved Terraform state file in the workspace, to re-run the provisioning operation. Unfortunately, Terraform would have marked the IBM Event Streams instance as tainted in the terraform state file due to the timeout error.
The default behaviour of the Terraform engine is to destroy the tainted resource and reprovision it, by default. In other words, the IBM Event Streams instance that is eventually provisioned by the IBM Cloud service-broker — even after IBM Cloud Schematics reported the timeout error — is destroyed again, and IBM Cloud Schematics tries to provision another instance of the IBM Event Streams instance. This results in double your costs.
How can you stop IBM Cloud Schematics from reprovisioning the IBM Cloud resources, due to the “tainted” flag in the Terraform state file?
Handling timeout errors
You should use the following procedure to handle such timeout errors in the IBM Cloud Schematics logs.
- Once you see the timeout block in the IBM Cloud Schematics workspace logs, view the Resources user interface page for the IBM Cloud Schematics workspace and verify whether the corresponding Terraform resource is marked as tainted.
- You should wait few hours and verify whether the IBM Cloud resource is actually getting provisioned in the account. Then, view the respective IBM Cloud resources user interface pages to confirm the provision.
- If the IBM Cloud resource is actually provisioned in the Cloud account after a delay, then you should check the IBM Cloud Schematics untaint CLI command to remove the resource’s taint flag in the state file.
- Finally, run the IBM Cloud Schematics plan and apply command from the user interface or command line to complete the provisioning of the template.
Note: In the last step, IBM Cloud Schematics idempotently provisions the remaining resources in that template.
Based on this experience, you must fine tune the timeout values in the Terraform template so that the same error does not repeat again in another environment. You must normally keep a watch on the following IBM Cloud resources for timeout errors in IBM Cloud Schematics:
- IBM Event Streams
- IBM Cloud Databases
- IBM Cloud Kubernetes Service
- IBM Cloud Satellite