Troubleshooting
Failures are categorized into four stages based on when it occurs in workload migration. Use the following guidelines on how to handle failures in different stages.
- Stage 0, Stage 1, Stage 2, and Stage 3 are applicable to both single-cloud and multi-cloud workload migrations.
- Stage 4 is only supported for single-cloud workload migration, as Windows migration is not supported on multi-cloud systems on IBM® Cloud Pak System Version 2.3.4.
Stage 0
- This stage can be turned off by setting the
premigration_validate
flag to false. - By default, the validation is in
On
state.
- Workload mobility cannot move virtual machines with certain numbers of vCPU
- Sometimes, workload mobility cannot move virtual machines with certain numbers of vCPU.
Therefore, during the migration process, you might see the following error in the
migrate_deployment.log
file:
pooljvm.1638958284531.4764 [12-08-21 11:13:46] 0108 com.ibm.purescale.utils.RESTUtils | postImpl return from post on https://localhost:5011/resources/applicationPatterns/a-0d7eb0bc-e8a3-44d2-a48d-91e78e4e81a2/virtualApplications/ is { "errorMessage": "Invalid application model.<br>Invalid HWAttributes.numvcpus property value in component WIN2016.", "rootCause": "com.ibm.maestro.util.wrapper.exception.ValidationException: Invalid application model.<br>Invalid HWAttributes.numvcpus property value in component WIN2016.", "errorStatusCode": 412, "message": "Invalid application model.<br>Invalid HWAttributes.numvcpus property value in component WIN2016." }
To address this issue, resize the source system to the allowed or valid HWAttributes.numvcpus values. Make sure that you first check and compare the target and source system values and then trigger the migration process.
Stage 1: Creating a request metadata during the instance deployment flow on the target system
- Create a dummy request to deploy the virtual machine.
- Insert the data and allocate resources.
- Add an IP address.
- Use data stores.
- Go to .
- Select your instance and click the delete icon in its details page.
Stage 2
- After the initial stages of workload migration are completed successfully, the
Workload create
job gets triggered. If errors occur in the API, then contact IBM support. The VMware® vCenter usually shows the error and status. In addition, it shows the VMware® jobs in vCenter user interface. If the triggered migration fails in between or is partially complete, then VMware® support team can confirm if and how it can be recovered. - If an instance fails at this stage, the Instance section of the user
interface remains in
Registering
state. -
If the instance UUID of the vCenters is same for both source and target, then the following error might appear due to VMware® migration utility error:
To verify, see https://<VCIP address>/mob/?moid=ServiceInstance&doPath=content%2eaboutERROR: Client received SOAP Fault from server: The object 'vim.ResourcePool:resgroup-6740' has already been deleted or has not been completely created Please see the server log to find more detail regarding exact cause of the failure.
- If the create job of workload migration fails due to the following error, ensure that the
compute nodes that are associated with the Virtual System Instance's cloud group have IPv4 IP
assigned as mentioned in Prerequisites section:
pooljvm.1622736025648.13239 [06-03-21 16:03:51] 0073 workload_migrations.workload_migrations_create | java.lang.NullPointerException: Cannot get property 'ip' on null object workload_migrations.workload_migrations_rack_helper.getHostNameFromRackSystem(workload_migrations_rack_helper.groovy:1493) workload_migrations.workload_migrations_rack_helper.rack2rack_migration_helper_x(workload_migrations_rack_helper.groovy:1064)
Stage 3: Post deployment failure
After the workload migration stage 2 is complete, the virtual machines are moved to target vCenter. However, failure can occur in later stages in IBM Cloud Pak System jobs during post migration updates.
Resolution: Some post migration steps are run on the virtual machine to switch its data from source to target that includes the metadata update of the target. Later, it validates whether all the data is correct and finally deletes source instance entries from the source IBM Cloud Pak System. If it fails in these stages, you must manually investigate the root cause and identify possible solutions on the target IBM Cloud Pak System.
Stage 4
Launching
state for an hour or more: - Log in to the Windows virtual machine.
- On the Windows command prompt, run the following command:
C:\IBM\maestro\maestro.deployment.ui\zero stop
- Click .
- Right-click Monitoring Agent for Windows OS and select Stop.
- Right-click Monitoring Agent for Workloads and select Stop.
- Open Task Manager and end the following processes:
- Right-click the two Python processes individually and click End task.
- Right-click the two IBM Java processes individually and click End task. To verify the process file location, right-click the process and click Open file location.
- Take a backup of the C:\IBM\maestro\agent folder.
- Delete the following folders:
- C:\0config\itlm\foundation
- C:\0config\safemode
- C:\IBM\maestro\agent\safemode
- Duplicate the file C:\0config\vm_is_installed and name it as "update". After you duplicate the file, you must have both these files in the directory C:\0config directory: vm_is_installed and update
- Restart the virtual system instance (VSI) from the IBM Cloud Pak System Software user interface. Navigate to the
Virtual System Instance page, locate the specific virtual system instance, and
click Stop and Start in the right panel. Make sure
that the status of the virtual system instance displays as "
STOPPED
" before you start the instance.
Troubleshooting multi-cloud workload migration issues
- Ignore the migrate_deployment_create job that you triggered on peer system in workload migration of the multi-cloud. If you open the logs for this job it mentions that this flow is only to create the database record for a peer rack...
- To change state of jobs stuck in
Running
state on a peer system, do these steps to update the status:- Make the GET API call and copy all the details as the request body of the
next PUT call as
follows:
PUT https://9.9.9.9/admin/resources/migrate_deployment/%7Bmigrate_deployment_id_from_GET_CALL_Of_migration_in_RUNNING%7D
Note: Remove the isas_rn and version variables from the input payload and changestate=‘running’
tostate=‘failed’
. A successful PUT call fixes the issue.
- Make the GET API call and copy all the details as the request body of the
next PUT call as
follows:
- Make sure that the IP addresses that are assigned to the virtual machine for migration are not
part of the IP groups on the target multi-cloud system. The migration fails with the following error
message:
com.ibm.rainmaker.cloud.CloudException: CWZCL3434E: Duplicate IP addresses cannot be added: {ipaddress=$IP, subnetid=$SUBNET} @ IBM Cloud Pak System $RACK_ID
- Make sure that no duplicate IP address is present on the target or source system after migration. This duplication can cause a network outage.