Troubleshooting
Troubleshooting Playbook
Troubleshooting playbook provides an understanding of different issues that could occur relatd to common networking, credential, Migration Agent installation and Data Replication issues, and How that can be troubleshooted further.
Communication Issues
Ensure the ports are open for traffic between source machines and the IBM Live Migration Service manager.
-
Verifying communication over Port 443
Verifying Communication over TCP port 443 between Staging Network and the IBM Live Migration Service Manager.
If there is a connection problem between the Staging area to the IBM Live Migration Service manager, use the following methods to check the connection.
Note 1An indication of a communication issue with the TCP port 443 is a lag in the replication process. However, a lag also due to other reasons too.Note 2If the replication process is started, a communication issue between the Staging area and the IBM Live Migration Service manager is demonstrated in the following steps 4 & 5 of the replication initiation procedure.Step 4 error: Failed to resolve IBM Live Migration Service Manager address in the replication server
The subnet selected for the replication servers is configured to prevent DNS lookups to resolve to https://ibm.cloudendure.com .Ensure that DNS traffic is not blocked.
Create an instance in the Replication Server's subnet and try resolving https://ibm.cloudendure.com either by accessing the destination in the browser or using this command:wgethttps://ibm.cloudendure.comStep 5 error: Failed to authenticate replication server with the IBM Live Migration Service Manager

https://ibm.cloudendure.com not reachable via TCP port 443 by the replication server. Check the subnet selected in Setup & Info > Replication Settings and ensure that TCP port 443 is open from the replication server network.
To verify the integrity of the connection from the replication server to the IBM Live Migration Service manager over TCP port 443:
- Launch a Ubuntu machine in the same subnet and under the same VPC as the replication server.
- On the machine, run this command:
wgethttps://ibm.cloudendure.com
(most platforms)
- If the command fails, there could be a connectivity issue.
-
Solving communication problems over port 443
Solving communication problems over TCP port 443 between the Staging area and IBM Live Migration Service manager.
To solve communication issues, perform the following AWS tasks as per the target cloud used:
-
AWS
- DHCP - Check DHCP options set in the VPC of the Staging Area.
Check IPv4 CIDR, the DHCP options are set, and the Route table, and Network ACL are correct. - DNS - Check if the outbound DNS resolution and connectivity over TCP port 443 is allowed.
- Create an instance in the replication server's subnet
- From this instance, try to:
- Browse https://ibm.cloudendure.com (Windows)
wgethttps://ibm.cloudendure.com (Linux)
-
Route Rules - The Route Rules in the Staging area subnet may be set inaccurately. The Route Rules should allow outbound traffic to the Internet.
To check and set Route Rules in the Staging area subnet:-
Sign in to AWS console, click on Services and select VPC under Networking & Content Delivery.

-
In the VPC Dashboard toolbar, select Route Tables option.
-
In the Route Tables page, check the box for Route Table in the Staging area.

-
This opens details in the Route Table. Navigate to Routes tab.

-
Within the target column in the Routes tab, select the route used for outbound communication to the Internet (either igw - Internet Gateway, vgw - VPN or i - EC2 instance). Verify the address space in the destination column is covered in the IBM Live Migration Service IPs and URLs.
NoteIBM Live Migration Service AWS-specific IPs and URLs include: 50.19.114.132, 13.52.54.28, s3.amazonaws.com, s3.us-east-2.amazonaws.com and outbound access to the EC2 endpoint of the AWS region.
-
If the address is not 0.0.0.0/0, change it to 0.0.0.0/0. Click Edit button.

-
Input 0.0.0.0/0 in to the destination field for the exact Target. Click Save.
NoteIf using VPN, enter a specific IP address range in the destination column.
-
-
Network ACL - The network ACL in the Staging area subnet may block traffic. Verify if ephemeral ports are open.
To solve communication issues, perform the following GCP tasks as per the target cloud used:
- DHCP - Check DHCP options set in the VPC of the Staging Area.
-
GCP
- DNS - Ensure you allow for outbound DNS resolution and connectivity over TCP port 443.
- Create an instance in the replication server's subnet
- From this instance, try to:
- Browse https://ibm.cloudendure.com (Windows)
- wget https://ibm.cloudendure.com (Linux)
-
Route Rules - The route rules in the staging area subnet may be set inaccurately. Route rules should allow outbound traffic to the internet.
To check and set route rules in the staging area subnet:-
Within the Google Cloud Platform --> Products & services button and navigate to Networking --> VPC network --> Routes.

-
In the Routes page, verify that the route with the 0.0.0.0/0 Destination IP Range is set as Default internet gateway under Next hop.

-
If no such route exists, create a route. Click CREATE ROUTE in the Routes menu.

-
Give the new route a Name and an optional Description in the corresponding fields.Choose the correct Network for the route from the dropdown menu.Enter 0.0.0.0/0 as the Destination IP Range.Enter a number for Route Priority and the optional Instance tags in the corresponding fields. Select the Default internet gateway from the dropdown menu under Next hop. Click Create.

-
To solve communication issues, perform the following Azure tasks as per the target cloud used:
- DNS - Ensure you allow for outbound DNS resolution and connectivity over TCP port 443.
-
Azure
-
DNS - Allow the outbound DNS resolution and connectivity over TCP port 443.
- Create an instance in the replication server's subnet
-
From this instance, try to:
- Browse https://ibm.cloudendure.com (Windows)
- wget https://ibm.cloudendure.com (Linux)
-
Network Security Groups - Configure a Network Security Group which allows inbound access on TCP port 1500 and 443 and outbound access on TCP port 443.
-
Subnet - Configure a subnet associated with the created Network Security Group. The subnet should be a virtual network for which the correct DNS servers are configured (either Azure-provided or custom). If using custom servers, make sure that the network security group allows access to these DNS servers and allows outbound access to TCP port 53 to resolve external addresses.
-
-
Ensure TCP port 1500 is open to traffic between Source machines and the Staging Area.
-
Verifying communication over Port 1500
If there is a connection issue from the source machine to replication servers or the Staging area, use these methods to check the connection.To verify the integrity of the connection from a source machine to the staging area over TCP port 1500 :
- Launch a new Linux instance in the staging area subnet.
- In the new Linux instance, run this command to open a listener in the staging area subnet:
nc -l 1500 -
In the source machine, run this command to check connectivity:
telnet <new instance ip> 1500NoteIf Use VPN… box is checked in replication settings, then use the private IP of the new Linux machine. If the Use VPN… box is unchecked, then use the public IP.If command fails, then there is a connectivity issue. To fix this, check:
-
AWS
- The Network ACL in the target subnet
- Route rules in the target subnet
- Firewall (both internal and external) in the source
-
Verify if checked/unchecked the Use VPN... checkbox properly

-
GCP
- VPC Routes in the target subnet
- Firewall (both internal and external) in the source
-
Verify if checked/unchecked the Use VPN... checkbox properly

-
Azure
- Network security groups in Azure
- Firewall (both internal and external) in the source
-
Verify if checked/unchecked the Use VPN... checkbox properly

-
Solving communication problems over Port 1500
To solve connectivity issues between source machines and the staging area, check these:- [For AWS only] The Network ACL in the staging area subnet may deny traffic.
- Route rules in the staging area subnet is inaccurately set.
- Firewalls, both internal and external, in the source machine/infrastructure may block communication.
-
The Use VPN... checkbox in IBM Live Migration Service user console may not be set properly.
Enabling the Network ACL
[For AWS only] The network ACL in the staging area subnet may block connectivity. By default, the network ACL allows connectivity. However, if ACL settings were changed to deny traffic, reverse changes made.
To check and enable the network ACL in the staging area subnet:
-
Sign in to the AWS console, click Services and select VPC under Networking & Content Delivery.

-
In the Resources, select Network ACL option:

-
In the Network ACL page, select the check box next to the network ACL of the staging area.

-
In the details table of the selected Network ACL, select Inbound Rules tab.

-
In the Inbound Rules tab, verify that the rule that determines the traffic to replication server subnet is set to Allow.
NoteThe below firewall rule allows traffic on TCP port 1500 from the address space of the Source environment. Network ACL need not be open to all Port Ranges, as captured in the screenshot below:
-
If the rule is set to Deny, click Edit.

-
Click the dropdown under Allow/Deny and select Allow. Click Save.

-
Check the Ephemeral Ports in the Outbound Rules tab. In the same Network ACL, navigate to the Outbound Rules tab.

-
Ensure that the correct Ephemeral Port Range is allowed. Ephemeral Port range varies based on each client's operating system . Click Edit to edit the Ephemeral Port's Port Range.

-
Edit the Port Range and click Save. You may have to create a new rule by clicking Add another rule.

Setting Route Rules on the Staging Area Subnet
-
AWS
To check and set Route Rules in the staging area subnet in AWS:
-
Sign in to AWS console, click Services and select VPC under Networking & Content Delivery.

-
In the VPC Dashboard toolbar, select Route Tables option.

-
In the Route Tables page, check the box of the Route Table of the staging network.

-
This expands the details in the route table. Navigate to Routes tab.

-
Within the Target column of the Routes tab, find the route used for the inbound traffic from the Source on TCP port 1500 (either igw - Internet Gateway, vgw - VPN or i - EC2 instance). Verify the Destination address is 0.0.0.0/0.
NoteThe Rule may be specific to the address space of Source machines. -
If the address is not 0.0.0.0/0, change it to 0.0.0.0/0.
NoteThe Rule may be specific to the address space of the Source machines. -
Click Edit.

-
Input 0.0.0.0/0 in the Destination field for the actual target. Click Save.
NoteIf using VPN, enter a specific IP address range in the destination column.
Firewall (both internal and external) in the Source machine/infrastructure
There are several causes for firewall issues. Check for these, if experiencing any firewall issues, such as, Windows firewall connection issues:

-
-
All platforms
- Ensure the subnet assigned for the replication servers still exists.
- AWS
- Ensure the IAM policy is properly set IBM Live Migration Service IAM policy
- Ensure selecting a specific subnet in Setup & Info > Replication Settings if you do not have a default VPC or if your AWS account is a EC2-classic account.
- GCP
- Check if the IAM roles are correct
- compute.instanceAdmin
- compute.securityAdmin
- compute.storageAdmin
- compute.networkAdmin
- Edit/Owner
- Check if the IAM roles are correct
-
Azure(ARM)
- Check to see if the assigned IAM roles and permissions are correct
- Contributor role
- Application ID and Authentication Key
Setting the Use Private Connection Checkbox
- Check to see if the assigned IAM roles and permissions are correct
-
Access the Use VPN… checkbox by navigating to Setup & Info > Replication Settings in the project.

-
Select this checkbox if the replicated data has to be transmitted from the source machine to the staging area over a private connection.
NoteUse this option if a VPN connection is available. Selecting this check box does not create a new private connection. -
You should use this option if you want to:
- Allocate dedicated bandwidth for replication
- Use another level of encryption
- Add another security layer by transferring replicated data from one private IP address (source) to another private IP address (target).
You can safely switch between a private connection and a public one, by checking or unchecking the Use VPN… check box, even after replication has started. This switch causes a very short pause in replication and there is no long-term effect on replication. Whitelist all the IBM Live Migration Service IPs in the firewall for Port 443.
IPs to whitelist
Ensure that you have whitelisted all of the CloudEndure IPs in your firewall for Port 443.

Credentials
Troubleshooting issue: Credentials provided are either invalid or have insufficient permissions.
AWS
If AWS credentials entered do not exist or invalid, or if the IAM policy created and attached to the user does not contain required permissions, this error message is displayed:

- In this case, attempt these troubleshooting steps:
- Verify the IAM policy created is identical to the IBM Live Migration pre-defined policy. Limiting permissions of the policy prevents the management and monitoring of AWS resources. This is required for the IBM Live Migration Service (as illustrated in the Creating a Policy for IBM Live Migration Service section).
- Verify the IAM user has a Programmatic access type (as illustrated in the Creating a New IAM User and Generating AWS Credentials section).
- Verify that the correct policy is attached to the IAM user (as illustrated in the Creating a New IAM User and Generating AWS Credentials section).
- Retry these steps Generating the Required AWS Credentialssection with the most latest/updated credentials within the AWS console.
GCP
If GCP credentials entered do not exist or invalid, or if the IAM role of the Private Key is NOT set to Owner, this error message is displayed:

- In this case, attempt these troubleshooting steps:
- Verify the Billing and Compute Engine are enabled in the GCP project (as illustrated in the Preparing the GCP Account section).
- Verify if the correct Project ID is entered and NOT the project name (as illustrated in the Obtaining the GCP Project ID section).
- Check the IAM role of the Private Key. If required, change it to owner (as illustrated in the Creating a GCP Service Account and Private Keysection).
- Create a new private key (as illustrated in the Regenerating the GCP Private Key section).
- Retry steps in the Generating the Required GCP Credentials section with the most latest/updated credentials within the GCP console.
Azure
If Azure ARM credentials do not exist or invalid, or the IAM role of the AD application used for IBM Live Migration Service is NOT a Contributor, this error message is displayed:

- In this case, attempt these troubleshooting steps:
- Verify that the new AD application is assigned to the Contributor role in your Azure (ARM) subscription (as illustrated in the Assigning the Required Role to the Application section).
- Retry steps in the Generating Required Azure Credentials section with the most latest/updated credentials within the Azure portal.
Agent Installation
Check if the machines meet prerequisites for agent installation.
Troubleshooting - Error: Installation Failed
When installation of a IBM Live Migration Service agent on a source machines fails during running of the IBM Live Migration Service agent installer file, this message is displayed:
Installation was not finished successfully. Please contact Support for assistance.

This type of error indicates that the agent was not installed in the source machine, and the machine will not appear in the IBM Live Migration Service user console. After the issue is fixed , run the agent Installer file again to install the Agent.
Installation Failed - Old Agent
Installation may fail due to an old IBM Live Migration Service agent. Always install the latest version of the IBM Live Migration agent. You learn how to download the Agent here.
-
Linux
Ensure there is enough free disk space
Free Disk
Free disk space in the root directory - verify that there is at least 1GB free disk space in the root directory (/) of the source machine for installation. To check available disk space in the root directory, run the following command:df -h /Free disk space in the /tmp directory - for duration of the installation process only, verify that there is at least 500MB of free disk space on the /tmp directory. To check available disk space on the /tmp directory, run the following command:
df -h /tmpAfter having entered these commands to check the available disk space, the results are displayed as:

Check format of the disks list to be replicated
During installation, when entering the disks to be replicated, do NOT use apostrophes, brackets, or disk paths that do not exist. Enter only existing disk paths, and separate them with a comma, as follows:/dev/xvdal,/dev/xvda2Check if the correct Kernel headers package is installed
Verify the exact kernel-devel/linux-headers are installed and it is the same version as the kernel you are running.The version number of kernel headers should be identical to the version number of the kernel. To ensure this, follow these steps:
-
Identify the version of the running kernel
To identify the version of the running kernel, run this command: uname -r
The uname -r output version should match the version of one of the installed kernel headers packages (kernel-devel-
/ linux-headers-<version number). -
Identify the version of the kernel-devel/linux-headers
To identify the version of the kernel in use, run the following command:On RHEL/CENTOS/ORACLE/SUSE:
rpm -qa | grep kernel
NoteThis command looks for kernel-devel.On Debian/Ubuntu: apt-cache search linux-headers
-
Verify the folder containing the kernel-devel/linux-headers is not a symbolic link.
Sometimes, the kernel content-devel/linux-headers, matching the version of the kernel, is a symbolic link., Then remove the link before installing the required package.To verify the folder containing the kernel-devel/linux-headers is not a symbolic link, run this command:
On RHEL/CENTOS/ORACLE/SUSE:
ls -l /usr/src/kernels
On Debian/Ubuntu:ls -l /usr/src
In the above example, the results show that linux-headers are not a symbolic link.
-
If a symbolic link exists, Delete the symbolic link
If the content of the kernel-devel/linux-headers, matching the version of the kernel, is actually a symbolic link, delete the link. Run this command:
rm /usr/src/<LINK NAME>
For example:rm /usr/src/linux-headers-4.4.1 -
Install the correct kernel-devel/linux-headers from the repositories.
If none of the already installed kernel-devel/linux-headers packages match the running kernel version, install the matching package.NoteSeveral kernel headers versions can run simultaneously in the OS. Therefore safely install new kernel headers packages in addition to the existing kernel headers (without uninstalling the other versions of the package.) A new kernel headers package does not impact the kernel and does not overwrite on the older versions of the kernel headers.NoteFor everything to work, install a kernel headers package with the exact same version number of the running kernel.To install the correct kernel-devel/linux-headers, run the following command:
On RHEL/CENTOS/Oracle/SUSE:sudo yum install kernel-devel-uname -r``On Debian/Ubuntu:
sudo apt-get install linux-headers-uname -r`` -
If no matching package was found, download the matching kernel-devel/linux-headers package
If no matching package is found in the repositories configured in the machine, download it manually from the internet and install.
To download the matching kernel-devel/linux-headers package, navigate to these sites:
- Debian package directory
-
Check the make, openssl, wget, curl, gcc, and build-essential packages are installed and stored in the current path.
NoteUsually, these packages are not required for agent installation. However, in some cases when installation fails, installing these packages solves the problem.If installation failed, the make, openssl, wget, curl, gcc, and build-essential packages should be installed and stored in the current path.
To verify the existence and location of required packages, run this command:
which <package>For e.g., to locate the make package:
Which make
Check the /tmp directory is mounted without the no exec option.
Verify that the /tmp directory is mounted in a way allowing to run scripts and applications from it.
To verify that the /tmp directory is mounted without the noexec option, run this command:
sudo mount | grep '/tmp'If the result is similar to the example below, it means that the issue exists in the OS:
/dev/xvda1 on /tmp type ext4 (rw,noexec)To fix and remove the no exec option from the mounted /tmp directory, run this command:
sudo mount -o remount,exec /tmpThis example illustrates this troubleshooting procedure:

-
-
Windows
Check that .Net Framework 3.5+ is installed on the machine.
-[for replicating machines to AWS cloud only]
Verify that .NET Framework Version 3.5 or above is installed on the Windows source machines
Verify that the machine has at least 1GB of free disk space
Verify that there is 1GB of free disk space in the root directory (C:) of the source machine for installation.
Check the net.exe and/or the sc.exe files are included in the PATH variable.
Verify the net.exe and/or sc.exe files, located by default in the C:\Windows\System32 folder, are included in the PATH Environment Variable.
- Navigate to the Control Panel-->System and Security-->System-->Advanced system settings.
-
In the System Properties dialog box Advanced tab, click Environment Variables.

-
In the System Variables section, of the Environment Variables pane, select Path variable. Click Edit to view contents.

-
In the Edit System Variable pane, review defined paths in the Variable value field. If path of the net.exe and/or sc.exe files do not appear here, manually add it to the Variable value field and click OK.

(GCP Only) - Check the unsigned drivers are supported in Windows 2003 SP2 or Windows 2003 R2.
(For replicating the machine to GCP cloud only) - if the source machine is a Windows 2003 SP2 or Windows 2003 R2 (Windows 2003 SP1 is not supported), installation of unsigned drivers has to be supported:
-
Right-click on My Computer
- Select Properties to open System Properties
- In the System Properties dialog box, select the Hardware tab
- Click Driver Signing
- Select Ignore - Install the software anyway and don't ask for my approval
- Select Make this action the system default
-
Click OK
Error: urlopen error [Errno 110] Connection times out
This error occurs when outbound traffic is not allowed over the TCP port 443. Port 443 must be open outbound to the IBM Live Migration Service manager located in the console. https://ibm.cloudendure.com/ (50.19.144.132)

Multipath / Powerpath support
-
Verify if the source machine has multipath configured run:
- multipath -l
- powermt check
- If so, make sure to run the installer with this parameter:
- --force-volumes
-
Specify the high-level logical device only, for e.g.:
- /dev/dm-0, /dev/cciss/c0d0
Error: You need to have root privileges to run this script

Please make sure the installer is either run as root or by adding sudo at the beginning: sudo python installer_linux.py
Troubleshooting - Agent is installed successfully but not running properly
Verify why the Agent is not running properly.
There are cases where installation was completed successfully, but the migration agent does not run at all or does not run properly in the source machine. These problems occur in the IBM Live Migration Service user console in the Machines page.
By default, when installation is completed successfully, replication of the source machines start automatically. (Unless the agent was installed with a stopped flag), and you monitor it through the IBM Live Migration Service user console. When replication starts, the first message appearing in the DATA REPLICATION PROGRESS column is Establish communication between IBM Live Migration Service Agent and Replication Server.

Usually, this message appears for a shorter duration, 5 minutes at the most. It is then replaced by other messages, which indicate subsequent steps of the replication progress.
If the Establish communication between the... message appears in the user console for over 5 minutes, it means that the agent failed to run or never established communication with the IBM Live Migration Service manager. This occurs because:
- The agent was installed successfully, but it is not running.
- Only in environments that are using a proxy for the agent-console communication] The agent was installed successfully, it is running properly, but it cannot communicate with the IBM Live Migration Service manager.
For all issues that occurs in this state, first check if the agent is installed on the source machine.

The Agent is Installed but Not Running
In certain cases, the agent is installed successfully in the source machine, but does not run. To handle this installation issue, follow these steps:
- Verify the agent is running in the source machine.
- If agent is not running, move to the next step.
- If the agent is running, contact IBM Live Migration Service support.
- Run the agent manually.
- If the agent has started running, the issue is solved. Verify that the source machine appears in the user console now.
- If the agent is still not running, move to the next step.
- Verify there is enough RAM in the source machine to run the agent. You need at least 200MB RAM.
- If there is not enough RAM, increase the RAM size.
- If there is enough RAM, contact IBM Live Migration Service support.
Verify the Agent is Running in the Source Machine
If the agent is installed in the source machine, but data replication does not start, check if the agent is running.
Checking if the agent is running:
-
Linux
- In the source machine, enter this command:
ps -ef | grep cloudendure | grep -v grep | grep -v bash | wc -l
The results are as follows:
5 = The agent is fully running
Less than 5 = Some components may not be running -
If the agent is running, the following results are displayed:

-
In this case, contact IBM Live Migration Service support.
If the agent is not running, try running it manually, as described in the next section.
- In the source machine, enter this command:
-
Windows
-
In the source machine, open the Task Manager - Services tab, and locate CloudEndureService. Verify the service is Running in the Status column.

-
If the Agent is running, contact the IBM Live Migration Service support.
- If the Agent is not running, try running it manually, as described below.
-
Running the Agent Manually:
If the Agent is installed in the source machine but is not running, try running it manually.
- Linux
- In the source machine, enter this command:
sudo /var/lib/cloudendure/runAgent.sh
- In the source machine, enter this command:
-
Windows
- In the source machine, navigate to the Control Panel --> System and Security --> Administrative Tools --> Services.
-
In the Services pane, locate CloudEndureService. Then, right-click and select Start from the pop-up menu.

-
If the agent fails to run, verify there is enough RAM in the source machine to run the Agent. You need at least 200MB RAM.
Agent is Running but Cannot Communicate with the IBM Live Migration Service Manager
[Only for environments using a proxy for agent-console communication]
If the agent is installed and running, but data replication does not start, there might be a communication problem between the agent and IBM Live Migration Service manager. This problem usually happens when using a proxy to communicate between the agent and the IBM Live Migration Service manager over TCP port 443.
- To solve this issue, follow these steps:
- Check configuration of the connecting proxy, and fix configuration, if necessary.
- Restart the agent.
Configuring the Proxy between the Source Machine and the IBM Live Migration Service Manager
Establish communication between the source machines and the IBM Live Migration Service manager over TCP port 443 in two ways:
- Direct communication between the source machines and the IBM Live Migration Service manager,
-
Indirect communication using a proxy.
Whitelist console https://ibm.cloudendure.com/ for both SSL Interception and Authentication.
-
To use a proxy, these environment variables should be configured in the source machines:
For HTTPs proxy, use - https://server-ip:port/??NoteThe value must end with '/'.-
Proxy for Linux
The required environment variables should be configured for all users and remain persistent between reboots.Utilize the https_proxy environment variable to set the proxy server. Learn more about Environmental Variables.
- Proxy for Windows
The required environment variables should be configured at the system level.
-
To configure environment variables at the Windows system level:
- Navigate to Control Panel --> System and Security --> System --> Advanced system settings.
-
In the Advanced Tab of the System Properties dialog box, click Environment Variables.

-
In the System Variables section of the Environment Variables pane, click New to add the https_proxy environment variable or Edit if the variable already exists.
NoteUser Variables should not be modified. -
Enter https://PROXY_ADDR:PROXY_PORT/ in the Variable value field. Click OK.
NoteProxy authentication is not supported with environmental variables.NoteIf the Environment Variable was created after the IBM Live Migration Service agent was installed, then the agent should be restarted. To restart the agent:Windows: Restart the service called CloudEndureService
Linux: Run the following commands:
/var/lib/IBM Live Migration/stopAgent.sh, var/lib/IBM Live Migration/runAgent.shIf the proxy is reconfigured, restart the agent, as described here.
To restart the Agent:

Troubleshooting - (Proxy environments only) - agent is running but cannot communicate with the IBM Live Migration Service manager
[Only for environments using a proxy for agent-console communication]
If the agent is installed and running, but data replication does not start, there may be a communication problem between the agent and IBM Live Migration Service manager. This problem usually occurs when using a proxy for communication between the agent and the IBM Live Migration Service manager over TCP port 443.
- To solve this issue, follow these steps:
- Check the configuration of the connecting proxy, and fix configuration if necessary.
- Restart the Agent.
Troubleshooting - Error: [Errno 110] Connection timed out
Contact IBM Live Migration Service Support
Troubleshooting - Error: You need to have root privileges to run this script.
Make sure you have root access or sudo
Troubleshooting - Unable to install Agent on the Oracle Linux.
When installing the IBM Live Migration agent on Oracle Linux, either install wget to run the installer prompt, or alternatively, use curl instead of wget.
Ex. curl-O ./installer_linux.py https://username.IBM Live Migration.com/installer_linux.py
Agent - Console Proxy Troubleshooting
Data Replication
Troubleshooting issues - Initial Replication Step errors
-
Firewall rules creation failed

Possible causes for failure:
The credentials may not have enough permissions to manage required resources on the target cloud.-
To provide sufficient permissions:
- Change your credentials - either generate new credentials with the required permissions, or change permissions of the existing credentials.
- If new credentials are generated, navigate to the Setup & Info --> CREDENTIALS page, enter the new credentials, and save. The subnet selected for launching the Replication Servers may no longer exist.
-
To select an appropriate subnet for the Replication Servers/Staging Area:
- Navigate to Setup & Info --> REPLICATION SETTINGS.
-
From the Replication Servers section - Choose the subnet… dropdown list, select an existing subnet for the Replication Servers:

-
Check the IAM policy (AWS) or IAM Roles (GCP - compute.instanceAdmin, compute.securityAdmin, compute.storageAdmin, compute.networkAdmin, or Editor/Owner) are correct.
-
-
Failed to create a replication server

Possible causes for failure
- Insufficient permissions in the provided Cloud Credentials (navigate to Setup & Info > Credentials).
- The subnet configured for Replication Servers does not exist.
- The number of machines limit reached in the target infrastructure, as defined by the cloud provider.
- Check if the IAM policy is correct and not over-the-quota for 't2.small' instances (AWS), ' n1-standard-1' (GCP) or 'Standard_F*' (Azure) in the Staging Area region.
-
Failed to boot replication server

Possible causes for failure
- Insufficient permissions in the provided Cloud Credentials (navigate to Setup & Info > Credentials).
- Check the IAM policy (AWS) or IAM Roles (GCP - compute.instanceAdmin, compute.securityAdmin, compute.storageAdmin, compute.networkAdmin, or Editor/Owner) are correct.
-
Failed to resolve IBM Live Migration Service Manager address in the replication server

Possible causes for failure:
-
The subnet selected for the Replication Servers is configured in a way that it prevents DNS lookups to resolve to console, https://ibm.cloudendure.com/. Please make sure that DNS traffic is not blocked. Create an instance in the Replication Server's subnet and try resolve console, https://ibm.cloudendure.com/ either by accessing the destination on the browser or using the following command:
wget https://ibm.cloudendure.com/
-
-
Failed to authenticate the replication server with the IBM Live Migration Service Manager

Possible causes for failure:
-
https://ibm.cloudendure.com is not reachable via the TCP port 443 from the Replication Server. Check the subnet selected in Setup & Info > Replication Settings and ensure that TCP port 443 can be opened from the Replication Server.
To verify the integrity of the connection from the Replication Server to the IBM Live Migration Service Manager over TCP port 443:
-
Launch a Ubuntu machine in the same subnet which was selected in the Replication Settings screen under the same VPC.
In the machine, run the following command:
wget https://ibm.cloudendure.com(most platforms)If the command fails, there is a connectivity issue.
-
-
Failed to download the IBM Live Migration Service replication software to the replication server

Possible causes for failure
- Please make sure that outgoing traffic to the Replication Server download located is not blocked to the management.
- Create an instance in the Replication Server's subnet.
- From within that instance try either:
- Browse to s3.amazonaws.com or s3.us-east-2.amazonawas.com
- wget s3.amazonaws.com or s3.us-east-2.amazonawas.com
If unable to access the sites, there is a connectivity issue.
-
Failed to create staging disks

Possible causes for failure
- Insufficient permissions in the provided cloud credentials (navigate to Setup & Info > Credentials).
- You reached the maximum storage limit in the target region.
- (AWS)Your IBM Live Migration Service Account is configured to use encrypted EBS disks, but the IAM user does not have permissions to encrypt using the selected KMS key.
-
Failed to attach the staging disks to the replication server

Possible causes for failure
- (AWS) The IBM Live Migration account is configured to use encrypted EBS disks, but IAM user does not have permissions to encrypt using the selected KMS key.
-
Failed to pair the IBM Live Migration Service Agent with the replication server
Please contact IBM Live Migration Support if encountering this error.
-
Failed to establish communication between the IBM Live Migration Agent and the replication server

Possible causes for failure:
There are cases when installation is indicated as successful, but progress is stopped in the last step of the replication initiation sequence - Establish communication between the IBM Live Migration Service Agent and the Replication Server. This means that the Agent could not establish communication with the IBM Live Migration Service Manager.
- A firewall, either in the Source or on the target side, is preventing the IBM Live Migration Service Agent from reaching the Replication Server on TCP port 1500.
- To verify, create a new instance on the Replication Server subnet and run
nc -l 1500. - From the Source machine run: telnet
<new instance ip>1500.
If this test fails, then check: - The Network ACL on the target subnet.
- Route rules on the target subnet.
- Firewall permissions (internal/external) in the b
- To verify, create a new instance on the Replication Server subnet and run
- The Use VPN… checkbox was mistakenly selected or deselected in Setup & Info > Replication Settings. Troubleshooting Issue - Finalized Initial Sync stuck in Data Replication Progress.
NoteAfter fixing the issue, it may take up to 30 minutes to re-establish communication. - A firewall, either in the Source or on the target side, is preventing the IBM Live Migration Service Agent from reaching the Replication Server on TCP port 1500.
Troubleshooting Issue - Finalized Initial Sync Stuck in Data Replication Progress

Possible solutions include:
A backlog may be preventing sync from starting. Please wait until backlog is flushed for sync to initialize.

Check if the IAM policy is correct (AWS)
Please make sure you are not over-quota for Recovery Point creation.
At times, the cloud vendors' APIs can have latency overheads, so patience is sought.
Troubleshooting issue - target machine (replica) creation failed

Possible solutions include:
- Look for errors in the Audit Log in the IBM Live Migration Service user console.
- Please make sure you are not over-quota on disks (from your chosen kind), instances (from your chosen kind) or snapshots (Need to have at least space for an additional set of disks).
- Check if the IAM policy is correct. If the target machine launch takes longer than expected (more than an hour), this means that something changed in the connectivity between the subnet and console, https://ibm.cloudendure.com/. Either DNS no longer resolves it or the TCP port 443 does not allow connectivity.
Replication Speed/Lag issues
Potential solutions:
- Ensure the Source machine is up and running.
- Ensure that IBM Live Migration Service services are up and running.
- Ensure that the outbound TCP port 1500 is not blocked from the source to the Replication Server.
- If the MAC address of the source had changed, it requires a reinstallation of the Migration Agent.
- If the Source machine was rebooted recently, or the IBM Live Migration Service services were restarted, the disks are re-read again and till it's finished, the Lag grows.
- If the Source machine had a spike in write operations, the Lag grows until IBM Live Migration Service manages to flush all written data to the target Replication Server.
Testing replication speed:
The replication speed depends on four key factors:
- The uplink speed from the server to the Replication Server and bandwidth available.
- The overall disk storage.
- The changes in the disk while replicating.
- I/O speed of the storage itself.
To test the uplink speed, check the iperf3 utility as follows:
- Install a vanilla Linux machine (m4.xlarge on AWS, Basic_A2 on Azure or n1-standard-1 on GCP) in the same subnet as the IBM Live Migration Service Replication Servers.
- On that machine, install iperf3 utility using: sudo apt-get install iperf3
- Then run: iperf3 -s -p 1500
- On the source server, install iperf3 as well:
- Windows: Download the right zip file from here and extract it.
- Linux: Install as mentioned above.
-
In the terminal window, run:
iperf3.exe -i 10 -c Vanilla_linux_server_ip -p 1500 -t 60 > iperf60.logiperf3.exe -i 10 -c Vanilla_linux_server_ip -p 1500 -t 1800 > iperf1800.log
or
iperf3 -i 10 -c Vanilla_linux_server_ip -p 1500 -t 60 > iperf60.logiperf3 -i 10 -c Vanilla_linux_server_ip -p 1500 -t 1800 > iperf1800.logHere is a sample output:

In this output, notice the uplink is 23.4Mbps which means, a 100GB (idle) server should be replicated in about 10 hours. For e.g. use this calculator.
If the server writes to disk an average of 20GB/day, take into account when calculating as follows: 20GB/day --> ~2Mbps --> which leaves only about 21.4Mbps available space for the initial 100GB.
NoteMake sure the Security Group (AWS), Network Security Group (Azure), and Firewall Rules (GCP) are configured permitting connectivity on the inbound port 1500.
(AWS only) Replication stuck at 0% - C5 and M5 instances
NVM drivers AWS Issue - C5 and M5 instances only
An AWS bug in the NVM drivers on the C5 and M5 instance types may cause replication issues. Users who encounter this bug will have Data Replication stuck at 0% on the IBM Live Migration Service user console.

To verify if NVM drivers are working properly in the source machine, run: Get-Disk | Select Number, Serial Number | Sort-Object Number
Ensure that the output matches the output shown in the AWS documentation.
If
there is a driver issue, the output will only contain zeros.
If there is an issue, the output will contain only zeros.
Number Serial Number
0 0000_0000_0000_0000.
1 0000_0000_0000_0000.
2 0000_0000_0000_0000.
3 0000_0000_0000_0000.
4 0000_0000_0000_0000.
5 0000_0000_0000_0000.
6 0000_0000_0000_0000.
If this is the case, something is not malfunctioning in
the machine as it does not felt the correct EBS volume IDs.
Fix this issue by reinstalling the NVM drivers.
Once the issue is fixed and the expected output is returned, Data Replication starts.