Troubleshooting

Troubleshooting Playbook

Troubleshooting playbook provides an understanding of different issues that could occur relatd to common networking, credential, Migration Agent installation and Data Replication issues, and How that can be troubleshooted further.

ImportantThe information provided in this section is for general troubleshooting guidance only. The information is provided on "AS IS" basis, with no guarantee of completeness, accuracy or timeliness, and without warranty or representations of any kind, expressed or implied. In no event will IBM and/or its subsidiaries and/or their employees or service providers be liable to you or anyone else for any decision made or action taken in reliance on the information provided above or for any direct, indirect, consequential, special or similar damages (including any kind of loss), even if advised of the possibility of such damages. IBM is not responsible for the update, validation or support of troubleshooting information.

Communication Issues

Ensure the ports are open for traffic between source machines and the IBM Live Migration Service manager.

  1. Establishing communication over Port 443

  2. Configuring communication over Port 443

  3. Verifying communication over Port 443

    Verifying Communication over TCP port 443 between Staging Network and the IBM Live Migration Service Manager.

    If there is a connection problem between the Staging area to the IBM Live Migration Service manager, use the following methods to check the connection.

    Note 1An indication of a communication issue with the TCP port 443 is a lag in the replication process. However, a lag also due to other reasons too.
    Note 2If the replication process is started, a communication issue between the Staging area and the IBM Live Migration Service manager is demonstrated in the following steps 4 & 5 of the replication initiation procedure.

    Step 4 error: Failed to resolve IBM Live Migration Service Manager address in the replication server

    Failed to resolve CloudEndure

    The subnet selected for the replication servers is configured to prevent DNS lookups to resolve to https://ibm.cloudendure.com .Ensure that DNS traffic is not blocked.
    Create an instance in the Replication Server's subnet and try resolving https://ibm.cloudendure.com either by accessing the destination in the browser or using this command:

    wget https://ibm.cloudendure.com

    Step 5 error: Failed to authenticate replication server with the IBM Live Migration Service Manager

    Failed to authenticate repliction

    https://ibm.cloudendure.com not reachable via TCP port 443 by the replication server. Check the subnet selected in Setup & Info > Replication Settings and ensure that TCP port 443 is open from the replication server network.

    To verify the integrity of the connection from the replication server to the IBM Live Migration Service manager over TCP port 443:

    • Launch a Ubuntu machine in the same subnet and under the same VPC as the replication server.
    • On the machine, run this command:
    • If the command fails, there could be a connectivity issue.
  4. Solving communication problems over port 443

    Solving communication problems over TCP port 443 between the Staging area and IBM Live Migration Service manager.

    To solve communication issues, perform the following AWS tasks as per the target cloud used:

    • AWS

      • DHCP - Check DHCP options set in the VPC of the Staging Area.
        Check IPv4 CIDR, the DHCP options are set, and the Route table, and Network ACL are correct.
      • DNS - Check if the outbound DNS resolution and connectivity over TCP port 443 is allowed.
      • Route Rules - The Route Rules in the Staging area subnet may be set inaccurately. The Route Rules should allow outbound traffic to the Internet.
        To check and set Route Rules in the Staging area subnet:

        • Sign in to AWS console, click on Services and select VPC under Networking & Content Delivery.

          Services-VPC

        • In the VPC Dashboard toolbar, select Route Tables option.

          Route tables

        • In the Route Tables page, check the box for Route Table in the Staging area.

          Route table check box

        • This opens details in the Route Table. Navigate to Routes tab.

          Detailed Route table

          Routes tab

        • Within the target column in the Routes tab, select the route used for outbound communication to the Internet (either igw - Internet Gateway, vgw - VPN or i - EC2 instance). Verify the address space in the destination column is covered in the IBM Live Migration Service IPs and URLs.

          NoteIBM Live Migration Service AWS-specific IPs and URLs include: 50.19.114.132, 13.52.54.28, s3.amazonaws.com, s3.us-east-2.amazonaws.com and outbound access to the EC2 endpoint of the AWS region.

          Internet gateway

        • If the address is not 0.0.0.0/0, change it to 0.0.0.0/0. Click Edit button.

          Edit tab

        • Input 0.0.0.0/0 in to the destination field for the exact Target. Click Save.

          Save button

          NoteIf using VPN, enter a specific IP address range in the destination column.
      • Network ACL - The network ACL in the Staging area subnet may block traffic. Verify if ephemeral ports are open.

      To solve communication issues, perform the following GCP tasks as per the target cloud used:

    • GCP

      • DNS - Ensure you allow for outbound DNS resolution and connectivity over TCP port 443.
      • Route Rules - The route rules in the staging area subnet may be set inaccurately. Route rules should allow outbound traffic to the internet.
        To check and set route rules in the staging area subnet:

        • Within the Google Cloud Platform --> Products & services button and navigate to Networking --> VPC network --> Routes.

          Navigating to routes

        • In the Routes page, verify that the route with the 0.0.0.0/0 Destination IP Range is set as Default internet gateway under Next hop.

          0.0.0.0/0 Destination IP Range

        • If no such route exists, create a route. Click CREATE ROUTE in the Routes menu.

          Create route

        • Give the new route a Name and an optional Description in the corresponding fields.Choose the correct Network for the route from the dropdown menu.Enter 0.0.0.0/0 as the Destination IP Range.Enter a number for Route Priority and the optional Instance tags in the corresponding fields. Select the Default internet gateway from the dropdown menu under Next hop. Click Create.

          Default internet gateway

      To solve communication issues, perform the following Azure tasks as per the target cloud used:

    • Azure

      • DNS - Allow the outbound DNS resolution and connectivity over TCP port 443.

      • Network Security Groups - Configure a Network Security Group which allows inbound access on TCP port 1500 and 443 and outbound access on TCP port 443.

      • Subnet - Configure a subnet associated with the created Network Security Group. The subnet should be a virtual network for which the correct DNS servers are configured (either Azure-provided or custom). If using custom servers, make sure that the network security group allows access to these DNS servers and allows outbound access to TCP port 53 to resolve external addresses.

Ensure TCP port 1500 is open to traffic between Source machines and the Staging Area.

  1. Calculating the bandwidth needed for Port 1500

  2. Verifying communication over Port 1500

    If there is a connection issue from the source machine to replication servers or the Staging area, use these methods to check the connection.

    To verify the integrity of the connection from a source machine to the staging area over TCP port 1500 :

    • Launch a new Linux instance in the staging area subnet.
    • In the new Linux instance, run this command to open a listener in the staging area subnet: nc -l 1500
    • In the source machine, run this command to check connectivity: telnet <new instance ip> 1500

      NoteIf Use VPN… box is checked in replication settings, then use the private IP of the new Linux machine. If the Use VPN… box is unchecked, then use the public IP.

      If command fails, then there is a connectivity issue. To fix this, check:

    • AWS

      • The Network ACL in the target subnet
      • Route rules in the target subnet
      • Firewall (both internal and external) in the source
      • Verify if checked/unchecked the Use VPN... checkbox properly

        AWS-Use VPN

    • GCP

      • VPC Routes in the target subnet
      • Firewall (both internal and external) in the source
      • Verify if checked/unchecked the Use VPN... checkbox properly

        GCP-Use VPN

    • Azure

      • Network security groups in Azure
      • Firewall (both internal and external) in the source
      • Verify if checked/unchecked the Use VPN... checkbox properly

        Azure-Use VPN

  3. Solving communication problems over Port 1500

    To solve connectivity issues between source machines and the staging area, check these:
    • [For AWS only] The Network ACL in the staging area subnet may deny traffic.
    • Route rules in the staging area subnet is inaccurately set.
    • Firewalls, both internal and external, in the source machine/infrastructure may block communication.
    • The Use VPN... checkbox in IBM Live Migration Service user console may not be set properly.

      Enabling the Network ACL

      [For AWS only] The network ACL in the staging area subnet may block connectivity. By default, the network ACL allows connectivity. However, if ACL settings were changed to deny traffic, reverse changes made.

      To check and enable the network ACL in the staging area subnet:

    • Sign in to the AWS console, click Services and select VPC under Networking & Content Delivery.

      Services-VPC

    • In the Resources, select Network ACL option:

      1 network ACL

    • In the Network ACL page, select the check box next to the network ACL of the staging area.

      Network ACL checkbox

    • In the details table of the selected Network ACL, select Inbound Rules tab.

      Inbound rules

    • In the Inbound Rules tab, verify that the rule that determines the traffic to replication server subnet is set to Allow.

      NoteThe below firewall rule allows traffic on TCP port 1500 from the address space of the Source environment. Network ACL need not be open to all Port Ranges, as captured in the screenshot below:

      Open Network ACL to all Port Ranges

    • If the rule is set to Deny, click Edit.

      Edit button

    • Click the dropdown under Allow/Deny and select Allow. Click Save.

      Save button

    • Check the Ephemeral Ports in the Outbound Rules tab. In the same Network ACL, navigate to the Outbound Rules tab.

      Outbond Rules tab

    • Ensure that the correct Ephemeral Port Range is allowed. Ephemeral Port range varies based on each client's operating system . Click Edit to edit the Ephemeral Port's Port Range.

      Edit tab

    • Edit the Port Range and click Save. You may have to create a new rule by clicking Add another rule.

      Editing port range and adding another rule

      Setting Route Rules on the Staging Area Subnet

    • AWS

      To check and set Route Rules in the staging area subnet in AWS:

      • Sign in to AWS console, click Services and select VPC under Networking & Content Delivery.

        Services VPC

      • In the VPC Dashboard toolbar, select Route Tables option.

        Route tables

      • In the Route Tables page, check the box of the Route Table of the staging network.

        Route tables checkbox

      • This expands the details in the route table. Navigate to Routes tab.

        Routes tab

      • Within the Target column of the Routes tab, find the route used for the inbound traffic from the Source on TCP port 1500 (either igw - Internet Gateway, vgw - VPN or i - EC2 instance). Verify the Destination address is 0.0.0.0/0.

        Internet gateway

        NoteThe Rule may be specific to the address space of Source machines.
      • If the address is not 0.0.0.0/0, change it to 0.0.0.0/0.

        NoteThe Rule may be specific to the address space of the Source machines.
      • Click Edit.

        Edit button

      • Input 0.0.0.0/0 in the Destination field for the actual target. Click Save.

        Save button

        NoteIf using VPN, enter a specific IP address range in the destination column.

      Firewall (both internal and external) in the Source machine/infrastructure

      There are several causes for firewall issues. Check for these, if experiencing any firewall issues, such as, Windows firewall connection issues:

      Windows Firewall

    • All platforms

      • Ensure the subnet assigned for the replication servers still exists.
    • AWS
      • Ensure the IAM policy is properly set IBM Live Migration Service IAM policy
      • Ensure selecting a specific subnet in Setup & Info > Replication Settings if you do not have a default VPC or if your AWS account is a EC2-classic account.
    • GCP
      • Check if the IAM roles are correct
        • compute.instanceAdmin
        • compute.securityAdmin
        • compute.storageAdmin
        • compute.networkAdmin
        • Edit/Owner
    • Azure(ARM)

      Setting the Use Private Connection Checkbox

    • Access the Use VPN… checkbox by navigating to Setup & Info > Replication Settings in the project.

      Setup Info

    • Select this checkbox if the replicated data has to be transmitted from the source machine to the staging area over a private connection.

      NoteUse this option if a VPN connection is available. Selecting this check box does not create a new private connection.
    • You should use this option if you want to:

      • Allocate dedicated bandwidth for replication
      • Use another level of encryption
      • Add another security layer by transferring replicated data from one private IP address (source) to another private IP address (target).

      You can safely switch between a private connection and a public one, by checking or unchecking the Use VPN… check box, even after replication has started. This switch causes a very short pause in replication and there is no long-term effect on replication. Whitelist all the IBM Live Migration Service IPs in the firewall for Port 443.

IPs to whitelist

Ensure that you have whitelisted all of the CloudEndure IPs in your firewall for Port 443.

Firewall address range

Credentials

Troubleshooting issue: Credentials provided are either invalid or have insufficient permissions.

AWS

If AWS credentials entered do not exist or invalid, or if the IAM policy created and attached to the user does not contain required permissions, this error message is displayed:

AWS error message

GCP

If GCP credentials entered do not exist or invalid, or if the IAM role of the Private Key is NOT set to Owner, this error message is displayed:

Google cloud platform error message

Azure

If Azure ARM credentials do not exist or invalid, or the IAM role of the AD application used for IBM Live Migration Service is NOT a Contributor, this error message is displayed:

Microsoft Azure error message

Agent Installation

Check if the machines meet prerequisites for agent installation.

Prerequisites

Troubleshooting - Error: Installation Failed

When installation of a IBM Live Migration Service agent on a source machines fails during running of the IBM Live Migration Service agent installer file, this message is displayed:

Installation was not finished successfully. Please contact Support for assistance.

IBM Live Migration Service Agent installation error message

This type of error indicates that the agent was not installed in the source machine, and the machine will not appear in the IBM Live Migration Service user console. After the issue is fixed , run the agent Installer file again to install the Agent.

Installation Failed - Old Agent

Installation may fail due to an old IBM Live Migration Service agent. Always install the latest version of the IBM Live Migration agent. You learn how to download the Agent here.

  1. Linux

    Ensure there is enough free disk space
    Free Disk
    Free disk space in the root directory - verify that there is at least 1GB free disk space in the root directory (/) of the source machine for installation. To check available disk space in the root directory, run the following command: df -h /

    Free disk space in the /tmp directory - for duration of the installation process only, verify that there is at least 500MB of free disk space on the /tmp directory. To check available disk space on the /tmp directory, run the following command: df -h /tmp

    After having entered these commands to check the available disk space, the results are displayed as:

    Available disk space

    Check format of the disks list to be replicated
    During installation, when entering the disks to be replicated, do NOT use apostrophes, brackets, or disk paths that do not exist. Enter only existing disk paths, and separate them with a comma, as follows: /dev/xvdal,/dev/xvda2

    Check if the correct Kernel headers package is installed
    Verify the exact kernel-devel/linux-headers are installed and it is the same version as the kernel you are running.

    The version number of kernel headers should be identical to the version number of the kernel. To ensure this, follow these steps:

    • Identify the version of the running kernel
      To identify the version of the running kernel, run this command: uname -r

      Version of running kernel

      The uname -r output version should match the version of one of the installed kernel headers packages (kernel-devel- / linux-headers-<version number).

    • Identify the version of the kernel-devel/linux-headers
      To identify the version of the kernel in use, run the following command:

      On RHEL/CENTOS/ORACLE/SUSE: rpm -qa | grep kernel

      version of the kernel-devel/linux-headers

      NoteThis command looks for kernel-devel.

      On Debian/Ubuntu: apt-cache search linux-headers

    • Verify the folder containing the kernel-devel/linux-headers is not a symbolic link.
      Sometimes, the kernel content-devel/linux-headers, matching the version of the kernel, is a symbolic link., Then remove the link before installing the required package.

      To verify the folder containing the kernel-devel/linux-headers is not a symbolic link, run this command:

      On RHEL/CENTOS/ORACLE/SUSE: ls -l /usr/src/kernels
      On Debian/Ubuntu: ls -l /usr/src

      kernel-devel/linux-headers is not a symbolic

      In the above example, the results show that linux-headers are not a symbolic link.

    • If a symbolic link exists, Delete the symbolic link
      If the content of the kernel-devel/linux-headers, matching the version of the kernel, is actually a symbolic link, delete the link. Run this command:
      rm /usr/src/<LINK NAME>
      For example: rm /usr/src/linux-headers-4.4.1

    • Install the correct kernel-devel/linux-headers from the repositories.
      If none of the already installed kernel-devel/linux-headers packages match the running kernel version, install the matching package.

      NoteSeveral kernel headers versions can run simultaneously in the OS. Therefore safely install new kernel headers packages in addition to the existing kernel headers (without uninstalling the other versions of the package.) A new kernel headers package does not impact the kernel and does not overwrite on the older versions of the kernel headers.
      NoteFor everything to work, install a kernel headers package with the exact same version number of the running kernel.

      To install the correct kernel-devel/linux-headers, run the following command:
      On RHEL/CENTOS/Oracle/SUSE: sudo yum install kernel-devel-uname -r``

      On Debian/Ubuntu: sudo apt-get install linux-headers-uname -r``

    • If no matching package was found, download the matching kernel-devel/linux-headers package

      If no matching package is found in the repositories configured in the machine, download it manually from the internet and install.

      To download the matching kernel-devel/linux-headers package, navigate to these sites:

    • RHEL, CENTOS, Oracle, and SUSE package directory

    • Debian package directory
    • Ubuntu package directory

      Check the make, openssl, wget, curl, gcc, and build-essential packages are installed and stored in the current path.

      NoteUsually, these packages are not required for agent installation. However, in some cases when installation fails, installing these packages solves the problem.

      If installation failed, the make, openssl, wget, curl, gcc, and build-essential packages should be installed and stored in the current path.

      To verify the existence and location of required packages, run this command: which <package>

      For e.g., to locate the make package: Which make

      which make

      Check the /tmp directory is mounted without the no exec option.

      Verify that the /tmp directory is mounted in a way allowing to run scripts and applications from it.

      To verify that the /tmp directory is mounted without the noexec option, run this command: sudo mount | grep '/tmp'

      If the result is similar to the example below, it means that the issue exists in the OS: /dev/xvda1 on /tmp type ext4 (rw,noexec)

      To fix and remove the no exec option from the mounted /tmp directory, run this command: sudo mount -o remount,exec /tmp

      This example illustrates this troubleshooting procedure:

      /tmp

  2. Windows

    Check that .Net Framework 3.5+ is installed on the machine.

    -[for replicating machines to AWS cloud only]

    Verify that .NET Framework Version 3.5 or above is installed on the Windows source machines

    Verify that the machine has at least 1GB of free disk space

    Verify that there is 1GB of free disk space in the root directory (C:) of the source machine for installation.

    Check the net.exe and/or the sc.exe files are included in the PATH variable.

    Verify the net.exe and/or sc.exe files, located by default in the C:\Windows\System32 folder, are included in the PATH Environment Variable.

    • Navigate to the Control Panel-->System and Security-->System-->Advanced system settings.
    • In the System Properties dialog box Advanced tab, click Environment Variables.

      System properties

    • In the System Variables section, of the Environment Variables pane, select Path variable. Click Edit to view contents.

      Environment variables

    • In the Edit System Variable pane, review defined paths in the Variable value field. If path of the net.exe and/or sc.exe files do not appear here, manually add it to the Variable value field and click OK.

      Edit system variables

      (GCP Only) - Check the unsigned drivers are supported in Windows 2003 SP2 or Windows 2003 R2.

      (For replicating the machine to GCP cloud only) - if the source machine is a Windows 2003 SP2 or Windows 2003 R2 (Windows 2003 SP1 is not supported), installation of unsigned drivers has to be supported:

    • Right-click on My Computer

    • Select Properties to open System Properties
    • In the System Properties dialog box, select the Hardware tab
    • Click Driver Signing
    • Select Ignore - Install the software anyway and don't ask for my approval
    • Select Make this action the system default
    • Click OK

      Error: urlopen error [Errno 110] Connection times out

      This error occurs when outbound traffic is not allowed over the TCP port 443. Port 443 must be open outbound to the IBM Live Migration Service manager located in the console. https://ibm.cloudendure.com/ (50.19.144.132)

      urlopen error

      Multipath / Powerpath support

    • Verify if the source machine has multipath configured run:

      • multipath -l
      • powermt check
    • If so, make sure to run the installer with this parameter:
      • --force-volumes
    • Specify the high-level logical device only, for e.g.:

      • /dev/dm-0, /dev/cciss/c0d0

      Error: You need to have root privileges to run this script

      root privileges

      Please make sure the installer is either run as root or by adding sudo at the beginning: sudo python installer_linux.py

Troubleshooting - Agent is installed successfully but not running properly

Verify why the Agent is not running properly.

There are cases where installation was completed successfully, but the migration agent does not run at all or does not run properly in the source machine. These problems occur in the IBM Live Migration Service user console in the Machines page.

By default, when installation is completed successfully, replication of the source machines start automatically. (Unless the agent was installed with a stopped flag), and you monitor it through the IBM Live Migration Service user console. When replication starts, the first message appearing in the DATA REPLICATION PROGRESS column is Establish communication between IBM Live Migration Service Agent and Replication Server.

Data replication progress column

Usually, this message appears for a shorter duration, 5 minutes at the most. It is then replaced by other messages, which indicate subsequent steps of the replication progress.

If the Establish communication between the... message appears in the user console for over 5 minutes, it means that the agent failed to run or never established communication with the IBM Live Migration Service manager. This occurs because:

For all issues that occurs in this state, first check if the agent is installed on the source machine.

NoteSometimes, the message - Establish communication between the… appears in the user console in red, and a red icon appears in the STATUS column. This usually means there is a connectivity issue. In this case, verify all network requirements are met.

Establishing communication

The Agent is Installed but Not Running

In certain cases, the agent is installed successfully in the source machine, but does not run. To handle this installation issue, follow these steps:

Verify the Agent is Running in the Source Machine

If the agent is installed in the source machine, but data replication does not start, check if the agent is running.

Checking if the agent is running:

Running the Agent Manually:

If the Agent is installed in the source machine but is not running, try running it manually.

Agent is Running but Cannot Communicate with the IBM Live Migration Service Manager

[Only for environments using a proxy for agent-console communication]

If the agent is installed and running, but data replication does not start, there might be a communication problem between the agent and IBM Live Migration Service manager. This problem usually happens when using a proxy to communicate between the agent and the IBM Live Migration Service manager over TCP port 443.

Configuring the Proxy between the Source Machine and the IBM Live Migration Service Manager

ImportantMake sure that the corporate firewall allows connections over TCP port 443.

Establish communication between the source machines and the IBM Live Migration Service manager over TCP port 443 in two ways:

To configure environment variables at the Windows system level:

Troubleshooting - (Proxy environments only) - agent is running but cannot communicate with the IBM Live Migration Service manager

[Only for environments using a proxy for agent-console communication]

If the agent is installed and running, but data replication does not start, there may be a communication problem between the agent and IBM Live Migration Service manager. This problem usually occurs when using a proxy for communication between the agent and the IBM Live Migration Service manager over TCP port 443.

Troubleshooting - Error: [Errno 110] Connection timed out

Contact IBM Live Migration Service Support

Troubleshooting - Error: You need to have root privileges to run this script.

Make sure you have root access or sudo

Troubleshooting - Unable to install Agent on the Oracle Linux.

When installing the IBM Live Migration agent on Oracle Linux, either install wget to run the installer prompt, or alternatively, use curl instead of wget.

Ex. curl-O ./installer_linux.py https://username.IBM Live Migration.com/installer_linux.py

Agent - Console Proxy Troubleshooting

Data Replication

Troubleshooting issues - Initial Replication Step errors

  1. Firewall rules creation failed

    Firewall rules creation failed

    Possible causes for failure:
    The credentials may not have enough permissions to manage required resources on the target cloud.

    • To provide sufficient permissions:

      • Change your credentials - either generate new credentials with the required permissions, or change permissions of the existing credentials.
      • If new credentials are generated, navigate to the Setup & Info --> CREDENTIALS page, enter the new credentials, and save. The subnet selected for launching the Replication Servers may no longer exist.
    • To select an appropriate subnet for the Replication Servers/Staging Area:

      • Navigate to Setup & Info --> REPLICATION SETTINGS.
      • From the Replication Servers section - Choose the subnet… dropdown list, select an existing subnet for the Replication Servers:

        Replication servers

      • Check the IAM policy (AWS) or IAM Roles (GCP - compute.instanceAdmin, compute.securityAdmin, compute.storageAdmin, compute.networkAdmin, or Editor/Owner) are correct.

  2. Failed to create a replication server

    Failed to create a replication server

    Possible causes for failure

    • Insufficient permissions in the provided Cloud Credentials (navigate to Setup & Info > Credentials).
    • The subnet configured for Replication Servers does not exist.
    • The number of machines limit reached in the target infrastructure, as defined by the cloud provider.
    • Check if the IAM policy is correct and not over-the-quota for 't2.small' instances (AWS), ' n1-standard-1' (GCP) or 'Standard_F*' (Azure) in the Staging Area region.
  3. Failed to boot replication server

    Failed to boot replication server

    Possible causes for failure

    • Insufficient permissions in the provided Cloud Credentials (navigate to Setup & Info > Credentials).
    • Check the IAM policy (AWS) or IAM Roles (GCP - compute.instanceAdmin, compute.securityAdmin, compute.storageAdmin, compute.networkAdmin, or Editor/Owner) are correct.
  4. Failed to resolve IBM Live Migration Service Manager address in the replication server

    Failed to resolve CloudEndure

    Possible causes for failure:

    • The subnet selected for the Replication Servers is configured in a way that it prevents DNS lookups to resolve to console, https://ibm.cloudendure.com/. Please make sure that DNS traffic is not blocked. Create an instance in the Replication Server's subnet and try resolve console, https://ibm.cloudendure.com/ either by accessing the destination on the browser or using the following command:

      wget https://ibm.cloudendure.com/

  5. Failed to authenticate the replication server with the IBM Live Migration Service Manager

    Failed to authenticate the replication

    Possible causes for failure:

    • https://ibm.cloudendure.com is not reachable via the TCP port 443 from the Replication Server. Check the subnet selected in Setup & Info > Replication Settings and ensure that TCP port 443 can be opened from the Replication Server.

      To verify the integrity of the connection from the Replication Server to the IBM Live Migration Service Manager over TCP port 443:

    • Launch a Ubuntu machine in the same subnet which was selected in the Replication Settings screen under the same VPC.

      In the machine, run the following command:
      wget https://ibm.cloudendure.com (most platforms)

      If the command fails, there is a connectivity issue.

  6. Failed to download the IBM Live Migration Service replication software to the replication server

    Failed to download the CloudEndure

    Possible causes for failure

    • Please make sure that outgoing traffic to the Replication Server download located is not blocked to the management.
    • Create an instance in the Replication Server's subnet.
    • From within that instance try either:
      • Browse to s3.amazonaws.com or s3.us-east-2.amazonawas.com
      • wget s3.amazonaws.com or s3.us-east-2.amazonawas.com
        If unable to access the sites, there is a connectivity issue.
  7. Failed to create staging disks

    Failed to create staging disks

    Possible causes for failure

    • Insufficient permissions in the provided cloud credentials (navigate to Setup & Info > Credentials).
    • You reached the maximum storage limit in the target region.
    • (AWS)Your IBM Live Migration Service Account is configured to use encrypted EBS disks, but the IAM user does not have permissions to encrypt using the selected KMS key.
  8. Failed to attach the staging disks to the replication server

    Failed to attach staging disks

    Possible causes for failure

  9. Failed to pair the IBM Live Migration Service Agent with the replication server

    Please contact IBM Live Migration Support if encountering this error.

  10. Failed to establish communication between the IBM Live Migration Agent and the replication server

    Failed to establish communication

    Possible causes for failure:

    There are cases when installation is indicated as successful, but progress is stopped in the last step of the replication initiation sequence - Establish communication between the IBM Live Migration Service Agent and the Replication Server. This means that the Agent could not establish communication with the IBM Live Migration Service Manager.

    • A firewall, either in the Source or on the target side, is preventing the IBM Live Migration Service Agent from reaching the Replication Server on TCP port 1500.
      • To verify, create a new instance on the Replication Server subnet and run nc -l 1500.
      • From the Source machine run: telnet <new instance ip> 1500.
        If this test fails, then check:
      • The Network ACL on the target subnet.
      • Route rules on the target subnet.
      • Firewall permissions (internal/external) in the b
    • The Use VPN… checkbox was mistakenly selected or deselected in Setup & Info > Replication Settings. Troubleshooting Issue - Finalized Initial Sync stuck in Data Replication Progress.
    NoteAfter fixing the issue, it may take up to 30 minutes to re-establish communication.

Troubleshooting Issue - Finalized Initial Sync Stuck in Data Replication Progress

Data replication progress

Possible solutions include:
A backlog may be preventing sync from starting. Please wait until backlog is flushed for sync to initialize.

machine dashboard

Check if the IAM policy is correct (AWS)
Please make sure you are not over-quota for Recovery Point creation.
At times, the cloud vendors' APIs can have latency overheads, so patience is sought.

Troubleshooting issue - target machine (replica) creation failed

log

Possible solutions include:

Replication Speed/Lag issues

Potential solutions:

Testing replication speed:
The replication speed depends on four key factors:

To test the uplink speed, check the iperf3 utility as follows:

NoteIf you're using Red Hat/CentOS use yum install instead. Please note, the epel-release package is required.

(AWS only) Replication stuck at 0% - C5 and M5 instances

NVM drivers AWS Issue - C5 and M5 instances only

An AWS bug in the NVM drivers on the C5 and M5 instance types may cause replication issues. Users who encounter this bug will have Data Replication stuck at 0% on the IBM Live Migration Service user console.

NVM drivers AWS Issue

To verify if NVM drivers are working properly in the source machine, run: Get-Disk | Select Number, Serial Number | Sort-Object Number
Ensure that the output matches the output shown in the AWS documentation.
If there is a driver issue, the output will only contain zeros.
If there is an issue, the output will contain only zeros.

Number Serial Number


0 0000_0000_0000_0000.
1 0000_0000_0000_0000.
2 0000_0000_0000_0000.
3 0000_0000_0000_0000.
4 0000_0000_0000_0000.
5 0000_0000_0000_0000.
6 0000_0000_0000_0000.
If this is the case, something is not malfunctioning in the machine as it does not felt the correct EBS volume IDs.

Fix this issue by reinstalling the NVM drivers.
Once the issue is fixed and the expected output is returned, Data Replication starts.