Migrating data between Cloud Pak for Data installations

Important: IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

Use the product data export and import utility to export data, including metadata, from one IBM Cloud Pak for Data installation and import the data to another Cloud Pak for Data installation.

The cpd-cli export-import command line interface can export and import Cloud Pak for Data service-specific data (when the services support the command). For more information, see Services that support cpd-cli export-import.

Who needs to complete this task?

Cluster administrator A cluster administrator must run the command that initializes the export import utility.

Instance administrator An instance administrator can run all other cpd-cli export-import commands.

Before you begin

Complete the following tasks before you run the cpd-cli export-import commands:

  1. Set up a client workstation to install Cloud Pak for Data.
  2. Create a profile to use the management commands.
  3. Prepare to use the export and import utility.
Best practice: You can run many of the commands in this task exactly as written if you set up environment variables for your installation. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

Initializing the export import utility

You must initialize the export import utility before you run any cpd-cli export-import commands.

  1. Set the CPU_ARCH environment variable based on the hardware on your Red Hat OpenShift Container Platform cluster:
    • For x86-64 hardware, run:
      export CPU_ARCH=x86_64
    • For Power® hardware, run:
      export CPU_ARCH=ppc64le
  2. Set the CPD_PROFILE_NAME environment variable to the name of the profile that you created in Creating a profile to use the management commands.
    export CPD_PROFILE_NAME=<my-profile-name>
  3. Run the appropriate command for your environment:
    The cluster pulls images from a private container registry
    Restriction: This option is available only if an administrator completed Installing the Cloud Pak for Data command-line interface (cpd-cli).
    cpd-cli export-import init \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --arch=${CPU_ARCH} \
    --pvc-name=export-import-pvc \
    --profile=${CPD_PROFILE_NAME} \
    --image-prefix=${PRIVATE_REGISTRY_LOCATION}

    The cluster pulls images from the IBM Entitled Registry
    Restriction: This option is available only if the cluster can connect to the internet.
    cpd-cli export-import init \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --arch=${CPU_ARCH} \
    --pvc-name=export-import-pvc \
    --profile=${CPD_PROFILE_NAME} \
    --image-prefix=icr.io/cpopen/cpd

List the available auxiliary modules

When you install a service that uses the cpd-cli export-import commands, the service installs a service-specific auxiliary module. For details, see Services that support cpd-cli export-import.

Run the following command to determine which auxiliary modules are installed:

cpd-cli export-import list aux-modules \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

Exporting data

The following commands provide several examples of how you can export data from an instance of Cloud Pak for Data.

Export data from Cloud Pak for Data to myexport1:
cpd-cli export-import export create myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}
Check whether the myexport1 job succeeded, failed, or is still in active progress:
cpd-cli export-import export status myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}
Retrieve the logs for the myexport1 export:
cpd-cli export-import export logs myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}
Export data from Cloud Pak for Data to myexport2 by using a scheduled export job at minute 0 past every 12th hour:
cpd-cli export-import schedule-export create myexport2 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--schedule="0 */12 * * *" \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}
Check whether the scheduled myexport2 job succeeded, failed, or is still in active progress:
cpd-cli export-import schedule-export status myexport2 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

Downloading exported data

The following command provides an example of how you can download the data that you exported so that you can migrate the data to another cluster. The exported data is saved to a compressed file.

Download data from Cloud Pak for Data to a TAR file in current working directory:
cpd-cli export-import export download myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

Uploading exported data

The following command provides an example of how you can upload the contents of the compressed export file to a different cluster (the target cluster).

Important: The cpd-cli export-import utility must be installed and initialized on the target cluster before you upload the exported data.
Upload data from a compressed TAR file
cpd-cli export-import export upload \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH} \
--file=cpd-exports-myexport1-20200301101735-data.tar

Importing data

The export must be completed successfully before you can run an import. Because only one import job is allowed at a time, you must always delete the completed import job to start a new one.

To import Cloud Pak for Data data from the myexport1 example:

cpd-cli export-import import create myimport1 \
--from-export=myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

To import Cloud Pak for Data data from the scheduled myexport2 example:

cpd-cli export-import import create myimport1 \
--from-schedule=myexport2 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

To check whether the myimport1 job succeeded, failed, or is still in active progress:

cpd-cli export-import import status myimport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH} \

Stopping export-import jobs

To delete the myexport1 job in the ${PROJECT_CPD_INST_OPERANDS} namespace without purging the exported data stored in the volume:

cpd-cli export-import export delete myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

To delete the myexport1 job and purge the exported data that is stored in the volume in the ${PROJECT_CPD_INST_OPERANDS} project:

cpd-cli export-import export delete myexport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH} \
--purge

To delete the scheduled myexport2 job and purge the exported data that is stored in the volume in the ${PROJECT_CPD_INST_OPERANDS} project:

cpd-cli export-import schedule-export delete myexport2 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH} \
--purge

To delete the myimport1 job:

cpd-cli export-import import delete myimport1 \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}

To force cleanup any previous k8s resources that are created by cpd-cli export-import and use a different PVC:

cpd-cli export-import reset \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH} \
--force

cpd-cli export-import init \
--image-prefix=${PRIVATE_REGISTRY_LOCATION}/${PROJECT_CPD_INST_OPERANDS} \
--namespace=${PROJECT_CPD_INST_OPERANDS} \
--pvc-name=pvc2 \
--profile=${CPD_PROFILE_NAME} \
--arch=${CPU_ARCH}