Migrating data lineage
Use cpd-cli commands to export and import data lineage within a Cloud Pak for Data cluster or between different clusters.
Prerequisites
Required roles
To complete this task, you must have one of the following roles:
- Cluster Administrator
- Instance Administrator
- Lineage Administrator
Limitation
- You can only migrate lineage assets and flows between environments.
- Any data source definition, catalog or project assignments won't be migrated.
- The lineage database needs to be empty. Any old data in the database will be deleted during the import.
Before you begin
The following prerequisites must be met:
- You must have Kubernetes installed. Check your version using the
kubectl versioncommand. - You must have OpenShift CLI installed. Check your version using the
oc versioncommand. - Connect to the Cloud Pak for Data experience using Kubernetes:
oc login -u <user_name> -p <password> <host>
-
Install the
cpd-cliinterface. For more information, see Installing the Cloud Pak for Data command-line interface (cpd-cli). -
Verify that you created a
cpd-cliprofile on your workstation. For more information, see Creating a profile to use thecpd-climanagement commands. -
Create a persistent volume claim (PVC) deployment file called
wkc-pvc.yamlwith the following parameters:apiVersion: v1 kind: PersistentVolumeClaim metadata: name: wkc-pvc spec: storageClassName: managed-nfs-storage accessModes: - ReadWriteMany resources: requests: storage: 10Gi -
Run the following command to create export import PVC file using the previously created deployment file:
oc apply -n ${PROJECT_CPD_INST_OPERANDS} -f wkc-pvc.yaml -
Initialize the export import utility on your workstation:
cpd-cli export-import init --pvc-name=wkc-pvc --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME} -
Review the list of export import modules to verify if the
lineage-apicomponent is listed. Run the following command for the list of the modules:cpd-cli export-import list aux-module --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}An example of the status message
ID NAME COMPONENT KIND VERSION ARCHITECTURE NAMESPACE VENDOR SI 0027-wkc-base catalog-api-aux catalog-api exim 1.0.0 x86_64 wkc ibm N glossary-api-aux glossary-api exim 1.0.0 x86_64 wkc ibm N 0075-wkc-lite lineage-api-aux lineage-api exim 1.0.0 x86_64 wkc ibm N 0075-wkc-lite policy-api-aux policy-api exim 1.0.0 x86_64 wkc ibm N wkc-base wkc-base-aux wkc-base exim 1.0.0 x86_64 wkc ibm N
Exporting data lineage
To export data lineage from the IBM Manta Data Lineage service, complete the following tasks:
-
Create the export job.
cpd-cli export-import export create <export_name> --component lineage-api --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}You can configure the following parameters:
- parameter:
TOKEN
value: Set to the current cluster admin token as a default. - parameter:
EXPORT_HOST
value: Set to the current cluster hostname as a default.
- parameter:
-
Check the status of the export.
cpd-cli export-import export status <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}An example of the status message
Name: lineage-export1 Job Name: cpd-ex-lineage-export1 Active: 0 Succeeded: 1 Failed: 0 Start Time: Thu, 31 Oct 2024 23:46:40 +0100 Completed At: Thu, 31 Oct 2024 23:46:58 +0100 Duration: 18s -
Download the export logs file.
cpd-cli export-import export logs <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME} -
Download the exported data.
cpd-cli export-import export download <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME} -
Delete the export job.
cpd-cli export-import export delete <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
Importing data lineage
To import data lineage to the IBM Manta Data Lineage service, complete the following tasks:
-
Create an import job.
cpd-cli export-import import create <import_name> -f lineage-import.yaml --from-export=lineage-export1 --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}You can configure the following parameters:
- parameter:
TOKEN
value: Set to the current cluster admin token as a default. - parameter:
IMPORT_HOST
value: Set to the current cluster hostname as a default. - parameter:
WAIT_TIME
value: Time between checks of import status. Set by default to 5 seconds. - parameter:
DATA_COLLISION_STRATEGY
value: An action when detecting preexisting data. Set tostopvalue by default. Results in stopping and failing import job. Can be changed todelete_oldvalue to delete preexisting data (and replace it?). - parameter:
JOB_COLLISION_STRATEGY
value: An action when detecting a running job. Set tostop_myselfvalue by default. Results in stopping and failing import job. Can be changed toremove_lockvalue to force to run the job.
- parameter:
-
Check the status of the import.
cpd-cli export-import import status <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}An example of the status message
Name: lineage-import1 Job Name: cpd-im Active: 0 Succeeded: 1 Failed: 0 Start Time: Thu, 31 Oct 2024 23:54:28 +0100 Completed At: Thu, 31 Oct 2024 23:54:48 +0100 Duration: 20s -
Download the import logs file.
cpd-cli export-import import logs <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME} -
Delete the import job.
cpd-cli export-import import delete <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}