Migrating data lineage

Use cpd-cli commands to export and import data lineage within a Cloud Pak for Data cluster or between different clusters.

Prerequisites

Required roles

To complete this task, you must have one of the following roles:

  • Cluster Administrator
  • Instance Administrator
  • Lineage Administrator

Limitation

  • You can only migrate lineage assets and flows between environments.
  • Any data source definition, catalog or project assignments won't be migrated.
  • The lineage database needs to be empty. Any old data in the database will be deleted during the import.

Before you begin

The following prerequisites must be met:

  1. You must have Kubernetes installed. Check your version using the kubectl version command.
  2. You must have OpenShift CLI installed. Check your version using the oc version command.
  3. Connect to the Cloud Pak for Data experience using Kubernetes:
oc login -u <user_name> -p <password> <host>
  1. Install the cpd-cli interface. For more information, see Installing the Cloud Pak for Data command-line interface (cpd-cli).

  2. Verify that you created a cpd-cli profile on your workstation. For more information, see Creating a profile to use the cpd-cli management commands.

  3. Create a persistent volume claim (PVC) deployment file called wkc-pvc.yaml with the following parameters:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
        name: wkc-pvc
    spec:
        storageClassName: managed-nfs-storage
        accessModes:
            - ReadWriteMany
        resources:
            requests:
                storage: 10Gi
    
  4. Run the following command to create export import PVC file using the previously created deployment file:

    oc apply -n ${PROJECT_CPD_INST_OPERANDS} -f wkc-pvc.yaml
    
  5. Initialize the export import utility on your workstation:

    cpd-cli export-import init --pvc-name=wkc-pvc --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    
  6. Review the list of export import modules to verify if the lineage-api component is listed. Run the following command for the list of the modules:

    cpd-cli export-import list aux-module --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

    An example of the status message

    ID           	NAME            	COMPONENT   	KIND	VERSION	ARCHITECTURE	NAMESPACE	VENDOR	SI
    0027-wkc-base	catalog-api-aux 	catalog-api 	exim	1.0.0  	x86_64      	wkc      	ibm   	N
             	    glossary-api-aux    glossary-api	exim	1.0.0  	x86_64      	wkc      	ibm   	N
    0075-wkc-lite	lineage-api-aux 	lineage-api 	exim	1.0.0  	x86_64      	wkc      	ibm   	N
    0075-wkc-lite	policy-api-aux  	policy-api  	exim	1.0.0  	x86_64      	wkc      	ibm   	N
    wkc-base     	wkc-base-aux    	wkc-base    	exim	1.0.0  	x86_64      	wkc      	ibm   	N
    

Exporting data lineage

To export data lineage from the IBM Manta Data Lineage service, complete the following tasks:

  1. Create the export job.

    cpd-cli export-import export create <export_name> --component lineage-api --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

    You can configure the following parameters:

    • parameter: TOKEN
      value: Set to the current cluster admin token as a default.
    • parameter: EXPORT_HOST
      value: Set to the current cluster hostname as a default.
  2. Check the status of the export.

    cpd-cli export-import export status <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

    An example of the status message

    Name:        	lineage-export1
    Job Name:    	cpd-ex-lineage-export1
    Active:      	0
    Succeeded:   	1
    Failed:      	0
    Start Time:  	Thu, 31 Oct 2024 23:46:40 +0100
    Completed At:	Thu, 31 Oct 2024 23:46:58 +0100
    Duration:    	18s
    
  3. Download the export logs file.

    cpd-cli export-import export logs <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    
  4. Download the exported data.

    cpd-cli export-import export download <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    
  5. Delete the export job.

    cpd-cli export-import export delete <export_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

Importing data lineage

To import data lineage to the IBM Manta Data Lineage service, complete the following tasks:

  1. Create an import job.

    cpd-cli export-import import create <import_name> -f lineage-import.yaml --from-export=lineage-export1 --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

    You can configure the following parameters:

    • parameter: TOKEN
      value: Set to the current cluster admin token as a default.
    • parameter: IMPORT_HOST
      value: Set to the current cluster hostname as a default.
    • parameter: WAIT_TIME
      value: Time between checks of import status. Set by default to 5 seconds.
    • parameter: DATA_COLLISION_STRATEGY
      value: An action when detecting preexisting data. Set to stop value by default. Results in stopping and failing import job. Can be changed to delete_old value to delete preexisting data (and replace it?).
    • parameter: JOB_COLLISION_STRATEGY
      value: An action when detecting a running job. Set to stop_myself value by default. Results in stopping and failing import job. Can be changed to remove_lock value to force to run the job.
  2. Check the status of the import.

    cpd-cli export-import import status <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    

    An example of the status message

    Name:        	lineage-import1
    Job Name:    	cpd-im
    Active:      	0
    Succeeded:   	1
    Failed:      	0
    Start Time:  	Thu, 31 Oct 2024 23:54:28 +0100
    Completed At:	Thu, 31 Oct 2024 23:54:48 +0100
    Duration:    	20s
    
  3. Download the import logs file.

    cpd-cli export-import import logs <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}
    
  4. Delete the import job.

    cpd-cli export-import import delete <import_name> --namespace=${PROJECT_CPD_INST_OPERANDS} --profile=${CPD_PROFILE_NAME}