Migrating Data Virtualization data by using the cpd-cli
Migrate your Data
Virtualization connections and database objects by using
cpd-cli commands.
About this task
The export and import functions in Cloud Pak for Data on-premises enables you to migrate Data Virtualization connections, objects, and associated governance artifacts between Cloud Pak for Data instances of the same version or between Data Virtualization instances of the same Cloud Pak for Data instance.
Governance artifacts are migrated separately by using the IBM® Knowledge Catalog export-import module. To migrate your governance assets, see Migrating catalog assets, Migrating governance artifacts, and Migrating data protection rules.
- Authorization statements (GRANTs)
- Connections Note: When you export and import data source connections, grantable privileges might not be fully preserved. Privileges that are grantable on the source system might be imported on the target system without the ability to grant them to other users. If grantable access is required after importing connections, then it must be manually re-granted by the connection owner or an authorized administrator.
- Nicknames
- Schemas (excluding those with personal credentials)
- Service instance groups
- Service instance users
- Statistics
- Tables (excluding those with personal credentials)
- User-defined active caches (i.e. not auto-generated caches)
- Views
- Db2 audit policies
- IBM Knowledge
Catalog objects including:
- Governance artifacts
- Catalogs
- Data protection rules
- Complete the pre-requisite steps.
- Export your IBM Knowledge Catalog assets.
- Export your Data Virtualization assets.
- (Optional) You can manage your connection mappings, or customize your control data for different target environments.
- Import your IBM Knowledge Catalog assets.
- Import your Data Virtualization assets.
Guidelines
- Grantable privileges might not be preserved after migration
- Avoiding import conflicts
-
- Re-running the import does not re-import connections or re-migrate governance artifacts that
were migrated successfully in a previous run. Additionally, re-uploading an archive by using
export uploadon objects that you previously imported, resets the import status (control data) to its initial state, resulting in a full re-import. To determine the migration status of the imported governance artifacts, check thegovernance_statusfield in the Data Virtualization import pod log.Examples of values are:
and:"governance_status": "SUCCESS -- Governance import complete.", "governance_import_completed": "2024-10-11T19:44:13.266958545","governanceAttemptSuccessful":"2026-02-28T02:07:56.443661542", "governance_status":"skipped -- Governance import already completed at 2026-02-28T02:07:56.443661542. Skipping governance.", - You can only migrate assets between Data
Virtualization instances on the same Cloud Pak for Data cluster.Note: You cannot export your current Data Virtualization instance and then import it back into the same Data Virtualization instance.
- Ensure that the source and target instances do not share assets (catalogs, projects, space assets and connections) in the same catalog, project, or space or else those assets might not update properly upon import. Instead, use distinct sets of catalogs, projects, and spaces to avoid import conflicts.
- Re-running the import does not re-import connections or re-migrate governance artifacts that
were migrated successfully in a previous run. Additionally, re-uploading an archive by using
- Input value specifications
-
- Case sensitivity applies to the names of schemas, tables, nicknames, and views that you export and import.
- Ensure connections target a specific connection ID by following the uppercase prefix and number format (for example: DB210000).
- Notice that the difference between
SERVICE_INSTANCE_NAMESPACEandSERVICE_INSTANCE_NAMEis in the usage and scope:SERVICE_INSTANCE_NAMESPACESERVICE_INSTANCE_NAMERefers to the namespace that Data Virtualization is running in which could be the same as the IBM Cloud Pak® for Data namespace (defaulting to “zen”) but can also be different (e.g., using tethered namespaces). This namespace is used in cpd-cli calls with the --namespace argument, particularly for import and export operations. Represents the Instance Name field that you set in the web client during Data Virtualization provisioning or displayed in the list of Data Virtualization instances, defaulting to data-virtualization-<namespace>-<optional_suffix>. This name is used in cpd-cli calls with the --instance-name flag, mainly for import and export creation.
- Import timing
-
- For large-scale imports, wait at least 10 minutes between importing IBM Knowledge Catalog and Data Virtualization assets. The import process relies on Cloud Pak for Data Global Search to identify and update the imported IBM Knowledge Catalog assets before these changes reflect in Global Search.
- Troubleshooting and error reporting
-
- If an import fails, you can safely re-import the assets to update the remaining assets that weren’t successfully imported.
- If you want to re-import your modified control data, ensure that you specify
CONN_IGNORE_DUPLICATES=True and
OBJ_IGNORE_DUPLICATES=True to avoid errors. For more information, see
CONN_IGNORE_DUPLICATESandOBJ_IGNORE_DUPLICATES. - You can retrieve export and import logs by using the respective log commands. See
export logsandimport logs. - Detailed export-import logs and errors are logged as exceptions in the
QueryplexAdmin.log file. To access the file, navigate to
/mnt/logs/diaglogs/NODE0000/QueryplexAdmin.log in the Data
Virtualization head
node (pod
c-db2u-dv-db2u-0).
Before you begin
- Data Virtualization Administrator
- Cluster Administrator
Pre-requisite steps
- Install the
cpd-cliinterface and then create acpd-cliprofile on your workstation. For more information, see Installing the IBM Software Hub command-line interface (cpd-cli) and Creating a profile to use the cpd-cli management commands. - Initialize the cpd-cli export-import utility. Ensure
you create a persistent volume claim (PVC) in each Data
Virtualization namespace where you want to
run the export-import utility. For more information, see Migrating data between IBM Software
Hub installations.
- In the case you need to reset the PVC, see
export-import reset.
- In the case you need to reset the PVC, see
Exporting Data Virtualization data
To export your Data Virtualization data, complete these steps.
- Create a new values.yaml file to specify which Data
Virtualization
objects to export.
- Specify the following required input values in the values.yaml file.
Ensure that identifiers consist only of alphanumeric characters. Note: By default, all of the data sets associated with
users,groups,caches, andauditare exported.- If you specify only
connections, then its schema, and schema and name is exported. - If you specify only
schemas, then its associated database objects and dependent connection (if it's not specified) is exported. - To stop exporting a data set, specify an empty array. For example:
"users": [],
users: Specify a comma-separated list of users that you want to export.groups: Specify a comma-separated list of group names that you want to export.caches: Specify a comma-separated list of cache IDs that you want to export.audit: Specify a comma-separated list of audit policy names that you want to export.objects: Export values that the engine needs before it can run.encryptionKey: The encryption key that is used to generate another encryption key, which encrypts your exported engine data (i.e. schemas, tables, and connections). If the objects exist, then specify an encryption key of your choice.identifier: Add an identifier to the export. This value uses the IBM Software Hub export job ID by default.- (Optional)
includeDepedencies: Export all related objects, including connections that the objects depend on. For example, if a view depends on a nickname and a table, all three are exported if you specifyincludeDepedencies=TRUE. Additionally, the connections that are associated with the nickname and table are also exported. This option is false by default.Note:includeDependenciesmight cause more schemas and connections to be included than in the original specification. - (Optional)
includeStatistics: Run and include statistic statements for exported database objects. These statistics statements are re-created during the import process for its associated object. This option is false by default. exportObjects: Provide details and optional comments about the connections and schemas that you want to export.connection: Specify the Data Virtualization connection ID (for example: DB210000).schemaorschemaandname: Specify the schema name to export the whole schema, or export a specific object within a schema by specifying the schema name and the name of the object name in that schema.
Example values.yaml file:{ "users": ["user1", "user2", "user3"], "groups": ["group1", "group2", "group3"], "caches": ["cacheID1", "cacheID2", "cacheID3"], "audit": ["auditPolicy1", "auditPolicy2", "auditPolicy3"], "objects": { "encryptionKey": "myEncryptionKey", "identifier": "myExport123", "includeDependencies": true, "includeStatistics": true, "exportObjects": [ { "_comment": "comment about the exported connection", "connection": "connectionID" }, { "_comment": "comment about the exported schema", "schema": "schemaName" }, { "_comment": "comment about the exported object", "schema": "schemaName", "name": "mytable1" } ] } } - If you specify only
- Format the input values into a single line. Note: Use
jq -cto compact the JSON objects into a single line.cat <<EOF | jq -c { "objects": { "encryptionKey": "myEncryptionKey", "includeStatistics": false, "exportObjects": [ { "_comment": "CPADMIN Schema", "schema": "CPADMIN" } ] }, "caches": [ "DVxxxx", "DVyyyy" ] } EOFAn example of the JSON objects as a single line:
{"objects":{"encryptionKey":"myEncryptionKey","includeStatistics":false,"exportObjects":[{"_comment":"CPADMIN Schema","schema":"CPADMIN"}]},"caches":["DVxxxx","DVyyyy"]} - Format and save the values.yaml file that is passed to
cpd-clias a dv-aux JSON object wrapped in single quotation marks:dv-aux: values: '{"objects":{"encryptionKey":"myEncryptionKey","includeStatistics":false,"exportObjects":[{"_comment":"CPADMIN Schema","schema":"CPADMIN"}]},"caches":["DVxxxx","DVyyyy"]}'
- Specify the following required input values in the values.yaml file.
Ensure that identifiers consist only of alphanumeric characters.
- Run the following command to pass the values.yaml
values in
cpd-cli.- Replace <cpd_import_name> with the name of the export to create. The name of the export in the following examples is myexport1.
Example:cpd-cli export-import export create <cpd_export_name> \ --profile=cpadmin \ --service-type dv \ --instance-name ${service_instance_name} \ --namespace ${SERVICE_INSTANCE_NAMESPACE} \ --values values.yaml \ --backoff-limit 0cpd-cli export-import export create myexport1 \ --profile=cpadmin \ --service-type dv \ --instance-name ${service_instance_name} \ --namespace ${SERVICE_INSTANCE_NAMESPACE} \ --values values.yaml \ --backoff-limit 0The name of the export data prints.Note: To check the name of the exported data, use theexport listcommand. Seeexport-import export list. - Check whether the export job succeeded, failed, or is still in active progress by running this
command:
cpd-cli export-import export status myexport1 \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --profile=${CPD_PROFILE_NAME} \ --arch=${CPU_ARCH} - Download the exported data from IBM Software
Hub to a compressed TAR file in the current
working directory:
cpd-cli export-import export download myexport1 \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --profile=${CPD_PROFILE_NAME} \ --arch=${CPU_ARCH}
Optional tasks
- You can map an existing connection ID to a target instance in both tasks, but you do not need to complete both tasks.
- If you run both tasks and no duplicates exist, then the mappings are combined.
- If you run both tasks and duplicates exist, then the mappings in the control data take priority.
- Manage how your connection IDs are mapped during import operations in Data
Virtualization by
running the
MANAGE_CID_MAPPINGstored procedure. See Managing connection mappings during import. - Customize your exported control data to accommodate differences in your target environment compared to your source environment, such as varying connection owners, gdbnode, and connection actions. To modify your data, see Modifying control data to support differences in environments.
Importing Data Virtualization data
import delete.To successfully import Data Virtualization artifacts, you must first import IBM Knowledge Catalog artifacts, and ensure that both imports occur within the same run context. The Data Virtualization import process only recognizes IBM Knowledge Catalog artifacts that are newly imported in the same run, and does not operate on artifacts processed in previous runs.
After the Data Virtualization import completes, it updates (or "claims") the newly imported IBM Knowledge Catalog artifacts, marking them as processed. In subsequent runs, processed artifacts are ignored during governance validation.
As a result, if you run the Data
Virtualization import again, you must re-import the required
IBM Knowledge
Catalog artifacts beforehand; otherwise, the
import fails with the ENSURE GOVERNANCE ARTIFACTS WERE IMPORTED FIRST error because
Data
Virtualization expects newly imported governance artifacts for each run.
If you want to bypass this validation, you must explicitly use the
SKIP_GOVERNANCE=TRUE option. See SKIP_GOVERNANCE.
To import your Data Virtualization data, complete these steps.
- Conditional: If you want to import the export data into a different namespace or cluster, then repeat the PVC set up steps for each namespace or cluster.
- Upload the export data from a compressed TAR file by running this command.
- Replace <export-import-file-name> with the name of the export file to upload (or use the new TAR file name if you modified the control data).
Example:cpd-cli export-import export upload \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --profile=${CPD_PROFILE_NAME} \ --arch=${CPU_ARCH} \ --file=<export-import-file-name>cpd-cli export-import export upload \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --profile=${CPD_PROFILE_NAME} \ --arch=${CPU_ARCH} \ --file=cpd-exports-myexport1-20200301101735-data.tar - In a new values.yaml file, specify the input values. Then format and save
the file that is passed to the
cpd-clias a dv-aux JSON object wrapped in single quotation marks:objectsOptions: Specify a comma-separated list of options for the object export. See the following Import options.
Example values.yaml file that is formatted as a dv-aux JSON object:dv-aux: values: '{"objectsOptions":"OBJ_RETRY_ALL=True,GOV_PROCEED_ON_ERROR=True,CONN_IGNORE_DUPLICATES=True,OBJ_IGNORE_DUPLICATES=True"}'Optional: Specify the following options when you create an import job.Import options Description GOV_PROCEED_ON_ERROR Continue the import process even if unrecoverable errors occur in the process of updating imported IBM Knowledge Catalog catalog assets, projects, connections and deployment space assets. The errors are logged as exceptions in QueryplexAdmin.log, but it does not stop the importing process.Values:- PROCEED_ON_ERROR=True
- PROCEED_ON_ERROR=False (default)
SKIP_GOVERNANCE Data Virtualization automatically updates imported IBM Knowledge Catalog catalog assets, projects, connections and deployment space assets, and then associates them to your Data Virtualization instance, before importing Data Virtualization objects.
Specify SKIP_GOVERNANCE=TRUE if you want to skip the update step, and then proceed to importing Data Virtualization objects.
Specify SKIP_GOVERNANCE=ONLY_GOVERNANCE if you want to only update IBM Knowledge Catalog governance assets, and skip importing Data Virtualization objects.
Values:- SKIP_GOVERNANCE=True
- SKIP_GOVERNANCE=False (default)
- SKIP_GOVERNANCE=ONLY_GOVERNANCE
BACKUP_CONN_OWNER Specifies the backup connection owner as the new connection owner. This option is used in the case that the original owner of the connection on the original instance doesn't exist when connections are being re-created. The backup connection owner must exist and have the appropriate privileges. If the connection owner exists, then the import_statusof the connection changes toSuccess.Values: Specify the backup IBM Software Hub user ID. For example, DB2INST1.
CONN_IGNORE_DUPLICATES and OBJ_IGNORE_DUPLICATES Use CONN_IGNORE_DUPLICATESandOBJECT_IGNORE_DUPLICATESto ignore duplicate connections and objects during the import process, respectively.If you set CONN_IGNORE_DUPLICATES=True or OBJ_IGNORE_DUPLICATES=True, then duplicated items are marked with a
Values:SQLCODE=-601duplication error, instead ofFailure.- CONN_IGNORE_DUPLICATES=True
- CONN_IGNORE_DUPLICATES=False (default)
- OBJ_IGNORE_DUPLICATES=True
- OBJ_IGNORE_DUPLICATES=False (default)
CONN_RETRY_ALL and OBJ_RETRY_ALL Use CONN_RETRY_ALLandOBJ_RETRY_ALLto retry importing all the connections and objects again, respectively.Note: Enabling these options ignores the result of the originalValues:import_status.- CONN_RETRY_ALL=True
- CONN_RETRY_ALL=False (default)
- OBJ_RETRY_ALL=True
- OBJ_RETRY_ALL=False (default)
- Run the following command to pass the values.yaml values in the
cpd-clisimilar to the following example.- Replace <cpd_import_name> with the name of the import to create. The name of the import in the following examples is myimport1.
- Replace <export_data name> with the name of the export data from step 3.
Example command:cpd-cli export-import import create <cpd_import_name> \ --profile=cpadmin \ --from-export <export_data_name> \ --service-type dv \ --instance-name ${service_instance_name} \ --namespace ${SERVICE_INSTANCE_NAMESPACE} \ --values values.yaml \ --backoff-limit 0cpd-cli export-import import create myimport1 \ --profile=cpadmin \ --from-export export-20240201221230 \ --service-type dv \ --instance-name ${service_instance_name} \ --namespace ${SERVICE_INSTANCE_NAMESPACE} \ --values values.yaml \ --backoff-limit 0 - Check whether the import job succeeded, failed, or is still in active progress by running this
command:
cpd-cli export-import import status myimport1 \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --profile=${CPD_PROFILE_NAME} \ --arch=${CPU_ARCH} \
Learn more
- To troubleshoot cpd-cli specific migration errors, see Troubleshooting data migration errors in Data Virtualization.
- For a list of all the cpd-cli commands for export and import, see
export-import. - For more information on options, see the Option section in
cpd-clicommand syntax conventions. - For more information on migrating data between IBM Software Hub installations, see Migrating data between IBM Software Hub installations.