Migrating Data Virtualization data by using the cpd-cli

Migrate your Data Virtualization connections and database objects by using cpd-cli commands.

About this task

The export and import functions in Cloud Pak for Data on-premises enables you to migrate Data Virtualization connections, objects, and associated governance artifacts between Cloud Pak for Data instances of the same version or between Data Virtualization instances of the same Cloud Pak for Data instance.

Governance artifacts are migrated separately by using the IBM® Knowledge Catalog export-import module. To migrate your governance assets, see Migrating catalog assets, Migrating governance artifacts, and Migrating data protection rules.

You can migrate the following objects:
  • Authorization statements (GRANTs)
  • Connections
    Note: When you export and import data source connections, grantable privileges might not be fully preserved. Privileges that are grantable on the source system might be imported on the target system without the ability to grant them to other users. If grantable access is required after importing connections, then it must be manually re-granted by the connection owner or an authorized administrator.
  • Nicknames
  • Schemas (excluding those with personal credentials)
  • Service instance groups
  • Service instance users
  • Statistics
  • Tables (excluding those with personal credentials)
  • User-defined active caches (i.e. not auto-generated caches)
  • Views
  • Db2 audit policies
  • IBM Knowledge Catalog objects including:
    • Governance artifacts
    • Catalogs
    • Data protection rules
To migrate your data, follow this workflow:
  1. Complete the pre-requisite steps.
  2. Export your IBM Knowledge Catalog assets.
  3. Export your Data Virtualization assets.
  4. (Optional) You can manage your connection mappings, or customize your control data for different target environments.
  5. Import your IBM Knowledge Catalog assets.
  6. Import your Data Virtualization assets.

Guidelines

Consider the following guidelines before you start the migration process.
Grantable privileges might not be preserved after migration
When you export and import data source connections, grantable privileges might not be fully preserved. Privileges that are grantable on the source system might be imported on the target system without the ability to grant them to other users. If grantable access is required after importing connections, then it must be manually re-granted by the connection owner or an authorized administrator.
Avoiding import conflicts
  • Re-running the import does not re-import connections or re-migrate governance artifacts that were migrated successfully in a previous run. Additionally, re-uploading an archive by using export upload on objects that you previously imported, resets the import status (control data) to its initial state, resulting in a full re-import. To determine the migration status of the imported governance artifacts, check the governance_status field in the Data Virtualization import pod log.
    Examples of values are:
      "governance_status": "SUCCESS -- Governance import complete.",
      "governance_import_completed": "2024-10-11T19:44:13.266958545",
    and:
      "governanceAttemptSuccessful":"2026-02-28T02:07:56.443661542",
      "governance_status":"skipped -- Governance import already completed at 2026-02-28T02:07:56.443661542. Skipping governance.",
  • You can only migrate assets between Data Virtualization instances on the same Cloud Pak for Data cluster.
    Note: You cannot export your current Data Virtualization instance and then import it back into the same Data Virtualization instance.
  • Ensure that the source and target instances do not share assets (catalogs, projects, space assets and connections) in the same catalog, project, or space or else those assets might not update properly upon import. Instead, use distinct sets of catalogs, projects, and spaces to avoid import conflicts.
Input value specifications
  • Case sensitivity applies to the names of schemas, tables, nicknames, and views that you export and import.
  • Ensure connections target a specific connection ID by following the uppercase prefix and number format (for example: DB210000).
  • Notice that the difference between SERVICE_INSTANCE_NAMESPACE and SERVICE_INSTANCE_NAME is in the usage and scope:
    SERVICE_INSTANCE_NAMESPACE SERVICE_INSTANCE_NAME
    Refers to the namespace that Data Virtualization is running in which could be the same as the IBM Cloud Pak® for Data namespace (defaulting to “zen”) but can also be different (e.g., using tethered namespaces).

    This namespace is used in cpd-cli calls with the --namespace argument, particularly for import and export operations.
    Represents the Instance Name field that you set in the web client during Data Virtualization provisioning or displayed in the list of Data Virtualization instances, defaulting to data-virtualization-<namespace>-<optional_suffix>.

    This name is used in cpd-cli calls with the --instance-name flag, mainly for import and export creation.
Import timing
  • For large-scale imports, wait at least 10 minutes between importing IBM Knowledge Catalog and Data Virtualization assets. The import process relies on Cloud Pak for Data Global Search to identify and update the imported IBM Knowledge Catalog assets before these changes reflect in Global Search.
Troubleshooting and error reporting
  • If an import fails, you can safely re-import the assets to update the remaining assets that weren’t successfully imported.
  • If you want to re-import your modified control data, ensure that you specify CONN_IGNORE_DUPLICATES=True and OBJ_IGNORE_DUPLICATES=True to avoid errors. For more information, see CONN_IGNORE_DUPLICATES and OBJ_IGNORE_DUPLICATES.
  • You can retrieve export and import logs by using the respective log commands. See export logs and import logs.
  • Detailed export-import logs and errors are logged as exceptions in the QueryplexAdmin.log file. To access the file, navigate to /mnt/logs/diaglogs/NODE0000/QueryplexAdmin.log in the Data Virtualization head node (pod c-db2u-dv-db2u-0).

Before you begin

To complete this task, you must be a:
  • Data Virtualization Administrator
  • Cluster Administrator

Pre-requisite steps

  1. Install the cpd-cli interface and then create a cpd-cli profile on your workstation. For more information, see Installing the IBM Software Hub command-line interface (cpd-cli) and Creating a profile to use the cpd-cli management commands.
  2. Initialize the cpd-cli export-import utility. Ensure you create a persistent volume claim (PVC) in each Data Virtualization namespace where you want to run the export-import utility. For more information, see Migrating data between IBM Software Hub installations.

Exporting Data Virtualization data

To export your Data Virtualization data, complete these steps.

  1. Create a new values.yaml file to specify which Data Virtualization objects to export.
    1. Specify the following required input values in the values.yaml file. Ensure that identifiers consist only of alphanumeric characters.
      Note: By default, all of the data sets associated with users, groups, caches, and audit are exported.
      • If you specify only connections, then its schema, and schema and name is exported.
      • If you specify only schemas, then its associated database objects and dependent connection (if it's not specified) is exported.
      • To stop exporting a data set, specify an empty array. For example: "users": [],
      • users: Specify a comma-separated list of users that you want to export.
      • groups: Specify a comma-separated list of group names that you want to export.
      • caches: Specify a comma-separated list of cache IDs that you want to export.
      • audit: Specify a comma-separated list of audit policy names that you want to export.
      • objects: Export values that the engine needs before it can run.
      • encryptionKey: The encryption key that is used to generate another encryption key, which encrypts your exported engine data (i.e. schemas, tables, and connections). If the objects exist, then specify an encryption key of your choice.
      • identifier: Add an identifier to the export. This value uses the IBM Software Hub export job ID by default.
      • (Optional) includeDepedencies: Export all related objects, including connections that the objects depend on. For example, if a view depends on a nickname and a table, all three are exported if you specify includeDepedencies=TRUE. Additionally, the connections that are associated with the nickname and table are also exported. This option is false by default.
        Note: includeDependencies might cause more schemas and connections to be included than in the original specification.
      • (Optional) includeStatistics: Run and include statistic statements for exported database objects. These statistics statements are re-created during the import process for its associated object. This option is false by default.
      • exportObjects: Provide details and optional comments about the connections and schemas that you want to export.
        • connection: Specify the Data Virtualization connection ID (for example: DB210000).
        • schema or schema and name: Specify the schema name to export the whole schema, or export a specific object within a schema by specifying the schema name and the name of the object name in that schema.
      Example values.yaml file:
      {
        "users": ["user1", "user2", "user3"],
        "groups": ["group1", "group2", "group3"],
        "caches": ["cacheID1", "cacheID2", "cacheID3"],
        "audit": ["auditPolicy1", "auditPolicy2", "auditPolicy3"],
        "objects": {
          "encryptionKey": "myEncryptionKey",
          "identifier": "myExport123",
          "includeDependencies": true,
          "includeStatistics": true,
          "exportObjects": [
            {
              "_comment": "comment about the exported connection",
              "connection": "connectionID"
            },
            {
              "_comment": "comment about the exported schema",
              "schema": "schemaName"
            },
            {
              "_comment": "comment about the exported object",
              "schema": "schemaName",
              "name": "mytable1"
            }
          ]
        }
      }
    2. Format the input values into a single line.
      Note: Use jq -c to compact the JSON objects into a single line.
      cat <<EOF | jq -c
      {
        "objects": {
          "encryptionKey": "myEncryptionKey",
          "includeStatistics": false,
          "exportObjects": [
            {
              "_comment": "CPADMIN Schema",
              "schema": "CPADMIN"
            }
          ]
        },
        "caches": [
          "DVxxxx",
          "DVyyyy"
        ]
      } 
      EOF

      An example of the JSON objects as a single line:

      {"objects":{"encryptionKey":"myEncryptionKey","includeStatistics":false,"exportObjects":[{"_comment":"CPADMIN Schema","schema":"CPADMIN"}]},"caches":["DVxxxx","DVyyyy"]}
    3. Format and save the values.yaml file that is passed to cpd-cli as a dv-aux JSON object wrapped in single quotation marks:
      dv-aux:
        values: '{"objects":{"encryptionKey":"myEncryptionKey","includeStatistics":false,"exportObjects":[{"_comment":"CPADMIN
          Schema","schema":"CPADMIN"}]},"caches":["DVxxxx","DVyyyy"]}'
  2. Run the following command to pass the values.yaml values in cpd-cli.
    • Replace <cpd_import_name> with the name of the export to create. The name of the export in the following examples is myexport1.
    cpd-cli export-import export create <cpd_export_name> \
    --profile=cpadmin \
    --service-type dv \
    --instance-name ${service_instance_name}  \
    --namespace ${SERVICE_INSTANCE_NAMESPACE} \
    --values values.yaml \
    --backoff-limit 0
    Example:
    cpd-cli export-import export create myexport1 \
    --profile=cpadmin \
    --service-type dv \
    --instance-name ${service_instance_name}  \
    --namespace ${SERVICE_INSTANCE_NAMESPACE} \
    --values values.yaml \
    --backoff-limit 0
    The name of the export data prints.
    Note: To check the name of the exported data, use the export list command. See export-import export list.
  3. Check whether the export job succeeded, failed, or is still in active progress by running this command:
    cpd-cli export-import export status myexport1 \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --profile=${CPD_PROFILE_NAME} \
    --arch=${CPU_ARCH}
  4. Download the exported data from IBM Software Hub to a compressed TAR file in the current working directory:
    cpd-cli export-import export download myexport1 \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --profile=${CPD_PROFILE_NAME} \
    --arch=${CPU_ARCH}

Optional tasks

You can complete one of the following (optional) tasks before importing your Data Virtualization data:
Important: Consider these guidelines if you want to map your connections:
  • You can map an existing connection ID to a target instance in both tasks, but you do not need to complete both tasks.
  • If you run both tasks and no duplicates exist, then the mappings are combined.
  • If you run both tasks and duplicates exist, then the mappings in the control data take priority.

Importing Data Virtualization data

Note: The export must be completed successfully before you can run an import. Because only one import job is allowed at a time, you must always delete the completed import job to start a new one. See import delete.
Important:

To successfully import Data Virtualization artifacts, you must first import IBM Knowledge Catalog artifacts, and ensure that both imports occur within the same run context. The Data Virtualization import process only recognizes IBM Knowledge Catalog artifacts that are newly imported in the same run, and does not operate on artifacts processed in previous runs.

After the Data Virtualization import completes, it updates (or "claims") the newly imported IBM Knowledge Catalog artifacts, marking them as processed. In subsequent runs, processed artifacts are ignored during governance validation.

As a result, if you run the Data Virtualization import again, you must re-import the required IBM Knowledge Catalog artifacts beforehand; otherwise, the import fails with the ENSURE GOVERNANCE ARTIFACTS WERE IMPORTED FIRST error because Data Virtualization expects newly imported governance artifacts for each run.

If you want to bypass this validation, you must explicitly use the SKIP_GOVERNANCE=TRUE option. See SKIP_GOVERNANCE.

To import your Data Virtualization data, complete these steps.

  1. Conditional: If you want to import the export data into a different namespace or cluster, then repeat the PVC set up steps for each namespace or cluster.
  2. Upload the export data from a compressed TAR file by running this command.
    • Replace <export-import-file-name> with the name of the export file to upload (or use the new TAR file name if you modified the control data).
    cpd-cli export-import export upload \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --profile=${CPD_PROFILE_NAME} \
    --arch=${CPU_ARCH} \
    --file=<export-import-file-name>
    Example:
    cpd-cli export-import export upload \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --profile=${CPD_PROFILE_NAME} \
    --arch=${CPU_ARCH} \
    --file=cpd-exports-myexport1-20200301101735-data.tar
  3. In a new values.yaml file, specify the input values. Then format and save the file that is passed to the cpd-cli as a dv-aux JSON object wrapped in single quotation marks:
    • objectsOptions: Specify a comma-separated list of options for the object export. See the following Import options.
    Example values.yaml file that is formatted as a dv-aux JSON object:
    dv-aux:
        values: '{"objectsOptions":"OBJ_RETRY_ALL=True,GOV_PROCEED_ON_ERROR=True,CONN_IGNORE_DUPLICATES=True,OBJ_IGNORE_DUPLICATES=True"}'
    Optional: Specify the following options when you create an import job.
    Import options Description
    GOV_PROCEED_ON_ERROR Continue the import process even if unrecoverable errors occur in the process of updating imported IBM Knowledge Catalog catalog assets, projects, connections and deployment space assets. The errors are logged as exceptions in QueryplexAdmin.log, but it does not stop the importing process.

    Values:
    • PROCEED_ON_ERROR=True
    • PROCEED_ON_ERROR=False (default)
    SKIP_GOVERNANCE

    Data Virtualization automatically updates imported IBM Knowledge Catalog catalog assets, projects, connections and deployment space assets, and then associates them to your Data Virtualization instance, before importing Data Virtualization objects.

    Specify SKIP_GOVERNANCE=TRUE if you want to skip the update step, and then proceed to importing Data Virtualization objects.

    Specify SKIP_GOVERNANCE=ONLY_GOVERNANCE if you want to only update IBM Knowledge Catalog governance assets, and skip importing Data Virtualization objects.

    Values:
    • SKIP_GOVERNANCE=True
    • SKIP_GOVERNANCE=False (default)
    • SKIP_GOVERNANCE=ONLY_GOVERNANCE
    BACKUP_CONN_OWNER Specifies the backup connection owner as the new connection owner. This option is used in the case that the original owner of the connection on the original instance doesn't exist when connections are being re-created. The backup connection owner must exist and have the appropriate privileges. If the connection owner exists, then the import_status of the connection changes to Success.

    Values: Specify the backup IBM Software Hub user ID. For example, DB2INST1.

    CONN_IGNORE_DUPLICATES and OBJ_IGNORE_DUPLICATES Use CONN_IGNORE_DUPLICATES and OBJECT_IGNORE_DUPLICATES to ignore duplicate connections and objects during the import process, respectively.

    If you set CONN_IGNORE_DUPLICATES=True or OBJ_IGNORE_DUPLICATES=True, then duplicated items are marked with a SQLCODE=-601 duplication error, instead of Failure.

    Values:
    • CONN_IGNORE_DUPLICATES=True
    • CONN_IGNORE_DUPLICATES=False (default)
    • OBJ_IGNORE_DUPLICATES=True
    • OBJ_IGNORE_DUPLICATES=False (default)
    CONN_RETRY_ALL and OBJ_RETRY_ALL Use CONN_RETRY_ALL and OBJ_RETRY_ALL to retry importing all the connections and objects again, respectively.
    Note: Enabling these options ignores the result of the original import_status.

    Values:
    • CONN_RETRY_ALL=True
    • CONN_RETRY_ALL=False (default)
    • OBJ_RETRY_ALL=True
    • OBJ_RETRY_ALL=False (default)
  4. Run the following command to pass the values.yaml values in the cpd-cli similar to the following example.
    • Replace <cpd_import_name> with the name of the import to create. The name of the import in the following examples is myimport1.
    • Replace <export_data name> with the name of the export data from step 3.
    cpd-cli export-import import create <cpd_import_name> \
    --profile=cpadmin \
    --from-export <export_data_name> \
    --service-type dv \
    --instance-name ${service_instance_name}  \
    --namespace ${SERVICE_INSTANCE_NAMESPACE} \
    --values values.yaml \
    --backoff-limit 0
    Example command:
    cpd-cli export-import import create myimport1 \
    --profile=cpadmin \
    --from-export export-20240201221230 \
    --service-type dv \
    --instance-name ${service_instance_name}  \
    --namespace ${SERVICE_INSTANCE_NAMESPACE} \
    --values values.yaml \
    --backoff-limit 0
  5. Check whether the import job succeeded, failed, or is still in active progress by running this command:
    cpd-cli export-import import status myimport1 \
    --namespace=${PROJECT_CPD_INST_OPERANDS} \
    --profile=${CPD_PROFILE_NAME} \
    --arch=${CPU_ARCH} \