Exporting data from the InfoSphere Information Server system

On your source system, export the data that you want to migrate to Cloud Pak for Data and copy the data to the target system.

Exporting the data

To export your data, complete these steps:

  1. Log in to the InfoSphere Information Server node as root.
    Note: If InfoSphere Information Server is on Windows, log in to the standalone Red Hat® Enterprise Linux® machine as root.
    Then, open a bash shell and change to the wkc user:
    bash
    su wkc
  2. Set environment variables for the following required parameters:
    TOOLKIT_PATH=<path to the directory where the toolkit was extracted (see Preparing for migration in InfoSphere Information Server)>
    OCP_HOST=<ocp host> 
    OCP_USERNAME=<ocp username> 
    CP4D_HOST=<cp4d host> 
    CP4D_USERNAME=<default platform administrator user: cpadmin or admin>
    CP4D_PASSWORD=<cp4d password>
    NAMESPACE=<namespace in the target CP4D>
    IIS_INSTALL_PATH=<IIS installation path>
    IIS_HOST=<IIS host>
    IIS_PORT=<IIS port>
    IIS_USERNAME=<IIS username>
    IIS_PASSWORD=<IIS password>
    EXPORT_INSTANCE_NAME=<name of the export instance>
    EXPORT_DATA_DIR=<path to the export data directory; the wkc user must have write permission to this directory>
    DQ_RULES_CONNECTION_NAME=<name of the connection for the database that stores data quality rules>
    IIS_INSTALL_PATH
    The InfoSphere Information Server installation location.

    If InfoSphere Information Server is installed in the default location, set the IIS_INSTALL_PATH variable to the value /opt/IBM/InformationServer. For InfoSphere Information Server on Windows, the IIS_INSTALL_PATH variable must always be set to /opt/IBM/InformationServer.

    EXPORT_INSTANCE_NAME

    The export instance name can contain only lowercase alphanumeric characters and hyphens (-) and must start and end with an alphanumeric character. Otherwise, the export will fail.

    DQ_RULES_CONNECTION_NAME
    The name of the connection for the database that stores data quality rules. For more information, see Preparing for migration in IBM Cloud Pak for Data. This setting is required only if you have data quality rules to migrate.

    If your system contains rules, you must define this parameter for each migrated rule to be able to view rule output records after running the rule in IBM® Knowledge Catalog. Without a connection, you will have access only to statistics, not to the actual rule output records. You can also define the output settings for rules after the migration. However, by setting this parameter during the migration process, you can avoid editing each rule after migration.

    In addition, set environment variables for the following optional parameters as needed:
    DQ_RULES_SCHEMA_NAME=<data quality rules schema name>
    DQ_RULES_TABLE_NAME=<data quality rules table name>
    DQ_PERF_CONFIG='<DQ perf config json string>'
    IMAM_PERF_CONFIG='<IMAM perf config json string>'
    REPLACEMENT_CHARACTER=<replacement character>

    All these parameters are optional.

    DQ_RULES_SCHEMA_NAME
    This parameter is the name of the database schema where the data quality rules are stored.
    DQ_RULES_TABLE_NAME
    This parameter is the data quality rules table name.
    DQ_PERF_CONFIG
    This parameter provides performance tuning information for the export of data quality projects. The value of the DQ_PERF_CONFIG parameter can include any of these properties:
    catalog_ds_bulk_size
    The number of data assets used to retrieve data classes in a single HTTP Request during export.
    Default value: 100
    Minimum value: 1
    Maximum value: 200
    concurrency
    The number of threads to be used during import job.
    Default value: 20
    Minimum value: 1
    Maximum value: 50
    ds_bulk_size
    The number of data assets retrieved in a single HTTP Request during export.
    Default value: 200
    Minimum value: 1
    Maximum value: 500
    ds_properties_bulk_size
    The number of data assets for which to retrieve properties in a single HTTP Request during export.
    Default value: 20
    Minimum value: 1
    Maximum value: 50
    rule_history_count
    The number of rule history entries to be exported.
    Default value: 20
    Minimum value: 1
    Maximum value: 100
    socket_time_out_in_seconds
    The HTTP Socket Timeout value, in seconds, used during export.
    Default value: 600
    Minimum value: 30
    Maximum value: 3600
    thread_count
    The number of threads to be used during the export job.
    Default value: 10
    Minimum value: 1
    Maximum value: 20
    migrate_sql_vt_as_sql_asset
    By default, SQL virtual tables are migrated as SQL-based data assets. To migrate SQL virtual tables as SQL-based data quality rules, set the parameter to false.

    Default value: true

    This option is available starting with the migration toolkit released in August 2024.

    migrate_rules_bound_to_classical_vt_assets
    By default, data rules with bindings to columns in virtual tables are not migrated. To migrate such rules to definition-based data quality rules, set the parameter to true. For migrated rules, you must reconfigure the bindings by using the IBM Knowledge Catalog API: Update data quality rule before you can run the rules.

    Default value: false

    This option is available starting with the migration toolkit released in February 2025.

    Sample DQ_PERF_CONFIG settings:
    DQ_PERF_CONFIG: '{ "thread_count": 5,"rule_history_count": 10,"concurrency": 25,"socket_time_out_in_seconds": 1200,"ds_properties_bulk_size": 10 }'
    IMAM_PERF_CONFIG
    This parameter provides performance tuning information for the export of metadata repository content. The value of the IMAM_PERF_CONFIG parameter can include any of these properties:
    dataasset_batch_size
    The number of data assets to be processed at once during the export.
    Default value: 100
    Minimum value: 1
    Maximum value: 1000
    filefactory_cache_size_in_num_files
    The number of files to be cached before the files are written to disk.
    Default value: 1000
    Minimum value: 1
    Maximum value: 1000
    zipwriter_cache_size_in_bytes
    The cache size in bytes that is used when .zip files are created.
    Default value: 10485760
    Minimum value: 1048576
    Maximum value: 104857600
    json_file_creation_thread_count
    The number of threads to be used for creating JSON files for the export.
    Default value: 10
    Minimum value: 1
    Maximum value: 100
    profiling_results_thread_count
    The number of threads to be used for exporting profiling results.
    Default value: 5
    Minimum value: 1
    Maximum value: 100
    Sample IMAM_PERF_CONFIG settings:
    IMAM_PERF_CONFIG: '{"dataasset_batch_size": 100,"filefactory_cache_size_in_num_files": 1000,"zipwriter_cache_size_in_bytes": 10485760,"json_file_creation_thread_count": 10,"profiling_results_thread_count": 5}'
    REPLACEMENT_CHARACTER
    This parameter specifies the character to be used to replace any nonprintable characters in asset names, descriptions, and tags. You can specify any printable character including special characters as replacement character. The default replacement character is a space.
  3. Set the path to the toolkit directory.
    export PATH=${TOOLKIT_PATH}/jdk-17.0.9+9/bin:${TOOLKIT_PATH}:$PATH
  4. Change to the directory where the toolkit content is stored.
    cd ${TOOLKIT_PATH}
  5. Certain Cloud Pak for Data information is required as input to the export command.

    If the InfoSphere Information Server engine tier and Cloud Pak for Data system can communicate over port 443, you can retrieve these parameter values by running the following command:

    ${TOOLKIT_PATH}/migration/iis_scripts/get_export_iis_params.sh -url https://${CP4D_HOST} -u ${CP4D_USERNAME} -p ${CP4D_PASSWORD} -rcn ${DQ_RULES_CONNECTION_NAME}
    This script creates a .zip file in the /tmp directory on the InfoSphere Information Server system with the following content:
    • user_mappings.json

      This file maps InfoSphere Information Server users to Cloud Pak for Data users.

    • group_mappings.json

      This file maps InfoSphere Information Server user groups to Cloud Pak for Data user groups.

    • dq_rules_connection.json

      This file provides connection information for the database to which data quality rules will be imported.

    • wkc_glossary_repository_id.txt

      This file provides the repository IDs of the Cloud Pak for Data governance artifacts repository.

    If the InfoSphere Information Server engine tier and Cloud Pak for Data system cannot communicate over port 443, you must manually create and transfer that information. On the Cloud Pak for Data system, complete the following steps as admin user:

    1. Copy the ${TOOLKIT_PATH}/migration/iis_scripts/get_export_iis_params.sh file to the Cloud Pak for Data system.
    2. Change to the directory to which you copied the file.
    3. Run the following command to create a .zip file with the required information in the /tmp directory:
      ./get_export_iis_params.sh -url https://${CP4D_HOST} -u ${CP4D_USERNAME} -p ${CP4D_PASSWORD} -rcn ${DQ_RULES_CONNECTION_NAME}
    4. Transfer the generated .zip file from the Cloud Pak for Data system to the InfoSphere Information Server system.
  6. Set the following environment variable to set the path to the .zip file that you generated in the previous step.
    CP4D_DETAILS_FILE_PATH=<path-to-generated-cp4d-details-file>
  7. Generate the CRYPTO_KEY value.
    • On Linux:
      CRYPTO_KEY=`uuidgen`
    • On AIX:
      CRYPTO_KEY=`uuid_get`
    Important: The value for the crypto_key parameter needs to be saved properly because the same key must be used during the import.
  8. Run the following script to trigger the export. Before running the script, remove any optional parameters that are not needed.
    ${TOOLKIT_PATH}/migration/iis_scripts/trigger_export_iis.sh -host ${IIS_HOST} -port ${IIS_PORT} -user ${IIS_USERNAME} -password ${IIS_PASSWORD} -export_instance_name ${EXPORT_INSTANCE_NAME} -dir ${EXPORT_DATA_DIR} -crypto_key ${CRYPTO_KEY} -namespace ${NAMESPACE} -params_from_cp4d_zip ${CP4D_DETAILS_FILE_PATH} -inspect false -dq_rules_schema_name ${DQ_RULES_SCHEMA_NAME} -dq_rules_table_name ${DQ_RULES_TABLE_NAME} -dq_perf_config "${DQ_PERF_CONFIG}" -iis_install_path ${IIS_INSTALL_PATH} -replacement_character ${REPLACEMENT_CHARACTER}

    You can optionally create tracing information. This information is also written to the ${EXPORT_DATA_DIR}/${EXPORT_INSTANCE_NAME}/2*/legacy-migration/export.log file. Run the command with the -log-level parameter. Set the parameter value to info or debug.

  9. You can periodically check the progress of the export by looking at the status file in the ${EXPORTED_DATA_DIR} target directory in the following path:
    cat ${EXPORT_DATA_DIR}/${EXPORT_INSTANCE_NAME}/2*/legacy-migration/export-status.json
    After the export is complete, the export-status.json file contains a status: succeeded message. An example of the status message:
    {"status":"succeeded","startTime":1700040848,"completionTime":1700041206,"message":"export is successfully completed.","percentageCompleted":100}

A .tar.gz file with the export instance name is created in the folder that you specified with the EXPORT_DATA_DIR parameter. The file name has the format cpd-exports-<export_instance_name>-<timestamp>-data.tar.gz.

If the export was successful, you can transfer the exported data to the Cloud Pak for Data system for importing it there.

Transferring the exported data

Copy the exported data to the target Cloud Pak for Data system.
  1. Set the following environment variable:
    TAR_GZ_FILE_PATH=${EXPORT_DATA_DIR}/cpd-exports-${EXPORT_INSTANCE_NAME}-*-data.tar.gz
  2. Copy the data by running the following command:
    scp ${TAR_GZ_FILE_PATH} ${OCP_USERNAME}@${OCP_HOST}:/tmp

What to do next

Import the data to your Cloud Pak for Data system.