Exporting data from the InfoSphere Information Server system
Exporting the data
To export your data, complete these steps:
- Log in to the InfoSphere Information Server
node as
root.Note: If InfoSphere Information Server is on Windows, log in to the standalone Red Hat® Enterprise Linux® machine asroot.Then, open a bash shell and change to thewkcuser:bash su wkc - Set environment variables for the following required parameters:
TOOLKIT_PATH=<path to the directory where the toolkit was extracted (see Preparing for migration in InfoSphere Information Server)> OCP_HOST=<ocp host> OCP_USERNAME=<ocp username> CP4D_HOST=<cp4d host> CP4D_USERNAME=<default platform administrator user: cpadmin or admin> CP4D_PASSWORD=<cp4d password> NAMESPACE=<namespace in the target CP4D> IIS_INSTALL_PATH=<IIS installation path> IIS_HOST=<IIS host> IIS_PORT=<IIS port> IIS_USERNAME=<IIS username> IIS_PASSWORD=<IIS password> EXPORT_INSTANCE_NAME=<name of the export instance> EXPORT_DATA_DIR=<path to the export data directory; the wkc user must have write permission to this directory> DQ_RULES_CONNECTION_NAME=<name of the connection for the database that stores data quality rules>IIS_INSTALL_PATH- The InfoSphere Information Server installation location.
If InfoSphere Information Server is installed in the default location, set the
IIS_INSTALL_PATHvariable to the value /opt/IBM/InformationServer. For InfoSphere Information Server on Windows, theIIS_INSTALL_PATHvariable must always be set to /opt/IBM/InformationServer. EXPORT_INSTANCE_NAME-
The export instance name can contain only lowercase alphanumeric characters and hyphens (-) and must start and end with an alphanumeric character. Otherwise, the export will fail.
DQ_RULES_CONNECTION_NAME- The name of the connection for the database that stores data quality rules. For more
information, see Preparing for migration in IBM Cloud Pak for Data. This setting is required only if you
have data quality rules to migrate.
If your system contains rules, you must define this parameter for each migrated rule to be able to view rule output records after running the rule in IBM® Knowledge Catalog. Without a connection, you will have access only to statistics, not to the actual rule output records. You can also define the output settings for rules after the migration. However, by setting this parameter during the migration process, you can avoid editing each rule after migration.
In addition, set environment variables for the following optional parameters as needed:DQ_RULES_SCHEMA_NAME=<data quality rules schema name> DQ_RULES_TABLE_NAME=<data quality rules table name> DQ_PERF_CONFIG='<DQ perf config json string>' IMAM_PERF_CONFIG='<IMAM perf config json string>' REPLACEMENT_CHARACTER=<replacement character>All these parameters are optional.
DQ_RULES_SCHEMA_NAME- This parameter is the name of the database schema where the data quality rules are stored.
DQ_RULES_TABLE_NAME- This parameter is the data quality rules table name.
DQ_PERF_CONFIG-
This parameter provides performance tuning information for the export of data quality projects. The value of the
DQ_PERF_CONFIGparameter can include any of these properties:catalog_ds_bulk_size- The number of data assets used to retrieve data classes in a single HTTP Request during export.
concurrency- The number of threads to be used during import job.
ds_bulk_size- The number of data assets retrieved in a single HTTP Request during export.
ds_properties_bulk_size- The number of data assets for which to retrieve properties in a single HTTP Request during export.
rule_history_count- The number of rule history entries to be exported.
socket_time_out_in_seconds- The HTTP Socket Timeout value, in seconds, used during export.
thread_count- The number of threads to be used during the export job.
migrate_sql_vt_as_sql_asset- By default, SQL virtual tables are migrated as SQL-based data assets. To migrate SQL virtual
tables as SQL-based data quality rules, set the parameter to false.
Default value: true
This option is available starting with the migration toolkit released in August 2024.
migrate_rules_bound_to_classical_vt_assets- By default, data rules with bindings to columns in virtual tables are not migrated. To migrate
such rules to definition-based data quality rules, set the parameter to true. For migrated rules,
you must reconfigure the bindings by using the IBM Knowledge Catalog API: Update data quality rule before you can run the rules.
Default value: false
This option is available starting with the migration toolkit released in February 2025.
DQ_PERF_CONFIGsettings:DQ_PERF_CONFIG: '{ "thread_count": 5,"rule_history_count": 10,"concurrency": 25,"socket_time_out_in_seconds": 1200,"ds_properties_bulk_size": 10 }' IMAM_PERF_CONFIG- This parameter provides performance tuning information for the export of metadata repository
content. The value of the
IMAM_PERF_CONFIGparameter can include any of these properties:dataasset_batch_size- The number of data assets to be processed at once during the export.
filefactory_cache_size_in_num_files- The number of files to be cached before the files are written to disk.
zipwriter_cache_size_in_bytes- The cache size in bytes that is used when .zip files are created.
json_file_creation_thread_count- The number of threads to be used for creating JSON files for the export.
profiling_results_thread_count- The number of threads to be used for exporting profiling results.
IMAM_PERF_CONFIGsettings:IMAM_PERF_CONFIG: '{"dataasset_batch_size": 100,"filefactory_cache_size_in_num_files": 1000,"zipwriter_cache_size_in_bytes": 10485760,"json_file_creation_thread_count": 10,"profiling_results_thread_count": 5}' - REPLACEMENT_CHARACTER
- This parameter specifies the character to be used to replace any nonprintable characters in asset names, descriptions, and tags. You can specify any printable character including special characters as replacement character. The default replacement character is a space.
- Set the path to the toolkit
directory.
export PATH=${TOOLKIT_PATH}/jdk-17.0.9+9/bin:${TOOLKIT_PATH}:$PATH - Change to the directory where the toolkit content is
stored.
cd ${TOOLKIT_PATH} - Certain Cloud Pak for Data information
is required as input to the export command.
If the InfoSphere Information Server engine tier and Cloud Pak for Data system can communicate over port 443, you can retrieve these parameter values by running the following command:
${TOOLKIT_PATH}/migration/iis_scripts/get_export_iis_params.sh -url https://${CP4D_HOST} -u ${CP4D_USERNAME} -p ${CP4D_PASSWORD} -rcn ${DQ_RULES_CONNECTION_NAME}This script creates a .zip file in the /tmp directory on the InfoSphere Information Server system with the following content:- user_mappings.json
This file maps InfoSphere Information Server users to Cloud Pak for Data users.
- group_mappings.json
This file maps InfoSphere Information Server user groups to Cloud Pak for Data user groups.
- dq_rules_connection.json
This file provides connection information for the database to which data quality rules will be imported.
- wkc_glossary_repository_id.txt
This file provides the repository IDs of the Cloud Pak for Data governance artifacts repository.
If the InfoSphere Information Server engine tier and Cloud Pak for Data system cannot communicate over port 443, you must manually create and transfer that information. On the Cloud Pak for Data system, complete the following steps as
adminuser:- Copy the ${TOOLKIT_PATH}/migration/iis_scripts/get_export_iis_params.sh file to the Cloud Pak for Data system.
- Change to the directory to which you copied the file.
- Run the following command to create a .zip file with the required information in the
/tmp
directory:
./get_export_iis_params.sh -url https://${CP4D_HOST} -u ${CP4D_USERNAME} -p ${CP4D_PASSWORD} -rcn ${DQ_RULES_CONNECTION_NAME} - Transfer the generated .zip file from the Cloud Pak for Data system to the InfoSphere Information Server system.
- user_mappings.json
- Set the following environment variable to set the path to the .zip file that you
generated in the previous
step.
CP4D_DETAILS_FILE_PATH=<path-to-generated-cp4d-details-file> - Generate the
CRYPTO_KEYvalue.- On Linux:
CRYPTO_KEY=`uuidgen` - On AIX:
CRYPTO_KEY=`uuid_get`
Important: The value for thecrypto_keyparameter needs to be saved properly because the same key must be used during the import. - On Linux:
- Run the following script to trigger the export. Before running the script, remove
any optional parameters that are not
needed.
${TOOLKIT_PATH}/migration/iis_scripts/trigger_export_iis.sh -host ${IIS_HOST} -port ${IIS_PORT} -user ${IIS_USERNAME} -password ${IIS_PASSWORD} -export_instance_name ${EXPORT_INSTANCE_NAME} -dir ${EXPORT_DATA_DIR} -crypto_key ${CRYPTO_KEY} -namespace ${NAMESPACE} -params_from_cp4d_zip ${CP4D_DETAILS_FILE_PATH} -inspect false -dq_rules_schema_name ${DQ_RULES_SCHEMA_NAME} -dq_rules_table_name ${DQ_RULES_TABLE_NAME} -dq_perf_config "${DQ_PERF_CONFIG}" -iis_install_path ${IIS_INSTALL_PATH} -replacement_character ${REPLACEMENT_CHARACTER}You can optionally create tracing information. This information is also written to the ${EXPORT_DATA_DIR}/${EXPORT_INSTANCE_NAME}/2*/legacy-migration/export.log file. Run the command with the
-log-levelparameter. Set the parameter value toinfoordebug. - You can periodically check the progress of the export by looking at the
status file in the
${EXPORTED_DATA_DIR}target directory in the following path:cat ${EXPORT_DATA_DIR}/${EXPORT_INSTANCE_NAME}/2*/legacy-migration/export-status.jsonAfter the export is complete, the export-status.json file contains astatus: succeededmessage. An example of the status message:{"status":"succeeded","startTime":1700040848,"completionTime":1700041206,"message":"export is successfully completed.","percentageCompleted":100}
A .tar.gz file with the export instance name is created in the folder that
you specified with the EXPORT_DATA_DIR parameter. The file name has the format
cpd-exports-<export_instance_name>-<timestamp>-data.tar.gz.
If the export was successful, you can transfer the exported data to the Cloud Pak for Data system for importing it there.
Transferring the exported data
- Set the following environment
variable:
TAR_GZ_FILE_PATH=${EXPORT_DATA_DIR}/cpd-exports-${EXPORT_INSTANCE_NAME}-*-data.tar.gz - Copy the data by running the following
command:
scp ${TAR_GZ_FILE_PATH} ${OCP_USERNAME}@${OCP_HOST}:/tmp
What to do next
Import the data to your Cloud Pak for Data system.