Importing the data to the Cloud Pak for Data system (IBM Knowledge Catalog)
Prerequisites
Remove assets from the catalog trash bin
- Login into the OpenShift® Container Platform cluster as an administrator.
- Install the
jqtool, which is a command line JSON processor that can process JSON output more efficiently. - Copy the following script and run it:Note: Before running the script, you need to set the CATALOG_ID to the target CATALOG_ID of the default catalog.
#!/bin/bash # set -x # Need to login to the cluster first get_Bearer_Token() { ROUTE=$(oc get ZenService lite-cr -o json |jq -r .status.url) BEARER_TOKEN=`curl -i -s -k -X POST "https://${ROUTE}/icp4d-api/v1/authorize" -H 'Accept: application/json' -H 'Content-Type: application/json' -d "{\"username\":\"${CP4D_USERNAME}\",\"api_key\":\"${CP4D_APIKEY}\"}" | grep '"token":' | grep -o '"token":\S*"' | awk -F , '{print$1}' | awk -F : '{print$2}' | tr -d '"'` } get_nextDeletedAssetsNumber() { NUM_SOFTDELETED_ASSET=`curl -sk -X "GET" \ "https://${ROUTE}:443/v2/trashed_assets?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \ -H "accept: application/json" \ -H "Authorization: Bearer ${BEARER_TOKEN}" | jq .total_rows` echo There are $NUM_SOFTDELETED_ASSET assets in the trashbin } purgeDeletedAssetsIDs() { TMPFILE=/tmp/trashedAssetIDs.txt curl -sk -X "GET" \ "https://${ROUTE}:443/v2/trashed_assets?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \ -H "accept: application/json" \ -H "Authorization: Bearer ${BEARER_TOKEN}" > ${TMPFILE} TRASHED_ASSETS=$(cat ${TMPFILE} |jq -r .resources[].metadata.asset_id) TRASHED_ASSETSIDS=($TRASHED_ASSETS) for AID in "${TRASHED_ASSETSIDS[@]}" do echo delete trashed asset ${AID} curl -sk -X "DELETE" \ "https://${ROUTE}:443/v2/trashed_assets/${AID}?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \ -H "accept: application/json" \ -H "Authorization: Bearer ${BEARER_TOKEN}" echo done } # set Catalog ID, admin user and apikey CATALOG_ID=<Set_the_target_Catalog_ID_here> CP4D_USERNAME=<Set_the_default_platform_administrator_user_here_cpadmin_or_admin)> CP4D_APIKEY=<Set_the_CP4D_API_key> get_Bearer_Token get_nextDeletedAssetsNumber while [ "$NUM_SOFTDELETED_ASSET" -ne "0" ] do purgeDeletedAssetsIDs get_nextDeletedAssetsNumber done echo To confirm trash bin is clean: get_nextDeletedAssetsNumber echo
catalog-api deployment using the following steps: - Find the current number of replicas by
running:
CATALOG_API_REPLICAS=`oc get deploy catalog-api -o=jsonpath='{.spec.replicas}'` - Scale down
catalog-apideployment:oc scale deploy catalog-api --replicas=0 - Scale the
catalog-apideployment back up:oc scale deploy catalog-api --replicas=$CATALOG_API_REPLICAS - Rerun the cleanup assets script until the removed assets in the catalog trash bin have all been removed.
Before you import the data, verify that you completed the steps in the section Apply configuration changes for Migration in the support document Migration from IBM InfoSphere Server to IBM Knowledge Catalog: Applying patches and toolkit to a new IBM Knowledge Catalog 4.8.x or 5.0.x installation (Part 2 of 2).
Importing the data
To import the data from your InfoSphere Information Server system, complete these steps:
- Log in to Cloud Pak for Data as a user with the Red Hat®
OpenShift Container Platform
adminrole. - Set the following environment variables and change to the
${TOOLKIT_PATH}directory:TOOLKIT_PATH=<path to the directory where migration-related files are stored)> CP4D_HOST=<CP4D host> CP4D_USERNAME=<default platform administrator user: cpadmin or admin> CP4D_PASSWORD=<CP4D password> CP4D_APIKEY=<CP4D API key> CRYPTO_KEY=<CRYPTO_KEY used during export (see Exporting the data)> CATALOG_ID=<ID of the catalog to which assets will be imported> EXPORT_INSTANCE_NAME=<name of the export instance> IMPORT_INSTANCE_NAME=<name of the import instance> NAMESPACE=<namespace in the target CP4D> PROFILE_NAME=<cpd-cli profile name> cd ${TOOLKIT_PATH}Important: TheCRYPTO_KEYvalue must be the value that you used for the export. Do not create a new one. - Decompress the .tar.gz file that you copied from the InfoSphere Information Server
system:
TAR_FILE_PATH=/tmp/cpd-exports-${EXPORT_INSTANCE_NAME}-*-data.tar gunzip ${TAR_FILE_PATH}.gz - Upload the resulting .tar file by using the following
command:
cpd-cli export-import export upload -f ${TAR_FILE_PATH} --profile=${PROFILE_NAME}The data is uploaded to the mounted PVC that was used when initializing the export-import utility (step Setting up the export-import utility of Preparing for migration in IBM Cloud Pak for Data) under the export instance name you used in the export command.
- Generate the
WKC_TOKENvalue:WKC_TOKEN=`curl -i -s -k -X POST "https://${CP4D_HOST}/icp4d-api/v1/authorize" -H 'Accept: application/json' -H 'Content-Type: application/json' -d "{\"username\":\"${CP4D_USERNAME}\",\"api_key\":\"${CP4D_APIKEY}\"}" | grep '"token":' | grep -o '"token":\S*"' | awk -F , '{print$1}' | awk -F : '{print$2}' | tr -d '"'` - To create an import_params.yaml file with the required information, copy
the following code snippet to your command prompt and run
it:
cat <<EOF > ${TOOLKIT_PATH}/import_params.yaml legacy-migration-aux: WKC_TOKEN: ${WKC_TOKEN} CRYPTO_KEY: ${CRYPTO_KEY} CATALOG_ID: ${CATALOG_ID} EOFYou can add any of the following optional parameters to the YAML as required:DUPLICATE_ACTION: <action to take if the import causes duplicates> CONTAINER_SUFFIX: <the suffix is added to the name of the project name by the migration> MEMBERS: <JSON containing the list of user groups need to be added to the catalog> JOB_MANAGER_THREADS: <Tunable parameter for import job> MAX_IN_PROGRESS_JOB_MESSAGES: <Tunable parameter for import job> CONTAINER_PROCESS_THREADS: <Tunable parameter for import job> JOB_MANAGER_SUBMIT_EXCHANGE: <Tunable parameter for import job> JOB_MANAGER_QUEUE_TTL: <Tunable parameter for import job> JOB_MANAGER_PICK_UP_TIMEOUT: <Tunable parameter for import job> JOB_MANAGER_STATUS_CHANGE_TIMEOUT: <Tunable parameter for import job> UPLOAD_FILE_SOCKET_TIMEOUT: <Tunable parameter for import job> FORCE_REHOME: trueSet the parameters as follows:DUPLICATE_ACTION- This parameter is used to decide what action to take if the import causes duplicates.
CONTAINER_SUFFIX- This parameter is used to add a suffix to the name of the project name for the migration.
MEMBERS- This parameter is a JSON array string containing a list of user groups and users to be added to the target catalog.
- Tunable parameters for import
- The following parameters are tunable:
JOB_MANAGER_THREADSMAX_IN_PROGRESS_JOB_MESSAGESCONTAINER_PROCESS_THREADSJOB_MANAGER_SUBMIT_EXCHANGEJOB_MANAGER_QUEUE_TTLJOB_MANAGER_PICK_UP_TIMEOUTJOB_MANAGER_STATUS_CHANGE_TIMEOUTUPLOAD_FILE_SOCKET_TIMEOUT
- FORCE_REHOME: true
- This parameter can be used to force rehoming of assets in the default catalog.
If you set up an OMAG cohorts as described in Configuring synchronization with external repositories, you must rehome assets inside the default catalog before importing the data that you exported from InfoSphere Information Server.
- Start the import of the InfoSphere Information Server data by running the following
command:
cpd-cli export-import import create -n ${NAMESPACE} --from-export ${EXPORT_INSTANCE_NAME} --profile=${PROFILE_NAME} ${IMPORT_INSTANCE_NAME} -f ${TOOLKIT_PATH}/import_params.yaml --backoff-limit=0Optionally, you can capture additional debug messages for diagnosing failures by adding the
--log-level=debugoption to the command. - You can periodically check the progress of the import in one of these ways:
- By checking the import status using a cpd-cli
command:
cpd-cli export-import import status -n ${NAMESPACE} --profile=${PROFILE_NAME} ${IMPORT_INSTANCE_NAME} - By looking at the status file in the PVC that was used when initializing the export-import
utility (using the cpd-cli export-import init command). You can access the import
status file by using the following
commands:
CPD_AUX_POD=`oc get pods -n ${NAMESPACE} -o custom-columns=POD:.metadata.name | grep cpd-aux` oc exec -it ${CPD_AUX_POD} -- bash -l -c "cat /data/cpd/data/exports/wkc/${EXPORT_INSTANCE_NAME}/20*/legacy-migration/import-status.json"
- By checking the import status using a cpd-cli
command: