Importing the data to the Cloud Pak for Data system (IBM Knowledge Catalog)

When your data is copied to the target system, import it so that you can use it.

Prerequisites

Remove assets from the catalog trash bin

You need to remove all the assets from the catalog trash bin before you can rehome the assets inside the default catalog:
  1. Login into the OpenShift® Container Platform cluster as an administrator.
  2. Install the jq tool, which is a command line JSON processor that can process JSON output more efficiently.
  3. Copy the following script and run it:
    Note: Before running the script, you need to set the CATALOG_ID to the target CATALOG_ID of the default catalog.
    #!/bin/bash
    
    # set -x
    
    # Need to login to the cluster first
    get_Bearer_Token()
    {
        ROUTE=$(oc get ZenService lite-cr -o json |jq -r .status.url)
        BEARER_TOKEN=`curl -i -s -k -X POST "https://${ROUTE}/icp4d-api/v1/authorize" -H 'Accept: application/json' -H 'Content-Type: application/json' -d "{\"username\":\"${CP4D_USERNAME}\",\"api_key\":\"${CP4D_APIKEY}\"}" | grep '"token":' | grep -o '"token":\S*"' | awk -F , '{print$1}' | awk -F : '{print$2}' | tr -d '"'`
    }
    
    get_nextDeletedAssetsNumber()
    {
        NUM_SOFTDELETED_ASSET=`curl -sk -X "GET" \
          "https://${ROUTE}:443/v2/trashed_assets?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \
          -H "accept: application/json" \
          -H "Authorization: Bearer ${BEARER_TOKEN}" | jq .total_rows`
        
        echo There are $NUM_SOFTDELETED_ASSET assets in the trashbin
    }
    
    purgeDeletedAssetsIDs()
    {
        TMPFILE=/tmp/trashedAssetIDs.txt
    
        curl -sk -X "GET" \
          "https://${ROUTE}:443/v2/trashed_assets?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \
          -H "accept: application/json" \
          -H "Authorization: Bearer ${BEARER_TOKEN}" > ${TMPFILE}
    
        TRASHED_ASSETS=$(cat ${TMPFILE} |jq -r .resources[].metadata.asset_id)
        TRASHED_ASSETSIDS=($TRASHED_ASSETS)
    
        for AID in "${TRASHED_ASSETSIDS[@]}"
        do
          echo delete trashed asset ${AID}
          curl -sk -X "DELETE" \
            "https://${ROUTE}:443/v2/trashed_assets/${AID}?catalog_id=${CATALOG_ID}&hide_deprecated_response_fields=false" \
            -H "accept: application/json" \
            -H "Authorization: Bearer ${BEARER_TOKEN}"
          echo
        done
    }
    
    
    # set Catalog ID, admin user and apikey
    CATALOG_ID=<Set_the_target_Catalog_ID_here>
    CP4D_USERNAME=<Set_the_default_platform_administrator_user_here_cpadmin_or_admin)>
    CP4D_APIKEY=<Set_the_CP4D_API_key>
    
    get_Bearer_Token
    
    get_nextDeletedAssetsNumber
    
    while [ "$NUM_SOFTDELETED_ASSET" -ne "0" ]
    do
        purgeDeletedAssetsIDs
        
        get_nextDeletedAssetsNumber
    
    done
    
    echo To confirm trash bin is clean:
    get_nextDeletedAssetsNumber
    echo
If some assets are not removed from the catalog trash bin after running the script, you will need to scale down the catalog-api deployment using the following steps:
  1. Find the current number of replicas by running:
    CATALOG_API_REPLICAS=`oc get deploy catalog-api -o=jsonpath='{.spec.replicas}'`
  2. Scale down catalog-api deployment:
    oc scale deploy catalog-api --replicas=0
  3. Scale the catalog-api deployment back up:
    oc scale deploy catalog-api --replicas=$CATALOG_API_REPLICAS
  4. Rerun the cleanup assets script until the removed assets in the catalog trash bin have all been removed.

Before you import the data, verify that you completed the steps in the section Apply configuration changes for Migration in the support document Migration from IBM InfoSphere Server to IBM Knowledge Catalog: Applying patches and toolkit to a new IBM Knowledge Catalog 4.8.x or 5.0.x installation (Part 2 of 2).

Importing the data

To import the data from your InfoSphere Information Server system, complete these steps:

  1. Log in to Cloud Pak for Data as a user with the Red Hat® OpenShift Container Platform admin role.
  2. Set the following environment variables and change to the ${TOOLKIT_PATH} directory:
    TOOLKIT_PATH=<path to the directory where migration-related files are stored)>
    CP4D_HOST=<CP4D host>
    CP4D_USERNAME=<default platform administrator user: cpadmin or admin>
    CP4D_PASSWORD=<CP4D password>
    CP4D_APIKEY=<CP4D API key>
    CRYPTO_KEY=<CRYPTO_KEY used during export (see Exporting the data)>
    CATALOG_ID=<ID of the catalog to which assets will be imported>
    EXPORT_INSTANCE_NAME=<name of the export instance>
    IMPORT_INSTANCE_NAME=<name of the import instance>
    NAMESPACE=<namespace in the target CP4D>
    PROFILE_NAME=<cpd-cli profile name>
    cd ${TOOLKIT_PATH}
    Important: The CRYPTO_KEY value must be the value that you used for the export. Do not create a new one.
  3. Decompress the .tar.gz file that you copied from the InfoSphere Information Server system:
    TAR_FILE_PATH=/tmp/cpd-exports-${EXPORT_INSTANCE_NAME}-*-data.tar
    gunzip ${TAR_FILE_PATH}.gz
  4. Upload the resulting .tar file by using the following command:
    cpd-cli export-import export upload -f ${TAR_FILE_PATH} --profile=${PROFILE_NAME}

    The data is uploaded to the mounted PVC that was used when initializing the export-import utility (step Setting up the export-import utility of Preparing for migration in IBM Cloud Pak for Data) under the export instance name you used in the export command.

  5. Generate the WKC_TOKEN value:
    WKC_TOKEN=`curl -i -s -k -X POST "https://${CP4D_HOST}/icp4d-api/v1/authorize" -H 'Accept: application/json' -H 'Content-Type: application/json' -d "{\"username\":\"${CP4D_USERNAME}\",\"api_key\":\"${CP4D_APIKEY}\"}" | grep '"token":' | grep -o '"token":\S*"' | awk -F , '{print$1}' | awk -F : '{print$2}' | tr -d '"'`
  6. To create an import_params.yaml file with the required information, copy the following code snippet to your command prompt and run it:
    cat <<EOF > ${TOOLKIT_PATH}/import_params.yaml
    legacy-migration-aux:
     WKC_TOKEN: ${WKC_TOKEN}
     CRYPTO_KEY: ${CRYPTO_KEY}
     CATALOG_ID: ${CATALOG_ID}
    EOF
    You can add any of the following optional parameters to the YAML as required:
    DUPLICATE_ACTION: <action to take if the import causes duplicates>                              
    CONTAINER_SUFFIX: <the suffix is added to the name of the project name by the migration>
    MEMBERS: <JSON containing the list of user groups need to be added to the catalog>       
    JOB_MANAGER_THREADS: <Tunable parameter for import job>
    MAX_IN_PROGRESS_JOB_MESSAGES: <Tunable parameter for import job>
    CONTAINER_PROCESS_THREADS: <Tunable parameter for import job>
    JOB_MANAGER_SUBMIT_EXCHANGE: <Tunable parameter for import job>
    JOB_MANAGER_QUEUE_TTL: <Tunable parameter for import job>
    JOB_MANAGER_PICK_UP_TIMEOUT: <Tunable parameter for import job>
    JOB_MANAGER_STATUS_CHANGE_TIMEOUT: <Tunable parameter for import job>
    UPLOAD_FILE_SOCKET_TIMEOUT: <Tunable parameter for import job>
    FORCE_REHOME: true
    Set the parameters as follows:
    DUPLICATE_ACTION
    This parameter is used to decide what action to take if the import causes duplicates.
    The values you can select include:
    • IGNORE: use this option if you want to have duplicates
    • REJECT: use this option if you do not want any duplicates which means the original asset will be kept
    • UPDATE: use this option to add new values from the import to the original asset
    • REPLACE: use this option to overwrite the entire asset with the new version from the import
    If you omit the DUPLICATE_ACTION parameter, the target catalog's setting for handling duplicate assets is used.
    CONTAINER_SUFFIX
    This parameter is used to add a suffix to the name of the project name for the migration.
    If you omit the CONTAINER_SUFFIX parameter, the default value -migration is used.
    If you do not want any suffix added to the migrated project names, specify no-suffix as the value for this parameter.
    MEMBERS
    This parameter is a JSON array string containing a list of user groups and users to be added to the target catalog.
    The format of the JSON array string for user groups:
    '[  { "access_group_id": "<group_id>", "roles": [ "<role>"] }, ... ]'
    The format of the JSON array string for users:
    '[  { "user_iam_id": "<user_id>", "roles": [ "<role>" ] }  ]'
    Where <roles> can take the following values: OWNER, EDITOR, VIEWER.
    The following is an example JSON array string for a user group:
    MEMBERS: '[ { "access_group_id": "10001", "roles": [ "OWNER"] }, { "access_group_id": "10003", "roles": [ "EDITOR" ] }, { "access_group_id": "10002", "roles": [ "VIEWER" ] } ]'
    Tunable parameters for import
    The following parameters are tunable:
    • JOB_MANAGER_THREADS
    • MAX_IN_PROGRESS_JOB_MESSAGES
    • CONTAINER_PROCESS_THREADS
    • JOB_MANAGER_SUBMIT_EXCHANGE
    • JOB_MANAGER_QUEUE_TTL
    • JOB_MANAGER_PICK_UP_TIMEOUT
    • JOB_MANAGER_STATUS_CHANGE_TIMEOUT
    • UPLOAD_FILE_SOCKET_TIMEOUT
    FORCE_REHOME: true
    This parameter can be used to force rehoming of assets in the default catalog.

    If you set up an OMAG cohorts as described in Configuring synchronization with external repositories, you must rehome assets inside the default catalog before importing the data that you exported from InfoSphere Information Server.

  7. Start the import of the InfoSphere Information Server data by running the following command:
    cpd-cli export-import import create -n ${NAMESPACE} --from-export ${EXPORT_INSTANCE_NAME} --profile=${PROFILE_NAME} ${IMPORT_INSTANCE_NAME} -f ${TOOLKIT_PATH}/import_params.yaml --backoff-limit=0

    Optionally, you can capture additional debug messages for diagnosing failures by adding the --log-level=debug option to the command.

  8. You can periodically check the progress of the import in one of these ways:
    • By checking the import status using a cpd-cli command:
      cpd-cli export-import import status -n ${NAMESPACE} --profile=${PROFILE_NAME} ${IMPORT_INSTANCE_NAME}
    • By looking at the status file in the PVC that was used when initializing the export-import utility (using the cpd-cli export-import init command). You can access the import status file by using the following commands:
      CPD_AUX_POD=`oc get pods -n ${NAMESPACE} -o custom-columns=POD:.metadata.name | grep cpd-aux`
      oc exec -it ${CPD_AUX_POD} -- bash -l -c "cat /data/cpd/data/exports/wkc/${EXPORT_INSTANCE_NAME}/20*/legacy-migration/import-status.json"