FKEY migration script

The FKEY migration script is a fix to avoid potential rare collisions in the FKEY file identifier when ingesting billions of records.

About this task

The FKEY migration script fix consists of a script that needs to be run in the Db2 terminal of Data Cataloging.

Procedure

  1. Run the following command to do SSH to the c-Db2u-0 pod.
    oc -n ibm-data-cataloging rsh c-isd-db2u-0
          su - db2inst1
  2. Create a file named FKEY_Updater.sh with the FKEY migration script.
    The script iterates over a list of data sources, fixes the FKEY in all the files, and ingests them in each data source of the previous Data Cataloging service versions.
          # ========================================================
          # How to invoke
          # Send the list of datasources to fix as follows:
          # ./FKEY_Updater.sh datasource1 datasource2 datasource3
          # ========================================================
    
          start_date=`date +%s.%N`
          datasourcearray=( "$@" )
    
          echo "Connecting to database..."
          db2 connect to bludb
          echo "Connected to BLUDB."
          printf '\n'
          echo "List of datasources to update:"
          printf ' - %s\n' "${datasourcearray[@]}"
          printf '\n'
    
          for datasource in "${datasourcearray[@]}"
          do
            echo "Starting update procedures for datasource '${datasource}'..."
            printf '\n'
            echo "Updating table ACESMAPLOADBASE..."
            db2 "update bluadmin.acesmaploadbase amlb set amlb.fkey=(mo.cluster || '_' || mo.datasource || '_' || mo.inode) from bluadmin.metaocean mo where amlb.fkey=mo.fkey and mo.datasource='${datasource}';"
            echo "ACESMAPLOADBASE table updated successfully."
            printf '\n'
            echo "Updating table ACOGMAPLOADBASE..."
            db2 "update bluadmin.acogmaploadbase acmlb set acmlb.fkey=(mo.cluster || '_' || mo.datasource || '_' || mo.inode) from bluadmin.metaocean mo where acmlb.fkey=mo.fkey and mo.datasource='${datasource}';"
            echo "ACOGMAPLOADBASE table updated successfully."
            printf '\n'
            echo "Updating table ACESMAP (This action could take several minutes)..."
            db2 "update bluadmin.acesmap am set am.fkey=(mo.cluster || '_' || mo.datasource || '_' || mo.inode) from bluadmin.metaocean mo where am.fkey=mo.fkey and mo.datasource='${datasource}';"
            echo "ACESMAP table updated successfully."
            printf '\n'
            echo "Updating table ACOGMAP (This action could take several minutes)..."
            db2 "update bluadmin.acogmap acm set acm.fkey=(mo.cluster || '_' || mo.datasource || '_' || mo.inode) from bluadmin.metaocean mo where acm.fkey=mo.fkey and mo.datasource='${datasource}';"
            echo "ACOGMAP table updated successfully."
            printf '\n'
            echo "Updating table METAOCEAN (This action could take several minutes)..."
            db2 "update bluadmin.metaocean mo set mo.fkey=(mo.cluster || '_' || mo.datasource || '_' || mo.inode) where not REGEXP_LIKE(mo.fkey, mo.cluster || '_' || mo.datasource || '_' || mo.inode) and mo.datasource='${datasource}';"
            echo "METAOCEAN table updated successfully."
            printf '\n'
            echo "Updates on datasource '${datasource}' done successfully."
            printf '\n'
          done
          printf '\n'
          end_date=`date +%s.%N`
          runtime=$(echo "$end_date - $start_date" | bc -l)
    
          echo "Execution time was $runtime seconds."
  3. Run the script.
    ./FKEY_Updater.sh datasource1 datasource2 datasource3

    The final execution time of the script depends on the number of files that are ingested in the database.