Prerequisites to migrating data (Watson Knowledge Catalog)
Before you migrate data from Information Server to Cloud Pak for Data, you must complete several prerequisite steps.
- Optional: Stop synchronization of information assets to the default catalog
- Optional: Disable automatic profiling of data assets
- Make sure the default catalog in Cloud Pak for Data does not contain user data
- Delete predefined data classes
- Install CLI for Red Hat OpenShift
- Configure Redis settings
- Increase available resources for services in Cloud Pak for Data
- Increase the size of the Db2 secondary log
- Configure IOPS settings for the NFS server
- Configure the timeout values for importing data
- Create users in the target Cloud Pak for Data system
- Install native connectors
Optional: Stop synchronization of information assets to the default catalog
Stop the synchronization of information assets only when you are importing large volumes of data. In the synchronization process, information assets are synchronized within the Watson™ Knowledge Catalog repository services (Xmeta and CAMS). If you migrate a large amount of data, the synchronization process might take a significant amount of time and slow down the overall migration process. You can optionally stop the synchronization by deleting the default catalog or the catalog that you configured for sharing assets. After the migration is finished, you can resume the synchronization by recreating the catalog.
- In Cloud Pak for Data, go to .
- Open the Catalog Setup tab and check which catalog is configured for sharing assets with Information Governance Catalog. It is usually Default Catalog.
- Go to and find this catalog.
- From the menu, select Delete.
Optional: Disable automatic profiling of data assets
When a data asset is added to a catalog, it is automatically profiled to get additional metadata. During data migration, the volume of data added to the catalog is large. You can temporarily disable automatic profiling to speed up the migration process and later enable it again.
- In Cloud Pak for Data, go to .
- Open the Catalog Setup tab and check which catalog is configured for sharing assets with Information Governance Catalog. It is usually Default Catalog.
- On the Overview tab, find this catalog and open it.
- Go to the Settings tab and clear the option Automatically create profiles for data assets. If the option is disabled, enable and disable it again to make sure it is disabled.
Make sure the default catalog in Cloud Pak for Data does not contain user data
To prevent the creation of duplicates, the target default catalog where the data will be migrated to cannot contain any user-defined data.
- Log in to the wdp-db2
pod:
./oc exec -it c-db2oltp-wkc-db2u-0 /bin/bash
- Run the following commands:
su - db2inst1 db2 connect to BGDB db2 "set schema bg" db2 "drop table \"flyway_schema_history\"" db2 "update GLOSSARY_STORAGE_VERSION set version = '0.0'" db2 "delete from SCHEMAVERSION"
- Restart the wkc-glossary-service pod. For
example:
oc delete wkc-glossary-service-849fdd8cd7-6nq52
Delete predefined data classes
If you have any predefined data classes in your target Cloud Pak for Data environment, remove them. When you import data classes from Information Server, these predefined data classes are imported as well. It is especially important when you modified predefined data classes in your source environment.
Install CLI for Red Hat OpenShift
If you don’t have the OpenShift Container Platform CLI , you must install it to be able to run various commands needed to complete the migration process. For information about installing the CLI, see the instructions in the OpenShift Container Platform documentation.
- oc login
- oc edit
- oc delete
- oc get pods
- oc cp
- oc exec
- oc set
Configure Redis settings
- Edit the value of the
maxmemory
property in the redis.conf file. Run this command:
Change the value tooc edit cm redis-ha-configmap
"1573741824"
. It must be enclosed in double quotation marks. - Increase the Redis memory limit to 2 GB by running this
command:
oc set resources sts redis-ha-server -c redis --limits=memory=2Gi
- Update the CAMS OMRS cache TTL setting by running this
command:
oc set env deploy catalog-api -c catalog-api omrs_cache_ttl_days=1
https://<target_host_name>/k8s/ns/wkc/deployments/catalog-api/environment
The
omrs_cache_ttl_days
property should be set to the value 1
.Increase available resources for services in Cloud Pak for Data
Before you start the migration, you must increase the memory limits for the Cassandra, Solr, event consumer, iis-services, and conductor services. The increased limits are required for operations like imports to ensure optimal performance.
- Log in to the Red Hat®
OpenShift cluster with this
command:
oc login
- Modify the HEAP SETTINGS section of the Cassandra JVM options.
- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit cm cassandra-jvm-options
- Modify the values. The
-Xms
and-Xmx
options must have the same value. The value of the-Xmn
option must be four times smaller than the value of the-Xmx
option. The following excerpt shows recommended values. If you have more resources, you can increase the values.################# # HEAP SETTINGS # ################# # Heap size is automatically calculated by cassandra-env based on this # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick the max # # For production use you may wish to adjust this for your environment. # If that's the case, uncomment the -Xmx and Xms options below to # override the automatic calculation of JVM heap memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to # the same value to avoid stop-the-world GC pauses during resize, and # so that we can lock the heap in memory on startup to prevent any # of it from being swapped out. #-Xms1024M #-Xmx1024M -Xms4096M -Xmx4096M # Young generation size is automatically calculated by cassandra-env # based on this formula: min(100 * num_cores, 1/4 * heap size) # # The main trade-off for the young generation is that the larger it # is, the longer GC pause times will be. The shorter it is, the more # expensive GC will be (usually). # # It is not recommended to set the young generation size if using the # G1 GC, since that will override the target pause-time goal. # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html # # The example below assumes a modern 8-core+ machine for decent # times. If in doubt, and if you do not particularly want to tweak, go # 100 MB per physical CPU core. #-Xmn256M -Xmn1024M
- Run this
command:
- Modify the resource requests and limits for the Cassandra StatefulSet.
- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit sts cassandra
- Modify the values. The memory request value must be equal to the value of the
-Xmx
option. The memory limit value must be four times bigger than the request value. The following excerpt shows recommended values.resources: limits: cpu: 2 memory: 16Gi requests: cpu: 1 memory: 4Gi
- Restart the Cassandra pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} delete pod cassandra-0
- Run this
command:
- Modify the
HEAP SETTINGS
section of theiis-services
configuration.- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit cm iis-server
- Search for the
-Xmx
option and change its value. The recommended value is-Xmx16384m
. - Find the name of the iis-services pod. Run this
command:
oc get pods | grep iis-services
- Restart the iis-services pod. Use the name that was returned by the command in previous step.
For
example:
oc -n ${PROJECT_CPD_INSTANCE} delete pod iis-services
- Run this
command:
- Modify the resource requests and limits for the Solr StatefulSet.
- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit sts solr
- Modify the values. The following excerpt shows recommended
values.
resources: limits: cpu: 2 memory: 4Gi requests: cpu: 1 memory: 1Gi
- Restart the Solr pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} delete pod solr-0
- Run this
command:
- Modify the resource request and limit values for the event consumer StatefulSet.
- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit sts shop4info-event-consumer
- Modify the values. The following excerpt shows recommended
values.
resources: limits: cpu: 3 memory: 4Gi requests: cpu: 200m memory: 1Gi
- Restart the event consumer pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} delete pod shop4info-event-consumer-0
- Run this
command:
- Modify the resource limits for the conductor StatefulSet.
- Run this
command:
oc -n ${PROJECT_CPD_INSTANCE} edit sts is-en-conductor
- Modify the values. The following excerpt shows recommended
values.
resources: limits: cpu: 6 memory: 16Gi
- Restart the conductor pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} delete pod is-en-conductor-0
- Run this
command:
Increase the size of the Db2 secondary log
- Search for the Db2 pod (
wdp-db2-0
) name, use ‘db2’ as the search string.oc get pods | grep db2
- Log in to the Db2
pod.
oc exec -it wdp-db2-0 bash
- Switch to the db2inst1 user:
su - db2inst1
- Run the following
command:
db2 "update db cfg for ilgdb using logsecond max_num_allowed"
The value you set for max_num_allowed is the maximum number of secondary log files that can be created and is usually calculated as 256 minus the number of primary log files. For more information about the logsecond configuration parameter, see the Db2 documentation.
Configure IOPS settings for the NFS server
Configure the NFS server to have at least 10 IOPS. For more information, see the Adjusting IOPS topic in the IBM Cloud documentation.
Configure the timeout values for importing data
- Search for the conductor pod (
is-en-conductor-0
) name, use ‘conductor’ as the search string.oc get pods | grep conductor
- Log in to the conductor pod.
oc exec -it is-en-conductor-0 bash
- Navigate to
/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.iis.client/iis.client.site.properties.
Open the file and add the following
property:
com.ibm.iis.http.soTimeout=36000000
- Search for the iis-services pod (
iis-services
) name, use ‘services’ as the search string.oc get pods | grep services
- Log in to the iis-services pod.
oc exec -it iis-services bash
- Run the following
commands:
/opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -set -key com.ibm.iis.gov.vr.setting.maxObjectsInMemory -value 4000000 /opt/IBM/InformationServer/ASBServer/bin/iisAdmin.sh -set -key com.ibm.iis.gov.xFrameOptions -value SAMEORIGIN
- Change the value of the
Xmx
option in theconfigMap
file.- Run the following
command:
oc -n ${PROJECT_CPD_INSTANCE} edit cm iis-server
- Modify the
-Xmx
option to have the-Xmx16384m
value. - Find the name of the iis-services pod. Run this
command:
oc get pods | grep iis-services
- Restart the iis-services pod. Use the name that was returned by the command in previous step.
For
example:
oc -n ${PROJECT_CPD_INSTANCE} delete pod iis-services
- Run the following
command:
- Stop the Information Server server by running the following
command:
/opt/IBM/InformationServer/wlp/bin/server stop iis
- Navigate to the opt/IBM/InformationServer/wlp/usr/servers/iis/server.xml
file. Open the file and configure the options to the following
values:
<httpSession ... invalidationTimeout="3600" ... /> <ltpa expiration="7600m"/> <transaction ... clientInactivityTimeout="36000" propogatedOrBMTTranLifetimeTimeout="72000" totalTranLifetimeTimeout="72000" ... />
- Start the Information Server server again by running the following
command:
/opt/IBM/InformationServer/wlp/bin/server start iis
Create users in the target Cloud Pak for Data system
- In Cloud Pak for Data, go to .
- Click New user.
- Provide the required information and save the changes.
- All user names in Cloud Pak for Data are always in lower case. As a result, if the user names in the source system contained any capital letter, the associations between such users and assets (properties like steward or created by) are ignored during migration. No workaround is available, you must recreate these associations manually.
- To preserve the associations between stewards and assets, you must add the Data Steward role to
re-created users in Cloud Pak for Data. This is valid only
for users whose user names in the source system don’t contain capital letters.The re-created users should log in to Cloud Pak for Data at least once before you run the migration. Otherwise, you must manually add those users as stewards in Information Governance Catalog before running the migration. To manually add those users, log in to Information Governance Catalog by entering this URL in your browser:
Then, go to the Administration page.https://<source-host-name>/ibm/iis/igc/
Information Server role | Cloud Pak for Data role or permission |
---|---|
|
View information assets |
|
Administrator role |
Information Governance Catalog User | Access governance artifacts |
(No equivalent role) | Manage governance categories |
|
Manage asset discovery |
(No equivalent role) | Manage governance workflows |
|
Manage information assets |
Common Metadata Administrator | Manage metadata |
|
Manage data quality |
Information Governance Catalog User | Access governance artifacts |
|
Access data quality |
Install native connectors
- Db2 connector
- Netezza® connector
- Db2 connector
- Complete the following steps:
- Download the installation files install.sh and
db2_client.tar.gz
from Fix Central. - Copy the files to the
/tmp
directory on Cloud Pak for Data. - Get the name of the conductor pod by running this command.
oc get pods -n ${PROJECT_CPD_INSTANCE}| grep conductor
The output looks similar to this example, where the pod name is indicated in bold.is-en-conductor-0 1/1 Running 0 1d
- Copy the files to the conductor pod by running this
command:
oc cp /tmp/install.sh db2_client.tar.gz <project-name>/is-en-conductor-0:/tmp
- Log in to the conductor pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} exec -it is-en-conductor-0 bash
- Check whether the mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
directory exists by running this
command:
If the directory doesn’t exist, create it and navigate to it by running these commands:[root@is-en-conductor-0 EngineClients]# ls /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
mkdir -p /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/ cd /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
- Copy the install.sh and db2_client.tar.gz files to
this directory by running this
command:
cp /tmp/install.sh /tmp/db2_client.tar.gz /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
- Create a new directory by running this command:
mkdir db2_client
- Extract the db2_client.tar.gz
file.
[root@is-en-conductor-0 EngineClients]# tar -xvf db2_client.tar.gz
- Edit the db2client.rsp file to contain a Db2 install path, for example mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients.
- Run the install.sh
file.
[root@is-en-conductor-0 EngineClients]# install.sh
- Print the system path to the current directory. The output is shown in
bold.
[root@is-en-conductor-0]# pwd /home/dsadm/sqllib
- Set up your environment by running this command:
source db2profile
- Get the IP address of the metadata repository (XMETA) docker. Run this
command:
[root@is-en-conductor-0]# ifconfig
- Run the
CATALOG TCPIP NODE
command. Use the IP address that you retrieved in the previous step. For example:[root@is-en-conductor-0 sqllib]# db2 "catalog tcpip node docker remote 192.0.2.2 server 50000"
- Run the
CATALOG DATABASE
command:[root@is-en-conductor-0 sqllib]# db2 "catalog database xmeta at node docker"
- Connect to the metadata repository
database:
[root@is-en-conductor-0 sqllib]# db2 connect to xmeta user db2inst1 using isadmin
- Download the installation files install.sh and
- Netezza connector
- Complete the following steps:
- Download the installation file
nz-linuxclient-v7.0.3-P2.tar.gz
from Fix Central. - Copy the file to the /tmp directory on Cloud Pak for Data.
- Get the name of the conductor pod by running this command.
oc get pods -n ${PROJECT_CPD_INSTANCE}| grep conductor
The output looks similar to this example, where the pod name is indicated in bold.is-en-conductor-0 1/1 Running 0 1d
- Copy the installation file to the conductor pod by running this
command:
oc cp /tmp/nz-linuxclient-v7.0.3-P2.tar.gz <project-name>/is-en-conductor-0:/tmp
- Log in to the conductor pod by running this
command:
oc -n ${PROJECT_CPD_INSTANCE} exec -it is-en-conductor-0 bash
- Check whether the mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
directory exists by running this
command:
If the directory doesn’t exist, create it and navigate to it by running these commands:[root@is-en-conductor-0 EngineClients]# ls /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
mkdir -p /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/ cd /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
- Copy the nz-linuxclient-v7.0.3-P2.tar.gz file to this directory by running
this
command:
cp /tmp/nz-linuxclient-v7.0.3-P2.tar.gz /mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/
- Create a new directory by running this command:
mkdir oracle
- Extract the
nz-linuxclient-v7.0.3-P2.tar.gz
file:[root@is-en-conductor-0 EngineClients]# tar -xvf nz-linuxclient-v7.0.3-P2.tar.gz
- Go to the extracted directory
linux64:
[root@is-en-conductor-0 EngineClients]# cd linux64
- Unpack the NPS®
Linux®
Client:
Unpack the client to [/usr/local/nz] /mnt/IIS_zen/Engine/<project-name>/is-en-conductor-0/EngineClients/nz. If the directory doesn’t exist, specify y to create it.[root@is-en-conductor-0 linux64]# unpack
- Go back to the parent
directory:
[root@is-en-conductor-0 linux64]# cd ..
- Check the contents of the directory. The output is shown in
bold.
[root@is-en-conductor-0 EngineClients]# ls bin64 datadirect.package.tar.z db2_client lib lib64 licenses linux linux64 nz nz-linuxclient-v7.0.3-P2.tar.gz sys webadmin
- Navigate to the nz directory and list its
contents:
[root@is-en-conductor-0 EngineClients]# cd nz [root@is-en-conductor-0 nz]# ls bin64 lib lib64 licenses sys
- Edit the odbc.ini file by running these
commands:
Add the following data source information to the odbc.ini file. Replace the values of#vi $ODBCINI
Servername
,Port
,Username
, andPassword
with the proper values for your system.[NZDSN] Driver=/mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients /nz/lib64/libnzodbc.so Description=NetezzaSQL ODBC Servername=203.0.113.17 Port=5480 Database=netezzadb Username=user1 Password=password ReadOnly=false ShowSystemTables=false LegacySQLTables=false LoginTimeout=0 QueryTimeout=0 DateFormat=1 NumericAsChar=false SQLBitOneZero=false StripCRLF=false securityLevel=preferredUnSecured caCertFile=
- Access the dsenv file in the
/opt/IBM/InformationServer/Server/DSEngine/ directory and add the following
commands to the
file:
export PATH/mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/nz/bin64:$PATH export LD_LIBRARY_PATH=/mnt/dedicated_vol/Engine/is-en-conductor-0/EngineClients/nz/lib64:$LD_LIBRARY_PATH export NZ_ODBC_INI_PATH=/opt/IBM/InformationServer/DSEngine
- Download the installation file