IBM Data Cataloging service installation and upgrade issues
Use this troubleshooting information to resolve install and upgrade problems that are related to the IBM Data Cataloging service.
Installation stuck because of resource quota
- Problem statement
- The IBM Data Cataloging installation is stuck because of resource quota set on IBM Fusion and IBM Data Cataloging namespaces. tuntil both the resource quota were removed
- Resolution
- Set the resource quota values for IBM Data Cataloging as
follows:
kind: ResourceQuota apiVersion: v1 metadata: name: dcs-resource-quota namespace: ibm-data-cataloging spec: hard: limits.cpu: '100' limits.memory: 180Gi pods: '120' requests.cpu: '20' requests.memory: 40Gi
Data cataloging upgrade stuck for more than 1 hour
- Problem statement
- IBM Data Cataloging upgrade is stuck for more than 1 hour and the progress shows 15%.
- Symptoms
- The IBM Data Cataloging upgrade runs for more than 1 hour
and gets stuck at 15%. On the OpenShift®
Console, the installed operator shows just the IBM Data Cataloging operator and not
IBM DB2
andStreams for Apache Kafka
operators under theibm-data-cataloging
namespace.Also, the Subscription for the IBM Data Cataloging operator shows InstallPlanPending, Reason "RequiresApproval.
- Resolution
-
- In the OpenShift console, go to tab.
- Select IBM Storage Discover operator.
- Go to Subscriptions tab, and select InstallPlan.
- Approve the InstallPlan.There is an alternative method using OpenShift CLI as follows:
- Run the following command to get the InstallPlan that is not
approved.
oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}'
- Run the following command to get the InstallPlan
name.
oc patch installplan $(oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}') -n ibm-data-cataloging --type merge --patch '{"spec":{"approved":true}}'
- Run the following command to get the InstallPlan that is not
approved.
- Wait for the upgrade process to complete.
Data Cataloging import service pod in CrashLoopBackOff state
- Diagnosis
-
- Check for CrashLoopBackOff on the import service
pod.
oc -n ibm-data-cataloging get pod -l role=import-service
- Confirm that the logs show permission denied
error:
oc -n ibm-data-cataloging logs -l role=import-service
- Check for CrashLoopBackOff on the import service
pod.
- Resolution
-
- Debug import service
pod.
oc -n ibm-data-cataloging debug deployment/isd-import-service --as-user=0
- Update the directory permissions.
chmod 775 /uploads mkdir -p /uploads/failed_requests chmod 775 /uploads/failed_requests exit
- Debug import service
pod.
IBM Data Cataloging database schema job is not in a completed state during installation or upgrade
Note: This procedure is applicable to recover DB2 that goes into an unavailable mode after the
service installation and related to a degraded state of the IBM Data Cataloging service.
- Symptoms
- The
isd-db2whrest
orisd-db-schema
pods report a not ready or error state.Run the following command to view the common logs:
oc -n ibm-data-cataloging logs -l 'role in (db2whrest, db-schema)' --tail=200
Go through the logs to check whether the following error exists:
Waiting on c-isd-db2u-engn-svc port 50001... db2whconn - ERROR - [FAILED]: [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 Connection refused
- Resolution
-
- Restart Db2:
oc -n ibm-data-cataloging rsh c-isd-db2u-0 sudo wvcli system disable -m "Disable HA before Db2 maintenance" su - ${DB2INSTANCE} db2stop db2start db2 activate db BLUDB exit sudo wvcli system enable -m "Enable HA after Db2 maintenance"
- Confirm that Db2 HA-monitoring is active:
sudo wvcli system status exit
- Check whether the problem occurred during installation or upgrade or a post installation problem.
- If this occurs during an upgrade or installation, recreate the
isd-db-schema
job and monitor the pod until it gets to a completed state.SCHEMA_OLD="isd-db-schema-old.json" SCHEMA_NEW="isd-db-schema-new.json" oc -n ibm-data-cataloging get job isd-db-schema -o json > $SCHEMA_OLD jq 'del(.spec.template.metadata.labels."controller-uid") | del(.spec.selector) | del (.status)' $SCHEMA_OLD > $SCHEMA_NEW oc -n ibm-data-cataloging delete job isd-db-schema oc -n ibm-data-cataloging apply -f $SCHEMA_NEW
oc -n ibm-data-cataloging get pod | grep isd-db-schema
- If this is a post installation problem, restart
db2whrest.
oc -n ibm-data-cataloging delete pod -l role=db2whrest
- Restart Db2:
IBM Data Cataloging installation is stuck for more than 1 hour
Note: This procedure must be used only during installation problems, not for upgrades or any
subsequent issues.
- Symptoms
- The IBM Data Cataloging installation running for more than an hour, and it stuck somewhere between 35% and 80% (both inclusive).
- Resolution
-
- Run the following command to scale down the
operator.
oc -n ibm-data-cataloging scale --replicas=0 deployment/spectrum-discover-operator
- Run the following command to scale down
workloads.
oc -n ibm-data-cataloging scale --replicas=0 deployment,statefulset -l component=discover
- Run the following command to remove the DB schema job if
present.
oc -n ibm-data-cataloging delete job isd-db-schema --ignore-not-found
- Run the following commands to delete the Db2 instance and the password
secret.
oc -n ibm-data-cataloging delete db2u isd oc -n ibm-data-cataloging delete secret c-isd-ldapblueadminpassword --ignore-not-found
- Wait until the Db2 pods and persistent volume claims get
removed.
oc -n ibm-data-cataloging get pod,pvc -o name | grep c-isd
- Run the following command to scale up the
operator.
oc -n ibm-data-cataloging scale --replicas=1 deployment/spectrum-discover-operator
- Run the following command to scale down the
operator.
IBM Data Cataloging service is not installed successfully on IBM Fusion HCI with GPU nodes
- Problem statement
- The data catalog service is in installing state for hours.
- Resolution
- To resolve the problem, do the following steps:
- Patch FSD with new affinity to not schedule
isd
workload on those nodes:
The fsd_dcs_patch.yaml file:oc -n <Fusion_namespace> patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition --patch "$(cat fsd_dcs_patch.yaml)"
cat >> fsd_dcs_patch.yaml << EOF apiVersion: service.isf.ibm.com/v1 kind: FusionServiceDefinition metadata: name: data-cataloging-service-definition namespace: <Fusion_namespace> spec: onboarding: parameters: - dataType: string defaultValue: ibm-data-cataloging descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 name: namespace required: true userInterface: false - dataType: storageClass defaultValue: '' descriptionCode: BMYDC0300 displayNameCode: BMYDC0301 name: rwx_storage_class required: true userInterface: true - dataType: bool defaultValue: 'true' descriptionCode: descriptionCode displayNameCode: displayNameCode name: doInstall required: true userInterface: false - dataType: json defaultValue: '{"accept": true}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: license required: true userInterface: false - dataType: json defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"nvidia.com/gpu","operator":"NotIn","values":["Exists"]}]}]}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: affinity required: true userInterface: false EOF
If the output shows this error message Error from server (UnsupportedediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+j son, application/apply-patch+yaml, then you need to follow the steps to resolve this issue:- Go to OpenShift Container Platform web console.
- Go to
<Fusion_namespace>
, select the IBM Fusion operator.
tab, under - Select the IBM Fusion service instance tab,
and select the
data-cataloging-service-instance
. - Select the YAML tab, and edit the YAML file for the
data-cataloging-service-instance
. Underspec.onboarding.parameters
, ensure you add the following lines.- dataType: json defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"isf.ibm.com/nodeType","operator":"NotIn","values":["gpu"]}]}]}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: affinity required: true userInterface: false
- Display the patch FSD:
oc -n <Fusion_namespace> get fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition -o yaml
- Install from the user interface.
- Patch FSD with new affinity to not schedule
IBM Data Cataloging database pods stuck during initialization phase
- Symptoms
- Multiple Db2 pods use the host port 5002, which can cause pods to stay in the
init
phase.
- Resolution
-
- Uninstall the IBM Data Cataloging service. For procedure, see Uninstalling IBM Data Cataloging.
- Create a file with the IBM Data Cataloging
FusionServiceDefinition
patch.cat >> fsd_dcs_patch.yaml << EOF apiVersion: service.isf.ibm.com/v1 kind: FusionServiceDefinition metadata: name: data-cataloging-service-definition namespace: ibm-spectrum-fusion-ns spec: onboarding: parameters: - dataType: string defaultValue: ibm-data-cataloging descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 name: namespace required: true userInterface: false - dataType: storageClass defaultValue: '' descriptionCode: BMYDC0300 displayNameCode: BMYDC0301 name: rwx_storage_class required: true userInterface: true - dataType: bool defaultValue: 'true' descriptionCode: descriptionCode displayNameCode: displayNameCode name: doInstall required: true userInterface: false - dataType: json defaultValue: '{"accept": true}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: license required: true userInterface: false - dataType: json defaultValue: '{"size":1,"mln":2,"storage":{"activelogs":{"requests":"300Gi"},"data":{"requests":"600Gi"},"meta":{"requests":"100Gi"},"activelogs":{"tempts":"100Gi"}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: dbwh required: true userInterface: false EOF
- Apply the patch to downsize the Db2
cluster.
oc -n ibm-spectrum-fusion-ns patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition --type=merge --patch-file fsd_dcs_patch.yaml
- Install IBM Data Cataloging service from the IBM Fusion user interface. For procedure, see Installing IBM Data Cataloging.