Data Cataloging service issues
Use this troubleshooting information to resolve install and upgrade problems that are related to the Data Cataloging service.
Data cataloging upgrade stuck for more than 1 hour
- Problem statement
- Data Cataloging upgrade is stuck for more than 1 hour and the progress shows 15%.
- Symptoms
- The Data Cataloging upgrade runs for more than 1 hour
and gets stuck at 15%. On the OpenShift®
Console, the installed operator shows just the Data Cataloging operator and not
IBM DB2
andAMQ-streams
operators under theibm-data-cataloging
namespace.Also, the Subscription for the Data Cataloging operator shows InstallPlanPending, Reason "RequiresApproval.
- Resolution
-
- In the OpenShift console, go to tab.
- Select IBM Storage Discover operator.
- Go to Subscriptions tab, and select InstallPlan.
- Approve the InstallPlan.There is an alternative method using OpenShift CLI as follows:
- Run the following command to get the InstallPlan that is not
approved.
oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}'
- Run the following command to get the InstallPlan
name.
oc patch installplan $(oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}') -n ibm-data-cataloging --type merge --patch '{"spec":{"approved":true}}'
- Run the following command to get the InstallPlan that is not
approved.
- Wait for the upgrade process to complete.
Data Cataloging import service pod in CrashLoopBackOff state
- Diagnosis
-
- Check for CrashLoopBackOff on the import service
pod.
oc -n ibm-data-cataloging get pod -l role=import-service
- Confirm that the logs show permission denied
error:
oc -n ibm-data-cataloging logs -l role=import-service
- Check for CrashLoopBackOff on the import service
pod.
- Resolution
-
- Debug import service
pod.
oc -n ibm-data-cataloging debug deployment/isd-import-service --as-user=0
- Update the directory permissions.
chmod 775 /uploads mkdir -p /uploads/failed_requests chmod 775 /uploads/failed_requests exit
- Debug import service
pod.
Data Cataloging database schema job is not in a completed state during installation or upgrade
Note: This procedure is applicable to recover DB2 that goes into an unavailable mode after the
service installation and related to a degraded state of the Data Cataloging service.
- Symptoms
- The
isd-db2whrest
orisd-db-schema
pods report a not ready or error state.Run the following command to view the common logs:
oc -n ibm-data-cataloging logs -l 'role in (db2whrest, db-schema)' --tail=200
Go through the logs to check whether the following error exists:
Waiting on c-isd-db2u-engn-svc port 50001... db2whconn - ERROR - [FAILED]: [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 Connection refused
- Resolution
-
- Restart Db2:
oc -n ibm-data-cataloging rsh c-isd-db2u-0 sudo wvcli system disable -m "Disable HA before Db2 maintenance" su - ${DB2INSTANCE} db2stop db2start db2 activate db BLUDB exit sudo wvcli system enable -m "Enable HA after Db2 maintenance"
- Confirm that Db2 HA-monitoring is active:
sudo wvcli system status exit
- Check whether the problem occurred during installation or upgrade or a post installation problem.
- If this occurs during an upgrade or installation, recreate the
isd-db-schema
job and monitor the pod until it gets to a completed state.SCHEMA_OLD="isd-db-schema-old.json" SCHEMA_NEW="isd-db-schema-new.json" oc -n ibm-data-cataloging get job isd-db-schema -o json > $SCHEMA_OLD jq 'del(.spec.template.metadata.labels."controller-uid") | del(.spec.selector) | del (.status)' $SCHEMA_OLD > $SCHEMA_NEW oc -n ibm-data-cataloging delete job isd-db-schema oc -n ibm-data-cataloging apply -f $SCHEMA_NEW
oc -n ibm-data-cataloging get pod | grep isd-db-schema
- If this is a post installation problem, restart
db2whrest.
oc -n ibm-data-cataloging delete pod -l role=db2whrest
- Restart Db2:
Data Cataloging installation is stuck for more than 1 hour
Note: This procedure must be used only during installation problems, not for upgrades or any
subsequent issues.
- Symptoms
- The Data Cataloging installation running for more than an hour, and it stuck somewhere between 35% and 80% (both inclusive).
- Resolution
-
- Run the following command to scale down the
operator.
oc -n ibm-data-cataloging scale --replicas=0 deployment/spectrum-discover-operator
- Run the following command to scale down
workloads.
oc -n ibm-data-cataloging scale --replicas=0 deployment,statefulset -l component=discover
- Run the following command to remove the DB schema job if
present.
oc -n ibm-data-cataloging delete job isd-db-schema --ignore-not-found
- Run the following commands to delete the Db2 instance and the password
secret.
oc -n ibm-data-cataloging delete db2u isd oc -n ibm-data-cataloging delete secret c-isd-ldapblueadminpassword --ignore-not-found
- Wait until the Db2 pods and persistent volume claims get
removed.
oc -n ibm-data-cataloging get pod,pvc -o name | grep c-isd
- Run the following command to scale up the
operator.
oc -n ibm-data-cataloging scale --replicas=1 deployment/spectrum-discover-operator
- Run the following command to scale down the
operator.
Data Cataloging service is not installed successfully on IBM Storage Fusion HCI System with GPU nodes
- Problem statement
- The data catalog service is in installing state for hours.
- Resolution
- To resolve the problem, do the following steps:
- Patch FSD with new affinity to not schedule
isd
workload on those nodes:
The fsd_dcs_patch.yaml file:oc -n <Fusion_namespace> patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition --patch "$(cat fsd_dcs_patch.yaml)"
cat >> fsd_dcs_patch.yaml << EOF apiVersion: service.isf.ibm.com/v1 kind: FusionServiceDefinition metadata: name: data-cataloging-service-definition namespace: <Fusion_namespace> spec: onboarding: parameters: - dataType: string defaultValue: ibm-data-cataloging descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 name: namespace required: true userInterface: false - dataType: storageClass defaultValue: '' descriptionCode: BMYDC0300 displayNameCode: BMYDC0301 name: rwx_storage_class required: true userInterface: true - dataType: bool defaultValue: 'true' descriptionCode: descriptionCode displayNameCode: displayNameCode name: doInstall required: true userInterface: false - dataType: json defaultValue: '{"accept": true}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: license required: true userInterface: false - dataType: json defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"nvidia.com/gpu","operator":"NotIn","values":["Exists"]}]}]}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: affinity required: true userInterface: false EOF
If the output shows this error message Error from server (UnsupportedediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+j son, application/apply-patch+yaml, then you need to follow the steps to resolve this issue:- Go to OpenShift Container Platform web console.
- Go to
<Fusion_namespace>
, select the IBM Storage Fusion operator.
tab, under - Select the IBM Storage Fusion service instance tab,
and select the
data-cataloging-service-instance
. - Select the YAML tab, and edit the YAML file for the
data-cataloging-service-instance
. Underspec.onboarding.parameters
, ensure you add the following lines.- dataType: json defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"isf.ibm.com/nodeType","operator":"NotIn","values":["gpu"]}]}]}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: affinity required: true userInterface: false
- Display the patch FSD:
oc -n <Fusion_namespace> get fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition -o yaml
- Install from the user interface.
- Patch FSD with new affinity to not schedule
Data Cataloging database pods stuck during initialization phase
- Symptoms
- Multiple Db2 pods use the host port 5002, which can cause pods to stay in the
init
phase.
- Resolution
-
- Uninstall the Data Cataloging service. For procedure, see Uninstalling Data Cataloging.
- Create a file with the Data Cataloging
FusionServiceDefinition
patch.cat >> fsd_dcs_patch.yaml << EOF apiVersion: service.isf.ibm.com/v1 kind: FusionServiceDefinition metadata: name: data-cataloging-service-definition namespace: ibm-spectrum-fusion-ns spec: onboarding: parameters: - dataType: string defaultValue: ibm-data-cataloging descriptionCode: BMYSRV00003 displayNameCode: BMYSRV00004 name: namespace required: true userInterface: false - dataType: storageClass defaultValue: '' descriptionCode: BMYDC0300 displayNameCode: BMYDC0301 name: rwx_storage_class required: true userInterface: true - dataType: bool defaultValue: 'true' descriptionCode: descriptionCode displayNameCode: displayNameCode name: doInstall required: true userInterface: false - dataType: json defaultValue: '{"accept": true}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: license required: true userInterface: false - dataType: json defaultValue: '{"size":1,"mln":2,"storage":{"activelogs":{"requests":"300Gi"},"data":{"requests":"600Gi"},"meta":{"requests":"100Gi"},"activelogs":{"tempts":"100Gi"}}}' descriptionCode: descriptionCode displayNameCode: displayNameCode name: dbwh required: true userInterface: false EOF
- Apply the patch to downsize the Db2
cluster.
oc -n ibm-spectrum-fusion-ns patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition --type=merge --patch-file fsd_dcs_patch.yaml
- Install Data Cataloging service from the IBM Storage Fusion user interface. For procedure, see Data Cataloging.