IBM Fusion IBM Data Cataloging known issues
List of all issues in the IBM Data Cataloging service along with its resolution.
The following known issues exist in IBM Data Cataloging service, with workarounds included wherever possible. If you come across an issue that cannot be solved by using these instructions, contact IBM support .
With resource quota set on fusion and DCS namespaces, the DCS installation was stuck until both the resource quota were removed
Live events stopped working after IBM Data Cataloging
- Problem statement
- Live events functionality stopped updating record events after the IBM Data Cataloging is upgraded.
- Resolution
- Follow the steps to resolve this issue:
- Delete the watcher on IBM Storage Scale
without deleting the connection on the IBM Data Cataloging
service as follows:
- Run the following command to list the
watchers.
Example output:/usr/lpp/mmfs/bin/mmwatch all list
Filesystem testing has 1 watcher(s): Cluster ID WatchID/PID Type Start Time Path Description ---------------------------------------------------------------------------------------------------------------------- 13748412159870826098 CLW1733515374 FSYS Fri Dec 06 12-02-55 2024 /ibm/testing
- Run the following command to delete the
watchers.
Example output:sudo /usr/lpp/mmfs/bin/mmwatch testing disable --watch-id CLW1733515374
[I] Successfully checked or deleted Clustered Watch policy skip partitions for watch: CLW1733515374 [I] Successfully removed Clustered Watch policy rules for watch: CLW1733515374 [I] Successfully removed Clustered Watch configuration file from the CCR for watch: CLW1733515374 [I] Successfully removed Clustered Watch configuration directory for watch: CLW1733515374 [I] Successfully checked or removed Clustered Watch global catchall and config fileset skip partitions for watch: CLW1733515374 [I] Successfully disabled Clustered Watch: CLW1733515374
- Run the command again to list the watchers to verify the watcher
deletion.
Example output:/usr/lpp/mmfs/bin/mmwatch all list
Cluster ID WatchID/PID Type Start Time Path Description ---------------------------------------------------------------------------------------------------------------------- Filesystem testing has no watchers.
- Run the following command to export cluster's URL and IBM Storage Scale working directory as environmental
variables.
export CLUSTER_URL = "yourcluster.ibm.com" export WD_PATH="/example/path/wd"
- Run the following command to manually add the watcher on the IBM Storage Scale.
Example output:sudo /usr/lpp/mmfs/bin/mmwatch testing enable --event-handler kafkasink --sink-brokers "isd-kafka-tlsext-bootstrap-ibm-data-cataloging.apps.qa-bm-$CLUSTER_URL:443" --sink-topic scale-le-connector-topic --sink-auth-config $WD_PATH/kafka/auth.file --events IN_ATTRIB,IN_CLOSE_WRITE,IN_MODIFY,IN_CREATE,IN_DELETE,IN_MOVED_FROM,IN_MOVED_TO
[I] Beginning enablement of Clustered Watch with newly created watch ID: CLW1733516766 [I] Verified the watch type is FSYS for filesystem testing [I] Successfully added Clustered Watch configuration file into CCR for watch: CLW1733516766 [I] Successfully added Clustered Watch configuration in the Spectrum Scale file system: testing for watch: CLW1733516766 [I] Successfully added Clustered Watch policy rules for watch: CLW1733516766 [I] Successfully checked or created Clustered Watch global catchall and config fileset skip partitions for watch: CLW1733516766 [I] Successfully checked or created Clustered Watch policy skip partitions for watch: CLW1733516766 [I] Successfully enabled Clustered Watch: CLW1733516766
- Run the command again to list the watchers to verify the watcher
creation.
Example output:/usr/lpp/mmfs/bin/mmwatch all list
Filesystem testing has 1 watcher(s): Cluster ID WatchID/PID Type Start Time Path Description ---------------------------------------------------------------------------------------------------------------------- 13748412159870826098 CLW1733516766 FSYS Fri Dec 06 12-26-07 2024 /ibm/testing
- On IBM Data Cataloging, re-scan the IBM Storage Scale connection.
- Any modifications on the scanned IBM Storage Scale folder should be monitored through live events and automatically updated on IBM Data Cataloging.
- Run the following command to list the
watchers.
- Delete the watcher on IBM Storage Scale
without deleting the connection on the IBM Data Cataloging
service as follows:
IBM Data Cataloging service in Metro-DR setup shows in Degraded state
- Diagnosis
-
- IBM Data Cataloging service is in degraded state
- Run the following command to check whether the Pod
isd-db2whrest
is not ready:oc -n ibm-data-cataloging get pod -l role=db2whrest
- Run the following command to check whether Db2 retries the network check and fails because of
the timeout:
oc -n ibm-data-cataloging logs -l type=engine --tail=100
Example output:+ timeout 1 tracepath -l 29 c-isd-db2u-1.c-isd-db2u-internal + [[ 17 -lt 120 ]] + (( n++ )) + echo 'Command failed. Attempt 18/120:' Command failed. Attempt 18/120:
- Resolution
- Increase the time before timeout, usually change from 1 second to 3-5 seconds.
- Modify the timeout from 1 to 3 in
isd-db2u-0
:oc -n ibm-data-cataloging exec c-isd-db2u-0 -- sudo sed -i 's/timeout 1 tracepath/timeout 3 tracepath/g' /db2u/scripts/include/common_functions.sh
- Wait until the current attempt exceeds the predefined 120 retries. After it restarts, it picks
the updated value:
oc -n ibm-data-cataloging logs -l type=engine --tail=50
- Monitor
db2whrest
pod readiness:oc -n ibm-data-cataloging get pod -l role=db2whrest -w
- Modify the timeout from 1 to 3 in
COS connection reporting scan aborted due to inactivity
If a COS connection scan fails with the error “Scan aborted because of a long period of inactivity”, it can be resolved by editing the settings file connections/cos/scan/scanner-settings.json within the data PV and choosing a higher value for notifier_timeout than the default value of 120 seconds. The change will be picked on the next scan. No pod restart is required.
Database connection issue after reboot
If an unexpected cluster update or node reboot causes database connection issues. For resolution, see the steps mentioned in the IBM Data Cataloging database schema job is not in a completed state during installation or upgrade section. .
Image pull error due to authentication failure
- Problem statement
- The OpenShift® Container Platform login token expires occasionally, and as this is the container image registry password, this breaks the service account access to the registry.
- Resolution
- If a pod is failing to pull an image from the registry with an authentication error, then
re-create the image-registry-pull-secret and relink the service accounts to the new
secret:
oc delete secret image-registry-pull-secret HOST=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') oc create secret docker-registry image-registry-pull-secret \ --docker-server="${HOST}" \ --docker-username=kubeadmin \ --docker-password="$(oc whoami -t)" for account in spectrum-discover-operator strimzi-cluster-operator spectrum-discover-ssl- zookepper spectrum-discover-sasl-zookeeper; do oc secrets link $account image-registry-pull- secret --for=pull; done
Visual query builder search terms overrides SQL search when going into individual mode
If a search is started in the query builder, then changed to SQL mode, the initial group search is as expected but if expanded to individual records it uses the query builder terms as the base. A workaround is to clear the visual query before changing to SQL query.
LDAPS configuration failing if dollar sign is in password
Currently, the dollar sign is not supported on passwords for ldaps configuration. A workaround is to create a password without the dollar sign in it.
Content search policy missing files
If there are issues with the incorrect expected data count while running a policy, you must verify that the connection is active, and rescan to get the latest data ingested to IBM Data Cataloging. After successful upgrade of IBM Data Cataloging, a rescan of existing connections is recommended.
REST API returns token with unprintable characters
$ curl -k -H "Authorization: Bearer ${TOKEN}" https://$SDHOST/policyengine/v1/tags
curl: (92) HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)
`TOKEN=$(curl -i -k https://$SDHOST/auth/v1/token -u "$SDUSER:$SDPSWD" | grep -i x-auth-token |
awk '{print $2}')`
`TOKEN=$(curl -i -k https://$SDHOST/auth/v1/token -u "$SDUSER:$SDPSWD" | grep -i x-auth-token |
awk '{print $2}' | tr -d '\r')`
Adding S3 connection gives false negative
- Problem statement
- When a connection of type S3 is added through IBM Data Cataloging user interface, it gives an undefined error message.
- Resolution
- Refreshing the browser removes the error message, and the connections table shows that the S3 connection was successful.
Scale datamover AFM and ILM capabilities not working properly due to SDK misleading function when deploying an application
- Problem statement
- When deploying IBM Data Cataloging service deployments
pods
scaleafmdatamover
andscaleilmdatamover
might show an error on logs when application is deployed.For example:
2023-07-20 02:51:54,311 - ibm_spectrum_discover_application_sdk.ApplicationLib - INFO - Invoking conn manager at http://172.30.255.202:80/connmgr/v1/internal/connections Traceback (most recent call last): File "/application/ScaleAFMDataMover.py", line 1023, in APPLICATION = ScaleAFMApplicationBase(REGISTRATION_INFO) File "/application/ScaleAFMDataMover.py", line 112, in init self.conn_details = self.get_connection_details() File "/usr/local/lib/python3.9/site-packages/ibm_spectrum_discover_application_sdk/ApplicationLib.py", line 492, in get_connection_details raise Exception(err) UnboundLocalError: local variable 'err' referenced before assignment 2023-07-20 02:51:54,367 INFO exited: scaleafm-datamover (exit status 1; not expected) 2023-07-20 02:51:55,368 INFO gave up: scaleafm-datamover entered FATAL state, too many start retries too quickly
- Cause
- SDK bug when deploying applications on DCS, causing deployed applications not to behave properly and pods in the incorrect state to show errors.
- Resolution
- Once identified this behavior, then follow the steps to resolve this issue:
- Verify the connmgr API is running and accessible through HTTP (curl to the connmgr service would be enough).
- Check the application pod and remove it to be redeployed.
Policies are not finished, resulting in a hanging state
- Problem statement
- The policies are not finished, which results in a hanging state.
- Cause
- Inconsistent behavior on policies results in a no finish status. The issue is still under investigation.
- Resolution
- Identify the policy engine pod and eliminate it; OpenShift Container Platform will create another pod automatically, and policies will be
executed and finished properly after the pod
creation.Example:
oc -n ibm-data-cataloging delete pod -l role=policyengine
IBM Data Cataloging service goes degraded state after IBM Fusion HCI rack restart
- Problem statement
- The IBM Data Cataloging service goes degraded after some of the nodes are restarted or after IBM Fusion HCI rack restart. Several pods go pending with the following errors Unable to attach or mount volumes: unmounted volumes=[spectrum-discover-db2wh], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition and MountVolume.SetUp failed for volume "xxx" : rpc error: code = Internal desc = staging path yyy for volume zzz is not a mountpoint.
- Resolution
- Run the following steps to resolve this issue:
- Run the following commands to make the compute nodes unschedule and drain them one after
one.
oc adm cordon worker4.fusion-test-zlinux.cp.fyre.ibm.com oc adm drain worker4.fusion-test-zlinux.cp.fyre.ibm.com --ignore-daemonsets --force --delete-emptydir-data
- After the node is drained, ensure it schedules again and then proceed to another node with the
same process.
It removes stale directory entries from nodes that are detected as mount points.
- The issue automatically resolves and IBM Data Cataloging service must be in a healthy state after all the nodes are backed up.
- Run the following commands to make the compute nodes unschedule and drain them one after
one.
Db2 license does not display correctly on the upgrade set up
- Problem statement
- The Db2 license displays incorrectly on the IBM Data Cataloging service upgrade set up.
- Resolution
- Run the following steps to resolve this issue:
- Run the following command to get the subsequent IBM Data Cataloging scoped
content.
oc project ibm-data-cataloging
- For the new license to take effect, delete the Db2 engine pods for the
Db2uCluster
orDb2uInstance
:oc delete $(oc get po -l type=engine,formation_id=isd -oname)
- Once the new Db2 pod is ready, verify the updated Db2
license:
oc exec -it c-isd-db2u-0 -- su - db2inst1 -c "db2licm -l"
For more about Db2 community edition license certificate key, see Upgrading your Db2 Community Edition license certificate key
- Run the following command to get the subsequent IBM Data Cataloging scoped
content.
ConstraintsNotSatisfiable for db2u-operator
- Problem statement
- The Db2 version used by IBM Data Cataloging service 2.1.6 got removed from the latest version of the IBM Operator Catalog.
- Resolution
- Follow the steps to resolve the issue:
- Add or update the IBM Catalog Source object
ibm-operator-catalog
in theopenshift-marketplace
namespace during install or upgrade.apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog-old namespace: openshift-marketplace spec: displayName: IBM Operator Catalog publisher: IBM sourceType: grpc image: icr.io/cpopen/ibm-operator-catalog:@sha256:c2538264cb1882b1c98fea5ef162f198ce38ed8c940e82e3b9db458a9a46cb15 updateStrategy: registryPoll: interval: 45m
- Update the
ibm-operator-catalog
source with the latest tag after the installation or upgrade completed.apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog-old namespace: openshift-marketplace spec: displayName: IBM Operator Catalog publisher: IBM sourceType: grpc image: icr.io/cpopen/ibm-operator-catalog:latest updateStrategy: registryPoll: interval: 45m
- Add or update the IBM Catalog Source object
ConstraintsNotSatisfiable for db2u-operator
- Problem statement
- The Db2 version used by IBM Data Cataloging service 2.1.6 got removed from the latest version of the IBM Operator Catalog.
- Resolution
- Follow the steps to resolve the issue:
- Add or update the IBM Catalog Source object
ibm-operator-catalog
in theopenshift-marketplace
namespace during install or upgrade.apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog-old namespace: openshift-marketplace spec: displayName: IBM Operator Catalog publisher: IBM sourceType: grpc image: icr.io/cpopen/ibm-operator-catalog@sha256:5d606e4eb2b875e0b975f892e80343105ea5fb0d67f96e1400d77a715f6df72a updateStrategy: registryPoll: interval: 45m
- Update the
ibm-operator-catalog
source with the latest tag after the installation or upgrade completed.apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-operator-catalog-old namespace: openshift-marketplace spec: displayName: IBM Operator Catalog publisher: IBM sourceType: grpc image: icr.io/cpopen/ibm-operator-catalog:latest updateStrategy: registryPoll: interval: 45m
- Add or update the IBM Catalog Source object