Data Cataloging service issues

Use this troubleshooting information to resolve install and upgrade problems that are related to the Data Cataloging service.

Data cataloging upgrade stuck for more than 1 hour

Problem statement
Data Cataloging upgrade is stuck for more than 1 hour and the progress shows 15%.
Symptoms
The Data Cataloging upgrade runs for more than 1 hour and gets stuck at 15%. On the OpenShift® Console, the installed operator shows just the Data Cataloging operator and not IBM DB2 and AMQ-streams operators under the ibm-data-cataloging namespace.

Also, the Subscription for the Data Cataloging operator shows InstallPlanPending, Reason "RequiresApproval.

Resolution
  1. In the OpenShift console, go to Operators > Installed Operators tab.
  2. Select IBM Storage Discover operator.
  3. Go to Subscriptions tab, and select InstallPlan.
  4. Approve the InstallPlan.
    There is an alternative method using OpenShift CLI as follows:
    1. Run the following command to get the InstallPlan that is not approved.
      oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}'
    2. Run the following command to get the InstallPlan name.
      oc patch installplan $(oc get ip -n ibm-data-cataloging -o=jsonpath='{.items[?(@.spec.approved==false)].metadata.name}') -n ibm-data-cataloging --type merge --patch '{"spec":{"approved":true}}'
  5. Wait for the upgrade process to complete.

Data Cataloging import service pod in CrashLoopBackOff state

Diagnosis
  1. Check for CrashLoopBackOff on the import service pod.
    oc -n ibm-data-cataloging get pod -l role=import-service
    
  2. Confirm that the logs show permission denied error:
    oc -n ibm-data-cataloging logs -l role=import-service
Resolution
  1. Debug import service pod.
    oc -n ibm-data-cataloging debug deployment/isd-import-service --as-user=0
  2. Update the directory permissions.
    chmod 775 /uploads
    mkdir -p /uploads/failed_requests
    chmod 775 /uploads/failed_requests
    exit

Data Cataloging database schema job is not in a completed state during installation or upgrade

Note: This procedure is applicable to recover DB2 that goes into an unavailable mode after the service installation and related to a degraded state of the Data Cataloging service.
Symptoms
The isd-db2whrest or isd-db-schema pods report a not ready or error state.

Run the following command to view the common logs:

oc -n ibm-data-cataloging logs -l 'role in (db2whrest, db-schema)' --tail=200

Go through the logs to check whether the following error exists:

Waiting on c-isd-db2u-engn-svc port 50001...

db2whconn - ERROR - [FAILED]: [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032

Connection refused
Resolution
  1. Restart Db2:
    
    oc -n ibm-data-cataloging rsh c-isd-db2u-0
    sudo wvcli system disable -m "Disable HA before Db2 maintenance"
    su - ${DB2INSTANCE}
    db2stop
    db2start
    db2 activate db BLUDB
    exit
    sudo wvcli system enable -m "Enable HA after Db2 maintenance"
  2. Confirm that Db2 HA-monitoring is active:
    
    sudo wvcli system status
    exit
    
  3. Check whether the problem occurred during installation or upgrade or a post installation problem.
  4. If this occurs during an upgrade or installation, recreate the isd-db-schema job and monitor the pod until it gets to a completed state.
    SCHEMA_OLD="isd-db-schema-old.json"
    SCHEMA_NEW="isd-db-schema-new.json"
    oc -n ibm-data-cataloging get job isd-db-schema -o json > $SCHEMA_OLD
    jq 'del(.spec.template.metadata.labels."controller-uid") | del(.spec.selector) | del (.status)' $SCHEMA_OLD > $SCHEMA_NEW
    oc -n ibm-data-cataloging delete job isd-db-schema
    oc -n ibm-data-cataloging apply -f $SCHEMA_NEW
    
    oc -n ibm-data-cataloging get pod | grep isd-db-schema 
    
  5. If this is a post installation problem, restart db2whrest.
    oc -n ibm-data-cataloging delete pod -l role=db2whrest
    

Data Cataloging installation is stuck for more than 1 hour

Note: This procedure must be used only during installation problems, not for upgrades or any subsequent issues.
Symptoms
The Data Cataloging installation running for more than an hour, and it stuck somewhere between 35% and 80% (both inclusive).
Resolution
  1. Run the following command to scale down the operator.
    oc -n ibm-data-cataloging scale --replicas=0 deployment/spectrum-discover-operator
    
  2. Run the following command to scale down workloads.
    oc -n ibm-data-cataloging scale --replicas=0 deployment,statefulset -l component=discover
    
  3. Run the following command to remove the DB schema job if present.
    oc -n ibm-data-cataloging delete job isd-db-schema --ignore-not-found
    
  4. Run the following commands to delete the Db2 instance and the password secret.
    oc -n ibm-data-cataloging delete db2u isd
    oc -n ibm-data-cataloging delete secret c-isd-ldapblueadminpassword --ignore-not-found
  5. Wait until the Db2 pods and persistent volume claims get removed.
    oc -n ibm-data-cataloging get pod,pvc -o name | grep c-isd
    
  6. Run the following command to scale up the operator.
    oc -n ibm-data-cataloging scale --replicas=1 deployment/spectrum-discover-operator
    

Data Cataloging service is not installed successfully on IBM Storage Fusion HCI System with GPU nodes

Problem statement
The data catalog service is in installing state for hours.
Resolution
To resolve the problem, do the following steps:
  1. Patch FSD with new affinity to not schedule isd workload on those nodes:
    oc -n <Fusion_namespace> patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition  --patch "$(cat fsd_dcs_patch.yaml)" 
    The fsd_dcs_patch.yaml file:
    
    cat >> fsd_dcs_patch.yaml << EOF
    
    apiVersion: service.isf.ibm.com/v1
    kind: FusionServiceDefinition
    metadata:
      name: data-cataloging-service-definition
      namespace: <Fusion_namespace>
    spec:
      onboarding:
        parameters:
          - dataType: string
            defaultValue: ibm-data-cataloging
            descriptionCode: BMYSRV00003
            displayNameCode: BMYSRV00004
            name: namespace
            required: true
            userInterface: false
          - dataType: storageClass
            defaultValue: ''
            descriptionCode: BMYDC0300
            displayNameCode: BMYDC0301
            name: rwx_storage_class
            required: true
            userInterface: true
          - dataType: bool
            defaultValue: 'true'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: doInstall
            required: true
            userInterface: false
          - dataType: json
            defaultValue: '{"accept": true}'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: license
            required: true
            userInterface: false
          - dataType: json     
            defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"nvidia.com/gpu","operator":"NotIn","values":["Exists"]}]}]}}}'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: affinity
            required: true
            userInterface: false
        
    EOF
    
    If the output shows this error message Error from server (UnsupportedediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+j son, application/apply-patch+yaml, then you need to follow the steps to resolve this issue:
    1. Go to OpenShift Container Platform web console.
    2. Go to Operators > Installed Operators tab, under <Fusion_namespace>, select the IBM Storage Fusion operator.
    3. Select the IBM Storage Fusion service instance tab, and select the data-cataloging-service-instance.
    4. Select the YAML tab, and edit the YAML file for the data-cataloging-service-instance. Under spec.onboarding.parameters, ensure you add the following lines.
      - dataType: json
                 defaultValue: '{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"isf.ibm.com/nodeType","operator":"NotIn","values":["gpu"]}]}]}}}'
                 descriptionCode: descriptionCode
                 displayNameCode: displayNameCode
                 name: affinity
                 required: true
                 userInterface: false
  2. Display the patch FSD:
    
    oc -n <Fusion_namespace> get fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition -o yaml
  3. Install from the user interface.

Data Cataloging database pods stuck during initialization phase

Symptoms
Multiple Db2 pods use the host port 5002, which can cause pods to stay in the init phase.
Resolution
  1. Uninstall the Data Cataloging service. For procedure, see Uninstalling Data Cataloging.
  2. Create a file with the Data Cataloging FusionServiceDefinition patch.
    cat >> fsd_dcs_patch.yaml << EOF
    apiVersion: service.isf.ibm.com/v1
    kind: FusionServiceDefinition
    metadata:
      name: data-cataloging-service-definition
      namespace: ibm-spectrum-fusion-ns
    spec:
      onboarding:
        parameters:
          - dataType: string
            defaultValue: ibm-data-cataloging
            descriptionCode: BMYSRV00003
            displayNameCode: BMYSRV00004
            name: namespace
            required: true
            userInterface: false
          - dataType: storageClass
            defaultValue: ''
            descriptionCode: BMYDC0300
            displayNameCode: BMYDC0301
            name: rwx_storage_class
            required: true
            userInterface: true
          - dataType: bool
            defaultValue: 'true'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: doInstall
            required: true
            userInterface: false
          - dataType: json
            defaultValue: '{"accept": true}'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: license
            required: true
            userInterface: false
          - dataType: json
            defaultValue: '{"size":1,"mln":2,"storage":{"activelogs":{"requests":"300Gi"},"data":{"requests":"600Gi"},"meta":{"requests":"100Gi"},"activelogs":{"tempts":"100Gi"}}}'
            descriptionCode: descriptionCode
            displayNameCode: displayNameCode
            name: dbwh
            required: true
            userInterface: false
    EOF
    
  3. Apply the patch to downsize the Db2 cluster.
    oc -n ibm-spectrum-fusion-ns patch fusionservicedefinitions.service.isf.ibm.com data-cataloging-service-definition --type=merge --patch-file fsd_dcs_patch.yaml
    
  4. Install Data Cataloging service from the IBM Storage Fusion user interface. For procedure, see Data Cataloging.