Troubleshooting
Problem
Cloud pak for data is a microservices based architecture with multiple containers contributing to certain feature on the User Interface. Often times, its not clear as what pods provide which service and this document aims to build such list. By no means this is an exhaustive list as inter pod/containers communications can go as deep as 5-10 pods to provide certain features on the UI.
Tools like Istio should however be used to get more in-depth and detailed insight into such inter pod communication.
Symptom
Following are some of the errors that gives an indication that some of the pods are not healthy or are having trouble returning proper responses
1. 504 Service Not available
2. 502 Gateway timeout
3. 404 Page not found
4. 500 Internal Server Error
4. Any other errors on the UI which indicates that some service is not available
Diagnosing The Problem
Refer to the below list of pods which contribute to services on the user interface. General technique is to look into the pod readiness, health and its logs. This can be done by a system administrator who has access to autheticated oc client which points to CPD project/namespace
1. Get the actual pod name from the below table and review its status
oc get pods | grep <name>
NAME READY STATUS RESTARTS AGE
xxx 1/2 Running 0 5d
yyy 1/1 Running 23 98d
zzz 0/1 CrashLoopbackoff 2 98d
Ensure that the pod is healthy and ready. In the above example -
xxx pod has two containers and 1 of them is not ready
yyy pod is ready but has been restarting too many times
zzz pod is crashing
2. Check the pod description to see list of events which is towards the end. This provides useful information on the lifecycle of the pod and any errors during the critical phase of scheduling and starting it.
oc describe pod xxx
3. Check the logs of the pod
oc logs xxx
Variations of the above commands which are handy
oc logs xxx > xxx.log (dump the logs into a file, good for sending it to support)
oc logs xxx -f (tail the log as they are being generated, handy when you want to troubleshoot and monitor the logs as you run the usecase)
oc logs xxx --tail=100 (tail last 100 rows)
oc logs xxx if --tail=100 (tail last 100 rows and follow the logs as they are being generated)
List of Pods
Feature Area | Relevant Pods | Additional Comments |
---|---|---|
Data source connection | wdp-connect-connection wdp-connect-connector |
|
Data Profiling in WKC Catalog |
wdp-activities
dataconn-engine-spark-cluster
dataconn-engine-service wdp-profiling-messaging
wdp-profiling
wdp-profiling-ui wdp-couchdb-[0-2]
|
|
WKC Catalog | portal-main redis-ha-server-0 redis-ha-server-1 portal-common-api catalog-api wdp-couchdb-[0-2] wdp-activities dc-main |
|
Left Pane | zen-data-sorcerer zen-core-api |
|
Data Discovery
(Auto Discovery/Quick Scan)
|
iis-xmetarepo is-en-conductor-0 iis-services kafka-0 zookeeper-0 solr-0 |
|
Data Discovery (Quick Scan) |
omag
odf-fast-analyzer
iis-services
iis-xmetarepo
is-en-conductor-0 |
|
ia-analysis | ||
Data Stage | is-en-conductor-0 iis-services |
|
WKC Search | elasticsearch-master | |
IGC Search | gov-enterprise-search-ui gov-catalog-search-service solr-0 |
|
Term assignments in Discovery | ||
IGC to WKC Catalog Sync | ||
WKC Workflow - Draft and Publish |
wkc-glossary-service
|
|
Spawning Jupyter Notebooks Env |
spawner-api
oc get pods | grep jupyter oc get pods -l app=ws-notebooks-ui
|
|
github integration | asset-files-api portal-main |
|
User Management /LDAP | zen-metastoredb zen-core usermgmt |
|
cpd-install-operator | ||
Document Location
Worldwide
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m0z000000GoylAAC","label":"Troubleshooting"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"2.5.0;3.0.0","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Product Synonym
ICPD;CPD;Cloud Pak Data;
Document Information
Modified date:
21 August 2020
UID
ibm16204935