Kubernetes data collector issues
Use this topic to review possible causes and solutions to Ansible or Kubernetes data collector installation issues, as well as data retrieval issues.
- Installing klusterlet with Helm
- Ansible install
- Kubernetes data collector install
- Dashboard pages show no data
Installing klusterlet with Helm
Problem - Installing klusterlet with Helm
You are installing the klusterlet with Helm, and by mistake, deleted the Helm and now you can't redeploy the klusterlet since there is already a custom resource definition, for example: ‘k8sdcs.ibmcloudappmgmt.com’.
When you try to delete the custom resource definition it hangs.
You get a Internal service error : rpc error: code = Unknown desc = object is being deleted: customresourcedefinitions.apiextensions.k8s.io "k8sdcs.ibmcloudappmgmt.com already exists message.
Cause - Installing klusterlet with Helm
This error occurs when you attempt to perform the klusterlet install but a custom resource definition already exists and you are unable to delete the custom resource definition.
Solution - Installing klusterlet with Helm
Check if the custom resource exists by running the command:
kubectl get K8sDC -n multicluster-endpoint
If it does exist, first patch it using the command:
kubectl patch k8sdcs.ibmcloudappmgmt.com -p '{"metadata":{"finalizers":[]}}' --type=merge Your_CR_name -n multicluster-endpoint
You can then delete the custom resource definition, and reinstall the klusterlet with Helm.
Ansible install
Problem - Ansible install
You get a dc-secret already exists or ibm-agent-https-secret already exists message.
Cause - Ansible install
This error occurs when you attempt to perform the Ansible install in a namespace where another install was already performed.
Solution - Ansible install
If you do not have any running Kubernetes data collector releases in this namespace, delete the existing secret with the following command and run the script again:
kubectl -n myNamespace delete secret dc-secret
or
kubectl -n myNamespace delete secret ibm-agent-https-secret
If a running Kubernetes data collector release already exists in this namespace, you need to either remove that release using the procedure in Uninstalling the Kubernetes data collector and then re-install the data collector, or install this second release into a different namespace.
Kubernetes data collector install
Problem - Kubernetes data collector install
Instead of success initialization indicators in the installation logs (see Checking the Kubernetes installation logs), you get warning messages.
Cause - Kubernetes data collector install
Potential issues that might be the cause:
- Proper ingresses aren't configured on the backend server
- HTTPS is not enabled on the backend server
- Invalid Authentication provided
- Invalid configuration provided, such as the wrong tenantID or ingress (or ingresses). Ensure that you are directing data to the correct backend
- Backend services not yet ready
- Backend services struggling
- Unsuccessful collection cycle. This is not critical and could be due to unexpected data or backend services struggling. The data collector reinitializes the cache and tries again after the next interval: After 10 minutes of unsuccessful cycles, the pod will recycle
Solution - Kubernetes data collector install
Review the logs for indicators, then review and adjust the settings.
Dashboard pages show no data
Problem - Dashboard pages show no data
No metrics are displayed in the Kubernetes data collector dashboard pages.
Cause - Dashboard pages show no data
The local domain cannot be resolved. The K8Monitor component is unable to register dashboards when the IBM Cloud Private cluster doesn't resolve the master node on the DNS (Domain Name System) server.
You can check for an unresolved domain by running the nslookup
command on the master node (such as nslookup master-node.cn.ibm.com). A message that the server can't find the master-node.address
confirms that the domain is unresolved.
Solution - Dashboard pages show no data
- Install and configure the NLnet Labs Unbound DNS resolver utility. For more information, see https://nlnetlabs.nl/documentation/unbound/.
- Modify
/etc/resolv.conf
on all cluster VMs: Add the Unbound server as the DNS server. - Replace
/etc/resolv.conf
that you modified in step 2 on all monitored machines. - Restart the cluster's kube-dns pod, which is responsible for the DNS resolution (such as service name and domain name) in the container.
- Restart the agent. (For more information, see Using agent commands.)