Monitoring Kubernetes
- Supported versions
- Installing the Instana agent in kubernetes
- Accessing kubernetes information
- Kubernetes dashboards
- Analyze kubernetes calls
- Analyze kubernetes logs
- Linking kubernetes services and logical services
- Sensor data collection
- Health rules
- Service meshes
- Troubleshooting / notes
Supported versions
- Instana supports the current stable version of Kubernetes. If the current stable version of Kubernetes is 1.26, the Kubernetes sensor is pinned to version 1.24. According to Kubernetes version compatibility guarantee, Instana supports Kubernetes versions 1.22 to 1.26. However, the lowest 2 versions in that range are considered as a soft deprecation.
Supported managed kubernetes
- IBM Cloud Kubernetes Service Monitoring and Performance Management
- Amazon Elastic Container Service for Kubernetes (EKS)
- Azure Kubernetes Service (AKS)
- Google Kubernetes Engine (GKE)
- IBM Cloud Kubernetes Service
- VMware Tanzu Kubernetes Grid (TKG) and VMware Tanzu Kubernetes Grid Integration (TKGI), formerly known as Pivotal Container Service (PKS)
Note: Only Linux workers are supported. Workers that run on Windows are not supported.
Supported service meshes
- Instana supports the last three stable versions of Istio.
Installing the Instana agent in kubernetes
The Agent Setup for Kubernetes describes how to install the Instana agent into your cluster.
The installation of Instana agents on VMware Tanzu Kubernetes Grid is fully automated by the Instana Microservices Application Monitoring for VMware Tanzu tile.
Accessing kubernetes information
Once the agent has been deployed to your cluster, the Kubernetes sensor will report detailed data about the cluster and the resources deployed into it.
Instana automatically discovers and monitors Kubernetes:
- Clusters
- CronJobs
- Nodes
- Namespaces
- Deployments
- DaemonSets
- StatefulSets
- Services
- Pods
Kubernetes information is easily accessible and deeply integrated in all aspects of your application.
From application perspectives
Kubernetes information is also accessible from within all your application perspectives or services. If a service is running on a Kubernetes cluster, the respective context information is shown in the "Infrastructure" tab:
For containers the pod and namespace, and for hosts the cluster and node are shown and directly linked.
From infrastructure
In the Infrastructure map you will see Kubernetes information in the sidebar for either the host or the container you have selected.
You can use Dynamic Focus to filter the data. For example search for a specific deployment in a cluster. Additionally, the keywords entity.kubernetes.cluster.distribution
and entity.kubernetes.cluster.managedBy
enable searching for a Kubernetes cluster by distribution and management layer. Supported values for entity.kubernetes.cluster.distribution
are gke
, eks
, openshift
and kubernetes
.
Supported values for entity.kubernetes.cluster.managedBy
are rancher
and none
.
Kubernetes dashboards
Kubernetes dashboards present all information needed for a given Kubernetes entity. The context is always accessible via the context path at the top. In the following screenshot we are seeing a Namespace named "robot-shop" in a cluster called "will-k8s-cluster".
The different dashboards are always structured in the same way:
-
Summary shows the most relevant information for a given entity. This dashboard starts with a status line that shows the current status and related information like age. In the next section, you can see the CPU, Memory, and Pod information, which provides the consumed resources including the pods. Sections like Top Deployments and Top Pods in the following screenshot show potential hotspots, which you might want to have a look at. The Logs section shows you the distribution chart of relevant logs for that entity, which complements the entity metrics. The chart is interactive and allows selection and highlight over all the measured values. You can focus on the selected time period or jump to the Analyze section to continue a troubleshooting journey.
-
Details shows detailed information like "labels", "annotation", and the "spec".
-
Events shows all relevant Kubernetes events and links them to the respective dashboards.
-
Related Entities like "Deployments", "K8s Services" and "Pods" are shown in the following tabs. What is shown depends on the entity you have selected.
CPU and memory usage
For Kubernetes pods, deployments, services, namespaces and nodes it is possible to see an aggregated view of the current CPU and Memory usage as it compares to the CPU and Memory limits and requests set for these resources.
If available, the usage information is calculated from data gathered from the container runtime that is executing the containers that make up the given resource.
Analyze kubernetes calls
Unbounded Analytics gives you powerful tools to slice and dice every single call in your Kubernetes cluster. If you click the button "Analyze Calls" from a Kubernetes dashboard the appropriate filter and grouping is already set. In this case we are seeing all calls in the "robot-shop" namespace grouped by pods:
Analyze kubernetes logs
Unbounded Analytics gives you powerful tools to slice and dice every single log in your Kubernetes cluster. If you click the Analyze Logs button from a Kubernetes dashboard,
the appropriate filtering is already set. In this case, you can see all logs in the robot-shop
namespace as follows:
To provide relevant information without changing context, Instana enriches log messages with infrastructure and kubernetes metadata, which are displayed in the tag table after the log message is expanded. See the following tag table:
Linking kubernetes services and logical services
Single kubernetes service to multiple logical services
Multiple logical services can be related to a single Kubernetes service when the service mapping rules match up, and there are calls generated on that Kubernetes service. For example, a Kubernetes service with the label selector "service=my-service"
may
contain pods that have the additional labels "env=dev"
and "env=staging"
— combined with a custom service mapping configuration in Instana with the following tags kubernetes.container.name
and kubernetes.pod.label, key: env
, results in multiple logical services linked to that single Kubernetes service and is displayed on the Kubernetes Service dashboard.
Single logical service to multiple kubernetes services
Multiple Kubernetes services can be related to a single logical service when those Kubernetes services are destroyed and recreated over time. For example, if the Kubernetes service shop-service-a
with generated calls is replaced
over time with shop-service-b
with generated calls, both services are displayed on the logical service dashboard when the selected period of time overlapped when the calls were generated.
Sensor data collection
Instana collects information about the Kubernetes Cluster, CronJob, DaemonSet, Deployment, Job, K8s Service, Namespace, Node and StatefulSet
Cluster
Metric | Description |
---|---|
Pods Allocation | Ratio of allocated pods to pods capacity |
CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
Memory Requests Allocation | Ratio of memory requests to memory capacity |
Memory Limits Allocation | Ratio of memory limits to memory capacity |
CPU Requests | Aggregated CPU requests of all running containers |
CPU Limits | Aggregated CPU limits of all running containers |
CPU Capacity | Aggregated CPU capacity of all nodes |
Memory Requests | Aggregated memory requests of all running containers |
Memory Limits | Aggregated memory limits of all running containers |
Memory Capacity | Aggregated memory capacity of all nodes |
Running Pods | Count of all running pods in this cluster |
Pending Pods | Count of all pending pods in this cluster |
Allocated Pods | Count of all allocated pods in this cluster |
Pods Capacity | Aggregated pods capacity of all nodes |
Out Of Disk Nodes | Count of out of disk nodes in this cluster |
Memory Pressure Nodes | Count of memory pressure nodes in this cluster |
Disk Pressure Nodes | Count of disk pressure nodes in this cluster |
Kubelet Not Ready nodes | Count of kubelet not ready nodes in this cluster |
Available Replicas | Available replicas from all deployments |
Desired Replicas | Desired replicas from all deployments |
Nodes Count | Number of nodes in this cluster |
CronJob
Metric | Description |
---|---|
Last Job Duration | Duration of last job run |
Active Jobs | Number of active jobs |
Time To Last Scheduled Job | How long ago a job for this cronjob was scheduled |
DaemonSet
Metric | Description |
---|---|
Available Replicas | Count of available replicas |
Desired Replicas | Count of desired replicas |
Unavailable Replicas | Count of unavailable replicas |
Misscheduled Replicas | Count of misscheduled replicas |
Available to Desired Replica Ratio | Ratio of available to desired replicas |
Deployment
Metric | Description |
---|---|
Available Replicas | Count of available replicas |
Desired Replicas | Count of desired replicas |
Available to Desired Replica Ratio | Ratio of available to desired replicas |
Pending Pods | Count of pending pods |
Unscheduled Pods | Count of unscheduled pods |
Unready Pods | Count of unready pods |
Pending Phase Duration | Duration of pending phase |
Pods Count | Number of pods for this deployment |
Memory Requests | Aggregated memory requests of all running containers for this deployment |
Memory Limits | Aggregated memory limits of all running containers for this deployment |
CPU Requests | Aggregated CPU requests of all running containers for this deployment |
CPU Limits | Aggregated CPU limits of all running containers for this deployment |
Job
Metric | Description |
---|---|
Active Pods | Number of active pods in this job |
Failed Pods | Number of failed pods in this job |
Succeeded Pods | Number of succeeded pods in this job |
Job Duration | Duration of job run |
K8s service
Metric | Description |
---|---|
CPU Requests | Aggregated CPU requests for this service |
CPU Limits | Aggregated CPU limits for this service |
Memory Requests | Aggregated memory requests for this service |
Memory Limits | Aggregated memory limits for this service |
Namespace
Metric | Description |
---|---|
Memory Requests Capacity | Maximum supported memory for memory requests on this namespace |
Used Memory Requests | Amount of memory allocated to used memory requests |
Memory Limits Capacity | Maximum supported memory for memory limits on this namespace |
Used Memory Limits | Amount of memory allocated to used memory limits |
CPU Requests Capacity | Maximum supported CPU for CPU requests on this namespace |
Used CPU Requests | Amount of CPU allocated to used CPU requests |
CPU Limits Capacity | Maximum supported CPU for CPU limits on this namespace |
Used CPU Limits | Amount of CPU allocated to used CPU Limits |
Used Pods | Number of used pods for this namespace |
Pods Capacity | Number of pods the namespace can take |
Used Pods Allocation | Ratio of used pods to pods capacity |
CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
Memory Requests Allocation | Ratio of memory requests to memory requests capacity |
Memory Limits Allocation | Ratio of memory limits to memory limits capacity |
Pods Allocation | Ratio of allocated pods to pods capacity |
Node
Metric | Description |
---|---|
Allocated Pods | Count of allocated pods on this node |
Pods Capacity | Number of pods the node can take |
Memory Requests | Aggregated memory requests of all running containers on this node |
Memory Limits | Aggregated memory limits of all running containers on this node |
Memory Capacity | Maximum supported memory on this node |
CPU Requests | Aggregated CPU requests of all running containers on this node |
CPU Limits | Aggregated CPU limits of all running containers on this node |
CPU Capacity | Maximum supported CPU on this node |
Pods Allocation | Ratio of allocated pods to pods capacity |
CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
Memory Requests Allocation | Ratio of memory requests to memory capacity |
Memory Limits Allocation | Ratio of memory limits to memory capacity |
Pod
Metric | Description |
---|---|
Containers Count | Number of containers for this pod |
CPU Requests | Aggregated CPU requests on all containers of this pod |
CPU Limits | Aggregated CPU limits on all containers of this pod |
Memory Requests | Aggregated memory requests on all containers of this pod |
Memory Limits | Aggregated memory limits on all containers of this pod |
Restarts Count | Aggregated restarts on all containers of this pod |
StatefulSet
Metric | Description |
---|---|
Available Replicas | Count of available replicas |
Desired Replicas | Count of desired replicas |
Available to Desired Replica Ratio | Percentage of available to desired replicas |
Health rules
Built-in
There are a couple of built-in health rules that will trigger an issue for Kubernetes entities
- Cluster
- Kubernetes reports a Master-Component (api-server, scheduler, controller manager) is unhealthy. Note that due to a bug in Kubernetes the health is not always reported reliably. We try to filter these out, not causing an alert but only showing up on the Cluster detail page.
- Node
- Requested CPU is approaching max capacity (requested CPU / CPU capacity ratio is greater than 80%).
- Requested Memory is approaching max capacity (requested memory / memory capacity ratio is greater than 80%).
- Allocated pods are approaching maximum capacity (allocated pods / pods capacity ratio is greater than 80%). For a node pods in the phases 'Running' and 'Unknown' are counted as allocated. See Kubernetes docs for details on node capacity.
- Node reports a condition which is not ready for more than one minute. For a node that's all conditions besides the Ready condition. See Kubernetes docs for details on all node conditions.
- Namespace
- Requested CPU is approaching max capacity (requested CPU / CPU capacity ratio is greater than 80%).
- Requested Memory is approaching max capacity (requested memory / memory capacity ratio is greater than 80%)
- Allocated pods are approaching maximum capacity (allocated pods / pods capacity ratio is greater than 80%). For a namespace pods in the phases 'Pending', 'Running', and 'Unknown' are counted as allocated. The namespace capacity values are based on ResourceQuotas which can be set per Namespace. See Kubernetes docs for details.
- Deployment
- Available replicas less than desired replicas.
- Pod
- A pod is not ready for more than one minute, and the reason is not that it's completed. (PodCondition=Ready, Status=False, Reason != PodCompleted). See Kubernetes docs for details on all pod conditions.
Custom
In addition to the built-in rules, you can also create custom rules on metrics of a cluster, namespace, deployment, and pod. E.g. if the threshold for node capacity warnings is too high you can disable them and create a custom rule with a lower threshold. See Events & Incidents configuration for details.
Service meshes
OpenShift ServiceMesh
Please see the OpenShift FAQs on the OpenShift ServiceMesh.
Istio
The default installation should work out of the box with Instana. If however you deploy Istio with a default deny policy (mode: REGISTRY_ONLY
). To work effectively with this configuration it is necessary to enable Instana's service
mesh by-pass. This can be enabled with the following agent configuration:
com.instana.container:
serviceMesh:
enableServiceMeshBypass: true
The setting will bypass blocked network connectivity in 2 different ways:
- Allow outgoing traffic from the application pod to the agent (on all ipv4 addresses the agent listens on, all ports).
- Allow incoming traffic to the application pod from the agent for JVM applications (from all ipv4 addresses the agent listens on, all ports).
Debugging the mesh by-pass
There are a couple of steps that can be taken to debug the service mesh by-pass.
- verify it is enabled.
- verify the iptable rules are applied to the container.
Verify enabled
To verify the service mesh by-pass is enabled you can check in the Instana Agent logs with the following command:
kubectl logs -l app.kubernetes.io/instance=instana-agent -n instana-agent -c instana-agent
If it is enabled you should find log lines that look similar to the following which indicate an inboud & output by-pass entry has been written for the denoted process:
2021-04-26T08:13:57.065+0000 | INFO | -client-thread-2 | DefaultServiceMeshSupport | 51 - com.instana.agent - 1.1.597 | Applying inbound service mesh bypass for process '764670'
and
2021-04-26T08:13:57.140+0000 | INFO | -client-thread-2 | DefaultServiceMeshSupport | 51 - com.instana.agent - 1.1.597 | Applying outbound service mesh bypass for process '764670'
Verify iptable rules
The easiest way to verify the iptable rules is to shell into the instana agent and listing the target containers iptables rules as follows. Replace ${PID}
with the process pid:
kubectl -n instana-agent exec -it ${INSTANA_AGENT_POD} -c instana-agent -- /bin/bash
nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_OUTPUT
If the chains have been applied the command should have an output similar to the following:
Chain INSTANA_OUTPUT (1 references)
target prot opt source destination
ACCEPT tcp -- 0.0.0.0/0 10.128.15.237
ACCEPT tcp -- 0.0.0.0/0 10.64.0.1
ACCEPT tcp -- 0.0.0.0/0 169.254.123.1
To support bi-directional communication between the instana agent and your JVM processes also check the following:
nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_INBOUND
with a result similar to this:
Chain INSTANA_INBOUND (1 references)
target prot opt source destination
ACCEPT tcp -- 10.128.15.237 10.64.0.14
ACCEPT tcp -- 10.64.0.1 10.64.0.14
ACCEPT tcp -- 169.254.123.1 10.64.0.14
Depending on when the rules were applied it can take a few minutes for the process to be instrumented and data to be visible in Instana's dashboards.
Troubleshooting / notes
Why am I not seeing any Kubernetes clusters or namespaces?
If there are no clusters or namespaces listed on the Kubernetes page, either no cluster is actively being monitored due to an agent not being installed, or no cluster was monitored during your selected timeframe.
Click Live to check for any clusters and namespaces in live mode, and if none are listed, see our Install Kubernetes section.
Missing clusterRole permissions
Monitoring issue type: kubernetes_missing_permissions
The Instana Agent requires the appropriate ClusterRole permissions for specific resources to be able to monitor a Kubernetes cluster successfully. If these permissions are missing, there will be corresponding resources missing on the Instana Kubernetes dashboards. To resolve this issue, install the latest version of the Instana Agent YAML, Helm chart or Operator. See our Kubernetes or OpenShift documentation for more information on the latest version of each installation method.
Deprecations
- We have deprecated support for the
extensions/v1beta1
andapps/v1beta2
API versions for DaemonSet, Deployment, and ReplicaSet in our Kubernetes Sensor, following the announcement from Kubernetes that these deprecated API versions will soon be removed in Kubernetes v1.16. Please follow the latest installation documentation for Kubernetes or OpenShift and update to our latest Helm chart, agent YAML or operator.