Monitoring Kubernetes

Supported versions

Instana currently provides support for the most recent stable versions of Kubernetes. Adhering to the Kubernetes version compatibility policy, Instana supports the latest Kubernetes version with the preceding four versions. However, the earliest two versions are considered as a soft deprecation.

For example, if the current latest version is 1.31, then Instana supports versions 1.31, 1.30, 1.29, 1.28, and 1.27, where versions 1.28 and 1.27 are considered a soft deprecation.

Setting	Best for	Considerations
1s	Deep troubleshooting, fast transient detection	Produces higher noise and increases infrastructure cost; might surface transient anomalies
10s	Most customer scenarios	Provides a good balance between responsiveness and signal quality; captures important changes without overemphasizing brief spikes
30s	large-scale, cost-sensitive, stable environments	Slower detection of issues; transient issues might not be captured

Metric	Description
Pods Allocation	Ratio of allocated pods to pods capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory capacity
Memory Limits Allocation	Ratio of memory limits to memory capacity
CPU Requests	Aggregated CPU requests of all running containers
CPU Limits	Aggregated CPU limits of all running containers
CPU Capacity	Aggregated CPU capacity of all nodes
Memory Requests	Aggregated memory requests of all running containers
Memory Limits	Aggregated memory limits of all running containers
Memory Capacity	Aggregated memory capacity of all nodes
Running Pods	Count of all running pods in this cluster
Pending Pods	Count of all pending pods in this cluster
Allocated Pods	Count of all allocated pods in this cluster
Pods Capacity	Aggregated pods capacity of all nodes
Out Of Disk Nodes	Count of out of disk nodes in this cluster
Memory Pressure Nodes	Count of memory pressure nodes in this cluster
Disk Pressure Nodes	Count of disk pressure nodes in this cluster
Kubelet Ready=False nodes	Count of kubelet nodes with status Ready=False in this cluster
Kubelet Not Ready nodes	Count of kubelet nodes with status Ready=Unknown or Ready=False in this cluster
Available Replicas	Available replicas from all deployments
Desired Replicas	Desired replicas from all deployments
Nodes Count	Number of nodes in this cluster

Metric	Description
Last Job Duration	Duration of last job run
Active Jobs	Number of active jobs
Time To Last Scheduled Job	How long ago a job for this cronjob was scheduled

Metric	Description
Available Replicas	Count of available replicas
Desired Replicas	Count of desired replicas
Unavailable Replicas	Count of unavailable replicas
Misscheduled Replicas	Count of misscheduled replicas
Available to Desired Replica Ratio	Ratio of available to desired replicas

Metric	Description
Active Pods	Number of active pods in this job
Failed Pods	Number of failed pods in this job
Succeeded Pods	Number of succeeded pods in this job
Job Duration	Duration of job run

Metric	Description
CPU Requests	Aggregated CPU requests for this service
CPU Limits	Aggregated CPU limits for this service
Memory Requests	Aggregated memory requests for this service
Memory Limits	Aggregated memory limits for this service

Metric	Description
Memory Requests Capacity	Maximum supported memory for memory requests on this namespace
Used Memory Requests	Amount of memory allocated to used memory requests
Memory Limits Capacity	Maximum supported memory for memory limits on this namespace
Used Memory Limits	Amount of memory allocated to used memory limits
CPU Requests Capacity	Maximum supported CPU for CPU requests on this namespace
Used CPU Requests	Amount of CPU allocated to used CPU requests
CPU Limits Capacity	Maximum supported CPU for CPU limits on this namespace
Used CPU Limits	Amount of CPU allocated to used CPU Limits
Used Pods	Number of pods used for this namespace
Pods Capacity	Number of pods the namespace can take
Used Pods Allocation	Ratio of used pods to pods capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory requests capacity
Memory Limits Allocation	Ratio of memory limits to memory limits capacity
Pods Allocation	Ratio of allocated pods to pod capacity

Metric	Description
Allocated Pods	Count of allocated pods on this node
Pods Capacity	Number of pods the node can take
Memory Requests	Aggregated memory requests of all running containers on this node
Memory Limits	Aggregated memory limits of all running containers on this node
Memory Capacity	Maximum supported memory on this node
CPU Requests	Aggregated CPU requests of all running containers on this node
CPU Limits	Aggregated CPU limits of all running containers on this node
CPU Capacity	Maximum supported CPU on this node
Pods Allocation	Ratio of allocated pods to pod capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory capacity
Memory Limits Allocation	Ratio of memory limits to memory capacity

Metric	Description
Containers Count	Number of containers for this pod
CPU Requests	Aggregated CPU requests on all containers of this pod
CPU Limits	Aggregated CPU limits on all containers of this pod
Memory Requests	Aggregated memory requests on all containers of this pod
Memory Limits	Aggregated memory limits on all containers of this pod
Restarts Count	Aggregated restarts on all containers of this pod

Metrics	Description
Current Replicas	Count of available replicas
Desired Replicas	Count of desired replicas
Maximum Replicas	The maximum number of replicas to which the autoscaler can scale up
Minimum Replicas	The minimum number of replicas to which the autoscaler can scale down
Current Replicas / Maximum Replicas	Ratio of current replicas to maximum replicas
Current Replicas / Minimum Replicas	Ratio of current replicas to minimum replicas
Observed Generation	The most recent generation of replicas that is observed by the autoscaler

Metrics	Description
Storage Class Name	Name of the `StorageClass` object that is used to create this PV
Total Capacity (GiB)	Total capacity of the PV in GiB
Used Capacity (GiB)	Used capacity of the PV in GiB
Utilization	Ratio of the used capacity of the PV to its total capacity, expressed as a percentage
Phase	Current phase of the PV, which can be `Available`, `Bound`, `Released`, or `Failed`
Access Mode	Access mode of the PV

Cloud provider	Name	Provisioner	Support
GCP	PersistentDisk	pd.csi.storage.gke.io	✅
GCP	Hyperdisk	pd.csi.storage.gke.io	✅
GCP	Bucket	gcsfuse.csi.storage.gke.io	✅
GCP	Filestore	filestore.csi.storage.gke.io	✅
AWS	Elastic Block Storage (EBS)	ebs.csi.aws.com	✅
AWS	Elastic File Storage (EFS)	efs.csi.aws.com	✅
AWS	Amazon FSx / Amazon File Cache	filecache.csi.aws.com	✅
AWS	S3	s3.csi.aws.com	✅
Azure	Managed CSI	disk.csi.azure.com	✅
Azure	Managed CSI Premium	disk.csi.azure.com	✅
Azure	Azurefile CSI	file.csi.azure.com	✅
Azure	Azurefile CSI Premium	file.csi.azure.com	✅
IBM	Block Storage	vpc.block.csi.ibm.io	✅
IBM	File Storage	vpc.file.csi.ibm.io	✅
IBM	Cloud Object Storage	ibm.io/ibmc-s3fs	✅
Openshift	Ceph RBD Block Storage	openshift-storage.rbd.csi.ceph.com	✅
Openshift	CephFS	openshift-storage.cephfs.csi.ceph.com	✅
Openshift	Ceph RGW	openshift-storage.object.csi.ceph.com	✅
Openshift	Nooba	openshift-storage.noobaa.io/obc	✅

Metrics	Description
Total Capacity (GiB)	Total capacity of the PVC in GiB
Used Capacity (GiB)	Used capacity of the PVC in GiB
Utilization	Ratio of the used capacity of the PVC to its total capacity, expressed as a percentage
Phase	Current phase of the PVC, which can be `Available`, `Bound`, `Released`, or `Failed`
Access Mode	Access mode of the PVC

Monitoring Kubernetes

Supported versions

Supported managed kubernetes

Supported service meshes

Installing the Instana agent in kubernetes

Kubernetes sensors

Legacy Kubernetes sensor

Installing

Checking the status and version of the legacy sensor

Troubleshooting

Next Generation K8sensor

Installing

Optimize data ingestion through polling interval configuration

Checking the status and version of K8sensor

Enabling autoscaling (HPA) for K8sensor (workaround)

Prerequisites

Procedure

Troubleshooting

Accessing Kubernetes information

Kubernetes page

Kubernetes dashboards

CPU and memory usage

Applications page

Infrastructure page

Kubernetes AI assistant

Analyzing Kubernetes calls

Analyzing Kubernetes logs

Linking Kubernetes services and logical services

Single Kubernetes service to multiple logical services

Single logical service to multiple Kubernetes services

Viewing metrics

Cluster

CronJob

DaemonSet

Deployment

Job

Kubernetes service

Namespace

Node

Pod

StatefulSet

Horizontal Pod Autoscalers (HPA)

Persistent Volume (PV)

Monitoring PV

Setting a Smart Alert for PV

Storage class support

Persistent Volume Claim (PVC)

Monitoring PVC

Setting a Smart Alert for PVC

Control plane monitoring

Accessing control plane monitoring in the UI

Debugging information

Health rules

Built-in

Custom

Required Role-Based Access Control (RBAC) for Instana agent installation

Instana agent operator

Instana agent (DaemonSet)

K8Sensor (deployment)

Required Role-Based Access Control (RBAC) for AutoTrace webhook

Mutating webhook pod

Instrumentation init container

Monitoring Java with Istio or OpenShift ServiceMesh

Monitor by using the agent.serviceMesh.enabled flag

Monitor by using service mesh bypass (deprecated)

Debugging the mesh by-pass

Verify enabled

Verify iptable rules

Troubleshooting notes

Why am I not seeing any Kubernetes clusters or namespaces?

Missing clusterRole permissions

Monitoring custom resources

Creating a ClusterRole for custom resource monitoring

Creating a ClusterRoleBinding

Collecting logs

Steps for log collection

Enabling debug logging for troubleshooting

Using the Instana agent custom resource

Using Helm chart

Monitor by using the `agent.serviceMesh.enabled` flag