Monitoring Kubernetes
Instana can help you access detailed Kubernetes information, analyze Kubernetes calls, link Kubernetes services and logical services, use built-in or custom health rules for Kubernetes entity alerts, and trace workloads that are deployed on the service meshes.
Supported versions
Instana currently provides support for the most recent stable versions of Kubernetes. Adhering to the Kubernetes version compatibility policy, Instana supports the latest Kubernetes version with the preceding four versions. However, the earliest two versions are considered as a soft deprecation.
For example, if the current latest version is 1.31, then Instana supports versions 1.31, 1.30, 1.29, 1.28, and 1.27, where versions 1.28 and 1.27 are considered a soft deprecation.
Supported managed kubernetes
- IBM Cloud Kubernetes Service Monitoring and Performance Management
- Amazon Elastic Container Service for Kubernetes (EKS)
- Azure Kubernetes Service (AKS)
- Google Kubernetes Engine (GKE)
- IBM Cloud Kubernetes Service
- VMware Tanzu Kubernetes Grid (TKG) and VMware Tanzu Kubernetes Grid Integration (TKGI), formerly known as Pivotal Container Service (PKS)
Supported service meshes
Instana supports the last three stable versions of Istio.
Installing the Instana agent in kubernetes
To monitor Kubernetes with Instana, you need to install the Instana host agent in your Kubernetes cluster.
For more information about the host agent installation steps, see Installing the host agent on Kubernetes.
The installation of Instana agents on VMware Tanzu Kubernetes Grid is fully automated by the Instana Microservices Application Monitoring for VMware Tanzu tile.
Kubernetes sensors
Instana provides the following Kubernetes sensors:
- Legacy Kubernetes sensor
- Next Generation K8sensor
The Legacy Kubernetes sensor is an end-of-life (EOL) product. All new features and fixes are delivered to K8sensor, so update your environments to use it.
The Next Generation K8sensor has the following advantages:
- High availability
- Better control over resource limits in a deployment
- New features, functionality, and fixes that are not backported to the legacy sensor
To enable Horizontal Pod Autoscaling (HPA) for K8sensor, see Enabling autoscaling (HPA) for k8sensor (workaround).
The Legacy Kubernetes sensor is disabled by default. To verify, check the agent configuration.yaml file:
com.instana.plugin.kubernetes:
enabled: false
To ensure this configuration is in effect, complete the steps in Checking the status and version of the legacy sensor. If the sensor is properly disabled, you see no match for the Legacy Kubernetes sensor in the Instana UI.
The Next Generation K8sensor is automatically installed by default after the agent is installed. To determine if the K8sensor is running on a cluster, run the command in the Checking the status and version of K8sensor section.
Legacy Kubernetes sensor
The Legacy Kubernetes sensor is end-of-life (EOL). The last version is still available for installation.
Installing
If you still need to reinstall the Legacy Kubernetes sensor on a system if it is not migrated to the Next Generation K8sensor, you can use the final version of the deprecated 1.X agent Helm chart 1.2.74 with the --set k8s_sensor.deployment.enabled=false setting during installation.
Checking the status and version of the legacy sensor
To check the status and running version of the Legacy Kubernetes sensor, complete the following steps:
- From the navigation menu, select Infrastructure.
- On the Maps tab, click your Kubernetes node.
- In the node details panel, expand Instana Agent (1) and then click on your agent.
- On the agent panel, click Open Dashboard.
- On the agent page, click Sensors Info in the Info area.
- In the Sensors Info window, search
Instana - Kubernetes - Sensorin the search field.
If the sensor is properly enabled, you see a match in the table. The State column displays the value Active, and the Version column shows the version number, for example, 1.2.143.
Troubleshooting
If the search for Instana - Kubernetes - Sensor and filtering the agent logs for com.instana.sensor-kubernetes yield no results, then check the configuration map of the Instana agent by running the following command:
kubectl -n instana-agent get cm instana-agent -o yaml
Make sure that the following setting is not present:
com.instana.plugin.kubernetes:
enabled: false
If the State column for the Instana - Kubernetes - Sensor displays Waiting for several minutes, it means that the sensor failed to activate.
Search the agent logs, filter for com.instana.sensor-kubernetes, and look for Activating Kubernetes Sensor. Specifically, check for the following message:
ERROR : bundle com.instana.sensor-kubernetes:1.2.143 (246)[com.instana.agent.kubernetes.sensor.Kubernetes(479)] : The activate method has thrown an exception
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.43.0.1/apis/apps/v1/deployments?labelSelector=app%3Dk8sensor. Message: Forbidden!Configured serv
ice account doesn't have access. Service account may have been revoked. deployments.apps is forbidden: User "system:serviceaccount:instana-agent:instana-agent" cannot list resource "deployme
nts" in API group "apps" at the cluster scope.
This error message indicates that a service account is not usable for accessing the Kubernetes API server.
Next Generation K8sensor
Installing
When you install the agent, the Next Generation K8sensor is installed automatically by default via the icr.io/instana/k8sensor:latest tag . If you specify the k8s_sensor.deployment.enabled value, make sure that it is set to true (the default value).
Optimize data ingestion through polling interval configuration
You can configure the K8sensor polling rate by setting k8s_sensor.pollrate in one of the following locations:
- Helm chart (for example,
--set k8s_sensor.pollrate=15s) InstanaAgentcustom resource (for example,k8s_sensor.pollrate: "15s")
1s to 30s. Values outside this range are automatically clamped to the nearest threshold.
| Setting | Best for | Considerations |
|---|---|---|
| 1s | Deep troubleshooting, fast transient detection | Produces higher noise and increases infrastructure cost; might surface transient anomalies |
| 10s | Most customer scenarios | Provides a good balance between responsiveness and signal quality; captures important changes without overemphasizing brief spikes |
| 30s | large-scale, cost-sensitive, stable environments | Slower detection of issues; transient issues might not be captured |
Checking the status and version of K8sensor
To determine if the K8sensor is running on a cluster, run the following command to list deployments with the label app: k8sensor.
kubectl get deployments --all-namespaces -l app=k8sensor
If K8sensor is in the list, then K8sensor is enabled. When K8sensor is enabled, the Legacy Kubernetes sensor is automatically disabled. In the configuration.yaml file, you can see the enabled value is changed to false as follows:
com.instana.plugin.kubernetes:
enabled: false
The K8sensor version is best identified by the sha256 hash of its image. To check the image hash of the running K8sensor container, run the following command:
kubectl get po -n instana-agent -l app=k8sensor -o jsonpath='{ .items[0].status.containerStatuses[0].imageID }'
Enabling autoscaling (HPA) for K8sensor (workaround)
Instana does not currently provide a built-in autoscaling (HPA template) for the Next Generation K8sensor. However, you can enable autoscaling by applying a standard Kubernetes HorizontalPodAutoscaler (HPA) to the k8sensor deployment. This workaround requires a cluster with metrics-server installed.
The Instana agent operator manages the initial replica count of the k8sensor deployment from the InstanaAgent custom resource (CR). To avoid conflicts with HPA, set the CR replica count once and keep it equal to the HPA's minReplicas.
Prerequisites
Ensure that the metrics-server is installed and working in your cluster.
Procedure
To enable autoscaling, complete the following steps:
- Set resource requests and stable replicas in the
InstanaAgentCR: Set the deployment replicas in the CR to your required minimum and keep it unchanged. For utilization-based HPA, define CPU and memory requests for the k8sensor container as shown in the following example:kubectl patch agents.instana.io instana-agent -n instana-agent \ --type='merge' -p '{ "spec": { "k8s_sensor": { "deployment": { "replicas": 2, "pod": { "requests": { "memory": "512Mi", "cpu": "200m" } } } } } }'Adjust replicas, memory, and CPU to match your environment. Ensure that the namespace and CR name (
instana-agent) match your installation. -
Apply the HPA manifest:
-
Create a file
k8sensor-hpa.yamlas shown in the following example. Update the namespace, if different.apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: instana-agent-k8sensor namespace: instana-agent spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: instana-agent-k8sensor minReplicas: 2 maxReplicas: 8 behavior: scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 60 metrics: - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 75 -
Apply the manifest by running the following command:
kubectl apply -f k8sensor-hpa.yamlNote: Notes: Set requests for Utilization targets. If you cannot set requests, use AverageValue instead of Utilization with a byte target (for example, averageValue: 800Mi). Memory is typically the primary driver; the CPU can be added as a guardrail.
-
-
Verify whether HPA is enabled by running the following commands:
kubectl get hpa instana-agent-k8sensor -n instana-agentkubectl get deploy instana-agent-k8sensor -n instana-agent -o jsonpath='{.spec.replicas}'; echokubectl top pods -n instana-agent | grep k8sensorIf you later edit the
InstanaAgentCR, the Instana agent operator might briefly reset.spec.replicas. However, the HPA reconciles and restores the required replica count. The behavior settings in HPA prevent flapping.
Troubleshooting
If listing the deployments with the label app: k8sensor returns no results, the cluster is not running K8sensor. This issue means that K8sensor was disabled when the agent was deployed. To resolve this issue, redeploy the agent with K8sensor enabled.
Accessing Kubernetes information
After the agent is deployed to your cluster, the Kubernetes sensor reports detailed data about the cluster and the resources that are deployed into it.
Instana automatically detects and monitors all resources that are running in the Kubernetes cluster:
- Clusters
- CronJobs
- Nodes
- Namespaces
- Deployments
- DaemonSets
- StatefulSets
- Services
- Pods
Kubernetes information is easily accessible and deeply integrated in all aspects of your application.
Kubernetes page
Click Platforms > Kubernetes in the navigation menu of the Instana UI, you can see the information of your Kubernetes clusters and namespaces.
Kubernetes dashboards
On the Kubernetes page, click a cluster or a namespace. You can see Kubernetes dashboards that present all the information for a certain Kubernetes entity. The context is always accessible by the context path. In the following screenshot, you can see a namespace that is named "robot-shop" in a cluster called "k8s-demo-cluster".
Kubernetes dashboards are structured as follows:
-
Summary shows the most relevant information for a certain entity. This dashboard starts with a status line that shows the status and related information, such as age. In the next section, you can see the CPU, memory, and pod information, which provides the consumed resources, including the pods. Sections like Top Deployments and Top Pods in the following screenshot show potential hotspots, which you might want to have a look at. The Logs section shows you the distribution chart of relevant logs for that entity, which complements the entity metrics. The chart is interactive and allows selection and highlight over all the measured values. You can focus on the selected time period or jump to the Analyze section to continue a troubleshooting journey.
-
Details shows detailed information like "labels", "annotation", and the "spec".
-
Events shows all relevant Kubernetes events and links them to the respective dashboards.
-
Related Entities like "Deployments", "K8s Services", and "Pods" are shown as tabs of Kubernetes dashboards. What is shown depends on the entity that you selected.
CPU and memory usage
For Kubernetes pods, deployments, services, namespaces, and nodes, you can view current CPU and Memory usage as it compares to the CPU and Memory limits and requests set for these resources.
If available, the usage information is calculated from data gathered from the container runtime that is running the containers that make up the resources.
Applications page
Click Applications in the navigation menu of the Instana UI, and then click the Applications or the Services tab. If the service or application is running on a Kubernetes cluster, you can see the respective context information in the Infrastructure tab:
For containers, the pod and namespace are displayed and directly linked; for hosts, the cluster and node are also shown and linked.
Infrastructure page
Click Infrastructure in the navigation menu of the Instana UI. In the Infrastructure map, you can see Kubernetes information in the sidebar for either the host or the container that you select.
You can use Dynamic Focus to filter the data. For example, search for a specific deployment in a cluster. Additionally, the keywords entity.kubernetes.cluster.distribution and entity.kubernetes.cluster.managedBy enable searching for a Kubernetes cluster by distribution and management layer. Supported values for entity.kubernetes.cluster.distribution are gke, eks, openshift, and kubernetes. Supported values for entity.kubernetes.cluster.managedBy are rancher and none.
Kubernetes AI assistant
You can use the Kubernetes AI assistant for Kubernetes cluster troubleshooting. Ask natural language questions about cluster health, pod issues, namespace resources, and deployment status. It includes a pre-built prompt library and can generate automation scripts or manual actions that can be saved to the Action Catalog for remediating events.
Before you use Kubernetes AI assistant, make sure that the following prerequisites are met:
- SaaS environments: For SaaS environments, Instana provides default LLM gateways. No extra configuration is required to use this capability. You can configure separate gateways for your own runtime if necessary. For more information, see SaaS.
This feature uses the feature flag feature.kubernetes.ai.agent.enabled, which is enabled (set to true) by default.
- Self-hosted environments: For Standard Edition and Custom Edition, you must configure LLM gateways to use this capability. For more information, see Self-hosted.
To use the AI assistant, complete the following steps:
- From the Kubernetes clusters view, click a cluster.
- Click the AI icon to open the chat interface. The chat window opens. You can type your queries in natural language, or you can use the existing prompt library.
Analyzing Kubernetes calls
Unbounded Analytics gives you powerful tools to slice and dice every call in your Kubernetes cluster. If you click Analyze Calls from a Kubernetes dashboard, the appropriate filter and grouping is already set. In this case, you can see all calls in the robot-shop namespace that are grouped by pods:
Analyzing Kubernetes logs
Unbounded Analytics gives you powerful tools to slice and dice every log in your Kubernetes cluster. If you click Analyze Logs from a Kubernetes dashboard, the appropriate filtering is already set. In this case, you can see all logs in the robot-shop namespace as follows:
To provide relevant information without changing context, Instana enriches log messages with infrastructure and Kubernetes metadata, which is displayed in the tag table after the log message is expanded. See the following tag table:
Linking Kubernetes services and logical services
Single Kubernetes service to multiple logical services
Multiple logical services can be related to a single Kubernetes service when the service-mapping rules match up and calls are generated on that Kubernetes service. For example, a Kubernetes service with the label selector "service=my-service" might contain pods that have the additional labels "env=dev" and "env=staging" combined with a custom service-mapping configuration in Instana with the following tags kubernetes.container.name, kubernetes.pod.label, and key: env. It results in multiple logical services that are linked to that single Kubernetes service and displayed on the Kubernetes Service dashboard.
Single logical service to multiple Kubernetes services
Multiple Kubernetes services can be related to a single logical service when those Kubernetes services are destroyed and re-created over time. For example, if the Kubernetes service shop-service-a with generated calls is replaced over time by shop-service-b with generated calls, both services are displayed on the logical service dashboard when the period of time selected overlaps the calls that were generated.
Viewing metrics
Instana collects information about the Kubernetes cluster, CronJob, DaemonSet, Deployment, Job, Kubernetes service, namespace, node, StatefulSet, Horizontal Pod Autoscaler, Persistent Volume, and Persistent Volume Claim.
Cluster
| Metric | Description |
|---|---|
| Pods Allocation | Ratio of allocated pods to pods capacity |
| CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
| CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
| Memory Requests Allocation | Ratio of memory requests to memory capacity |
| Memory Limits Allocation | Ratio of memory limits to memory capacity |
| CPU Requests | Aggregated CPU requests of all running containers |
| CPU Limits | Aggregated CPU limits of all running containers |
| CPU Capacity | Aggregated CPU capacity of all nodes |
| Memory Requests | Aggregated memory requests of all running containers |
| Memory Limits | Aggregated memory limits of all running containers |
| Memory Capacity | Aggregated memory capacity of all nodes |
| Running Pods | Count of all running pods in this cluster |
| Pending Pods | Count of all pending pods in this cluster |
| Allocated Pods | Count of all allocated pods in this cluster |
| Pods Capacity | Aggregated pods capacity of all nodes |
| Out Of Disk Nodes | Count of out of disk nodes in this cluster |
| Memory Pressure Nodes | Count of memory pressure nodes in this cluster |
| Disk Pressure Nodes | Count of disk pressure nodes in this cluster |
| Kubelet Ready=False nodes | Count of kubelet nodes with status Ready=False in this cluster |
| Kubelet Not Ready nodes | Count of kubelet nodes with status Ready=Unknown or Ready=False in this cluster |
| Available Replicas | Available replicas from all deployments |
| Desired Replicas | Desired replicas from all deployments |
| Nodes Count | Number of nodes in this cluster |
CronJob
| Metric | Description |
|---|---|
| Last Job Duration | Duration of last job run |
| Active Jobs | Number of active jobs |
| Time To Last Scheduled Job | How long ago a job for this cronjob was scheduled |
DaemonSet
| Metric | Description |
|---|---|
| Available Replicas | Count of available replicas |
| Desired Replicas | Count of desired replicas |
| Unavailable Replicas | Count of unavailable replicas |
| Misscheduled Replicas | Count of misscheduled replicas |
| Available to Desired Replica Ratio | Ratio of available to desired replicas |
Deployment
| Metric | Description |
|---|---|
| Available Replicas | Count of available replicas |
| Desired Replicas | Count of desired replicas |
| Available to Desired Replica Ratio | Ratio of available to desired replicas |
| Pending Pods | Count of pending pods |
| Unscheduled Pods | Count of unscheduled pods |
| Unready Pods | Count of unready pods |
| Pending Phase Duration | Duration of pending phase |
| Pods Count | Number of pods for this deployment |
| Memory Requests | Aggregated memory requests of all running containers for this deployment |
| Memory Limits | Aggregated memory limits of all running containers for this deployment |
| CPU Requests | Aggregated CPU requests of all running containers for this deployment |
| CPU Limits | Aggregated CPU limits of all running containers for this deployment |
Job
| Metric | Description |
|---|---|
| Active Pods | Number of active pods in this job |
| Failed Pods | Number of failed pods in this job |
| Succeeded Pods | Number of succeeded pods in this job |
| Job Duration | Duration of job run |
Kubernetes service
| Metric | Description |
|---|---|
| CPU Requests | Aggregated CPU requests for this service |
| CPU Limits | Aggregated CPU limits for this service |
| Memory Requests | Aggregated memory requests for this service |
| Memory Limits | Aggregated memory limits for this service |
Namespace
| Metric | Description |
|---|---|
| Memory Requests Capacity | Maximum supported memory for memory requests on this namespace |
| Used Memory Requests | Amount of memory allocated to used memory requests |
| Memory Limits Capacity | Maximum supported memory for memory limits on this namespace |
| Used Memory Limits | Amount of memory allocated to used memory limits |
| CPU Requests Capacity | Maximum supported CPU for CPU requests on this namespace |
| Used CPU Requests | Amount of CPU allocated to used CPU requests |
| CPU Limits Capacity | Maximum supported CPU for CPU limits on this namespace |
| Used CPU Limits | Amount of CPU allocated to used CPU Limits |
| Used Pods | Number of pods used for this namespace |
| Pods Capacity | Number of pods the namespace can take |
| Used Pods Allocation | Ratio of used pods to pods capacity |
| CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
| CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
| Memory Requests Allocation | Ratio of memory requests to memory requests capacity |
| Memory Limits Allocation | Ratio of memory limits to memory limits capacity |
| Pods Allocation | Ratio of allocated pods to pod capacity |
Node
| Metric | Description |
|---|---|
| Allocated Pods | Count of allocated pods on this node |
| Pods Capacity | Number of pods the node can take |
| Memory Requests | Aggregated memory requests of all running containers on this node |
| Memory Limits | Aggregated memory limits of all running containers on this node |
| Memory Capacity | Maximum supported memory on this node |
| CPU Requests | Aggregated CPU requests of all running containers on this node |
| CPU Limits | Aggregated CPU limits of all running containers on this node |
| CPU Capacity | Maximum supported CPU on this node |
| Pods Allocation | Ratio of allocated pods to pod capacity |
| CPU Requests Allocation | Ratio of CPU requests to CPU capacity |
| CPU Limits Allocation | Ratio of CPU limits to CPU capacity |
| Memory Requests Allocation | Ratio of memory requests to memory capacity |
| Memory Limits Allocation | Ratio of memory limits to memory capacity |
Pod
| Metric | Description |
|---|---|
| Containers Count | Number of containers for this pod |
| CPU Requests | Aggregated CPU requests on all containers of this pod |
| CPU Limits | Aggregated CPU limits on all containers of this pod |
| Memory Requests | Aggregated memory requests on all containers of this pod |
| Memory Limits | Aggregated memory limits on all containers of this pod |
| Restarts Count | Aggregated restarts on all containers of this pod |
StatefulSet
| Metric | Description |
|---|---|
| Available Replicas | Count of available replicas |
| Desired Replicas | Count of desired replicas |
| Available to Desired Replica Ratio | Percentage of available to desired replicas |
Horizontal Pod Autoscalers (HPA)
To enable HPA specifically for the Next Generation K8sensor, see Enabling autoscaling (HPA) for k8sensor (workaround).
The following metrics about HPA are available:
| Metrics | Description |
|---|---|
| Current Replicas | Count of available replicas |
| Desired Replicas | Count of desired replicas |
| Maximum Replicas | The maximum number of replicas to which the autoscaler can scale up |
| Minimum Replicas | The minimum number of replicas to which the autoscaler can scale down |
| Current Replicas / Maximum Replicas | Ratio of current replicas to maximum replicas |
| Current Replicas / Minimum Replicas | Ratio of current replicas to minimum replicas |
| Observed Generation | The most recent generation of replicas that is observed by the autoscaler |
You can monitor Horizontal Pod Autoscalers (HPA) by using the Analytics Infrastructure feature. For more information, see Analyze Infrastructure). On the Analyze Infrastructure page, you can search for Horizontal Pod Autoscalers and view the total count.
To see the list of HPAs, select Kubernetes Horizontal Pod Autoscalers from the list of infrastructure types.
You can create Smart Alerts for metrics that are related to HPA. For more information, see Smart Alerts. You can create Smart Alerts with the following HPA metrics: Current Replicas, Desired Replicas, Maximum Replicas, Message count, and Minimum Replicas.
You can set up alerting based on replica utilization on the following metrics:
-
Current Replicas / Maximum Replicas -
Current Replicas / Minimum Replicas
For example, if the Current Replicas metric reaches 80% of the Maximum Replicas metric, then the 0.8 value of the Current Replicas / Maximum Replicas metric can trigger an alert. Similarly, if the Current Replicas metric reaches 100% of the Minimum Replicas metric, then the 1.0 value of the Current Replicas / Minimum Replicas metric can trigger an alert.
Persistent Volume (PV)
The following table lists the available metrics that are related to PV:
| Metrics | Description |
|---|---|
| Storage Class Name | Name of the StorageClass object that is used to create this PV |
| Total Capacity (GiB) | Total capacity of the PV in GiB |
| Used Capacity (GiB) | Used capacity of the PV in GiB |
| Utilization | Ratio of the used capacity of the PV to its total capacity, expressed as a percentage |
| Phase | Current phase of the PV, which can be Available, Bound, Released, or Failed |
| Access Mode | Access mode of the PV |
Monitoring PV
To monitor a PV, complete the following steps in the Instana UI:
-
From the navigation menu, select Infrastructure.
-
On the Infrastructure dashboard, click Analyze Infrastructure.
-
On the Analyze Infrastructure dashboard, search for Kubernetes Persistent Volume and view the total count as shown in the following image:
-
To view the list of PVs, select Kubernetes Persistent Volume from the list of infrastructure types as shown in the following image:
-
To view PV metrics, select a PV from the list. You can view a list of PV metrics with type and current value.
Setting a Smart Alert for PV
To set up a Smart Alert for PV metrics, complete the following steps in the Instana UI:
-
From the navigation menu, select Infrastructure > Smart Alerts > Create Smart Alert.
-
In the Create Smart Alert window, select any of the available PV metrics, such as Capacity Used %, and filter according to the required entity.
For more information about configuring smart alerts, see Smart Alerts.
Storage class support
Instana supports the most popular storage classes provided by most cloud providers, for both static and dynamic provisioning. The following support matrix indicates different storage classes with verified support.
| Cloud provider | Name | Provisioner | Support |
|---|---|---|---|
| GCP | PersistentDisk | pd.csi.storage.gke.io | ✅ |
| GCP | Hyperdisk | pd.csi.storage.gke.io | ✅ |
| GCP | Bucket | gcsfuse.csi.storage.gke.io | ✅ |
| GCP | Filestore | filestore.csi.storage.gke.io | ✅ |
| AWS | Elastic Block Storage (EBS) | ebs.csi.aws.com | ✅ |
| AWS | Elastic File Storage (EFS) | efs.csi.aws.com | ✅ |
| AWS | Amazon FSx / Amazon File Cache | filecache.csi.aws.com | ✅ |
| AWS | S3 | s3.csi.aws.com | ✅ |
| Azure | Managed CSI | disk.csi.azure.com | ✅ |
| Azure | Managed CSI Premium | disk.csi.azure.com | ✅ |
| Azure | Azurefile CSI | file.csi.azure.com | ✅ |
| Azure | Azurefile CSI Premium | file.csi.azure.com | ✅ |
| IBM | Block Storage | vpc.block.csi.ibm.io | ✅ |
| IBM | File Storage | vpc.file.csi.ibm.io | ✅ |
| IBM | Cloud Object Storage | ibm.io/ibmc-s3fs | ✅ |
| Openshift | Ceph RBD Block Storage | openshift-storage.rbd.csi.ceph.com | ✅ |
| Openshift | CephFS | openshift-storage.cephfs.csi.ceph.com | ✅ |
| Openshift | Ceph RGW | openshift-storage.object.csi.ceph.com | ✅ |
| Openshift | Nooba | openshift-storage.noobaa.io/obc | ✅ |
Persistent Volume Claim (PVC)
The following table lists the available metrics that are related to PVC:
| Metrics | Description |
|---|---|
| Total Capacity (GiB) | Total capacity of the PVC in GiB |
| Used Capacity (GiB) | Used capacity of the PVC in GiB |
| Utilization | Ratio of the used capacity of the PVC to its total capacity, expressed as a percentage |
| Phase | Current phase of the PVC, which can be Available, Bound, Released, or Failed |
| Access Mode | Access mode of the PVC |
As long as the PVC is bound to a PV, most of the metrics displayed are the same as the ones available for its associated PV.
Monitoring PVC
To monitor a PVC, complete the following steps in the Instana UI:
-
From the navigation menu, select Infrastructure.
-
On the Infrastructure dashboard, click Analyze Infrastructure.
-
On the Analyze Infrastructure dashboard, search for Kubernetes Persistent Volume and view the total count as shown in the following image:
-
To view the list of PVCs, select Kubernetes Persistent Volume Claim from the list of infrastructure types as shown in the following image:
-
To view PVC metrics, select a PVC from the list of PVCs. You can view a list of PVC metrics with type and value.
Setting a Smart Alert for PVC
To set up a Smart Alert for PVC metrics, complete the following steps in the Instana UI:
-
From the navigation menu, select Infrastructure > Smart Alerts > Create Smart Alert
-
In the Create Smart Alert window, select any of the available PV metrics, such as Capacity Used in % and filter according to the required entity.
For more information about configuring Smart Alerts, see Smart Alerts.
To learn more about PV or PVC monitoring and its need, see the Demystifying PVCs and PVs IBM blog post .
Control plane monitoring
Instana provides comprehensive monitoring capabilities for Kubernetes control plane components, giving you to gain deep visibility into the health and performance of your cluster's core infrastructure. Control plane monitoring helps you identify bottlenecks, troubleshoot issues, and ensure the reliability of your Kubernetes environment.
- API Server: The front-end for the Kubernetes control plane that exposes the Kubernetes API.
- Scheduler: Assigns pods to nodes based on resource requirements and constraints.
- Etcd: Distributed key-value store that stores all cluster data.
- Controller Manager: Runs controller processes that regulate the state of the cluster.
Accessing control plane monitoring in the UI
- From the navigation menu, select Platforms > Kubernetes.
- Select your cluster from the list.
- Navigate to the Control plane tab on the cluster dashboard.
The control plane dashboard displays real-time metrics and historical trends for all monitored components.
Debugging information
When viewing a Kubernetes cluster in the Instana UI, the Control Plane tab displays debugging information that provides insights into the cluster's monitoring status and configuration. This information helps you to understand the current state of monitoring and troubleshoot issues.
- Host coverage
Host coverage indicates the percentage of nodes in your Kubernetes cluster that have the Instana agent installed and actively reporting data. This metric is calculated as the ratio of monitored nodes to the total number of nodes in the cluster.
For example, if you see "3 of 6 - 50.0%", it means:- 3 nodes have the Instana agent installed and are being monitored.
- 6 nodes exist in total in the cluster.
- 50% of your cluster nodes are covered by monitoring.
A low host coverage percentage might indicate:- Agents are intentionally not installed on nodes according to provided configuration. This is in particular expected for control plane nodes but can also affect further nodes with taints. For more information, see Agent Deployment and Scheduling.
- Some agents are not running or have connectivity issues.
- Nodes that were recently added to the cluster and agents are not yet deployed.
To improve host coverage, ensure that the Instana agent is properly installed on intended nodes in your cluster. For more information, see Installing the Instana agent on Kubernetes.
- Cluster UUID
The Cluster UUID is a unique identifier assigned to your Kubernetes cluster. This identifier is used internally by Instana to distinguish between different clusters and correlate monitoring data. The UUID is automatically generated and remains consistent throughout the cluster's lifetime.
- Agent monitor
The Agent monitor field displays the name or identifier of the Instana agent that is responsible for monitoring the Kubernetes control plane components. This information is useful for troubleshooting agent-specific issues or verifying which agent instance is collecting control plane metrics.
- K8s sensor version
The K8s sensor version shows the version of the Kubernetes sensor that is currently deployed in your cluster. The Kubernetes sensor is responsible for collecting metadata and metrics from the Kubernetes API server.
The version number helps you verify that you are running the latest supported version. For more information about Kubernetes sensors and how to check their status, see Kubernetes sensors.
- Accessing debugging information
To view the debugging information for the Kubernetes cluster with the Instana UI
- From the Instana UI navigation menu, select .
- On the Clusters tab (either table view or card view) click the name of the cluster to open the dashboard for that cluster.
- On the cluster dashboard click the Control Plane tab.
- The debugging information details are displayed at the end of the listed control plane details.
Health rules
Built-in
A couple of built-in health rules exist that trigger an issue for Kubernetes entities.
- Cluster
- Kubernetes reports that a Master-Component (apiserver, scheduler, and controller manager) is unhealthy. Instana filters the health status of the Master-Component, not causing an alert, but showing only the health status on the cluster detail page.
- Node
- The requested CPU is approaching max capacity. The ratio of requested CPU to CPU capacity is greater than 80%.
- The requested memory is approaching max capacity. The requested memory to memory capacity ratio is greater than 80%.
- Allocated pods are approaching maximum capacity. The allocated pods to pods capacity ratio are greater than 80%. For a node, pods in the phases 'Running' and 'Unknown' are counted as allocated. For more information about node capacity, see Kubernetes docs.
- The node reports a condition that is not ready for more than one minute, and all conditions for this node are beyond the "Ready" condition. For more information about all node conditions, see Kubernetes docs.
- Namespace
- The requested CPU is approaching max capacity. The ratio of requested CPU to CPU capacity is greater than 80%.
- The requested memory is approaching max capacity. The requested memory to memory capacity ratio is greater than 80%.
- Allocated pods are approaching maximum capacity. The allocated pods to pods capacity ratio are greater than 80%. For a namespace, pods in the phases 'Pending', 'Running', and 'Unknown' are counted as allocated. The namespace capacity values are based on ResourceQuotas that can be set per namespace. For more information, see Kubernetes docs.
- Deployment
- Available replicas less than desired replicas.
- Pod
- A pod must be ready within one minute of being deployed, but if it is not ready within one minute, the reason is not that it has completed its task (PodCondition=Ready, Status=False, Reason!= PodCompleted). For more information about all pod conditions, see Kubernetes docs.
Custom
In addition to the built-in rules, you can also create custom rules on metrics of a cluster, namespace, deployment, and pod. For example, if the threshold for node capacity warnings is too high, you can disable them and create a custom rule with a lower threshold value. For more information, see Events and incidents configuration.
Required Role-Based Access Control (RBAC) for Instana agent installation
The Instana agent operator installs and manages three main components. Each component requires specific RBAC permissions to monitor your Kubernetes cluster:
Instana agent operator
The operator requires cluster-wide permissions to do the following tasks:
- Manage agent deployments: Create and update DaemonSets, Deployments, ConfigMaps, Secrets, and ServiceAccounts across namespaces.
- Configure RBAC: Create ClusterRoles and ClusterRoleBindings for the agent and k8sensor components.
- Manage Instana custom resources: Create, update, and delete Instana agent custom resources (
agents.instana.ioandagentsremote.instana.io) with finalizers for proper cleanup. - Ensure high availability: Manage PodDisruptionBudgets for k8sensor to prevent disruptions during cluster maintenance.
- Discover cluster resources: Read cluster-wide resources such as nodes, namespaces, pods, and services to configure agents correctly.
- Monitor etcd (OpenShift): Copy etcd certificates and configuration from the
openshift-etcdnamespace toinstana-agentnamespace for metrics collection. - Access kubelet metrics: Read nonresource URLs (
/metrics,/stats/summary, and/healthz) to validate cluster health. - Record events: Create events for operational visibility and troubleshooting.
- Leader election: Manage leases in the operator's namespace for high availability when you run multiple operator replicas.
- Monitor service endpoints: Read EndpointSlices from
discovery.k8s.ioto track service endpoint changes and ensure that agents can discover backend services dynamically.
Instana agent (DaemonSet)
The agent pods require permissions to do the following tasks:
- Collect node metrics: Access kubelet metrics endpoints on each node such as
/metrics,/stats/summary, and/metrics/cadvisor. - Monitor pods: Read pod information and metrics across all namespaces.
- Run privileged access: Use privileged security contexts to enable host-level monitoring.
K8Sensor (deployment)
The k8sensor requires read-only permissions to do the following tasks:
- Monitor all Kubernetes resources: Read all resource types such as pods, deployments, services, ConfigMaps, and others across the cluster for comprehensive monitoring.
- Monitor custom resources: Discover and monitor Custom Resource Definitions (CRDs).
- Monitor workloads: Track all workload types including DaemonSets, StatefulSets, Jobs, CronJobs, and HorizontalPodAutoscalers.
- Monitor networking: Read Ingress resources and network policies.
- Discover etcd endpoints: Read services and endpoints in the
kube-systemnamespace to locate etcd for metrics collection. - Monitor OpenShift resources: Read DeploymentConfigs and other OpenShift-specific resources (OpenShift only).
- Run privileged access (OpenShift): Use privileged Security Context Constraints (SCCs) on OpenShift for comprehensive monitoring.
- Track service topology: Read EndpointSlices from
discovery.k8s.ioto monitor service endpoint distribution across the cluster for comprehensive network topology mapping.
For more information about the specific ClusterRoles, Roles, and permissions that the installation creates, see the Instana Helm Charts repository.
Required Role-Based Access Control (RBAC) for AutoTrace webhook
The Instana AutoTrace webhook is a separate component from the Instana agent installation. The webhook operates with two components, each with specific security requirements and RBAC permissions:
Mutating webhook pod
The webhook pod runs as a highly restricted, non-privileged service that intercepts pod creation requests. It requires cluster-wide permissions to do the following tasks:
- Read TLS certificates: Access secrets to read TLS certificates for the webhook HTTPS endpoint.
- Manage image pull secrets: Create image pull secrets in target namespaces when you use private registries.
- Read configuration: Access secrets and ConfigMaps that are referenced in container environment variables.
- Manage NGINX configuration: Create and update ConfigMaps for NGINX ingress tracing integration.
- Determine instrumentation scope: Read namespaces and check namespace labels for opt-in or opt-out logic.
- Comply with Pod Security Policies: Use PodSecurityPolicies on Kubernetes versions earlier than 1.25 when the PSP admission controller is enabled.
The webhook pod runs with the following security restrictions:
- Runs as non-root user (UID 1001) without elevated privileges
- Cannot escalate privileges
- All Linux capabilities are dropped
- Uses RuntimeDefault seccomp profile for restricted system call access
- Read-only root filesystem to prevent modifications
Instrumentation init container
The webhook injects the init container into application pods to copy instrumentation files into a shared volume. It requires no special permissions and operates with the following security characteristics:
- Inherits pod security context: Uses the application pod's security context by default.
- No privileged mode: Runs without elevated privileges.
- No Linux capabilities: Does not require any special capabilities.
- No host-level access: Does not require
hostPID,hostNetwork, orhostIPCaccess. - Writes to emptyDir volumes: Only writes to shared
emptyDirvolumes, not to the host filesystem.
Both the webhook pod and the instrumentation init container are fully compliant with the Kubernetes Restricted Pod Security Standard, which is the most restrictive security profile.
For more information about installing and configuring the AutoTrace webhook, see Instana AutoTrace webhook.
Monitoring Java with Istio or OpenShift ServiceMesh
Monitor by using the agent.serviceMesh.enabled flag
You can enable the Instana agent Java monitoring with Istio and OpenShift service mesh by using the agent.serviceMesh.enabled flag. This Kubernetes-native approach uses a single dedicated network port for all Java workloads that are monitored on a single host or node. The default value is set to true. For more information about the configuration parameter, see Helm Chart configuration.
If the Istio configuration is set to REGISTRY_ONLY, additional steps are required for the agent socket service to work properly.
You need to deploy the following resource definition for each individual cluster node. Make sure to define a unique metadata.name property for each host or node. Also, set the value for spec.hosts to <node-ip-address>.instana-agent-headless.instana-agent.svc, where <node-ip-address> is the node's IP address.
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: instana-agent-worker-<node-unique-counter>
spec:
hosts:
- <node-ip-address>.instana-agent-headless.instana-agent.svc
ports:
- number: 42699
name: agent
protocol: TCP
resolution: DNS
location: MESH_EXTERNAL
Monitor by using service mesh bypass (deprecated)
The alternative legacy approach is to enable service mesh bypass. The default installation of Istio works out-of-the-box with Instana. If you deploy Istio with a default deny policy (mode: REGISTRY_ONLY), you can enable Instana's service mesh bypass by using the following agent configuration:
com.instana.container:
serviceMesh:
enableServiceMeshBypass: true
The setting bypass blocked network connectivity in two different ways:
- Allow outgoing traffic from the application pod to the host agent (on all IPv4 addresses which the agent listens on, and all ports).
- Allow incoming traffic to the application pod from the agent for JVM applications (from all ipv4 addresses which the host agent listens on, all ports).
Debugging the mesh by-pass
To debug the service mesh by-pass, follow the steps:
- Verify that the service mesh by-pass is enabled.
- Verify that the iptable rules are applied to the container.
Verify enabled
To verify whether the service mesh by-pass is enabled, check in the Instana agent logs by running the following command:
kubectl logs -l app.kubernetes.io/instance=instana-agent -n instana-agent -c instana-agent
If the service mesh by-pass is enabled, you can find the following log lines, which indicate that an inbound or outbound by-pass entry is written for the denoted process:
Inbound by-pass:
2021-04-26T08:13:57.065+0000 | INFO | -client-thread-2 | DefaultServiceMeshSupport | 51 - com.instana.agent - 1.1.597 | Applying inbound service mesh bypass for process '764670'
Outbound by-pass:
2021-04-26T08:13:57.140+0000 | INFO | -client-thread-2 | DefaultServiceMeshSupport | 51 - com.instana.agent - 1.1.597 | Applying outbound service mesh bypass for process '764670'
Verify iptable rules
The easiest way to verify that the iptable rules is to shell into the Instana agent and list the target container iptables rules as follows. Replace ${PID} with the pid of the JVM process:
kubectl -n instana-agent exec -it ${INSTANA_AGENT_POD} -c instana-agent -- /bin/bash
nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_OUTPUT
If the chains are applied, you can see an output as follows:
Chain INSTANA_OUTPUT (1 references)
target prot opt source destination
ACCEPT tcp -- 0.0.0.0/0 10.128.15.237
ACCEPT tcp -- 0.0.0.0/0 10.64.0.1
ACCEPT tcp -- 0.0.0.0/0 169.254.123.1
Check whether bidirectional communication between the Instana agent and your JVM processes is supported by running the following command:
nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_INBOUND
The result is similar to the following output:
Chain INSTANA_INBOUND (1 references)
target prot opt source destination
ACCEPT tcp -- 10.128.15.237 10.64.0.14
ACCEPT tcp -- 10.64.0.1 10.64.0.14
ACCEPT tcp -- 169.254.123.1 10.64.0.14
Depending on when the iptable rules were applied, it can take a few minutes for the process to be instrumented and the data to be visible in Instana's dashboards.
Troubleshooting notes
Why am I not seeing any Kubernetes clusters or namespaces?
If no clusters or namespaces are listed on the Kubernetes page, either no cluster is being actively monitored due to an agent not being installed, or no clusters are being monitored during your selected time frame.
Click Live to check for any clusters and namespaces in live mode, and if none are listed, you need to install the Instana agent in kubernetes.
Missing clusterRole permissions
Monitoring issue type: kubernetes_missing_permissions
The Instana agent requires the appropriate ClusterRole permissions for specific resources to be able to monitor a Kubernetes cluster successfully. If these permissions are missing, corresponding resources are missing on the Instana Kubernetes dashboards. To resolve this issue, install the latest version of the Instana Agent YAML, Helm chart, or Operator. For more information about the latest version of each installation method, see Kubernetes or OpenShift.
Monitoring custom resources
To monitor Kubernetes custom resources (CRs), you need to create a ClusterRole resource with individual rules that grant the necessary permissions to the Instana agent. This configuration allows the k8sensor to access and monitor custom resources in your cluster.
Creating a ClusterRole for custom resource monitoring
ClusterRole that specifies the permissions required to monitor your custom resources. The following example shows a basic ClusterRole configuration:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: instana-crmon
rules:
- apiGroups: ["your.custom.group"]
resources: ["yourcustomresources"]
verbs: ["get", "list", "watch"]
You must replace your.custom.group with the API group of your custom resource and yourcustomresources with the plural name of your custom resource type. You can add more rules as needed for each custom resource type you want to monitor. You might also use the ["*"] wildcard for apiGroups and resources in test systems to match all custom resources if this means no security risk in your case.
Using such overly permissive access is not recommended for production systems, where the principle of least privilege is advised.
Creating a ClusterRoleBinding
ClusterRole, you need to create a ClusterRoleBinding to bind the ClusterRole with the ServiceAccount of the Instana agent:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: instana-crmon
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: instana-crmon
subjects:
- kind: ServiceAccount
name: instana-agent-k8sensor
namespace: instana-agent
Make sure that the namespace field in the subjects section matches the namespace where your Instana agent is deployed. If you installed the agent in a different namespace, update the value accordingly.
After you apply these resources, the Instana k8sensor gains the necessary permissions to monitor your custom resources.
Collecting logs
To collect logs for k8sensor and the kubernetes environment, you can collect diagnostic information with the mustgather script as outlined in the following steps:
Steps for log collection
-
Clone the serviceability repository.
-
Navigate to the
agent/k8sdirectory. -
Read the instructions in the
README.mdfile in that directory, making sure that the prerequisites are fulfilled and that you are logged in into the cluster. -
Run
instana-k8s-mustgather.sh. -
After the script finishes, a
.tgzfile is generated, which contains all the diagnostic information that is required for the customer support team to help service the ticket.Figure 2. MustGather output
Enabling debug logging for troubleshooting
The k8sensor supports different log levels for troubleshooting purposes. By default, k8sensor logs at the info level. When investigating issues, you can temporarily enable debug level logging to capture more detailed diagnostic information.
You can enable debug logging by using one of the following methods:
Using the Instana agent custom resource
- Edit the Instana agent custom resource:
kubectl edit instanaagent -n instana-agent instana-agent - Add or modify the
agent.envsection:spec: agent: env: K8S_SENSOR_LOG_LEVEL: debug - Wait for the rollout to complete. Check the rollout status by running the following command:
kubectl rollout status deployment/instana-agent-k8sensor -n instana-agent - Verify that debug logging is enabled by checking the k8sensor logs:
kubectl logs -n instana-agent deployment/instana-agent-k8sensor --tail=50 | grep -i "level.*debug" - If applicable, reproduce the issue to generate appropriate logs. Otherwise, wait a few minutes for the k8sensor to collect data during its regular polling intervals (default: every 10 seconds).
- Capture logs. For detailed log collection instructions, see Collecting logs.
- To disable debug logging, edit the Instana agent custom resource again and change the log level back to
info:spec: agent: env: K8S_SENSOR_LOG_LEVEL: info
Using Helm chart
- Update your Helm installation with the debug log level:
helm upgrade instana-agent \ --repo https://agents.instana.io/helm \ --namespace instana-agent \ --set agent.env.K8S_SENSOR_LOG_LEVEL=debug \ --reuse-values \ instana-agent - Wait for the rollout to complete. Check the rollout status by running the following command:
kubectl rollout status deployment/instana-agent-k8sensor -n instana-agent - Verify that debug logging is enabled by checking the k8sensor logs:
kubectl logs -n instana-agent deployment/instana-agent-k8sensor --tail=50 | grep -i "level.*debug" - If applicable, reproduce the issue to generate appropriate logs. Otherwise, wait a few minutes for the k8sensor to collect data during its regular polling intervals (default: every 10 seconds).
- Capture logs. For detailed log collection instructions, see Collecting logs.
- To disable debug logging, update the Helm installation again:
helm upgrade instana-agent \ --repo https://agents.instana.io/helm \ --namespace instana-agent \ --set agent.env.K8S_SENSOR_LOG_LEVEL=info \ --reuse-values \ instana-agent