Kubecost Metrics
Kubecost Cost Model
The Cost Model both exports and consumes the following metrics.
Metric | Description |
---|---|
node_cpu_hourly_cost
|
Hourly cost per vCPU on this node |
node_gpu_hourly_cost
|
Hourly cost per GPU on this node |
node_ram_hourly_cost
|
Hourly cost per Gb of memory on this node |
node_total_hourly_cost
|
Total node cost per hour |
kubecost_load_balancer_cost
|
Hourly cost of a load balancer |
kubecost_cluster_management_cost
|
Hourly cost paid as a cluster management fee |
pv_hourly_cost
|
Hourly cost per Gb on a persistent volume |
node_gpu_count
|
Number of GPUs available on node |
container_cpu_allocation
|
Average number of CPUs requested/used over last 1m |
container_gpu_allocation
|
Average number of GPUs requested over last 1m |
container_memory_allocation_bytes
|
Average bytes of RAM requested/used over last 1m |
pod_pvc_allocation
|
Bytes provisioned for a PVC attached to a pod |
kubecost_node_is_spot
|
Cloud provider info about node preemptibility |
kubecost_network_zone_egress_cost
|
Total cost per GB egress across zones |
kubecost_network_region_egress_cost
|
Total cost per GB egress across regions |
kubecost_network_internet_egress_cost
|
Total cost per GB of internet egress |
service_selector_labels
|
Service Selector Labels |
deployment_match_labels
|
Deployment Match Labels |
statefulSet_match_labels
|
StatefulSet Match Labels |
kubecost_cluster_memory_working_set_bytes
|
(Created by recording rule) |
Kubecost Network Costs
The Kubecost network-costs DaemonSet collects node network data and exports the egress, ingress, and performance statistics.
Metric | Description |
---|---|
kubecost_pod_network_egress_bytes_total
|
egressed byte counts by pod |
kubecost_pod_network_ingress_bytes_total
|
ingressed byte counts by pod |
kubecost_network_costs_parsed_entries
|
total parsed conntrack entries |
kubecost_network_costs_parse_time
|
total time in milliseconds it took to parse conntrack entries |
cAdvisor
cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.
GitHub: https://github.com/google/cadvisor
Metric | Description |
---|---|
container_memory_usage_bytes
|
Current memory usage, including all memory regardless of when it was accessed |
container_fs_limit_bytes
|
Number of bytes that can be consumed by the container on this filesystem |
container_fs_usage_bytes
|
Number of bytes that are consumed by the container on this filesystem |
container_memory_working_set_bytes
|
Current working set |
container_network_receive_bytes_total
|
Cumulative count of bytes received |
container_network_transmit_bytes_total
|
Cumulative count of bytes transmitted |
container_cpu_usage_seconds_total
|
Cumulative cpu time consumed |
container_cpu_cfs_periods_total
|
Number of elapsed enforcement period intervals |
container_cpu_cfs_throttled_periods_total
|
Number of throttled period intervals |
Kube-State-Metrics (KSM)
Although the default Kubecost installation does not include a KSM deployment, Kubecost does calculate & emit the below metrics. The below metrics and labels follow conventions of KSMv1, not KSMv2.
Metric | Description |
---|---|
kube_deployment_spec_replicas
|
Number of pods specified for a Deployment |
kube_deployment_status_replicas_available
|
Number of pods currently available for a Deployment |
kube_job_status_failed
|
The number of pods which reached Phase Failed and the reason for failure |
kube_namespace_annotations
|
Kubernetes annotations converted to Prometheus labels |
kube_namespace_labels
|
Kubernetes labels converted to Prometheus labels |
kube_node_labels
|
Kubernetes labels converted to Prometheus labels |
kube_node_status_allocatable
|
The allocatable for different resources of a node that are available for scheduling |
kube_node_status_allocatable_cpu_cores
|
Total allocatable cpu cores of the node (Deprecated in ksm 2.0.0) |
kube_node_status_allocatable_memory_bytes
|
Total allocatable memory bytes of the node (Deprecated in ksm 2.0.0) |
kube_node_status_capacity
|
The capacity for different resources of a node |
kube_node_status_capacity_cpu_cores
|
Total cpu cores available on the the node (Deprecated in ksm 2.0.0) |
kube_node_status_capacity_memory_bytes
|
Total memory available on the node (bytes) (Deprecated in ksm 2.0.0) |
kube_node_status_condition
|
The condition of a cluster node |
kube_persistentvolume_capacity_bytes
|
Total capacity of a persistent volume (bytes) |
kube_persistentvolume_status_phase
|
Status of a persistent volume (Bound |
kube_persistentvolumeclaim_info
|
Information about persistent volume claim |
kube_persistentvolumeclaim_resource_requests_storage_bytes
|
The capacity of storage requested by the persistent volume claim |
kube_pod_annotations
|
Kubernetes annotations converted to Prometheus labels |
kube_pod_container_resource_limits
|
The number of requested limit resource by a container |
kube_pod_container_resource_limits_cpu_cores
|
Limit on CPU cores that can be used by the container. (Deprecated in ksm 2.0.0) |
kube_pod_container_resource_limits_memory_bytes
|
Limit on the amount of memory that can be used by the container. (Deprecated in ksm 2.0.0) |
kube_pod_container_resource_requests
|
The number of requested request resource by a container |
kube_pod_container_status_restarts_total
|
The number of container restarts per container |
kube_pod_container_status_running
|
Describes whether the container is currently in running state |
kube_pod_container_status_terminated_reason
|
Describes the reason the container is currently in terminated state |
kube_pod_labels
|
Kubernetes labels converted to Prometheus labels |
kube_pod_owner
|
Information about the Pod's owner |
kube_pod_status_phase
|
The pods current phase (Pending |
kube_replicaset_owner
|
Information about the ReplicaSet's owner |
Node exporter
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
GitHub: https://github.com/prometheus/node_exporter
Metric | Description |
---|---|
node_cpu_seconds_total
|
Seconds the cpus spent in each mode |
node_disk_reads_completed
|
The total number of reads completed successfully |
node_disk_reads_completed_total
|
The total number of reads completed successfully |
node_disk_writes_completed
|
The total number of writes completed successfully |
node_disk_writes_completed_total
|
The total number of writes completed successfully |
node_filesystem_device_error
|
Whether an error occurred while getting statistics for the given device |
node_memory_Buffers_bytes
|
Memory information field Buffers_bytes |
node_memory_Cached_bytes
|
Memory information field Cached_bytes |
node_memory_MemAvailable_bytes
|
Memory information field MemAvailable_bytes |
node_memory_MemFree_bytes
|
Memory information field MemFree_bytes |
node_memory_MemTotal_bytes
|
Memory information field MemTotal_bytes |
node_network_transmit_bytes_total
|
Network device statistic transmit_bytes |
Prometheus
Prometheus emits metrics which are used by Kubecost for diagnostic purposes:
Metric | Description |
---|---|
up
|
Scrape target status |
prometheus_target_interval_length_seconds
|
Amount of time between target scrapes |
NVIDIA K8s Device Plugin (GPU)
NVIDIA GPU monitoring support can be explained in more detail on the Kubecost Blog: Monitoring NVIDIA GPU Usage in Kubernetes with Prometheus. The following metrics are consumed:
GitHub: https://github.com/NVIDIA/k8s-device-plugin
Metric | Description |
---|---|
DCGM_FI_DEV_GPU_UTIL
|
GPU utilization |