Kubernetes Metrics – The Complete Guide

What are Kubernetes Metrics?

Deployments of Kuberenetes in production are notoriously massive in scope, running thousands and even tens of thousands of containers. Keeping track of this amount of containers can introduce many complexities. 

Kubernetes metrics help you keep track of your containers, introducing visibility into the process. Kuberenetes lets you monitor a wide range of metrics and gain insights into your clusters, nodes, pods, and applications. In this blog post, we’ll provide you with a short introduction into the world of Kubernetes metrics.

Why Monitor Kubernetes?

Kubernetes monitoring is an essential part of a Kubernetes architecture, which can help you gain insight into the state of your workloads. You can monitor performance metrics, resource utilization, and the overall health of your clusters. Insights obtained from monitoring metrics can help you quickly discover and remediate issues.

In addition to troubleshooting, Kubernetes monitoring can help you detect threats and protect your workloads. Here are several Kubernetes threats you can monitor:

  • Data loss attackers — attackers use network tunneling and reverse shelling techniques to hide sensitive information
  • Rogue pod connection — attackers with access to compromised containers can connect to other pods. You can detect these attacks through layer 7 network filtering
  • Container risk-application vulnerabilities and misconfigurations — are targeted by attackers, who use these blindspots to discover exploits in networks, systems, files, and process controls

Top Kubernetes Metrics to Monitor

Organizations use metrics to measure specific aspects of their Kubernetes deployment. While different production deployments, storage strategies, and networking architectures often require different metrics, to be efficient, a metrics system should be uniform across the entire operation. The below metrics are recommended for the majority of containerized workloads.

Kubernetes Cluster Metrics

Your monitoring solution should provide you with relevant insights into the performance of Kubernetes clusters. To gain this level of visibility, you need to keep track of:

  • The number of running containers, pods, and nodes
  • Central processing unit (CPU), memory, and disk usage
  • Network input/output (I/O) pressure

This information can help you learn more about capacity and adjust cluster resource utilization accordingly.

Kubernetes Control Plane Metrics

In Kubernetes, the process responsible for monitoring clusters is called “Kubernetes Control Plane”. The control plane keeps track of various metrics, and then makes scheduling decisions that ensure the cluster runs optimally. The control plane is controlled by master nodes. 

To ensure the control plane works properly and efficiently, you need to collect relevant metrics that track control plane components. For example, you can monitor the Scheduler, application programming interface (API) Server, Etcd, and Controller. 

Once you set up these metrics, you should virtualize and centralize the data. You can do that using Grafana dashboards, which can be utilized after you set up Prometheus. These insights can help you troubleshoot cluster performance issues.

Kubernetes Node Metrics

A Kubernetes node has its own pre-determined CPU and memory capacity, which can be used by connected running pods. This process greatly impacts cluster operations, and should be monitored continuously. Here are other additional metrics every Kubernetes monitoring solution should collect and monitor:

  • Disk-space usage
  • Node-network traffic

Node conditions describe the status of the running nodes and can be of great use. For example, statuses such as MemoryPressure, Ready, DiskPressure, OutOfDisk, NetworkUnavailable, and more.

Kubernetes Pod/Container Metrics

Resource allocation is critical in ensuring that pods and containers run optimally without creating disruptions to application performance. To keep track of this process, you need to ensure that pods are not under or over-provisioned. To discover and troubleshoot these issues, you can set up metrics that check the restart activity of containers and monitor throttled containers. 

Kubernetes Application Metrics

Request Rate, Error Rate, and Duration (RED) metrics can help you ensure Kubernetes services are running ideally, and create dashboards that visualize monitoring in real time. In addition to RED metrics, you should also set up application metrics such as memory, heap, threads, and Java Virtual Machine (JVM).

Key Kubernetes Performance Metrics

Here are several metrics you should track to gain visibility into the performance of your Kubernetes deployment:

  • Memory utilization — if a cluster is not properly utilizing memory, the workload performance might decrease. Additionally, Kubernetes terminates pods that exceed their limits. When nodes are not provisioned with sufficient memory resources, kubelet determines there is memory pressure and starts reclaiming resources and might delete pods from the node. 
  • CPU utilization — gaps between a pod’s and node’s configured requests and CPU limits might lead to cluster performance and node throttling. You should set up metrics to ensure pods and nodes are requesting the CPU resources needed for optimal performance.  
  • Pod deployments — monitoring can help ensure a deployment rollout runs the needed amount of pods. During this process, Kubernetes first ascertains how many pods are needed to run the application, and then deploys the pods accordingly. In some cases, the deployment may need all new pods available, but in other cases you might need to enforce a waiting period.
  • Desired vs current pods — manually launching individual pods is usually not efficient when running Kubernetes deployments in production. These scenarios often require the use of controllers, which automatically create pods according to predefined specs. There are several metrics you can use to set up desired pods. For example, kube_deployment_spec_replicas, and kube_deployment_status_replicas. When setting this up, do make sure that the numbers of these pods match.

Monitoring Kubernetes in the Cloud

Kubernetes Monitoring on AWS

Amazon Web Services (AWS) offers several monitoring solutions you can use when deploying Kubernetes in the cloud, including a dedicated monitoring service called Amazon CloudWatch Container Insights. 

Amazon CloudWatch Container Insights is a fully-managed service that lets you isolate, monitor, and diagnose containerized workloads and microservices environments. The service provides automation capabilities, visualized dashboards, and actionable insights you can use to troubleshoot your environment.

You can use Amazon CloudWatch Container Insights to automate dashboards and visualize information flowing from a range of services, including Amazon Elastic Container Service for Kubernetes (EKS) and Amazon Elastic Container Service (ECS). Once you collect information from your Kubernetes clusters, you can sort it by node, pod, task, namespace, service, and container.

Kubernetes Monitoring on Azure

Microsoft Azure offers a feature called Container insights that lets you monitor the performance of containerized workloads deployed in the Azure cloud or on-premises. 

Containers insights enables you to consume logs from several orchestration engines, including Kubernetes, Docker Swarm, Red Hat OpenShift, and DC/OS. You can collect log and metric information from containers running within a cluster and also from cluster hosts, and correlate log information from the two source types.

Kubernetes Monitoring on Google Cloud

When you create a new cluster, Cloud Operations for Google Kubernetes Engine (GKE) is enabled by default to provide monitoring capabilities for Kubernetes. Google Kubernetes Engine (GKE) integrates natively with Cloud Logging and Cloud Monitoring, and both services features are managed by Cloud Operations for GKE.

Cloud Operations for GKE provides a monitoring dashboard designed especially for Kubernetes. The dashboard lets you view important cluster metrics, including memory usage, CPI utilization, and the total number of open incidents. You can view clusters by workloads, services, or infrastructure, and inspect nodes, namespaces, services, containers, and nodes.

Kubernetes Metrics with Kubermatic Kubernetes Platform

Kubermatic Kubernetes Platform (KKP) manages up to thousands of clusters across multiple private and public clouds. We have designed KKP to provide for Kubernetes automation in large scale and complex enterprise setups. Of course, reliable metrics on health, performance, and resource consumption are an integral part of that. Kubermatic Kubernetes Platform uses best-in-class open source technologies to provide users with centralized Monitoring, Logging, and Alerting of their clusters and services across numerous clouds. KKP uses Prometheus and its Alertmanager for monitoring and alerting. Dashboarding is done with Grafana.

Learn More

Kristin Wittig

Kristin Wittig

Marketing & Communications