To quote Michael Dell, “the cloud isn’t a place, it’s a way of doing IT.” As IT becomes more and more central to what every company does, understanding cloud native best practices is key not only for developers but for every part of a business.
This post is the second of a seven-part series examining how cloud native best practices can help businesses deliver value to their customers better, faster, and cheaper. This part explains the Pets vs Cattle analogy and how it can help business provide a better service with faster time to recovery which reduces lost revenue.
History: Pets vs Cattle
The analogy of Pets vs Cattle has become one of the core concepts of a DevOps service model. The idea was first introduced by Bill Baker, a distinguished Engineer at Microsoft, in his presentation “Scaling SQL Server 2012” and later popularized by Gavin McCance talking about the OpenStack cloud at CERN. The difference between Cattle and Pets is how vital each individual is to you. To pull from Randy Bias’ post on the history of this analogy:
In the old way of doing things, we treat our servers like Pets, for example, Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like Cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.
Pets are each vitally important, but Cattle are interchangeable. Moving production software infrastructure from Pets towards Cattle is key to creating a highly available (HA) system with reduced failures, a smaller blast radius, and faster disaster recovery. Taken together, moving towards a Cattle service model helps businesses deliver a better service with reduced downtime.
Cloud Native Best Practices: Pets vs Cattle
Reliably running and managing scalable IT infrastructure necessitates using Cattle and Kubernetes clusters will be no different. Joaquin Menchaca’s blog nicely traces the evolution from a Pet to Cattle management model up the stack from bare metal servers to cloud native containers and container orchestrators. At each stage, a higher layer of the stack becomes replaceable creating immutable production where only exact copies are deployed and they can be replaced at anytime. If anything needs to be changed, a new deployment is made and the old ones are decommissioned. Immutable production eliminates variations leading to fewer deployment failures, consistency across environments, easy horizontal scaling, and a simple rollback and recovery process.
To take advantage of these benefits and bring immutable containers into production, Kubernetes has become the de facto standard because it simplifies the process of deploying and orchestrating containers. However, installing, managing, and updating Kubernetes itself is no simple undertaking. To avoid these challenges, clusters have become the new Pets with companies running a few large vital clusters rather than multiple smaller replaceable clusters. This is in direct contradiction to the cloud native principles Kubernetes was founded on. To derive the full value from a cloud native approach, it is imperative that companies treat their Kubernetes clusters like Cattle too. Spinning them up and down on demand and treating them as disposable, replaceable assets rather than prized Pets.
Beyond easing management and saving operations time and money, running Kubernetes clusters as Cattle can have a real impact on the bottom line too. For example, an e-commerce company earning 5 billion euros per year is completely dependent on their shop being online. Every minute of production outage is 10,000 Euros in missed sales. In a case study with the CNCF, when VSCO standardized their provisioning and outage recovery with Kubernetes, they reduced their outage time by 88%. The Cloud Native Computing Foundation is already seeing businesses transition towards using Kubernetes clusters like Cattle. In the past six months, organizations running 1-5 production clusters decreased 37%, while organizations running 11-50 clusters increased 154%. Treating infrastructure as Cattle allows companies to simply replace their infrastructure and come back online sooner, saving sales.
At Kubermatic, we strongly believe in treating Kubernetes clusters as Cattle. We use multiple tools to make it as easy as possible for our developers and operators to create and use Kubernetes clusters including kind and KubeOne, and Kubermatic Kubernetes Platform. We use kind to quickly spawn Kubernetes clusters to run integration and E2E tests in our CI/CD pipeline. We recently open sourced KubeOne because it gives developers the ability to quickly install, manage, and upgrade highly available Kubernetes clusters on any cloud provider, on-prem, or bare-metal cluster. Furthermore, we are actively contributing to cluster-api in upstream Kubernetes to simplify the process of cluster creation, configuration, and management. Finally, when we designed Kubermatic Kubernetes Platform for large-scale installations, we followed this best practice of turing clusters into Cattle, too. This requires that creating, updating, and deleting a Kubernetes cluster is only one click away for anyone. Kubermatic Kubernetes Platform does this by spinning up the user cluster Kubernetes control components as containers in the seed Kubernetes cluster (architecture details here). If anything happens to a control component of the worker cluster, it will be automatically restarted and replaced by the seed Kubernetes cluster turning the control components and cluster itself into cattle.
Running Cattle rather than Pets is a computing infrastructure best practice no matter what layer of the stack. It provides customers a better service, reduces time to recovery, and eliminates lost revenue from production outages. Kubernetes became the standard for container orchestration because it allows you to treat your containers like Cattle. Kubernetes clusters themselves should be treated like Cattle too.