A Framework for Kubernetes Incident Response

Kubernetes is used by organizations of all sizes to run production, mission critical applications. This fact is well known by attackers, who realize Kubernetes clusters are the “crown jewels” of an IT environment. Compromising a cluster grants an attacker access to sensitive data, control over business applications, and the ability to abuse an organization’s computing resources for cryptomining or criminal activity. 

Kubernetes security is a broad topic - in this article I’ll focus on one critical aspect: incident response. Kubernetes clusters and pods are operated by DevOps engineers and developers. They are the first ones to see the warning signs of an attack. Will they know how to recognize an attack and how to escalate it to security teams? Do they have a common language to communicate with incident responders about what is happening in a containerized environment?

Read on to understand the basics of Kubernetes incident response and how to create this common language between DevOps, software engineers, and the security operations center (SOC).

Incident Response Basics

An incident response plan (IRP) is a documented and systematic process that outlines how the organization acts during a security incident. A solid IRP can significantly reduce the amount of damage caused to the organization during a disaster. It helps standardize response across the organization, ensuring each role knows exactly how to act in a timely and appropriate manner.

To ensure the success of an IRP, all relevant stakeholders must know their responsibilities and agree to the plan. They should also be ready to coordinate the efforts around the IRP when attacks occur. IRP stakeholders usually include members of several teams, including legal, operations, security, PR, customers, partners, developers, and executive management.

Here are several benefits of a solid incident response plan:

  1. Be ready for emergencies - security events occur without warning. It is critical to create a process in advance.
  2. Coordinate your efforts - organizations can find it difficult to keep all stakeholders in the loop when a crisis strikes. An IRP can standardize communication, ensuring everyone is on the same page and well informed.
  3. Expose security gaps - many organizations typically have limited technical maturity or limited staff. An IRP can help reveal obvious security gaps related to tooling or process, ensuring the organization can address the issues before a crisis occurs.
  4. Practice your response - an IRP helps create clear and repeatable processes that relevant stakeholders can follow during every incident. This improves the coordination and effectiveness of your response over time.

Recent Kubernetes Security Breaches

Studying the details of real life security breaches can help you prepare your organization’s incident response strategy.

Docker Hub Attack

Kubernetes environments are highly dynamic, making it difficult to pinpoint the source of an attack. In the famous Docker Hub attack, attackers planted malicious container images inside the Docker Hub image repository. Anyone using these images was cryptojacked. This means that users unwittingly deployed cryptocurrency miners in the form of Docker containers, which then used computing resources to mine cryptocurrency for attackers.

Tesla Rogue Pod Attack

Cryptocurrencies are soaring in value, and the cloud’s unlimited computing resources make resource hijacking more profitable than stealing information. Automaker Tesla was one of the earliest victims of cryptojacking, when a Kubernetes cluster was compromised due to an administrative console not being password protected. 

The issue was discovered by RedLock Cloud Security Intelligence and disclosed in a report that informed the public of the issue - a misconfiguration helped attackers to gain access to Tesla’s AWS S3 bucket, where credentials were stored. These credentials were used to run a script on a Kubernetes pod that performed illicit cryptomining.

Jenkins Croptomining Exploit

Hackers exploited a Jenkins vulnerability to cryptomine approximately $3.5 million USD (10,800 coins in the Monero cryptocurrency), within 18 months.

In addition to exploiting vulnerable personal computers running Jenkins and Windows machines, this attack evolved to targeting Jenkins CI servers. This is a recent update of the malware, which continues to update itself and change the mine pool in order to evade detection.

Security Controls and Forensic Analysis for Kubernetes

Preparing an Incident Response Plan

An IRP is critical to ensure incidents are efficiently managed. A solid plan can help your organization to effectively recover from incidents, as well as prevent future incidents.

Here are several important phases every IRP should include:

  • Identification - incidents should be detected as accurately and early as possible. This is key to ensure effective incident response and management. This phase involves monitoring security events on the Kubernetes control plane and worker nodes, detecting incidents, and reporting on potential risks.
  • Coordination - once incidents are reported, DevOps teams must coordinate with SOC analysts to evaluate each event and ascertain if it is a real security incident. After analyzing the data, they initiate an incident response process.
  • Resolution - teams investigate the main cause of the reported incident. Responders strive to limit the impact and resolve all immediate risks. While remediating, DevOps and security teams implement necessary fixes and recover any affected data, services, and systems. A critical decision at this stage is whether to shut down the cluster, or known infected nodes, until the threat is eradicated.
  • Continuous improvement - after each new incident, DevOps teams and SOC responders learn new insights. Using this information, teams can fine tune Kubernetes clusters, implement new security measures, and improve the incident response process itself.

Make It Clear When to Escalate

Before launching your code into a production cluster, it’s important to understand your application and infrastructure security model. You should also clearly define what is considered a security incident that requires a response from your DevOps team. Additionally, you should set guidelines that explain when the in-house team should respond and when they should call in external experts.

Incident response, for example, can start when the ops team submits a potential event and classifies it as a security incident. The ops team then assigns the incident to the relevant security team member. 

Your IRP defines when outside security experts should be called as well as how teams are engaged. Developing these processes is critical to ensuring effective incident response, as cluster administrators are the first to discover and identify security issues.

Container Forensics

After implementing the necessary security mechanisms for your Kubernetes workloads and drafting an IRP, you should make sure that all roles involved in forensic analysis have access to the necessary information.

Here are several important data sources required for container forensics:

Logs  

Logs are critical for forensic investigations. For example, Kubernetes logs, cloud infrastructure logs, application logs, and audit logs. You should also analyze operating system logs, such as network connections, SSH sessions, processes, user logins, and executions.

Node snapshots 

You can take snapshots of a node’s disk. As needed, you might want to shift other workloads elsewhere and quarantine your nodes, then run additional analyses. During this time, you should be able to identify affected nodes and any attached disks. 

Be sure to create a duplicate of disks while they are online and send the duplicated images for analysis. You should also use the Docker explorer tool, and compare the differences in binaries on your disk snapshots.

Container visibility tools

Responders, whether they are part of a DevOps or other security analysts, should use the tools available to them when they work with Kubernetes and Docker. For example, the Docker statistics API can help responders obtain system metrics, which are highly useful for understanding how systems are impacted by its container load.

Container visibility tools can help responders detect activities occuring in the system and understand several aspects of these behaviors. For example, understanding if the files are expected or unexpected. The tool can also help you understand how to get real-time information without actually logging in, and how to remotely gather information from multiple sources.

Conclusion

In this article I explained the basics of Kubernetes incident response, and presented three essential components that will help your organization respond to and successfully contain an attack:

  • Preparing an incident response plan - defining the process of identifying an incident, coordinating with all stakeholders, resolving the incident, and making changes to the environment to prevent future incidents.
  • Make it clear when to escalate - give DevOps teams the tools and knowledge they need to identify warning signs of a cybersecurity incident, and clear thresholds for escalating issues to the SOC.
  • Container forensics - ensure you have logs of all security-relevant events in your environment, and the ability to quickly access these logs and perform forensic investigation in case of an attack.

I hope this will be of help as you build a coordinated effort to secure your Kubernetes environment.

Sascha Haase

Sascha Haase

VP Edge