Audit Logging in Clusters

August 19, 2021

Introduction

Whenever you change something in your cluster, you might want to log that change somewhere so that you can look it up later. Did you know that Kubernetes not only lets you log all changes but also fire events that respond to particular changes? Find out in this article how to do this!

Audit logging is a feature that was introduced in Kubernetes v1.11 and added new features in v1.13. Every event is logged as a JSON file that holds information about the event that might be useful for you when troubleshooting. This article provides an in depth view of and how to configure audit logging.

Why Do You Need Audit Logging?

One obvious use case for audit logging is cluster debugging. The logs record changes in the cluster state, and you can review them to see if misconfiguration is a source of the errors.

Another use case is troubleshooting. Since the logs record which request came from which component and provide a timestamp, you can use them as a time-series of events in your cluster that can tell you what went wrong.

You can also use audit logging for performance tuning. Since the logs document what happened and how often requests were made, looking at them is sometimes the only way to find out from the documentation if some types of requests get fired more often than you would expect and are causing a heavy load on the cluster.

What Can Audit Logs Tell You?

Audit logs record the following information about every request:

What happened?
When did it happen?
Who initiated it?
On which component did it happen?
Where was it observed?
From where was it initiated?
To which component was it going?
Where audit log events are generated

All audit log events are generated in the Kubernetes API server. Kube-apiserver is the central component of a cluster and controls the cluster state. All requests that modify the state of the cluster pass through the API server. This makes kube-apiserver the ideal choice for implementing audit logging.

Example of an Audit Log Event

Each event is recorded as a JSON document. To see an example of an audit log event, you can install the JSON parser jq from here and search the audit logs with

tail -l /var/log/audit/audit.log | jq .

This command finds one logged event, forwards it to the JSON parser jq for pretty formatting and prints out the event. Here is an example:

{
"kind":"Event",
"apiVersion":"audit.k8s.io/v1beta1",
"metadata":{ "creationTimestamp":"2018-03-21T21:47:07Z" },
"level":"Metadata",
"timestamp":"2018-03-21T21:47:07Z",
"auditID":"20ac14d3-1214-42b8-af3c-31454f6d7dfb",
"stage":"RequestReceived",
"requestURI":"/api/v1/namespaces/default/persistentvolumeclaims",
"verb":"list",
"user": {
 "username":"irina@loodse.com",
  "groups":[ "system:authenticated" ]
},
"sourceIPs":[ "172.20.66.233" ],
"objectRef": {
  "resource":"persistentvolumeclaims",
  "namespace":"default",
  "apiVersion":"v1"
},
"requestReceivedTimestamp":"2018-03-21T21:47:07.603214Z",
"stageTimestamp":"2018-03-21T21:47:07.603214Z"
}

Searching Audit Logs Using Falco

Most users rely on jq to search audit logs, but Sysdig developer Mark Stemm states that the Falco tool is “the easy way” to search audit logs and is more user-friendly. If you are interested in Falco, you can view his intro talk here.

Components of Audit Logging

There are two components that handle the configuration of audit logging. First is the Audit Policy which controls which events get logged. The second component is Audit Backend which handles the persisting of audit events to an external storage. Let’s look at both components in more detail.

Audit Policy

The audit policy defines the configuration of audit logging. It controls which events go into the audit stream. The audit policy is defined in a YAML file, in which you can stipulate for every type of event as to whether it gets logged and at what level.

Here is an example of an audit policy YAML file:

apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
  - "RequestReceived"
rules:
  # Log pod changes at RequestResponse level
  - level: RequestResponse
    resources:
    - group: ""
      # Resource "pods" doesn't match requests to any subresource of pods,
      # which is consistent with the RBAC policy.
      resources: ["pods"]
  # Log "pods/log", "pods/status" at Metadata level
  - level: Metadata
    resources:
    - group: ""
      resources: ["pods/log", "pods/status"]

  # Don't log requests to a configmap called "controller-leader"
  - level: None
    resources:
    - group: ""
      resources: ["configmaps"]
      resourceNames: ["controller-leader"]

The format is straightforward. First you can use the parameter “omitStages” to define the stages at which requests don’t get logged. In this case these are events in the “RequestReceived” stage. What you need to know here is that requests don’t just get recorded once, they can actually get logged at various stages in the lifecycle of a request.

Stages of a Request

The four stages at which a request can be logged are defined as follows:

RequestReceived: You can log requests before the server has started issuing a response. In our case the administrator has decided that this stage is too inconsequential to be part of the logs.
ResponseStarted: The headers of the request have been completed but no response body has been sent off.
ResponseComplete: Requests get logged at this stage when their response body has been completed.
Panic: This is the stage that gets logged if a panic has occurred.

The parameter ”rules” is the only required parameter in the Audit Policy YAML file. This is where you can define which requests get logged at which level. For example, the first rule states that all changes in pods are logged at the RequestResponse level.

Levels of an Event

The first rule dictates the audit level of the event. The defined levels are:

None: Events that match this rule shouldn’t get logged.
Metadata: Only logs the metadata of the request but not its request or response body.
Request: Logs the metadata and the request body of the event but not the response body.
RequestResponse: Logs event metadata, request and response body. This is the most informative level but can generate a high load on your cluster.

Audit Backends

Audit backends allow you to write your audit logs to an external storage. Kubernetes provides two options for this:

the log backend writes logs to a filesystem
the webhook backend sends events to an external HTTP API

Log Backend

The log backend uses the JSONlines format to write logs to a file. To configure the log backend you have to provide the file to the kube-apiserver using the flag --audit-log-path.

Webhook Backend

The webhook backend sends your audit events to an external API. You can configure it by providing a configuration file with the kube-apiserver flag --audit-webhook-config-file.

Conclusion

Audit logging is a secure, sequential record of all actions in a cluster, which audits user-generated activities, applications that use kube-apiserver, as well as those generated by the control plane. Audit logging provides useful features for troubleshooting and performance tuning in Kubernetes clusters.