Saturday, November 19, 2022

K8S Events

A few notes on K8S Events.  

K8S at its core is a database of configs - with a stable and well defined schema. Different applications (controllers) use the database to perform actions - run workloads, setup networking and storage. The interface to the database is nosql - with a 'watch' interface similar to pubsub/mqtt that allow controllers to operate with very low latency, on every change.

Most features are defined in terms of CRDs - the database object, with metadata (name, namespace, labels, version ), data and status. The status is used by controllers to write info about how the object was actuated, and by users to find out. For example a Pod represents a workload - the controllers will write the IP of the pod and 'running' in status. Other controllers will use this information to update other object - like EndpointSlice. 

K8S also has a less used and more generic pubsub mechanism - the Event, for 'general purpose' events.

Events, logs and traces are similar in structure and use - but different in persistence and on how the user interacts with them. While 'debugging' is the most obvious use case, analyzing and using them in code, to extract information and trigger actions is where the real power lies.  

The CRD 'status' is persistent and treated as a write to the object - all watchers will be notified, the writing is quite expensive.  Logs are batched and generally written to specialized storage, and deleted after some time - far cheaper but harder to use programmatically, since each log system has a different query API. 

In K8S events have 1h default storage - far less than logs, which are typically stored for weeks, or Status - which is stored as long as the object lives. K8S implementation may also optimize the storage - keep them in RAM longer or using optimized storage mechanisms.  In GKE (and likely others) they are also logged to stackdriver - and may have longer persistence. 

Events are associated with other objects using 'involvedObject' field, which links the event to an object, and is used in 'kubectl describe'. This pattern is similar to the new Gateway 'policy attachment' - where config, overrides or defaults can be are attached to other resources.  

```
# Selectors filter on server side.
kubectl get events -A --field-selector involvedObject.kind!=Pod

kubect get events -A --watch
```

Watching the events can be extremely instructive and reveal a lot of internal problems - Status also includes errors, but you need to know to watch a particular object. 

As a 'pubsub' system the Events are far from ideal - both as storage, API and feature set - but they are close in semantics and easy to bridge to a real pubsub, and for K8S they are very useful. 

In the past I tried to add more Events to Istio - there was some interest but never got to finish the PR, maybe with Ambient we can try again. The real power of Events is not for debugging, but in synchronizing between applications in real time, for example propagate the IP address and info about a node as soon as it connects to the control plane. 

CNCF CloudEvents provides an API and integrations with various messaging and pubsub systems - it is a bit over-designed and more complex then it needs to be, but the integrations make it useful and it provides a basic HTTP based interface that is easy to work with. 

Istio also provides some events over XDS - and can also act as a bridge, to allow components using a control plane to get both configs and events. 

Links:

  • https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#event-v1-core
  • https://www.bluematador.com/blog/kubernetes-events-explained - how to watch and filter
  • https://www.cncf.io/blog/2021/12/21/extracting-value-from-the-kubernetes-events-feed/
TODO:
  • Evaluate CloudEvents integrations with K8S Events and 'real' pubsub
  • Extend Istio XDS 'debug' bridge to Events, evaluate use for sync and ambient info if Events are as reliable as pubsub.
  • Generate events from Istiod - connect/disconnect are clear. Warnings about bad configs are unlikely to be good unless frequency can be controlled.

No comments: