Scaling#

This chapter is about strategies for scaling controllers and the tradeoffs these strategies make.

Motivating Questions#

Why is the reconciler lagging? Are there too many resources being reconciled?
What happens when your controller starts managing resource sets so large that it starts significantly impacting your CPU or memory use?

Scaling an efficient Rust application that spends most of its time waiting for network changes might not seem like a complicated affair, and indeed, you can scale a controller in many ways and achieve good outcomes. But in terms of costs, not all solutions are created equal:

Can you improve your algorithm, or should you throw more expensive machines at the problem?

Scaling Strategies#

We recommend trying the following scaling strategies in order:

#Controller Optimizations (minimize expensive work to allow more work)
#Vertical Scaling (more headroom for the single pod)
#Sharding (horizontal scaling)

In other words, try to improve your algorithm first, and once you've reached a reasonable limit of what you can achieve with that approach, allocate more resources to the problem.

Controller Optimizations#

Ensure you look at common controller optimization to get the most out of your resources:

minimize network intensive operations
avoid caching large manifests unnecessarily, and prune unneeded data
cache/memoize expensive work
checkpoint progress on .status objects to avoid repeating work

When checkpointing, care should be taken to not accidentally break reconciler#idempotency.

Vertical Scaling#

increase CPU/memory limits
configure controller concurrency (as a multiple of CPU limits)

The controller::Config currently** defaults to unlimited concurrency and may need tuning for large workloads.

It is possible to compute an optimal concurrency number based the CPU resources you assign to your container, but this would require specific measurement against your workload.

Aggressiveness meets fairness

A highly parallel reconciler might be eventually throttled by apiserver flow-control rules, and this can clearly degrade your controller's performance. Measurements, calculations, and observability (particularly for error rates) are useful to identifying such scenarios.

Sharding#

If you are unable to meet latency/resource requirements using techniques above, you may need to consider partitioning/sharding your resources.

Sharding is splitting your workload into mutually exclusive groups that you grant exclusive access to. In Kubernetes, shards are commonly seen as a side-effect of certain deployment strategies:

sidecars :: pods are shards
daemonsets :: nodes are shards

Sidecars and Daemonsets

Several big agents use daemonsets and sidecars in situations that require higher than average performance, and is commonly found in network components, service meshes, and sometimes observability collectors that benefit from co-location with a resource. This choice creates a very broad and responsive sharding strategy, but one that incurs a larger overhead using more containers than is technically necessary.

Sharding can also be done in a more explicit way:

1 controller deployment per namespace (naive sharding)
1 controller deployment per labelled shard (precise, but requires labelling work)

Explicitly labelled shards is less common, but is a powerful option. It is used by fluxcd via their sharding.fluxcd.io/key label to associate a resource with a shard. Flux's Stefan talks about scaling flux controllers at KubeCon 2024.

Automatic Labelling

A mutating admission policy can help automatically assign/label partitions cluster-wide based on constraints and rebalancing needs.

In cases where HA is required, leases can be used gate access to particular shards. See availability#Leader Election