Improving Kubernetes Security: Work on Your MUT
Every organization has a limited amount of time to spend on security, and sometimes it seems like there’s a never-ending panoply of things that need attention. In such a world, how do you pick where to start? A concept that I find useful is focusing on improving your “mess-up tolerance”, or MUT. As part of my Kubernetes Community Days UK talk on getting started with Kubernetes security, I covered some ideas about how organizations can use MUT to help prioritize their security efforts.
What is MUT?
This is an idea that I’ve shamelessly borrowed from a talk I saw some years ago by Dr. Ian Levy from the UK National Cyber Security Centre (NCSC), although he used a rather more direct version of it.
The basic idea is that mistakes or mess ups will happen in every system. There will be exploitable bugs in software code, configurations will be mistakenly applied, and incorrect assumptions will be made about how things operate, leading to security exposures.
Mess-up tolerance is, “How many mess-ups have to happen before something goes badly wrong for my organization?” The higher the number, the more things must go wrong at the same time, and the less likely it is, that you’ll have a bad security incident.
So, when you’re getting started with Kubernetes security, you should focus on things that are likely to increase your MUT to reduce the risks of a breach.
An example of MUT improvement
Let’s provide a real-world example of how companies can make changes to their Kubernetes configurations to improve their MUT.
As we’ve covered on the blog previously, a common configuration for managed Kubernetes clusters is to have the Kubernetes API server on the Internet with anonymous authentication enabled. With this configuration, if a user applies a Kubernetes role-based access control (RBAC) manifest allowing privileged access to the system:unauthenticated group, an attacker would be able to compromise that cluster and potentially all its applications instantly.
This is essentially a MUT of 1 — a single mistake could cause a serious security issue. So, how would we improve this?
A first step would be to disable anonymous authentication on your Kubernetes cluster. This is usually needed for monitoring systems but is often unused and just present as the default configuration. If you change this option, the attacker would need a valid set of credentials with the necessary privileges to compromise the cluster. That’s better, but still a bit risky.
The other obvious improvement would be to avoid putting the Kubernetes API server directly on the Internet. This is again a common default, but in many cases there’s no real need to have it exposed. Using a VPN, bastion server, or the cloud provider’s remote access services can prevent the API server from being directly exposed.
This would improve our MUT again. Now the attacker would need to compromise our remote access service, find the Kubernetes API server, get valid credentials for it, and have an account that has sufficient rights to compromise the cluster.
It’s still possible for this to happen — remember, there is no such thing as absolute security — but taking these hardening steps has made it much less likely.
When starting with Kubernetes security — or indeed any other type of security — it’s important to focus on where you can make meaningful improvements quickly. It can be tempting to dive into complex security concepts like zero trust networks or using mTLS everywhere. But in reality, more basic measures like reducing your attack surface by removing unnecessary services from the Internet or making sure that all your management APIs require authentication to access them will provide a larger improvement to your overall security posture.
For more practical security advice, check out my talk at KCD-UK on getting started with Kubernetes security: