CVE-2022-0185 in Linux Kernel Can Allow Container Escape in Kubernetes
Last week, a new high-severity CVE was released that affects the Linux kernel. This vulnerability provides an opportunity for an attacker who has access to a system as an unprivileged user to escalate those rights to root. To do this, the attacker must have a specific Linux capability, CAP_SYS_ADMIN, which reduces the risk of breakout in some container cases. But in many Kubernetes clusters, it's likely that an attacker could exploit this issue.
At the moment, there is no public exploit code for this issue. However, one of the researchers who found it has posted a proof of concept showing a container breakout, and it's expected that exploit code will be released soon.
Exploitability for container breakout
When considering whether this vulnerability could be exploited to escape from a standard containerized environment, we can look at the vulnerability notification that had this section:
“Exploitation relies on the CAP_SYS_ADMIN capability; however, the permission only needs to be granted in the current namespace. An unprivileged user can use unshare(CLONE_NEWNS|CLONE_NEWUSER) to enter a namespace with the CAP_SYS_ADMIN permission, and then proceed with exploitation to root the system.”
The CAP_SYS_ADMIN capability is not in the standard set provided by Docker or other containerized environments, unless it has been added, either specifically or by using the --privileged flag when starting the container.
However, the advisory also notes that unprivileged users could exploit this vulnerability by using the unshare Linux command to enter a new namespace, where they can get the capability to allow exploitation of this issue.
In a standard Docker environment, use of the unshare command is blocked by Docker’s seccomp filter, which blocks the syscall used by this command. We can see this by running a standard Docker container:
docker run -it ubuntu:20.04 /bin/bash
At this point, it's important to note that when Docker (or other CRIs) are used in a Kubernetes cluster, the seccomp filter is disabled by default, so this vulnerability could be exploited in those cases. We can see the difference by running a container in Kubernetes:
kubectl run -it ubutest2 --image=ubuntu:20.04 /bin/bash
Once we have the container running, we can check which capabilities are present by installing and using the pscap utility:
root@ubutest2:/# pscap -a
At the moment, the relevant capability is not present. Now if we use the unshare command, we can see that it’s not blocked and our new shell has “full” capabilities, making the system vulnerable to this issue:
|root@ubutest2:/# unshare -r
# pscap -a
ppid pid name command capabilities
0 1 root bash chown, dac_override, fowner, fsetid, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap
1 270 root sh full
All systems at risk of this vulnerability should apply the patch for their Linux distribution as quickly as possible. Installation of this patch will likely require a reboot of the host to be effective.
Where that’s not possible, there are some other options to reduce the risk of container escapes using this vulnerability. First, organizations should minimize the use of privileged containers that will have access to CAP_SYS_ADMIN.
For unprivileged containers, ensuring that a seccomp filter is in place that blocks the unshare call will reduce the risk. This filter should be in place by default for all Docker installations. However, for Kubernetes, some additional work will be needed.
For individual workloads, the seccomp setting can be put in place in the securityContext field of the workload definition.
There's also a plan to allow cluster operators to enable a seccomp profile by default for all workloads in a cluster. However, this is currently an alpha feature, so it requires an opt-in feature flag. Hopefully, this feature will graduate to beta in Kubernetes 1.24, which would make it more widely available.
Another option to mitigate exploitation from unprivileged containers is to disable the user’s ability to use user namespaces at a host level. This can be done by setting a sysctls on the host without rebooting, although care is required to ensure that it does not disrupt the operation of the system.
For example, on Ubuntu based distributions the following command will disable this feature:
sudo sysctl -w kernel.unprivileged_userns_clone=0
Container environments consist of several layers, and as a result, cluster operators must pay attention to security issues in each of these locations. Ultimately, most containers rely on the security of the Linux kernel, so it’s important to resolve any security issues promptly to ensure that your clusters remain secure.
CVE Resource: https://www.openwall.com/lists/oss-security/2022/01/18/7