How Do Containers Contain? Container Isolation Techniques
If you work with containers long enough, you already know that containers should not be considered as security boundaries. In this blog, we’ll explore how different container isolation techniques intend to provide a solution to this problem, and whether their strengths and weaknesses make them a practical choice.
Container isolation techniques
Broadly speaking, for Linux containers, there are three approaches to isolating the contained application from the underlying host and from other applications running in the environment.
- Linux containers -Container runtimes like containerd (used by Docker or directly) or CRI-O isolate containers using Linux namespaces and use other operating system features such as cgroups, seccomp filters, and capabilities to restrict what the contained process can do. All the containers running on a single VM share a single operating system kernel. This is by far the most common use case.
- Sandbox-based containers - With a sandbox like Google gVisor, the contained processes still effectively share a kernel, but instead of using Linux features for isolation, a dedicated security sandbox is used to provide any resources (such as networking) and to arbitrate calls made by the application that require kernel resources.
- VM-based containers - With software like AWS Firecracker, a hypervisor is used to provide isolation. Generally lightweight VMs are used to minimize any performance impact. Isolation is effectively the same as with any hypervisor setup, with the added advantage that many of the features of standard hypervisors, such as full virtual hardware support, can be disabled as they are not required.
A spectrum of isolation
Now that we’ve laid out our three approaches to isolation, we can return to the question of security boundaries. Critically, none of these approaches provides perfect isolation, because they can all have vulnerabilities. However, what we can look at is the relative attack surface of each one. The larger the attack surface, the more likely it is that breakouts will occur.
Of these three approaches, process-based isolation as used by Linux containers clearly has the largest attack surface. In addition to the exposed Linux kernel, with all its complexity, the process of establishing new containers and managing their separation makes use of a number of different mechanisms that were not designed with this task in mind. As a result, there have been several breakouts over the years at different layers of the stack. The recently disclosed runc issue CVE-2021-30465 and the abstract shimmer issue in Containerd both show good examples of that complexity and how attackers may be able to exploit it to escape the confines of the container.
The larger attack surface and shared resources of process-based containers also show up in other areas. Additional controls are needed (for example, cgroups) to prevent denial of service attacks or the risks of “noisy neighbors” hosted on a shared VM. There also are risks related to potential information leaks between containers.
This isn’t to say that process-based containers provide no isolation at all; for some use cases, the isolation provided is going to be perfectly adequate. But for higher security scenarios, additional controls will be needed to supplement the containers’ isolation.
The contrast between process-based isolation and sandbox isolation is typically in design intent. Solutions like gVisor are designed specifically for security isolation of processes on a host and, as such, can focus on this use case, hardening the interface between the container and the underlying host.
There are still some areas where a sandbox environment will need additional controls — specifically around resource management, as the containers are present on the same host, so cgroup configuration will be required to mitigate the risk of noisy neighbors.
Hypervisor-based container isolation takes a different approach to the sandbox, by providing a full Linux kernel to each contained process. Hypervisors are generally designed to provide a security boundary and have a considerably smaller attack surface than a full Linux kernel. Additionally, in cases such as Firecracker, the hypervisor interface is further hardened to remove facilities like virtual hardware, which has been a source of traditional breakout attacks.
Of course, as with any technology choice, there are trade-offs in choosing an approach to container isolation. The first is a possible performance impact. While Firecracker has done considerable work in improving performance over earlier approaches to VM-based container isolation, starting a full Linux kernel per container will have some level of impact. Some studies have shown that sandbox approaches like gVisor will also tend to have an impact on performance.
The other area where using additional isolation layers may impact your container use is where your containers make use of the flexibility inherent in Linux containerization. For example, some containers need access to the host’s network stack for monitoring, and security solutions that are designed to work by analyzing container processes would likely need to be adapted to work with sandboxed or VM solutions.
Choosing a container isolation approach
So which of these approaches is right for your containers? In general, this will depend on the threat model of the application or environment you’re operating in, but there are some general principles to consider.
Ensuring that a process-based container being launched by an untrusted user cannot break out to the underlying host is a complex process that requires cooperation among a number of projects from different groups, and the interface between the container and host has been subject to several recent vulnerabilities (see Part I). If your applications are launched by a trusted team and are primarily internally facing, the risk of breakout attacks is reduced, and process-based isolation is likely to provide an appropriate level of isolation.
Since sandboxes or hypervisors can sometimes limit the flexibility inherent in a container security model, one of the best solutions to enforcing container isolation while still maintaining full flexibility is to combine strong default controls with a third-party security solution. For example, if you want additional protection for your workloads, areas such as admission control (to restrict the security context of containers in your environment) and runtime security controls (to detect and respond to attacks) can help mitigate these risks.
Where you are looking to maintain segregation so that users launching containers are not fully trusted (for example, in a multi-tenant Kubernetes cluster) consider using sandboxes or hypervisors for isolation.
Several approaches to container isolation are available, each with different properties that will suit different scenarios. Choosing the right one for your applications is an important part of your container security architecture.