Achieving a Self-Securing Infrastructure in Public Clouds
One thing that most researchers look for when investigating security breaches is whether or not there’s a common element. In the case of security breaches, the kill-chain for almost every cloud security breach we looked at involved the exploitation of misconfigured or mismanaged cloud infrastructure settings. The reason for this is simple; there are just too many settings to track. But even if you could track them all, the time required to manually fix them would still leave you exposed.
That’s why the remediation of public cloud configuration issues is the primary mission of Aqua CSPM. In fact, the latest release of CSPM includes even more flexible remediation options, enabling customers to safely invoke assisted, manual, or automated remediation – which is key in a self-securing infrastructure. But before describing these new options, let's start by revisiting the notion of misconfigurations.
Configuring Your Infrastructure is Not as Easy as You May Think
A possible culprit for misconfigurations may simply be the success of cloud platforms. Their massive adoption worldwide and the continued rapid evolution of these platforms, has led to a shortage of expertise in this area.
Today’s sophisticated cloud platforms include hundreds of services with thousands of different configurations. For example, as of 2020, AWS comprises at least 212 services, including computing, storage, networking, database, and many more. Each one of these services represents a complex service by itself, with many different possible configurations – and mastering all of these options is anything but easy. And keeping up with continuous updates and best practices makes the job that much harder.
Another aspect of complexity in maintaining a well-configured infrastructure is the distributed nature of cloud platforms. Permissions can easily be delegated and scoped so that individual teams can work independently in an agile way and self-maintain their portion of the infrastructure. However, all these great enablement options come at the price of less control over who and how changes are made. This means that more people – with varying skill sets – can perform configuration tasks that in the past were reserved only for highly-trained ops teams. But now, the chances of someone misunderstanding the downstream implications created by a cloud configuration change has grown dramatically.
All these factors create the need for a system that constantly and systematically addresses these risks as soon as they are discovered.
Borrowing from the world of Cloud Operations, what’s needed is a tool that enables a self-healing infrastructure in complex and dynamic cloud environments. You need a solution that can autonomously scan, monitor, and remediate misconfiguration issues – and do all of this continuously and almost instantly.
This was the driving force behind our addition of self-securing, autonomous remediation to Aqua CSPM. However, to bring this level of autonomous remediation to market, several hurdles had to be overcome.
Trusting the Security Model
To start, fixing misconfigurations requires write permissions in the cloud account to perform the remediations. And since remediation might be needed for any cloud object, traditional CSPM solutions could end up requesting write permissions for practically everything. In other words, this effectively requires Owner-level permission sets to be granted to a third party – which clearly violates the principle of least-privilege access. For most organizations, this is simply a nonstarter.
A traditional approach to resolve this challenge involves the installation of remote components executing the remediation on the customers’ side (e.g., functions). These remote agents become the proxy for that owner-level permission. The challenge here is the security of the agents themselves, as well as the need to deploy and maintain these owner-level agents across all cloud accounts. For many large organizations, simply maintaining hundreds or more cloud accounts is a real challenge in and of itself.
Like all components of Aqua CSPM, the security of our remediation features is paramount. So, we’ve developed a security model that grants you complete control over how, when, and with which permissions Aqua connects to your account – while still being easy to use with minimal configuration requirements. In this model, a rotating key is used. The key is never stored by us and the customer only provides it (manually or automatically) upon request. By design, we don’t have access to perform any remediation without an explicit supporting action by the customer.
Figure 1 - Creating a remediation policy in CSPM
You can read more about the security model for each remediation type here.
Trusting the Automation
A second major consideration is the need to be absolutely sure that the automation will avoid inadvertently disrupting the production environment by “fixing” something that should have been ignored. When a person fixes a “problem,” they should consider (we hope) the entirety of the app environment when assessing the potential risk created by implementing a fix. For example, imagine that you made a “discovery” of an S3 bucket that’s open for public access. Should it be fixed? It really depends on the context, is it mistakenly exposing sensitive data, or are the S3 buckets correctly hosting static web pages for your app? Finding the right answer requires context.
Figure 2 – Choosing an encryption key for remediation of an unencrypted S3 Bucket.
For an automated system to work, the system needs to understand the context and decide accordingly what to do. We need a system that collects contextual information and applies analytics with policies to decide whether to fix a problem or not. If your organization is not 100% sure they can trust automated remediation, the system instead needs to be able to create an alert about suspected misconfigurations and switch to a semi-automatic remediation process. This is a procedure where the decision making is left for humans, while the rest of the remediation process is automated.
At the heart of the Remediations feature, is the concept of policies. By default, Aqua CSPM will not make any changes to your account. Instead, we follow an explicit opt-in process for enabling the feature in your account. Once enabled, you must also define a policy that gives Aqua explicit permission to make the changes when requested.
Figure 3 – Aqua CSPM remediation types.
You can read more about remediation policies here.
Keeping Track of Remediations
The last expectation before trusting the automation process is that the system will include full visibility into what was changed, why it was changed, who initiated the change, and how this change can be rolled back if needed.
Whether the remediation is assisted, manual or automated, every remediation that Aqua CSPM performs is heavily audited, including a full picture of who initiated the remediation task, an image of the object before the change, the API that was used to make the change and an image of the object after the change. In addition, fail-safes are included at every step of the process to ensure that errors are caught and reported back to you. Reporting is also available on top of this data so that you can govern the use and frequency of the remediation tasks and use this data for continuous organizational improvements.
Figure 4 – Sample of remediation steps.
Remediation is Here Now
Since so many cloud security breaches have exploited misconfigurations in common, simply hoping that your teams implement changes with your entire cloud native infrastructure in mind is not a sustainable strategy. That’s why Aqua CSPM now offers assisted, manual, and auto-remediations.
The solution includes a robust set of flows that can match the needs of different situations and balance the level of automation needed – in context – for each security finding.
Learn more about Aqua CSPM Remediation Feature Overview
Get started with Aqua CSPM on Aqua Wave