Operations Infrastructure Month in Review #3
What’s this about?
One of the most important aspects of our security strategy in the Operations Engineering Team is to mitigate the risk of leaked AWS credentials. Even if you follow AWS best practices of putting your infrastructure within a VPC, leaking AWS credentials provides keys to the castle.
This post describes the strategy that we take to reduce the probability of AWS credentials being leaked, as well as reducing the risk in the event that they are leaked.
In the beginning, there were long lived credentials.
The most straight forward way to give an application access to make requests to AWS APIs is to create an IAM user, generate an access key, and then pass the access key id and secret to the application. However, this has a number of problems:
ecs:DescribeTaskDefinitionpermission will have access to all secrets.
An alternative method is to use EC2 Instance Profiles. Instance profiles allow you to attach an IAM role to an EC2 instance, and applications running on the host can access an “instance metadata” endpoint to obtain temporary AWS credentials. This solves both trust (The EC2 instance authenticates itself with AWS) and credential expiration (credentials obtained from instance metadata only last for 1 hour, greatly minimizing the impact of a leak).
However, in the context of ECS, instance profiles have their own set of problems:
ECS introduced Task Roles, which are similar to Instance Profiles, allowing you to attach an IAM role to an ECS task. This seems to solve all of our problems:
AWS_CONTAINER_CREDENTIALS_RELATIVE_URIenvironment variable within the container.
At Remind, we use Stacker to manage all of our infrastructure, and then we run our services and applications with Empire. Through Stacker, we have a base “blueprint” for each Empire app which (among other things):
Doing this ensures that we have a common starting point and convention for managing how all of our applications and services access AWS API’s.
Not everything that we run on our ECS container instances gets run with ECS/Docker. We wanted to be able to continue using instance profiles for software running outside of Docker (generally infrastructure processes, like the Amazon SSM/ECS agents), but with the assurance that our Docker containers (user facing applications) would not be able to access userdata, or IAM credentials from the instance profile.
To do this, any requests from Docker containers going to the instance metadata endpoint get re-routed to an nginx proxy on host. This proxy denies the container access to the instance metadata endpoints for userdata and IAM credentials:
This has the added benefit that any requests to instance metadata initiated from a Docker container gets logged and forwarded to our log aggregation service.
With the above in place, if an application or service that we run with ECS were to introduce an exploit that allowed an attacker to make arbitrary GET requests, we can be more confident that the attacker won’t be able to obtain AWS credentials.
This article has to do with understanding user research from an engineering perspective. Whether you’re curious about the different types...