Operations Infrastructure Month in Review #3
What’s this about?
When you download something from the internet, a common method for determining both integrity and authenticity of an object is to generate a cryptographic hash of it, and compare it to what you expect. For example:
$ curl -LO https://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2017-08-17/2017-08-16-raspbian-stretch-lite.zip $ echo "52e68130c152895905abe66279dd9feaa68091ba55619f5b900f2ebed381427b 2017-08-16-raspbian-stretch-lite.zip" | shasum -a 256 -c
52e68130c152895905abe66279dd9feaa68091ba55619f5b900f2ebed381427b is the content adressable identifier for the
2017-08-16-raspbian-stretch-lite version of Raspbian. I verified both it’s integrity, and authenticity by checking the sha256 hash of what I downloaded, with what I know to be the expected hash.
How confident are you that your infrastructure is running the Docker images that you expect to be running? If you deploy Docker images by tag, for example
:latest, you shouldn’t be. It’s the equivalent of performing the download above, without ever checking the sha256 hash of what we downloaded.
Let’s say you’re using Docker Hub to store images, and you’re also deploying Docker images to your infrastructure by specifying a tag, like
Now, one day, you discover that someone’s Docker Hub credentials on your team have been exposed. Docker Hub doesn’t support MFA, so you know an attacker could have had push access to your repositories.
How can you be sure that the
acme/app:v1.2.11 you’re running, hasn’t been overwritten with a malicious version? Short answer is you can’t, because you’re not verifying what you’re downloading from the internet.
Has this ever happened to us? No, but it’s a scary thought.
The answer to this in the Docker world are digests (there’s also Content Trust, but I won’t get into that in this post).
Images that use the v2 or later format have a content-addressable identifier called a digest. As long as the input used to generate the image is unchanged, the digest value is predictable.
Instead of specifying
acme/app:v1.2.11, we should have been specifying the content addressable identifier for that tag;
When we use the digest as the identifier, Docker will not only pull the image with that digest, but also calculate the sha256 digest of what we downloaded, and verify it against what we specified.
This provides a number of protections:
docker pull’s; content addressable identifiers can never change, so they can be cached efficiently.
Can we do something better as an industry to build content addressability into dependency management? This is one of the reasons why I’m excited about distributed content addressable filesystems like IPFS. If everything you depend on is specified by content, then the only attack vector is by using a dependency that you consciously ignored to review.
Never deploy tags to production, always use the content addressable identifer. Mutability is the devil of security and stability.
This article has to do with understanding user research from an engineering perspective. Whether you’re curious about the different types...