Baking AWS EC2 Image

Definition: The term ‘Baking’ is refer to the process of creating machine images.

There’s a relatively new concept in the IT world called immutable infrastructure. This is the idea that once you create a server you should never change it’s running configuration. The advantages of this approach include: avoidance of configuration drift; no need to patch running systems; and no need for privileged access to running systems.

Configuration drift is where, over time, administrators log on to running systems and make changes. Unfortunately these changes are often undocumented and in some cases not persisted, so they aren’t applied on reboot. This leads to lots of unique servers which are impossible to manage at scale.

Everyone should be familiar with the idea of patching running servers. In my experience performing patching of live systems never goes smoothly, often due to the aforementioned configuration drift. If we don’t need to change the configuration of a running server, nor to patch it, then we’ve reached the system where there’s no need to log on as root or administrator. This is great news for tightly regulated organisations who often have to worry about privileged insider threats and spend vast sums of money to build systems that monitor what their administrators are doing.

The way to create immutable infrastructure, and to achieve these benefits, is to create a master image and use this to instantiate all of your servers. If you want to modify a server, changing it’s configuration or patching it, then you update your master image and redeploy your servers in a rolling upgrade. This may sound like a lot of work, but by adopting the processes and tooling of DevOps it’s actually quite simple to get up and running

I’m doing a lot of work with Amazon Web Services (AWS) at the moment and their master images are called Amazon Machine Images (AMI). AWS also provides a number of DevOps tools that we can use to automate the process of creating AMIs.

Building an AMI with Packer

I started out by creating an AMI manually using the Packer tool from Hashicorp. Packer is an opensource application written in Go that is design to automate the production of machine images. The images are generated by taking a base image and then customising it based on a configuration file. For the purposes of my proof of concept I used the following Packer configuration file:

<pre><code class="css">
{
    "builders": [{
        "type": "amazon-ebs",
        "region": "eu-west-1",
        "vpc_id": "vpc-4925e72e",
        "subnet_id": "subnet-4d13d12a",
        "source_ami": "ami-01ccc867",
        "instance_type": "t2.micro",
        "ssh_username": "ec2-user",
        "ami_name": "yum-upgrade "
    }],
    "provisioners": [{
        "type": "shell",
        "inline": [
            "sleep 30",
            "sudo yum update -y"
        ]
    }]
}
</code></pre>

The first part of the file, the builder, describes how the image will be built. In this example I am building an “amazon-ebs” image, i.e. an AMI backed with an Elastic Block Storage filesystem. The other values specify things like the AWS region, VPC, and EC2 instance type that will be used for the build process. One of the key fields is “source_ami”, this field specifies the base AMI to use, here I am using the latest Amazon Linux AMI available at the time of writing.

The second part of the file, the provisioner, describes how the base image should be customised. In this example all I am doing is running YUM to apply all of the available package updates using an inline shell provisioner. There are lots of other provisioners described in the Packer documentation that may be more useful for complex configurations.

The other prerequisite that you need is a set of valid AWS credentials. Check the AWS documentation on how to set these up.

Once you’ve got your credentials configured you should save the configuration file as packer.json, and you can then check it’s validity by running:

packer validate packer.json

Assuming there’s no syntax errors, building an AMI is as simple as:

packer build packer.json

The build might take a while to run, but once it’s finished you should be able to look at the AMIs section of the EC2 web console and see your newly baked image!

###Automating the Process

The source code for my proof of concept AMI bakery is available from my GitHub account.

The automated process works by creating an AWS CodePipeline that is triggered by changes to an AWS CodeCommit Git repository. The pipeline has two stages: a source stage that monitors the Git repository and a build stage which is an AWS CodeBuild process that runs the Packer command that will produce our new AMI. For simplicity I’ve written AWS CloudFormation templates to deploy all of these services and their supporting AWS IAM roles. For the steps to do this, see the README in the GitHub repository.

###AWS CodeCommit AWS CodeCommit is a managed Git service, similar to GitHub. The service isn’t as feature rich as GitHub, but it has the advantages of being tightly integrated with the other AWS services and of using AWS IAM roles to control access. AWS CodePipeline supports GitHub Git repositories as well, though there are a couple of extra integration steps needed to setup access.

To create the AWS CodeCommit repository, deploy the codecommit.yaml AWS CloudFormation template using either the AWS web console or the CLI.

###AWS CodeBuild AWS CodeBuild is a fully managed build service that covers all of the steps necessary to create software packages that are ready to be installed – compilation, testing, and packaging. AWS CodeBuild works by processing a build specification YAML file that describes the build environment and the build steps. Build environments are supplied as Docker containers, AWS provides a number of pre-built containers for common languages and platforms such as Java, Python, and Ruby.

Unfortunately, Packer is not one of the supplied build containers, fortunately with AWS CodeBuild you can supply your own container. This is the Dockerfile I put together to run Packer on the AWS CodeBuild service:


                       FROM ubuntu
                       
                       RUN apt-get update && apt-get -y install curl unzip jq && \
                           curl -o packer.zip https://releases.hashicorp.com/packer/1.0.0/packer_1.0.0_linux_amd64.zip && \
                           unzip packer.zip
                       
                       CMD ["/packer"]

Normally I would have built a minimal Packer container, but AWS CodeBuild requires a bunch of other commands to function and I couldn’t find these listed in the documentation, so I went with the quick solution of copying what Amazon do themselves!

AWS CodeBuild needs to pull the container from a registry. You can use the Docker Hub container registry, but I chose to use the AWS Elastic Container Registry because it integrates with AWS CodeBuild using IAM roles which makes configuring security simpler. To create the AWS Elastic Container Registry, deploy the ecr-repository.yaml AWS CloudFormation template using either the AWS web console or the CLI.

With the registry created, building and uploading the Packer container is simple:

 
docker build --rm -t /packer:latest .
aws ecr get-login --region AWSREGION

Run the docker login command that’s output by aws ecr …, then:

 
docker tag /packer:latest AWSACCOUNT.dkr.ecr.AWSREGION.amazonaws.com/packer:latest
docker push AWSACCOUNT.dkr.ecr.AWSREGION.amazonaws.com/packer/latest

The final piece of configuration for AWS CodeBuild is the buildspec.yml file. Normally, I would just need a single phase, build, which would invoke Packer. However, there was a bug in the AWS Go SDK which means that you need to manually setup the security credentials for Packer to be able to access EC2. This bug has been fixed and the next version of Packer should pick this up and the install phase can be removed.

To create the AWS CodeBuild project, deploy the codebuild-role.yaml AWS CloudFormation template and then the codebuild-project.yaml AWS CloudFormation template using either the AWS web console or the CLI. Note that you will need to edit the codebuild-project.yaml template to reflect your own values for the container image and the source location.

###AWS CodePipeline AWS CodePipeline is the glue that connects the AWS CodeCommit Git repository to the AWS CodeBuild project that invokes Packer to create an AMI. The pipeline I used has two stages: a source stage and a build stage. The source stage watches the Git repository for new commits and then invokes the build stage. The build stage kicks off the AWS CodeBuild project which uses the Packer container I created to build my new AMI.

To create the AWS CodePipeline pipeline, deploy the codepipeline-role.yaml AWS CloudFormation template and then the codepipeline.yaml AWS CloudFormation template using either the AWS web console to the CLI.

###Building an AMI At this point to make the pipeline work all you need to do is to commit the files packer.json and buildspec.yml to the root of the AWS CodeCommit Git repository. Within a few seconds the source stage of the pipeline will notice the commit, package up the files into an S3 bucket and invoke the build stage to actually create the AMI.

Note that you will need to edit the packer.json file to reflect the AWS Region you are using and the base AMI. You can omit the “vpc_id” field if the region you are using still has it’s default VPC. If, like me, you don’t have a default VPC anymore then you can deploy the vpc.yaml AWS Cloudformation template to create a VPC and use the VPC ID of your new VPC in packer.json.

###Extra Credit Once the basic AMI Bakery pipeline is up and running there’s lots of enhancements you could make, here’s some ideas:

If you are creating a VPC just for Packer, you will end up paying for the Internet Gateway. To avoid this you could create two additional pipeline stages, one to create the VPC and one to tear it down.

Pipelines can be configured to send messages to an AWS SNS topic when they complete. You could write an AWS Lambda function to listen for these messages and then trigger another pipeline or build project (in a different account) that bakes another AMI based on your newly created AMI. We’re looking at doing this to allow one team to manage the base operating system AMI that is then used by application teams to build their own AMIs.

You could create extra stages in the pipeline to perform automated testing of your newly baked AMI, to add a manual approval