Air-Gapped Packer Builds

Introduction

It's pretty common to use Hashicorp Packer to build AMIs (primarily in AWS, but elsewhere too). Briefly, Packer talks to the AWS API to get an EC2 instance to start. In its "build" phase, it tells the instance to run commands, write files and whatever else to setup the operating system as you want it. It then shuts down the instance, takes an AMI from it and cleans up.

If your CI runner runs on the same network as the EC2 instance being used to make the AMIs, then it's all pretty easy - Packer sets up keys and makes SSH connections to the EC2 instance to run commands, write files and so on.

Things get a bit trickier if the CI runner and the EC2 instance are separated by firewalls ("air gapped"). Packer can't communicate with the EC2 instance, and so it may seem like it's now not possible to make AMIs (the same may also appear to be true for AWS Image Builder and other tools).

There is a way to do it though. This article describes how we can use userdata to perform the build for us, completely "hands off" from Packer. We lose some console screen output and debugging capability, but we can make our AMIs quite successfully.

UserData to the Rescue

Packer has features to:

Run UserData when the EC2 instance starts up
Leave the instance running after it is finished doing its "build" steps to customise the server
A "none" communicator (to turn off SSH)

With these features, it's possible for us to put all our customisation steps into UserData, and at the end of that process to shut the system down. We then tell Packer to start it up, and wait for it to shut itself down. When it does, Packer makes the AMI as it normally would, and we're all done.

Here's some sample Packer config (HCL format) to do this:

packer {
    required_plugins {
        amazon = {
            version = ">= 1.2.8"
            source  = "github.com/hashicorp/amazon"
        }
    }
}

source "amazon-ebs" "example" {
    ami_name = "my_new_ami"
    region = "eu-west-1"
    vpc_id = "vpc-xxxx"
    subnet_id = "subnet-xxxx"
    instance_type = "t2.small"

    source_ami_filter {
            filters = {
                name = "some-source-ami-name"
                root-device-type = "ebs"
                virtualization_type = "hvm"
            }
            most_recent = true
            owners = ["xxxxxxxxxxxxxxx"]
    }
    # Turn off SSH connections
    communicator = "none"
    # Packer always makes a security group - we can limit what it can do here
    temporary_security_group_source_cidrs = ["10.0.0.0/32"]
    # Don't shutdown the instance when the 'build' phase is finished
    disable_stop_instance = true
    # Specify user_data to run when the instance starts
    user_data_file = "./user_data.cloud_config"
}

# We do nothing here - it's all in user_data
build {
    name = "example"
    sources = [
        "source.amazon-ebs.example"
    ]
}

The above references a userData script, an example of it looks like this:

#cloud-config

runcmd:
 - sudo apt update
 - sudo apt upgrade -y
 # Do other steps here

 # Tell cloud init to clean so it runs again on next boot. Then shutdown the system
 - sudo /usr/bin/cloud-init clean
 - sudo /usr/bin/shutdown -h now

This example runs really fast and so it almost looks the same as it would if we were doing things traditionally. If your UserData takes (say) 10 minutes to run, then the screen output from Packer looks a bit empty.

This all means Packer also has less control over the build creation process. This becomes a problem if something goes wrong during the UserData build. This can be solved by adding some timeouts. Firstly, Packer has its own timeout mechanism (which is in effect, even if disable_stop_instance is set). It is controlled by environment variables set before running Packer. For example:

AWS_POLL_DELAY_SECONDS: 5
AWS_MAX_ATTEMPTS: 100

Here, Packer will poll every 5 seconds to see if the instance has shutdown yet. It will attempt 100 polls before it times out. When Packer times out, it should clean up properly.

Secondly, it is probably also worth having a timeout on the CI job that runs Packer in the first place. Make this timeout a little longer than Packer's own. If the CI timeout is reached, it'll kill off the Packer process, so obviously Packer won't be able to clean up. You'd need to have a manual clean up job run in that case.

As a side note, in our experience, Packer always leaves a few resources behind if you run it enough times. Things get expensive if that means an EC2 instance gets left on, so a 'manual' clean up job that runs after Packer has finished is usually a good idea on any setup. It's not shown in this example, but you can apply Tags to all the Packer resources it creates, so it's relatively easy to write a script to check for resources with those tags and to destroy them.

Something else to think about may be cleaning up old AMIs. They don't cost a lot to keep, but there's probably no need to have them, their snapshots and other resources hanging about for very long. Having some sort of "AMI cleaner" run after a new AMI is created may also be worth considering.

Conclusions

Packer normally needs an SSH control connection from the runner to the EC2 Instance being built. If that connection isn't possible because of firewalls and air-gaps, then it's still possible to make AMIs using Packer.

Using UserData and some additional config options, it's possible to customise an operating system and make an AMI from it using Packer.

Pre-Emptive can help you get to operations nirvana, can help you make AMIs in challenging environments and much more. Please contact us - we can help you figure out what you need and make it work for you.

Tags: ansible packer ci air-gapped firewall ami aws

Other Pages

Air-Gapped Packer Builds

Introduction

UserData to the Rescue

Conclusions