automated AWS snapshot with Ansible

March 13, 2016

I use AWS infrastructure as a service (IaaS) heavily, both personally and professionally. I have a slew of all different types of instances, doing different types of things. In many cases, I’ll want backups; basically, AWS snapshots. I needed a way to quickly create and manage backups through the use of AWS snapshots in an automated fashion.

Sounds like a job for Ansible - and that’s exactly what I did. I wrote a playbook to handle all of this management. This playbook….

get a list of instances in a region by tag (in this case “backup: yes”),
takes a snapshot of that list with specific tags identifier, instance, and incremental),
finds COUNT number of snapshots,
and deletes the oldest number of snapshots that exceed the specified COUNT.

Bitchin. Let’s do it.

the stuffs - preamble

The usual preamble. I build my playbooks as modular as possible; I wrap up everything, AWS secret and access keys included, into a vault file. This way I have one password file or password to protect. Makes it very modular.

I’m currently using Ansible 2.0.0.2. I’d go to 2.0.1.0, but there’s a nasty little bug with AWS subnets right now. I use two customized modules contained in the "./library" folder:

ec2_remote_facts, for resource “root_device_name”, hopefully this will be added in a future release.
ec2_snapshot_facts, again, hopefully this will be in future releases.

There are two extra variables passed in at run time:

COUNT, the number of snapshots to rotate.
INCREMENT, the variable to define daily, weekly, or monthly backups. This allows us to maintain different backup schedules.

An example run of this play would like like this:

ansible-playbook --vault-password-file ~/.ansible/myvaultpassword -i inventory/localhost awsbackup.yml -e "INCREMENTAL=daily COUNT=3"

One final note - this snapshot only works for instances with ONE volume. If there are multiple volumes, the rotation logic will need to be modified.

Here’s the playbook. Let’s walk through it.

the pretasks

The first thing I like to do is run a few sanity checks, pretasks. I make sure the proper variables have been passed in:

pre_tasks:
  - name: Default to 3 snapshots if COUNT is not defined.
    set_fact:
      COUNT: "{{ default_count }}"
    when: COUNT is not defined

  - name: Verify incremental type (daily, weekly, monthly)
    fail:
      msg: "INCREMENTAL is not defined.  Allowed values are daily, weekly, or monthly.  Exiting."
    when: INCREMENTAL is not defined or INCREMENTAL != 'daily' and INCREMENTAL != 'weekly' and INCREMENTAL != 'monthly'

It’s just a way I like to let myself know what optional variables are needed.

first role - get AWS facts

Take note - I’m a big fan of using ec2_remote_facts as opposed to a dynamic inventory script, like ec2.py. Probably the biggest benefit IMHO is the fact that I don’t have to store AWS credentials in a boto profile, or worry about them lingering in a bash history. So here’s my first step - gather some AWS facts:

---
# tasks file for aws.facts
- name: Gather EC2 facts.
  ec2_remote_facts:
    aws_secret_key: "{{ vault.aws_secret_key }}"
    aws_access_key: "{{ vault.aws_access_key }}"
    region: "{{ vault.region }}"
    filters:
      "tag:backup": "yes"
  register: ec2_facts

- name: Print facts
  debug:
    msg: "{{ ec2_facts }}"

Note the “tag:backup”: “yes”. This is merely a tag for systems we want to backup. It could be anything and any number of tags. We’ll use these facts to not only find the instances to snapshot, but to use this list of instances for our rotation logic.

create snapshots

Pretty straightforward here:

---
- name: Snapshot the instance.
  ec2_snapshot:
    aws_secret_key: "{{ vault.aws_secret_key }}"
    aws_access_key: "{{ vault.aws_access_key }}"
    region: "{{ vault.region }}"
    instance_id: "{{ item.id }}"
    device_name: "{{ item.root_device_name }}"
    description: "awsbackup snapshot taken on {{ ansible_date_time.date }} at {{ ansible_date_time.time }}"
    snapshot_tags:
      Name: "{{ item.tags.Name }}_{{ INCREMENTAL }}_{{ ansible_date_time.date }}"
      identifier: awsbackup
      instance: "{{ item.tags.Name }}"
      incremental: "{{ INCREMENTAL }}"
  with_items: "{{ ec2_facts.instances }}"

Note the tags.

identifier - how I differentiate snapshots made by this playbook.
instance - the instance name. Makes easy to map the snapshot to the instance it’s associated with.
incremental - allows seperation of dailies, weeklies, and monthlies, if necessary.

These tags will be the basis for the rotation logic.

find snapshots

Now, in order to rotate our snapshots, we need to find all instances for this particular incremental set:

---
# tasks file for aws.findsnapshots
- name: Find snapshots.
  ec2_snapshot_facts:
    aws_secret_key: "{{ vault.aws_secret_key }}"
    aws_access_key: "{{ vault.aws_access_key }}"
    region: "{{ vault.region }}"
    filters:
      "tag:identifier": "awsbackup"
      "tag:incremental": "{{ INCREMENTAL }}"
  register: ec2_snapshot_facts

- name: Print snapshot facts
  debug:
    msg: "{{ ec2_snapshot_facts }}"

Now on to the tricky part.

delete snapshots

This took me a while to figure out. For an unknown number of instances, how do I walk each of those and delete the oldest X number, keeping my COUNT number of snapshots? In python it’s easy, a nested loop. You can establish variables and you have a lot of ligc at your disposal. Loop through instances, loop through the volumes, get the snapshots for those volumes, get the last X snapshots and delete them. However, turning Ansible into a programming language isn’t quite as simple. So I cooked up a series of steps to get a list of lists:

Get an ordered list of snapshots of a list of instances, sorted by date created.
Cut each nested list by COUNT, removing the COUNT newest snapshots, giving us the snapshots to delete.
Join the nested list of lists together into a single list, so we can iterate over them and delete them.
Delete the snapshots.

The first thing I needed was an ordered list of snapshots for each instance. I needed to build a list of lists, using the ec2_facts registered variable from the aws.facts role.

- name: Get ordered list of snapshots for each instance.
  set_fact:
    snaps_fact_list:  "{{ ec2_snapshot_facts.snapshots|selectattr('tags.identifier', 'equalto', 'awsbackup')|selectattr('tags.incremental', 'equalto', INCREMENTAL)|selectattr('tags.instance', 'equalto', item.tags.Name)|sort(attribute='start_time')|reverse|map(attribute='id')|list }}"
  with_items: ec2_facts.instances
  register: snaps_fact_list_register

(Pardon the word wrap, the Jinja filter gets a bit long…) Let’s check out the Jinja2 filter. I use a combination of selectattr, sort, reverse, and map:

selectattr for filtering by attribute. We filter all snapshots by the identifier, incremental, and instance name, provide by with_items.
sort for sorting by snapshot creation date.
reverse for newest snapshot first.
map for returning only the snapshot ID.

There’s an extra twist when using the set_fact module with with_items. It doesn’t append to the fact; it overwrites it. The fact will be the last item run. However, if we register the task, We get the result for each iteration. If we debug the register snaps_fact_list_register.results, we’ll get each list of snapshots (truncated the output for clarity):

TASK [aws.deletesnapshots : Debug ordered list of snapshots.] ******************
ok: [127.0.0.1] => {
    "msg": [
        {
            "_ansible_no_log": false,
            "ansible_facts": {
                "snaps_fact_list": [
                    "snap-1bcef459",
                    "snap-5f3fbd0a",
                    "snap-bee3daee",
                    "snap-7f1b833a"
                ]
            },
        ...
        },
        {
            "_ansible_no_log": false,
            "ansible_facts": {
                "snaps_fact_list": [
                    "snap-f4938ab4",
                    "snap-59e46908",
                    "snap-ad9522e8",
                    "snap-7138382d"
                ]
            },
        ...

My COUNT is 3, so the last snapshot should be deleted. The next step, I run another set_fact command which will trim the list of snapshots:

- name: Cut the list by our COUNT variable.
  set_fact:
    snaps_cut: "{{ item.ansible_facts['snaps_fact_list'][COUNT:] }}"
  with_items: snaps_fact_list_register.results
  register: snaps_cut_register

Again, if we debug the register snaps_cut_register.results, we’ll get (truncated output for clarity):

TASK [aws.deletesnapshots : Debug debug cut list.] *****************************
ok: [127.0.0.1] =>
...
    "item": {
...
    },
    "msg": [
        "snap-7f1b833a"
    ]
}
ok: [127.0.0.1] =>
...
    "item": {
...
    },
    "msg": [
        "snap-7138382d"
    ]
}

Now we have a list of the oldest snapshots for each instance. With this method, we can ensure that we always have COUNT number of snapshots, even with an indeterminate number of instances and snapshots.

The final step to the list creation is to join that nested list together:

- name: Join the nested list of snapshots that will be deleted.
  set_fact:
    snaps_joined: "{{ snaps_joined }} + {{ item.ansible_facts.snaps_cut }}"
  with_items: snaps_cut_register.results
  register: snaps_joined_register

And the results:

TASK [aws.deletesnapshots : Debug joined list.] ********************************
ok: [127.0.0.1] => {
    "msg": [
        "snap-7f1b833a",
        "snap-7138382d"
    ]
}

This time, since our set_fact is a series of joined lists, the final interation will include the results of all the with_items. We finally delete the old snapshots:

- name: Delete snapshots in the nested list of snapshots.
  ec2_snapshot:
    aws_secret_key: "{{ vault.aws_secret_key }}"
    aws_access_key: "{{ vault.aws_access_key }}"
    region: "{{ vault.region }}"
    state: absent
    snapshot_id: "{{ item }}"
  with_items: snaps_joined
  register: deleted_snapshots

Success! All instances snapshotted and any extra snapshots are deleted.

summary

Boom, headshot. An AWS snapshot backup system using Ansible. The looping logic was a bit of a pain to navigate through, but this worked out well in the end. Using Ansible to construct nested loops wasn’t as easy as I thought, either it’s difficult or I’m still learning. In the end, it’s a pretty cool setup. Link below.

Ansible awsbackup playbook - https://github.com/bonovoxly/playbook/tree/master/old_format/awsbackup
equivalent Python backup script - https://github.com/bonovoxly/playbook/blob/master/old_format/awsbackup/awsbackup.py

-b

little fluffy IaaS clouds

Stories about cloud infrastructure, automation, DevOps, and other tech.