Automate the Backup and Restore of Cloud Instances with Snapshots

5 min read

Learn how to use the IBM Cloud CLI and Terraform for instance snapshots.

A snapshot is a point-in-time copy of volume. When a volume snapshot is requested, the volume contents at a single moment are identified. Writes performed while the snapshot is being persisted will not affect the snapshot contents. The snapshot of the boot volume will capture all the blocks required to provision future instances.

New instances can be provisioned from a boot snapshot or a new data volume can be restored from a snapshot. This blog describes how to create the instance, take snapshots of the volumes and then create a new instance from the snapshots. The basic steps are captured in this diagram:

This blog describes how to create the instance, take snapshots of the volumes and then create a new instance from the snapshots. The basic steps are captured in this diagram:

It also describes some Linux commands for handling partitions (fdisk), file systems (mkfs), mounting (mount) and device unique identifiers (blkid).

Using automation to avoid repetitive steps

Going through this process for one instance with one volume, one time, is okay to start, but you may want to automate the steps so you can take the snapshots automatically. You can then adjust these scripts to fit into your CI/CD pipeline, backup schedule, automated testing environment, etc.

Once you go through the steps manually, it becomes clear that all those steps in the IBM Cloud console can be automated with their command line counterparts. Equipped with IBM Cloud CLI and Terraform, I decided to write a set of shell scripts and Terraform templates to prove it.

The files can be found in the Git repository. Note that you will find more VPC examples in other folders in this repository. These examples are companions to our VPC tutorials.

The scripts under vpc-snapshot/ do the work:

  1. Use Terraform to create the VPC, network resources, data volumes and instance.
  2. Create snapshots of the boot volume and the data volumes.
  3. Restore snapshots into a new instance with associated new data volumes.
  4. Test the restored instance.
  5. Delete the restored instance, restored volumes, snapshots, etc.
  6. Use Terraform destroy to delete the original instance, data volumes and network resources.

Running the automation scripts

To make it easier to understand what happens, I created several scripts mapping to the high-level steps:

$ git clone https://github.com/IBM-Cloud/vpc-tutorials
$ cd vpc-tutorials
$ cd vpc-snapshot

$ ls -1 0*
000-prereqs.sh
010-terraform-create.sh
020-snapshot-backup.sh
030-snapshot-restore.sh
040-snapshot-test.sh
080-snapshot-cleanup.sh
090-terraform-destroy.sh

Instructions on how to run the scripts can be found in the folder's README. You will need the right permissions to work with VPC. Try it in the IBM Cloud Shell or verify the prerequisites on your development computer: IBM Cloud CLI, Terraform, IBM Cloud Provider for Terraform and JQ. You will also need an SSH key, which you should be familiar with if you are working with virtual servers.

Once you have configured your environment with the local.env file, you can start running the scripts ./000-prereqs.sh./010-terraform-create.sh, etc. If you change the values of local.env, start from scratch with 000.  Below I'll explain what is happening step by step.

Step by step

After executing the 000-prereqs.sh and 010-terraform-create.sh, navigate to Virtual server instances for VPC in the IBM Cloud console and find your instance. I kept the default PREFIX=snaps and got the instance "snaps," and it had the data volumes snaps0 and snaps1:

I kept the default PREFIX=snaps and got the instance "snaps," and it had the data volumes snaps0 and snaps1:

Navigate through the "snaps0" volume name and notice the volume and Snapshots panel:

Navigate through the "snaps0" volume name and notice the volume and Snapshots panel:

You will see that it is empty — no snapshots yet. The script, ssbackup.sh, takes a snapshot of the boot and data volumes but requires an instance id as a parameter. The 020-snapshot-backup.sh script finds the instance id from the Terraform output before invoking ssbackup.sh

After executing 020-snapshot-backup.sh, navigate back to the volumes in the IBM Cloud console and observe the snapshot created for each volume. This was my snaps0:

After executing `020-snapshot-backup.sh`, navigate back to the volumes in the IBM Cloud console and observe the snapshot created for each volume. This was my snaps0:

It is possible to initialize a new instance from these snapshots. The script, ssrestore.sh, does the heavy lifting but it takes a lot of parameters:

$ ./ssrestore.sh
NAME:
  ssrestore.sh
USAGE:
  ssrestore.sh SNAPSHOT_BASENAME INSTANCE_NAME VPC SUBNET KEY PROFILE_NAME
  SNAPSHOT_BASENAME: initial characters of all snapshots previously created with ssbackup
  INSTANCE_NAME:     initial characters of all volumes and instances created by this script
  VPC:               ID of the VPC for the instance created by this script.
  SUBNET:            ID of the subnet for the instance created by this script.
  KEY:               ID of the ssh ke for the instance created by this script.
  PROFILE_NAME:      Name of the profile for the instance created by this script.

Execute the 030-snapshot-restore.sh script. While it is executing, open the script to see how these parameters are obtained from the Terraform output. After the script completes, navigate to Virtual server instances for VPC, then to the  "snaps-restore" instance and check out the volumes created. Take note of the IP address of the restored instance:

>>> creating a floating ip for the instance
>>> floating ip for restored instance "52.118.151.41"

Execute the 040-snapshot-test.sh to verify that the restored instance has the same contents as the original instance and volumes. The next section has the details. A shell script will be run on the restored instance; the output looks like this:

$ ./040-snapshot-test.sh
>>> Testing instance created from snapshot, floating ip: 52.116.143.204
>>> wait for cloud-init to complete
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-1024-kvm x86_64)
...
To check for new updates run: sudo apt update




status: done
>>> verify volumes were restored
test /datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce
test /datavolumes/55570bca-d536-4428-aff2-68e38c0b6975
Connection to 52.116.143.204 closed.
>>> success

You can clean up in two steps, but you may want to first take a look at the next section and learn more about Linux volumes. Clean up the restored instance, volumes, snapshots, etc. with 080-snapshot-cleanup.sh. Then clean up the Terraform created resources with 090-terraform-destroy.sh.

Linux commands for partitions (fdisk), file systems (mkfs), mounting (mount) and device unique identifiers (blkid)

This section describes the details of partitioning disks, making file systems, etc. Study, skim or skip depending on your needs.  

The initial instance configuration was completed in the 010-terraform-create.sh step in a cloud-init user-data script. Check out the user_data in the terraform/main.tf instance configuration:

resource ibm_is_instance "main0" {
  vpc             = ibm_is_vpc.main.id
  user_data = file("${path.module}/user_data.sh")
  ...
}

The contents of user_data is expanded from the terraform/user_data.sh file:

$ cat terraform/user_data.sh
#!/bin/bash
set -x
sleep 60; # disks may not be mounted yet.... TODO

# step through the disks in /dev/disk/by-id and find just the data (unformatted) disks
# partition, make a file system, mount, add uuid to fstab, add a version.txt file

cd /dev/disk/by-id
for symlink in $(ls -1 virtio-* |sed -e /-part/d -e /-cloud-init/d); do
  disk=$(readlink $symlink)
  disk=$(realpath $disk)
  mount_parent=/datavolumes
  mkdir -p $mount_parent
  chmod 755 $mount_parent
  if fdisk -l $disk | grep Linux; then
    echo Disk: $disk is already partitioned
  else
    echo Partition
    # the sed is used for self documentation
    sed -e 's/\s*\([\+0-9a-zA-Z]*\).*/\1/' << ____EOF | fdisk $disk
n # new partition
p # primary partition
# default - partition #
# default - first sector
# default - last sector
w # write the partition table
____EOF
    echo mkfs
    disk_partition=${disk}1
    yes | mkfs -t ext4 $disk_partition
    uuid=$(blkid -sUUID -ovalue $disk_partition)
    mount_point=$mount_parent/$uuid
    echo add uuid $uuid to /etc/fstab
    echo "UUID=$uuid $mount_point ext4 defaults,relatime 0 0" >> /etc/fstab
    echo mount $mount_point
    mkdir -p $mount_point
    chmod 755 $mount_point
    mount $mount_point
    cat  > $mount_point/version.txt << ____EOF
    version=1
    initial_disk_partition=$disk_partition
    mount_point=$mount_point
____EOF
    echo wrote version to $mount_point/version.txt
    cat $mount_point/version.txt
    sync;sync
  fi
done

The /dev/disk/by-id directory contains the volumes identified by their volume attachment ids. It is not possible to predict these ids, so any disks that have not been previously partitioned are initialized.

Followthe instructions in the documentation for Using your block storage data volume (CLI). The following steps are automated.

The fdisk command creates a new, primary partition. The rest of the parameters are defaulted. The sed script provides a self-documenting way to capture the inputs to the fdisk interactive command. The results are a new partition ending in "1" that is about as big as the disk. One of my volumes was /dev/vde. The resulting primary partition was /dev/vde1.

The mkfs command makes a file system on the partition. This is required before the device can be mounted. The command was mkfs -t ext4 /dev/vde1. A unique ID, UUID, for the device is also available after the file system is created and is displayed with the command blkid -sUUID -ovalue /dev/vde1.

One can mount /dev/vde1, but it is also possible to mount the disk by the UUID. After the mount instructions are put into the /etc/fstab, it is possible to mount just using the directory name, which also happens to match the UUID. For me, this was: mount /datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce.

Finally, some data is put into a version.txt file.

If you ssh to the instance created by Terraform, you can see the /etc/fstab:

$ cd terraform
$ terraform output
floating_ip = "52.118.150.30"
instance_id = "0717_31302db7-2fdc-433b-a0a8-e08dfb24805c"
key = "r006-267ead51-fbd5-435c-a92b-c9cac0d217d6"
profile = "cx2-2x4"
resource_group_id = "27532db3eeab40c4a7f4bd751885e81f"
subnet_id = "0717-68a0a3ea-dd1d-46bd-934e-9ae712e57bd1"
vpc_id = "r006-965865da-0691-4960-9542-f19cbee691da"
z = {
  "ssh" = "ssh root@52.118.150.30"
}
zone = "us-south-1"
$ ssh root@52.118.150.30
...
root@snaps:~# cat /etc/fstab
LABEL=cloudimg-rootfs        /         ext4        defaults        0 0
LABEL=UEFI        /boot/efi        vfat        defaults        0 0
UUID=eaa95c2f-7b08-4aa8-a893-052d0a203fce /datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce ext4 defaults,relatime 0 0
UUID=55570bca-d536-4428-aff2-68e38c0b6975 /datavolumes/55570bca-d536-4428-aff2-68e38c0b6975 ext4 defaults,relatime 0 0

While on the instance, you can see the results of the cloud-init execution. Look in this file for the cloud-init steps. I'm a vi user:

root@snaps:~# vi /var/log/cloud-init-output.log

Check out the version.txt files:

root@snaps:~# cat /datavolumes/*/version.txt
    version=1
    initial_disk_partition=/dev/vde1
    mount_point=/datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce
...

All entries in /etc/fstab will be mounted during instance initialization. This is why the instance restored from snapshots is correctly initialized. This allows the 040-snapshot-test.sh script to verify that the volumes are working.

For me, the restored instance (snaps-restore /dev/vdc1) did not have the same volume devices as the original (snaps /dev/vde1). The /etc/fstab is referencing them by UUID to avoid any problems:

root@snaps:~# df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vde1       10254612   36892   9677100   1% /datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce
/dev/vdd1       11286728   40988  10652692   1% /datavolumes/55570bca-d536-4428-aff2-68e38c0b6975

You can ssh to the restored instance. The output of ./030-snapshot-restore.sh had the IP address:

>>> creating a floating ip for the instance
>>> floating ip for restored instance "52.118.151.41"
...
$ ssh root@52.118.151.41
...
root@snaps-restore:~# df
Filesystem     1K-blocks    Used Available Use% Mounted on
...
/dev/vdb1       11286728   40988  10652692   1% /datavolumes/55570bca-d536-4428-aff2-68e38c0b6975
/dev/vdc1       10254612   36892   9677100   1% /datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce

While you are ssh'd to snaps-restore, check out the data volumes. For example:

root@snaps-restore:~# cat /datavolumes/*/version.txt
    version=1
    initial_disk_partition=/dev/vde1
    mount_point=/datavolumes/eaa95c2f-7b08-4aa8-a893-052d0a203fce
...

Clean up

Remove the snapshots and instance that was restored from the snapshot. Destroy the Terraform resources:

./080-snapshot-cleanup.sh
./090-terraform-destroy.sh

Next steps

Be the first to hear about news, product updates, and innovation from IBM Cloud