Doing Backups in AWS

This text is about doing backups for data already existing in AWS, not for outside data, although some methods apply for both cases. But let’s start from the beginning:

What Data?

Your data can be located on EC2 nodes (virtual servers) or you may be using some dedicated database service such as RDS. The dedicated services have the backup functionality built-in already, with settings easily accessible through the interface. I won’t deal with those but rather with the “raw” data you may have on a node.

The data on the node falls in 2 categories, or can be looked over from 2 different perspectives:

  1. When one wants to capture the “system state” at a certain point in time. This perspective does not consider the data composition, but the functionality that is being captured for use at a later date as a known good fallback point.

  2. When one wants to get the state of a specific subsystem (e.g. a subset of the local storage, a subset of the local database). This is the “classical backup” as it is widely known.

Capturing State

AWS offers full support for taking snapshots of volumes:

AWS Volume Snapshot example

One does not need to only use the interface; all the functionalities are available programatically. One may also want to look over Boto Python library.

Classical Backup

One can store files through programatical means (e.g. from cron-based scripts to full fledged backup software that runs on a schedule) in the Amazon Cloud to the following services:

  • Simple Storage Service (S3): this is the easiest to use as it offers instant storage, instant retrieval and also versioning (e.g. you may mirror some directory contents on the secure storage at various points in time). It is not a cost effective method of storage for huge amounts of data (multiple terabytes) over long periods of time.

  • Glacier: this is the equivalent of the tape storage. The retrieval is not instant (one must schedule such retrieval in advance). It does not support versioning by default. It is 3-4 times cheaper than S3, though.

  • A dedicated EC2 node (or multiple nodes organized as a backup storage cluster): this is not cost effective but may work in certain scenarios (e.g. live data mirroring).

  • A dedicated database in RDS: this is far from cost effective but is the solution if one wants to use some existing backup software that can store data to a database only.

That was my introduction on doing backups in AWS. Thank you for your read!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.