Tag Archives: script
Doing Backups in AWS

This text is about doing backups for data already existing in AWS, not for outside data, although some methods apply for both cases. But let’s start from the beginning:

What Data?

Your data can be located on EC2 nodes (virtual servers) or you may be using some dedicated database service such as RDS. The dedicated services have the backup functionality built-in already, with settings easily accessible through the interface. I won’t deal with those but rather with the “raw” data you may have on a node.

The data on the node falls in 2 categories, or can be looked over from 2 different perspectives:

  1. When one wants to capture the “system state” at a certain point in time. This perspective does not consider the data composition, but the functionality that is being captured for use at a later date as a known good fallback point.

  2. When one wants to get the state of a specific subsystem (e.g. a subset of the local storage, a subset of the local database). This is the “classical backup” as it is widely known.

Capturing State

AWS offers full support for taking snapshots of volumes:

AWS Volume Snapshot example

One does not need to only use the interface; all the functionalities are available programatically. One may also want to look over Boto Python library.

Classical Backup

One can store files through programatical means (e.g. from cron-based scripts to full fledged backup software that runs on a schedule) in the Amazon Cloud to the following services:

  • Simple Storage Service (S3): this is the easiest to use as it offers instant storage, instant retrieval and also versioning (e.g. you may mirror some directory contents on the secure storage at various points in time). It is not a cost effective method of storage for huge amounts of data (multiple terabytes) over long periods of time.

  • Glacier: this is the equivalent of the tape storage. The retrieval is not instant (one must schedule such retrieval in advance). It does not support versioning by default. It is 3-4 times cheaper than S3, though.

  • A dedicated EC2 node (or multiple nodes organized as a backup storage cluster): this is not cost effective but may work in certain scenarios (e.g. live data mirroring).

  • A dedicated database in RDS: this is far from cost effective but is the solution if one wants to use some existing backup software that can store data to a database only.

That was my introduction on doing backups in AWS. Thank you for your read!

Crazy DevOps interview questions

Who likes interviews? Me neither. Well, depends…

If you get any of the following questions during an interview then either the interviewer did read this one or he’s getting info from the same sources as I am. Either way, let’s get one step forward.


Question 1:

Suppose you log into a node and find the following situation:

# ls -la /bin/chmod
rw-r--r-- 1 root root 56032 ian 14  2015 /bin/chmod

Is it fixable?

… yes. Most of the time. Let’s remember how the executables are being started on Linux: with a system loader, ld-linux-something. Let’s check:

# ldd /bin/chmod 
	linux-vdso.so.1 =>  (0x00007ffdf27fc000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb11a650000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fb11aa15000)

OK, got it:

# /lib64/ld-linux-x86-64.so.2 /bin/chmod +x /bin/chmod

Fun, isn’t it? The follow up question obviously is “what if the loader’s rights are unproperly set” or something similar. The answer to that is that not all the filesystem issues are fixable with easy commands. One may even have to mount the file system on a different installation (e.g. from a Live CD or attach the virtual storage to another, “good” node) and fix things up from there.


Continue Reading →

A small introduction to Chef

Chef Logo

If you know what Chef is, you may skip this one or actually stay for a (hopefully) interesting read. Either way, Chef is a tool that one may use for automating the node configurations – e.g. installed software, their configuration files, system configuration files, NFS mounts and so on.

When interacting with chef, one may realize that there are 3 main components involved:

  • The Chef Server, which is actually just a big data repository.
  • The Chef Client, which is installed and runs on endpoints and does all the “dirty work” such as changing files and installing packages. The client authenticates itself to the server by a public/private key mechanism.
  • The Knife: this is the tool used by the sysadmins to actually do work with Chef.

What is stored in the repository known as Chef Server?

  • Cookbooks;
  • Data Bags;
  • Environment and Node data.

A Cookbook is a small project written in Ruby. This project contains files for:

  • Recipes – these are individual files that contain rules to be aplied on clients (e.g. install a rpm package, deploy a config file from a template). The file contents is plain Ruby code using constructs (calls to libraries) provided by Chef.
  • Attributes – or, better phrased, default attributes (e.g. port numbers, sizes, paths etc).
  • Templates – templatized configuration files with attributes replaced with template variables that are to be initialized in the recipe from default or user provided values.

These small projects are versioned and usually are also maintained – for development purposes – in an external repository (most likely git based). There is a system of includes and dependencies; starting some recipe installation will maybe trigger the installation of a full environment.

The “data bag” is an interesting concept: these are json data pieces stored inside the Chef Server. They can be encrypted, making them suitable for sensitive data such as passwords or private keys.

What is a (Chef) Environment? One may see it as a form of grouping nodes (servers). This allows for having shared, common data for all the nodes within a certain environment and also for running commands to groups of nodes identified by the environment they belong to. The (Chef) Node also has a data record associated with it that may override the inherited settings from the environment. Such data record (also json encoded) contains overrides for the default attributes described before and, on individual nodes, the recipe list to be applied (also called the “run list”).


As I have mentioned before, the effective work is performed with knife. The entire documentation is available on the Chef website:

From a DevOps perspective, the most frequent operations are:

  • Modifying environment and node data;
  • Uploading and downloading cookbooks without dependency checking; for dependencies there is another tool, berks;
  • Running the same shell command on a group of nodes (query based).

Is it a fun tool? It sure is. But you know the saying, “don’t drink and drive”; in this context, due to the impact of individual commands, one must be extra careful (and most likely sober).

That’s it for a small introduction on Chef concepts. Thank you for the read and have a nice day!

Previous Page · Next Page