magpiebrain

Sam Newman's site, a Consultant at ThoughtWorks

Archive for ‘July, 2010’

As part of the day job at ThoughtWorks, I’m involved with training courses we run in partnership with Amazon, giving people an overview of the AWS service offerings. One of the things we cover that can often cause some confusion is the different types of images – specifically the difference between S3 and EBS-backed images. I tend to give specific advice over what type of instance I prefer, but before then some detail is in order.

S3-hosted Images

Instances launched from S3-hosted images use the local disk for the root partition. Once spun up, if you shut down machine down the state of the operating system is lost. The only way to persist data is to store it off-instance – for example on an attached EBS volume, on S3 or elsewhere. Aside from the lack of persistence of the state of the OS, these images are limited to being 10GB in size – not a problem for most Linux distros I know of, but a non-starter for some Microsoft images.

There is also a time delaying spinning these images up, due to the time taken to copy the Image from S3 to the physical machine – the whole image has to be downloaded before the instance can start the boot up process. In practice, popular images seem to be pretty well cached and so that the delay may not be an issue for you. As with everything in AWS, the barrier to entry is so low it is worth testing this yourself to understand if this is going to be a deal breaker.

EBS-backed Images

An EBS-backed instance is an EC2 instance which uses such a device as its root partition, rather than local disk. EBS-backed images can boot in seconds (as opposed to minutes for less-well cached S3 images) as they just need to stream enough of the image to start the boot process. They also can be up to 1TB in size – in other words the max size of any EBS volume. EBS volumes also have different performance characteristics than local drives, which may need to be taken into account depending on your particular use.

The key benefit is that EBS-backed instances can be ‘suspended’ – by stopping an EBS-backed instance, you can resume it later, with all EBS-backed state (e.g. the OS state) being maintained. In this way it is more akin to the slices at slicehost, or a real, physical machine.

There is one out and out downside when compared to S3-hosted images – your charges will be higher. You will be charged for IO, which could mount up depending on your particular usage patterns. You’re also charged for the allocated size of the EBS volume too.

So which is best?

Aside from the increased cost, and the differing benefits of a block-storage rather than local disk in terms of performance characteristics, there is one key downside to an EBS. It encourages you to think of EC2 instances as persistent, stable entities which are around forever. In general, I believe you’ll gain far more from embracing the ‘everything fails, deal with it’ ethos – that is, have your machines configure themselves afresh atop vanilla gold S3-backed AMIs on boot (e.g. using EC2 instance data to download & install your software). This means you cannot fall back into the old thought patterns of constantly patching, tweaking an OS, to the point where attempting to recreate it would be akin to an act of sysadmin archeology.

There will remain scenarios where EBS-backed instances make sense (and those of you using Windows Server 2008 have no choice), but I always recommend that people moving to AWS limit their use and instead adopt their practices to use the S3 – thereby also embracing more of the promise of the AWS stack as a whole.