Prior to Beanstalk, if you wanted to create an 'elastic' web application that scaled automatically depending on its resource needs, you would need to manually provision an elastic IP, a load balancer (maybe), and build some logic into your web application to know when to spin up new machine images to handle the load. (If I've lost you so far, you might as well move on to some lighter reading).
Beanstalk is profound in that it creates this automatically managed environment for you, and lets you simply upload your web application to the "cloud." Beanstalk monitors the health and load of your application, and along with some features you can tweak, will know when to spin up new machine images to handle the load.
If this sounds cool to you, its because it is! Read more here: http://aws.amazon.com/elasticbeanstalk/
As of the writing of this blog, Elastic Beanstalk only supports Java web applications, and they deploy on Amazon's own flavor of Linux in a Tomcat 6.x container. Good news! This is exactly the environment I prefer. Those of you who like to deploy on Windows are out of luck for now, but I hear Amazon is working on Windows server images for Beanstalk. You might even be able to roll your own after looking at one of their Linux images.
Ok, so if this is so cool, why would I need to make my own customized machine image (AMI)? Well unless a vanilla Linux server environment has everything your web application needs, you may need some extras. For example, you may want to have some custom packages installed on the server, like Image Magick or server side log or security monitoring. These things are easy to configure on your own dedicated servers, but in Beanstalk land, every machine image that boots is vanilla, unless you customize it. This article exists because it took me a while to figure all this stuff out, so now my hard work helps you out.
For my latest project, I needed some custom software and services to run on my Linux machine images, and you simply can't install software packages, compile code, start/stop services from a Java application (not easily anyway). So that leaves me with one option: roll my own image and use it to run my Beanstalk app.
The first thing you might consider is to just launch your Beanstalk app, SSH into the server and customize it on the fly, then bundle the disk image. Nice concept, but no go. If you customize your disk image while it is running in a Beanstalk environment, bundling it simply doesn't work and you get a non-bootable machine image.
No, in fact the way to do this is to spin up an EC2 instance manually, using one of Amazon's beanstalk AMI's. I used a 64-bit image ami-100fff79. Launching in EC2 as opposed to Beanstalk means that just the core services get launched, and it makes it easy for us to customize, and most importantly, bundling it to your own custom AMI actually works.
Go through the normal steps of launching an EC2 instance, create a keypair (newbies read http://chris-richardson.blog-city.com/amazon_ec2_keypairs_and_other_stumbling_blocks_1.htm), and SSH into your instance.
Now one of the things I needed on my custom machine image is a common location to mount an Amazon S3 bucket. Why? Because my web application needs temporary storage, and there may be multiple machine images running at the same time, so they all need access to the same temporary storage. For that is a nifty Linux tool called s3fs which lets you mount your Amazon S3 bucket as part of your file system.
The problem I ran into was that Amazon's Linux machine image doesn't have much of the stuff I need to compile custom code, so I was forced to install a metric crap-ton of packages just to get basic compilation done, and even then I had issues with dependencies of certain versions of packages that Amazon doesn't make available and you have to compile yourself. Lots of work. Well, lots of work UNLESS you find wonderful people who work hard and post their instructions and benefits (kind of why I am doing this, ya know?). One such honorable person is Matthew Stump. He figured out all the packages needed to compile s3fs and packaged them up into RPMs that install nicely on Amazon's Linux AMIs. Check them out here: http://eclecticengineer.blogspot.com/2011/03/amazon-ami-linux-rpms-for-s3fs.html.
Quick note: When you install your own packages, you may need to turn off yum's gpg checking. To do that, edit /etc/yum.conf, and set gpgcheck to 0.
Ok back to the main topic:
Now that you have installed and configured all your custom packages, and you are happy with your environment, you need to save it to your very own AMI to use with Beanstalk. I found out at this point that this is not so easy. Yes, yes, the nifty tools that are available for Eclipse are easy for bundling windows images, but they simply don't work in this case for the Linux image I want. You have to do it the old fashioned way, with the command line.
In order to save your own AMI, you need an Amazon S3 bucket. Be smart, create a separate bucket for the disk image so you don't accidentally overwrite or delete part of it later. Also, make sure your bucket name is all letters and numbers, no underscores. For some reason underscores caused me grief when using Amazon's API tools and caused "v2 compatibility" bucket warnings. Safer just to use no underscores.
Update on 3/19/2011 - I found out that the newest Amazon AMI's already have the API tools installed, so this saves you some time. They are all located in
You will also need to have your X.509 cert. When you sign up for Amazon Web Services, they give you an access key id and a secret id, but then you can create an X.509 cert. Do it -- you will need the private key (pk) and certificate (cert) to create your own AMIs. Once you have those files, you will need them on your running EC2 instance. I installed ftp (yum install ftp) so I could connect out to another account and grab my certs. Once I had them, I put them in the /mnt directory on my running EC2 instance. Get them on the server any way you prefer.
Now comes the magic, here is the consolidation of all my effort: the command line entries that actually do the work.
Notes: The reason I use "nohup" is so that this runs in the background, so even if I log out, or the ssh connection gets dropped (happened to me many times), it will keep running until it is done. It generates a text file in your user directory called "nohup.out" that has all the output of the process. The pk-XXXX.pem and cert-XXXX.pem are the X.509 certs you got from the AWS security credentials page and that you put on the server. The ######## at the end is your Amazon account ID. If you log into your AWS account, it is displayed with dashes, like this: ####-####-#### (where #'s are numbers of course). The number you use in this command is the same number, just without the dashes. I know it's confusing.
This can take quite a while to complete in my experience if you are using one of Amazon's small EC2 instances with limited CPU. Much faster on a normal or larger instance.
You MUST wait until this process finishes until you move on. Check the nohup.out file for progress. Errors will be obvious. Successful completion will be obvious, and will result in many image.xxx files in the /mnt directory.
2. Upload the image files to your S3 bucket: