Thursday, February 19, 2009

Herd It on Amazon Web Services

I've spent the past month figuring out all the details and complications of hosting my Facebook music annotation game. Herd It, on Amazon Web Services. We figured out that one of the reasons why the game didn't work well when more than 5 people connected was lack of bandwidth on our server to serve the Flash components. Also, we want to be ready for when Herd It becomes the next Desktop Tower Defense so AWS seemed like a great solution. It was an omen that, on the day that I was debating whether or not to bother to try to figure it all out, a senior manager from AWS gave a talk at UCSD...

Anyway, now that I've figured it all out, I think that AWS is great. However, there are a lot of hurdles to overcome in getting it to work so hopefully this will benefit someone (maybe even me, when I forget what I did).

Step 1 - Register for an AWS account.
If you've ever bought anything Amazon, this is as simple as adding AWS to your existing account. In particular, you will need 2 of their (many) web services:
EC2 (elastic compute cloud) - this is the "cloud" of servers that does all the processing.
S3 (simple storage service) - this is the storage bucket where you will keep all your data.

Step 2 - Figure out EC2.
EC2 works as follows:
You create an AMI - an "Amazon machine image" which is basically a complete copy of the OS, programs and data of the machine that you want to run in Amazon's cloud. Imagine you wanted to backup your computer's entire hard disk so that you could reconstruct the entire system - this is what you would need. You will replicate this image on one or more of Amazon's cloud machines.

I highly recommend following the AWS tutorial. It covers everything you'll need to know and doesn't have any distracting details. You will learn how to use images that Amazon have pre-made, how to set them up, how to change them, how to save them, and how to kill them.

Step 3 - Create your own AMI
For this, I started with the most current Ubuntu AMI at alestic.com. Some nice guy (Eric Hammond) has created a bunch of basic AMI's that have nothing more than a simple OS. From here, you will need to install all the programs that you're going to need. As someone who wasn't very familiar with Unix administration, this seemed daunting but it was surprisingly easy and I like Ubuntu a lot now. For example, this page shows you that, by typing 4 lines, you can get an Apache web server and PHP running (this is all I needed for Herd It). You will also need to copy all your data onto the AWS machine using FTP or scp (for example, I copied all the PHP and Flash files that make Herd It work).

Once you've got everything on the AMI running as you want it, you'll need to bundle your AMI and copy it into your S3 bucket. Again the AWS tutorial covers all of this.

Step 4 - Elastic DNS
This was the trickiest part... If you only want to run a single instance then you don't need to worry about this. But, the whole point of AWS is to let you create many instances to power your new web app that's going to take over the world. To achieve this, I found some help from this page but there was still a bit of work to do:

Step 4a - Register your domain.
There are a million places where you can get the domain "mykillerapp.com" or whatever.
I got www.herdit.org for $9/year from NameCheap.com
(I just discovered that, of course, there is already a site a "mykillerapp.com" and that it's a sweet applied maths quiz! For the rest of this tutorial, I'll just refer to my domain: herdit.org).

Step 4b - Set up a DNS forwarding service.
The domain name of all your new EC2 machines will be something like
http://ec2-a-bunch-of-numbers.compute.amazonaws.com/ and
http://ec2-more-different-numbers.compute.amazonaws.com/
In order for these to all map to your new domain name, herdit.org, you need to set up DNS forwarding. For this, you need a DNS service provider. You domain name service may provide this but, whether or not it does, ZoneEdit is a free service that gets the job done. You will need to transfer the DNS from your domain name provider and set up ZoneEdit (or whatever DNS service you use) to handle your new domain (it may take a day or two for these changes to register).

Step 4c - Tell you AMIs to register with your DNS service
Once your DNS service is running, you want it to forward requests for your domain ("herdit.org") to the EC2 machines ("ec2-0112358132134.amazon.com", etc.). To do this, you need those EC2 machines to tell the DNS service that they are ready.

The DNS registration works using a program called "ez-ipupdate" that you can install on your (Ubuntu) AMI by typing:

sudo apt-get install ez-ipupdate

Now all you need to do is to get ez-ipupdate to run whenever you start a new instance so that you don't have to log in manually. The Spatten Design blog has a good post on how to do this using Ruby. However, since I don't use Ruby, I wrote an init.d script that you will run when the instance starts. Copy the following and save it on your AMI as '/etc/init.d/update-dynamic-dns'

#!/bin/sh

### BEGIN INIT INFO
# Provides: update-dynamic-dns
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Update dynamic DNS on startup
# Description: Uses ez-ipupdate to send the current Dynamic IP address
# to ZoneEdit Dynamic DNS provider
### END INIT INFO

# Author: Luke Barrington <lukeinusa@gmail.com>

DYNAMIC_DNS_CONFIG_FILE=/etc/ez-ipupdate/dynamic_dns.yml
AMAZON_INSTANCE_DATA_ADDRESS=http://169.254.169.254
API=latest

# Read current instance URL from Amazon service
IP=`curl $AMAZON_INSTANCE_DATA_ADDRESS/$API/meta-data/public-ipv4/`
echo "Instance Dynamic IP Address = $IP"

SERVICE=`sed -n -e "s/^service:[ ]*/\l/p" $DYNAMIC_DNS_CONFIG_FILE`
USERNAME=`sed -n -e "s/^username:[ ]*/\l/p" $DYNAMIC_DNS_CONFIG_FILE`
PASSWORD=`sed -n -e "s/^password:[ ]*/\l/p" $DYNAMIC_DNS_CONFIG_FILE`
HOST=`sed -n -e "s/^host:[ ]*/\l/p" $DYNAMIC_DNS_CONFIG_FILE`

case "$1" in
start)
echo "Using dynamic DNS service = $SERVICE"
echo "Connecting with username = $USERNAME"
echo "Mapping IP to host = $HOST"

# ZoneEdit server name has changed since ez-ipupdate was last built
if [ "$SERVICE" = 'zoneedit' ]; then
eval "ez-ipupdate --address $IP --service-type $SERVICE --server=dynamic.zoneedit.com --user $USERNAME:$PASSWORD --host $HOST"
else
eval "ez-ipupdate --address $IP --service-type $SERVICE --user $USERNAME:$PASSWORD --host $HOST"
fi
;;
*)
echo "Usage: update-dynamic-dns start"
;;
esac


You will also need to create a file at '/etc/ez-ipupdate/dynamic_dns.yml' (or whatever you call it in the script above) that contains the following:


# service should be one of the services supported by ez-ipupdate.
# Possible values: null ezip pgpow dhs dyndns dyndns-static
# dyndns-custom ods tzo easydns easydns-partner
# gnudip justlinux dyns hn zoneedit heipv6tb
# (The above list is from man ez-ipupdate)
service: zoneedit
username: YOUR DNS SERVICE USERNAME
password: YOUR DNS SERVICE PASSWORD
host: YOUR HOST NAME (e.g.,
herdit.org)


Finally, you can run the update-dynamic-dns script by typing:

./etc/init.d/update-dynamic-dns

To register this init.d script to run automatically at startup, use this command:

update-rc.d update-dynamic-dns defaults

Now, as soon as the EC2 machine boots (well, after a few minutes), it should register itself with your DNS service and tell it to send requests for "herdit.org" to its address (e.g., ec2-123456789.amazon.com). The cool thing about ZoneEdit (or any DNS service that has "round robin" DNS) is that, if multiple machines all register to the same host, the DNS service will send requests to each one in turn. This will spread your millions of users across all the AMIs that you run.

At this stage, you will want to bundle up the AMI again. Now you are ready for Step 5...

Step 5 - Try and take over the world
Run hundreds of instances, pay thousands of dollars to Amazon, get millions of users, sell your site for billions of dollars.



Notes and next steps
Now that I''ve set all this up, we are testing Herd It to see how it can handle the load of many simultaneous users. Herd It users a Java server to coordinate everything (via XML events) and saves all the info in a MySQL database. These are both still running on my local server. I plan to put them on AWS sometime as well and I expect that this tutorial will help with that.

There are apps out there that can monitor your site's traffic to automatically create or kill new instances based on your traffic but, for the moment, I will be monitoring it manually.


Now, after all that work, why not go and play Herd It?!