We were sitting on an aging set of servers and limited resources for hosting our own software. We were growing faster than our capacity to service new hardware.
We hate screwing around with switches, routers, cables. And Q9 is loud, and cold. And sometimes really hot.
And Q9 was costing us thousands a month, a number that we wanted to minimize, not expand on with new req's.
Buy new hardware to meet our growing demand, or find some other way to deliver our stuff to the world.
Use cloud computing.
Either rent some virtual infrastructure somewhere or build on top of a platform.
We had recently built a new component for Agility called UGC which is basically a set of APIs invoked from our customers' websites. We hosted this service at Q9 for a few months, but after watching our servers get pinned at 100% on Fridays at 4pm when traffic started ramping, we got serious about cloud.
We started an experiment.
We put UGC on Azure in beta and used it internally. It performed beautifully, and deployments were SO EASY. We were forced to make decisions in the application architecture that made us do things in much smarter ways: everything is load balanced; everything is just an API, which is forwards and backwards compatible with all versions; everything that can be cached, is.
We did run into a couple of hitches in early 2011.
9:30am, on a company ski trip, error messages start coming in on my phone. I call Azure support and the signal is bad (I am on a ski hill north of Ajax). Nobody can figure what's going on. The deployment is down all day until I get home and do a new deployment. Of course, I'm pissed. Where the hell was Microsoft Support on this?
Next day I get a phone call from an engineer with an explanation of the issue and a workaround. Turned out that there was a bug with disk quotas and our IIS logs exceeded 4gb or something and weren't getting cleaned up automatically like they should have been. So, we had some downtime, but we learned a lesson.
We decided to go with Azure: Windows Azure Compute for the VMs, SQL Azure for databases, Blob Storage for persistent files.
Rework the rest of our multi-tenanted systems and website hosting environment to work on Azure.
Do it in three months - because by this time our lease was up for renewal at Q9.
Actually, we started about a year earlier - not long after we got UGC running on Azure - prototyping website and application hosting environment logic that would allow us to use the Azure VMs that are created as part of a standard deployment and put whatever IIS 7 apps on them that we wanted.
We started with a beta environment in March that was part of our SXSW marketing push. Folks could come onto our website and create their own managed site, with hosting, UGC, managed content, everything, by filling out their name and email address. 3 fields, submit, website created.
Holy smokes. This is powerful stuff!
Hundreds of people created websites in the first few months.
We had to get this thing finished up and move everything else over to the cloud before our servers ran out of gas.
First, we had to finish all the application hosting work and do a proper beta. Also, we needed a backup and recovery plan. Our SQL Server dbs were all being backed up using a combination of partial and full backups based on transaction logs. In a way, SQL Azure negates that need - backups are needed more for application failure now, not hardware failure - so we built a set of backup an restore routines based on BCP, which works a treat.
Now, we have the ability to host all of our apps and API end-points in a customized Web Role deployment, which can be quickly scaled up or down as necessary. We monitor the apps and the deployment using an HTTP "heartbeat" call that tells us what's going in inside the VM.
One by one, we started to move over our customer data to SQL Azure, and to migrate their hosted websites to the new platform. Database migration was done using a simple BCP command line, and it allowed me to log and monitor the transfers in fine detail. The website migration was a tad trickier, as we were moving off IIS 6 and onto IIS 7.5.
Interestingly enough, because Amazon S3 was made available so early, we’ve been using it for several years. We are only just now engineering a change to allow our customers to store static website files in Blob storage, delivered via a custom CDN endpoint (more on that later). It's cool to have a system designed to live in separate places, all accessed via APIs. This is the way of the world now.
SQL Azure as a database service has been a bit of a challenge, to say the least. We didn't have to make a ton of changes to our schemas in order to put it in Azure; our multi-tenant system was already fairly well-tuned, but a combination of service performance issues in SQL Azure have lately led us to do a slew of performance engineering and tweaking that I would have looked to save for another time. It's always better to do that kind of stuff when you aren't under pressure, but we couldn't leave our customers hanging, so we've been pressing forward with quite a few changes to help make things better.
We have a ton of ideas here, and some stuff is obvious. Things like geo-located services (beyond CDN) and global fail-over using Azure Traffic Manager are no brainers and something where Azure has a great emerging feature set. Other stuff is going to be based on customer feedback, and feeding the needs of our growth. Stuff like easier deployments of customer code, easier and faster account sign-ups, online self-service and subscription payments, continued customer support.
I write "tentative" here because, as always, the Agility system is a thing in motion, one that will always progress and move forward. Windows Azure is the same way - we're onboard and moving with it.
The cloud, as I see it, is our chance to implement technology in a way that just works. It's got to be simple, it’s got to be easy, and it's got to be reliable. With our latest move, we're getting closer to where we want to be.