Rackspace Cloud Servers: what happens when the host fails

When developing applications for the cloud everybody knows (or should know) that a host, network or disk (in short any resource) failure is not an exceptional event but a rather common one. A resource failure becomes a 'normal', common event in the application lifecycle like the occasional bug.

The Amazon approach is that some services (like databases) come with a certain degree of resiliency built-in while others (i.e. EC2 instances) are expected to fail relatively frequently and it is left to the developer to install backup, redundancy and availability countermeasures.

My understanding is that other providers, like Rackspace, have instead a more traditional approach and will automatically restart failed virtual servers in case of host failure. If the failed cloud server image cannot be recovered then it will be bootstrapped from the most recent backup. This means that, depending on the requirements, one could move a traditional application to the cloud without having to worry too much about backups and high availability as they are, to a certain degree, built into the offering.

Yesterday I had proof that Rackspace does, in fact, do exactly what it says on the tin:

I keep a small cloud server instance running on Rackspace for testing purposes. I have setup daily backups because they come (almost) free with the package and they're dead-easy to setup: just a couple of clicks on the web control panel.
Yesterday Rackspace sent me an email saying that the host hosting my instance 'became unresponsive' (that was the exact wording) and was rebooted. After half an hour I received another email stating that the host suffered an hardware failure and my cloud server had been shut down and was in process of being moved to a new host.
Three and a half hours later another email informed me that my server was up and running again, which I could confirm by logging in. The cloud server kept its ip, luckily did not lose any data and was just moved to a new host.
Note that I was on the road coming back from holidays while all this happened and I did not have to lift one finger, well, except for logging into the server at the end.

Popular posts

Mirth: recover space when mirthdb grows out of control

Buffett on bad news

On Quantity