Skip to main content

Rackspace Cloud Servers: what happens when the host fails

When developing applications for the cloud everybody knows (or should know) that a host, network or disk (in short any resource) failure is not an exceptional event but a rather common one. A resource failure becomes a 'normal', common event in the application lifecycle like the occasional bug.

The Amazon approach is that some services (like databases) come with a certain degree of resiliency built-in while others (i.e. EC2 instances) are expected to fail relatively frequently and it is left to the developer to install backup, redundancy and availability countermeasures.

My understanding is that other providers, like Rackspace, have instead a more traditional approach and will automatically restart failed virtual servers in case of host failure. If the failed cloud server image cannot be recovered then it will be bootstrapped from the most recent backup. This means that, depending on the requirements, one could move a traditional application to the cloud without having to worry too much about backups and high availability as they are, to a certain degree, built into the offering.

Yesterday I had proof that Rackspace does, in fact, do exactly what it says on the tin:

I keep a small cloud server instance running on Rackspace for testing purposes. I have setup daily backups because they come (almost) free with the package and they're dead-easy to setup: just a couple of clicks on the web control panel.
Yesterday Rackspace sent me an email saying that the host hosting my instance 'became unresponsive' (that was the exact wording) and was rebooted. After half an hour I received another email stating that the host suffered an hardware failure and my cloud server had been shut down and was in process of being moved to a new host.
Three and a half hours later another email informed me that my server was up and running again, which I could confirm by logging in. The cloud server kept its ip, luckily did not lose any data and was just moved to a new host.
Note that I was on the road coming back from holidays while all this happened and I did not have to lift one finger, well, except for logging into the server at the end.

Comments

Anonymous said…
Very nice information. I really thank you fro sharing this. I have you bookmarked to check out new stuff you post.
Cloud Server

Popular posts from this blog

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post. Btw: the recovery, depending on db size and disk speed, is going to take long. The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux. The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking. Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd in

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive. Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

How to automatically import a ZFS pool built on top of iSCSI devices with systemd

When using ZFS on top of iSCSI devices one needs to deal with the fact that iSCSI devices usually appear late in the boot process. ZFS on the other hand is loaded early and the iSCSI devices are not present at the time ZFS scans available devices for pools to import. This means that not all ZFS pools might be imported after the system has completed boot, even if the underlying devices are present and functional. A quick and dirty solution would be to run  zpool import <poolname> after boot, either manually or from cron. A better, more elegant solution is instead to hook into systemd events and trigger zpool import as soon as the devices are created.