Cloud Servers – cheap, fast & reliable?


One of the things the cloud has taught us over the past few years is to expect outages. Lots of them. Many of those outages have been pretty big! This is because in the world of public cloud (aka. Infrastructure as a Service), running servers in reliable, physical data centres has been replaced with a new concept; that of running cloud servers in Availability Zones. Which is fine, so long as you understand the implications and can configure your website or application to operate optimally in this type of environment.

Don’t get me wrong, cloud servers have a lot going for them. It’s just that while they solve many problems, they also introduce a few biggies of their own and you really should be across the pros and cons before jumping on the bandwagon. Cloud servers aren’t the silver bullet when you’re looking to host a website, application or other workload – read on to understand why.

Why an Availability Zone (AZ) is not the same as a data centre

Cloud providers are guilty of pushing the myth that an Availability Zone is the same as a data centre. It’s not. A data centre is ultra reliable, an Availability Zone is not.

An Availability Zone is the term given to a collection of physical infrastructure that is managed by the “Cloud Service Orchestration” (CSO) software that underpins your cloud service provider. CloudStack, OpenStack, Eucalyptus, Azure, VMware vCloud and of course Amazon EC2 represent some of the most well known CSO platforms.

The CSO platform itself is a complex piece of software that runs the whole shebang. It interfaces with and manages the hypervisors running on the underlying physical servers, the storage platforms and physical networking to provide you with a virtual environment where you can spin up metered cloud servers and configure virtual networking, firewalls and load-balancing services on-demand. Which is all very impressive, but it does introduce a rather large amount of complexity, single points of failure and the opportunity for it all to go wrong. Very wrong.

In what other form of hosting could something as inane as a single expired SSL certificate bring down an entire hosting provider and hundreds of thousands of customers, worldwide? And that’s just the tip of the mega outage iceberg!

The primary reason you’d use multiple data centres is for assurance that your data is safe and that your applications can continue to run even in the event of a major incident (such as a terrorist attack, earthquake or other natural disaster) taking an entire data centre offline. Multiple, redundant upstream network providers, feeds into multiple power grids and backup diesel generators have meant that data centres tend to run for decades with nary a blip. Suffice to say, chances are that you are pretty safe running your workloads from a single data centre. In fact, a modern Tier III data centre typically guarantees at least 99.98% availability.

Conversely, when it comes to a cloud Availability Zone (AZ) the chances of an outage impacting every single customer in that Zone are exceptionally high. In fact, an outage is so likely that the majority of cloud providers simply won’t honour the SLA or provide an uptime guarantee at all – unless you’ve first architected your application to run in a 2-tier or 3-tier configuration, deployed servers across multiple Availability Zones and configured load balancers to failover in the likely event of an entire AZ going down.

For clarity; unless you architect a load balanced solution across multiple Availability Zones, your public cloud provider is most likely going to offer you an uptime guarantee of 0.00%.

This is quite clearly monumentally time consuming to setup, expensive, complex, difficult to manage and depending on your application – may not even work out for you. What’s more, there are many applications that through either design or licensing restrictions simply won’t allow you to scale horizontally and therefore not play well in the cloud at all.

If you’re worried about availability, a single, well-architected VPS will provide similar (and probably better) uptime than two or three cloud servers, load balanced across multiple AZs. It would seem cloud servers really aren’t as inexpensive and reliable as the Cloud vendor’s marketing machines would have you believe.

Why are Availability Zones unreliable?

While powerful, the CSO software represents a major single point of failure in cloud architectures. It oversees and connects every server running in the AZ, manages their VLANs, controls internet access and if HA is active (a feature of some CSOs that will automatically power up your cloud server on a new physical host should the current host die – also known as automatic VM migration) will usually require some form of shared storage platform – which represents another single point of failure.

When the CSO software has a hiccup (it happens from time to time) the results can be mixed. Your server may keep running but lose internet connectivity. Perhaps you’ll lose the ability to start and stop servers. Perhaps your server will just restart. Perhaps it will shut down and not restart. Perhaps your root disk will go missing. Perhaps an entire server will go missing. Sometimes permanently!

I’ve seen all of these conditions first hand, many of them affecting not only a handful of servers but often every server across the entire Availability Zone. This has also led to another phenomenon; Cloud providers often won’t reveal the actual locations of their AZs. Why wouldn’t they you ask? It’s because they’ll often run multiple AZs from within the same data centre as a risk mitigation strategy. It gives you, the customer, the opportunity to deploy your servers across multiple zones to guard against the failure of a single zone.

The responsibility for uptime is passed from the provider to the client and you may now be paying for twice as many servers as you really need – yet still without any geographic redundancy!

Architecting for the Cloud

Given the poor reliability of AZs, the only way to build a reliable server infrastructure in the Cloud with the backing of a 99.9% SLA is to architect your application to run load balanced across multiple AZs. The industry is shifting the responsibility away from infrastructure and operations teams and onto software developers who need to write applications that subscribe to the “design for failure” mantra. That is, to expect that the entire cloud infrastructure (or perhaps just one or more tiers of the underlying technology stack) is going break in some way, quite frequently, at any time.

To guard against this risk you’ll need to run at least two cloud servers (but probably four) and do some load balancing work to separate your server’s single-tiered application stack into a lean, mean, multi-zone, n-tier configuration. And this is just for a simple application with little traffic!

Which brings me to another misleading headline that gets perpetuated in the industry, which is that by moving to the cloud you’ll save money. To get an uptime equivalent to a single VPS in a good data centre, you’re now paying for multiple servers, inter-zone traffic, a load balancer, a DevOps guy and your software developer to get all this working. The fact that you’ll be leaving it running 24×7 doesn’t really leverage the whole “pay by the hour” thing very effectively either IMHO.

Congratulations, your application now qualifies for the 99.9% SLA but it took some time to set up, costs a lot, is complex to manage and could very well still be in a single geographic location. Was it worth it?

Own your base and rent your peaks

The fact is, if you’re looking for a hosting solution that is reliable, fast and cost effective then a VPS or dedicated server is likely to be the best option for many businesses. Hybrid cloud hosting architectures leverage the best of both infrastructure types – you run your core workloads on linux VPS or dedicated servers (leveraging solid, consistent performance, enterprise-grade reliability & monthly billing) and use the public cloud to horizontally scale your server tiers when the capacity of your servers is exhausted or to cope with surges in traffic as a result of seasonal demand or other factors (leveraging rapid provisioning, almost infinite scalability and hourly billing) – this is exactly the sort of thing that the cloud’s hourly billing model and API-driven burstable characteristics are designed for.

At Anchor, we’ve deep experience building hybrid cloud hosting solutions that offer the best of both worlds. Contact us with your requirements and we’ll be able to design and build an infrastructure that will support your requirements.

Hosting on a VPS or dedicated server will almost certainly be a great place from which to get your website or app really moving. Why? Because you can keep it simple. A virtual server (running on well built, redundant hardware) in a Tier III datacentre will enjoy consistent performance and a realistic 99.9% uptime. Need more power? It’s easy to scale up. Increase the RAM and CPU power to your VPS with only a reboot. This approach has for a decade proven to be very reliable and cost effective.

Deciding whether you should deploy your workloads on dedicated servers, VPS or cloud servers typically means weighing up the needs of SLA accountability, performance, uptime, speed of provisioning, variable hourly billing vs fixed monthly pricing, scalability, management complexity and software development & operations costs. There is a lot to consider!

Public cloud services are always improving and maturing but they aren’t the silver bullet for every hosting problem. Long term, all application developers should be building their apps around n-tier because they’ll be more reliable, scalable and secure. It also means much easier adoption of the cloud once your business has the scale you need.

Really, I love cloud servers

I do. In fact, it’s been fun taking the opposing view here as I think the development of IaaS is one of the most exciting things to have happened to the technology industry in a long time. Cloud servers are flexible. They can be automated, created, cloned, backed-up, moved, upgraded or destroyed in minutes and can be managed via a web console or API. Available in almost infinite quantities, you can spin up 2 (or 200!) servers in minutes and hourly pricing allows you to scale your servers up, down (or turn them off) at any time and you’re only paying for the resources you use – never more.

While there can be no doubt that there are immense technical and business benefits to this type of infrastructure, as you can see, there are a few considerations to work through before deciding whether or not cloud servers make sense for your particular use case. Cloud servers are perfect for workloads that are seasonal or where there are significant fluctuations in load. They are great for test and dev environments as they can be spun up and torn down in no time. There are plenty of other use cases too. It’s just worth acknowledging that you need to spend quite a lot of money and time to realise the uptime and scalability benefits of public cloud for production workloads.

What do you think? Do you agree? Disagree? Please drop a comment in below. 🙂