Disaster recovery in the cloud
What you need to know about protecting your data
Max Lee | April 8, 2015
As an engineer in the Bay Area, I seldom come across businesses that aren’t in the cloud, whether it’s a hybrid or all-in solution. In just a few years, “aaS” (IaaS, PaaS, SaaS) has become the norm—and for good reason. Cloud providers have liberated us from the hassle of colocation and the tedium of scheduled tape backups. They provide solutions to the once-difficult problems of scalability and monitoring with a few lines of code and mouse clicks. And they’ve helped transform high availability, durability, and fault tolerance from aspirations into assumed givens.
With overall reliability easily implemented or inherent in many cloud services, the coveted “five nines”—or 99.999% uptime over a year—is more attainable now than ever before.
Durability through redundancy
Cloud storage is often relied upon as the final destination for storage of critical data, such as database backups, and should be an integral component of a disaster-resilient cloud infrastructure. All the well-known cloud providers (Amazon Web Services, Azure, Rackspace) offer durable storage solutions. Every file or object uploaded is replicated to multiple locations, either within one data center or across multiple data centers. The durability achieved through such high levels of redundancy is staggering.
For example, Amazon S3 is designed for 99.999999999% durability over the course of a year. In other words, if you store 10,000 objects with S3, the expected average loss is one object every ten million years—a number that should be attractive to even the most risk-averse businesses.
Keep your eggs in multiple baskets
Traditional disaster recovery practices, targeted towards cost-effectiveness at the expense of longer recovery times, typically involve a cold, warm, or hot standby site. These standbys are complete replicas of a primary site, with varying degrees of capacity and recovery times in the case of a failover. The most resilient configuration, active-active clustering, involves no standby site, however. In this setup, two identical sites are run simultaneously across geographically-dispersed data centers, with a load balancer directing traffic between the two. In the case of an outage in one data center, failover to the other data center is nearly instantaneous. Although technically attractive, this approach traditionally came at considerable cost—that is, until cloud-based high availability arrived.
Cloud providers simplify redundant deployments across multiple servers or multiple geographic locations. The cost typically associated with this setup is limited to the standard price per instance. Active-active, or even active-active-active configurations are not only possible, but also fairly simple to implement, and offer the added benefit of improved performance.
Are two clouds better than one?
History has shown that, although uncommon, cloud providers do experience widespread outages. To mitigate this risk, some businesses are opting to distribute their infrastructure across more than one cloud. The hybrid cloud approach involves maintaining certain components and services on premises, while leveraging public cloud services for redundancy or increased capacity.
Another emerging strategy, dubbed multi cloud, is a more generalized extension of this approach. It involves the use of multiple cloud providers to safeguard against technical (e.g. outages) and operational (e.g. vendor lock-in) risks incurred through the use of a single provider. The flexibility that these approaches provide comes at a cost, however. Greater investment must be made in personnel with the technical skill set needed to build and maintain these disparate systems. Additionally, compliance and security need to be managed across two or more systems, instead of just one.
Weathering a disaster without missing a beat
Why build an infrastructure capable of mere disaster recovery when you can build one that will weather any disaster without ever missing a beat?
Happen to be on AWS when an earthquake knocks out an entire availability zone (AZ) or region even? No problem—and no need for standby servers or tape restores. You have Elastic Load Balancing and Route 53 to automatically route all traffic to unaffected AZs and regions.
What happens when a fire wipes out an entire data center hosting your data on Azure? Again, no worries, if you enabled geo-redundant storage (GRS). All your data is safe and sound hundreds of miles away in triplicate.
So go “all in” on high availability in the cloud and say good night to traditional DR practices.
Max Lee is a full-stack engineer and certified AWS solutions architect based out of San Francisco.