Improve Application Resilience with Distributed Cloud Services

Redundancy is the linchpin of resilience. Organizations commonly implement redundant IT infrastructure to ensure the availability of applications in the event of hardware failure or network outage. The same principles apply to cloud applications. Although public cloud services are highly reliable, major cloud providers have had recent service interruptions lasting several hours or more, such as the Amazon Web Services (AWS) outage in December 2021. In that event, several web service providers that run their applications in the AWS cloud experienced service interruptions.

A multi-hour outage could be devastating to an organization that cannot afford any downtime, with limited compensation from the service provider. Cloud providers’ service-level agreements (SLAs) typically apply to their services, not their customers’ applications. Distributing cloud applications across multiple virtual machines (VMs), availability zones or regions can reduce the risk of downtime in the event of a service provider outage. There are multiple ways to design a distributed cloud service, with varying costs and levels of protection.

Interested In Learning More?

Location, Location, Location

A distributed cloud service is simply an architecture in which multiple clouds are used to meet availability, performance or compliance requirements. The workload could be distributed within the same service provider’s environment, to another cloud provider’s infrastructure, or to a private cloud maintained on-premises or in a co-location provider’s facility.

For resilience purposes, it isn’t necessary to use multiple providers or a private cloud. Organizations can improve resilience significantly by distributing workloads across regions or availability zones within the same service provider’s infrastructure. A “region” is the geographic location of the service provider’s servers, while an “availability zone” is a geographically isolated location within a region. For example, AWS operates 26 geographic regions with two to five availability zones per region.

A distributed workload costs more but creates an important hedge against downtime. Although cloud providers are responsible for the availability of services they manage, such as load balancers, customers are responsible for ensuring the resilience of applications hosted in an Infrastructure-as-a-Service environment. Furthermore, any compensation for a cloud outage reflects only the cost of the failed service, not the business impact of application downtime.

Cost-Benefit Analysis

A 2022 Uptime Institute study measured both the cost and resiliency benefits of various distributed architectures. Distributing workloads across VMs within the same availability zone in an active-active configuration only protects against VM outages yet costs 43 percent more than a single VM instance. Distributing workloads across multiple availability zones costs the same yet increases availability from 99.5 percent to 99.9 percent by protecting against availability zone outages.

Organizations can gain nearly 100 percent availability by distributing workloads across regions. Regional outages are exceedingly rare.

In a “pilot light” failover model, data in the failover region is live but the services remain idle until needed. It could take 15 minutes or more to start the services, but this configuration only costs about 20 percent more than distribution across availability zones.

In warm standby mode, services are always running at reduced capacity and can handle some traffic immediately. This configuration costs about 90 percent more than distribution across availability zones. A dual region, active-active configuration costs more than twice as much as distribution across availability zones but provides instantaneous recovery from an outage.

How DeSeMa Can Help

Few organizations have the expertise to analyze the cost of distributed cloud services against the risk of downtime. They also lack the skill sets needed to architect a distributed cloud service that meets business requirements cost-effectively.

The DeSeMa team helped build the cloud, including major public cloud services such as AWS, Azure and the Google Cloud Platform. We can help you determine the best way to distribute your cloud workloads and handle the design and implementation from end to end.