Disaster Recovery On Demand

Organizations today rely mostly on Disaster Recovery (DR) services to prevent either man-made or natural disasters from causing expensive service disruptions. Unfortunately, current DR services come either at very high cost or with weak guarantees about the amount of data lost and time required to restart operation after a failure.

Virtualized cloud platforms are well suited to providing DR. Under normal operating conditions, a cloud-based DR service may only need a small share of resources to synchronize state from the primary site to the cloud. The full amount of resources required to run the application only needs to be provisioned (and paid for) if a disaster actually happens. The use of automated virtualization platforms for disaster recovery means that additional resources can be rapidly brought online once the disaster is detected. This can dramatically reduce the recovery time after a failure which is a key component in enabling business continuity.

Key Requirements for Effective DR Service

The requirements for an effective DR Service may be based on business decisions such as the monetary cost of system downtime or data loss, while others can be directly tied to application performance and accuracy.

The level of data protection and speed of recovery depends on the type of backup mechanism used and the nature of resources available at the backup site. In general, DR services fall under one of the following categories:

Hot Backup Site: A hot backup site typically provides a set of mirrored stand-by servers that are always available to run the application once a disaster occurs, providing minimal RTO and RPO. Hot standbys typically use synchronous replication to prevent any data loss due to a disaster.

Warm Backup Site: A warm backup site may keep state up to date with either synchronous or asynchronous replication schemes depending on the necessary RPO. Standby servers to run the application after failure are available, but are only kept in a “warm” state where it may take minutes to bring them online.

Cold Backup Site: In a cold backup site, data is often only replicated on a periodic basis, leading to an RPO of hours or days. In addition, servers to run the application after failure are not readily available, and there may be a delay of hours or days as hardware is brought out of storage or re-purposed from test and development systems, resulting in a high RTO. It can be difficult to support business continuity with cold backup sites, but they are a very low cost option for applications that do not require strong protection or availability guarantees.

Cloud-based Disaster Recovery (DR)

The on-demand nature of cloud computing means that it provides the greatest cost benefit when peak resource demands are much higher than average case demands. This means that cloud platforms can provide the greatest benefit to DR services that require warm stand-by replicas. In this case, the cloud can be used to cheaply maintain the state of an application using low cost resources under ordinary operating conditions.

Only after a disaster occurs, a cloud-based DR Service pays for the more powerful – and expensive – resources required to run the full application. These resources can be provisioned in a matter of seconds or minutes. In contrast, an enterprise using its own private resources for DR must always have servers available to meet the resource needs of the full disaster case, resulting in a much higher cost during normal operation.

Disaster Recovery as a (Cloud) Service

“Cloud-based DR moves the discussion from data center space and hardware to one about cloud capacity planning,” noted Lauren Whitehouse, senior analyst at Enterprise Strategy Group (ESG) in Milford, Massachusetts.

Identifying critical resources and recovery methods is the most relevant aspect during this process, since an organization needs to ensure that all critical apps and data are included in the blueprint. With applications identified and prioritized, and RTOs defined, the organization can then determine the best and most cost-effective methods of achieving the RTOs (by application and service). A combination of cost and recovery objectives drives different levels of disaster recovery.