Disaster Recovery for PaaS

Current version: 9.3

Disaster recovery allows an organization to maintain or quickly resume mission-critical functions following a disaster that requires manual intervention. The Sitecore Disaster Recovery (DR) service offers two DR service types: DR Basic (confirmation) and DR Managed (automatically). With these go two Infrastructure types: Cold Standby (minimum infrastructure) and Hot Standby (full replica):

  • DR Basic Cold Standby: reactive service. The recovery and failover process starts after the event and requires confirmation from the customer. This is a cost-effective service option with a longer Recovery Time Objective (RTO). The Basic Cold Standby DR service type provisions the minimum infrastructure required and is often used for non-critical applications or in situations where data only changes infrequently.

    Basic Cold Standby Disaster Recovery includes Geo-Replication. In the event of a disaster, failover to the secondary region and database happens with minimal downtime.

  • DR Managed Hot Standby: proactive service. Failback and failover process will start automatically. Provisions an entire replica of the primary site. This provides the shortest RTO interval.

The recovery option that works best with your environment depends on whether you require failover initiation to be proactive or reactive and how quickly you want to be back online when an outage occurs.

Note

For more detailed information on Sitecore's disaster recovery processes, please review the Disaster Recovery Policy document.

Recovery option considerations

To decide which recovery option matches your requirements, use the following table as a reference and consider:

  • How quickly your site needs to be back online in the event of an outage.

  • The recovery point objective (RPO).

  • The recovery time objective (RTO).

Specifications

DR Basic cold standby

DR Managed hot standby

Backup technologies

SQL Azure Geo-Replication

Azure APIs.

SQL Azure Geo-Replication

Azure APIs

Recovery process

  1. Customer request/approval for failover

  2. Deploy

  3. Restore

  4. Switch over

  5. Go live

  6. Customer validation

  1. Switch over

  2. Go live

  3. Customer validation

Secondary environment state

Created on demand

Exact replica of the primary environment. Sitecore Azure Web Apps fully deployed and stopped

Recovery Point Objective (RPO)

SQL 5 seconds

WebApp 12 hours

SQL 5 seconds

WebApp 12 hours

Recovery Time Objective (RTO) - Technology only

4 hours

< 1 minute

Failback Time - Technology only

4 hours

10 minutes

Note

The technology RTO values depend on how long the system takes to restore the Sitecore platform. If manual steps are required involving the customer or partner, this may extend the effective RTO.

Replication between regions

A typical Sitecore environment is comprised of five Azure resource types: App Services, Azure SQL, Application Insights, Azure Search/SOLR, Redis Cache. Sitecore ensures the sizes and instance counts for all of the resources are replicated to a secondary data center. However, only App Services and Azure SQL have their files/data backed up and restored. The other services do not have their data replicated because it is either transient or not required for successful restoration, specifically:

  • App Services – All files/data are backed up and restored.

  • Azure SQL – All files/data are backed up and restored.

  • Redis Cache – Data is not replicated because Redis Cache contains user session data that typically expires before a Sitecore site can be restored, therefore it is not included as part of the disaster recovery strategy.

  • Application Insights – Data is not replicated because Application Insights only contains health monitoring data and this is not required for the runtime of the Sitecore site.

Do you have some feedback for us?

If you have suggestions for improving this article,