DR Basic Cold Standby

Abstract

Learn about the processes involved in recovering your environment with the Basic DR service.

With the Sitecore DR Basic Cold Standby, the Sitecore Managed Cloud disaster recovery service sets a process in to action in the event of an outage. When there is an outage in the primary region, a new Sitecore production environment is created in a secondary data center. During the creation of the secondary environment, a simple outage page is displayed to make customers aware that the site is down temporarily. Because the new environment must be created in a secondary data center, this recovery option has the longer RTO but is the less expensive option.

HADR_Basic.PNG

The setup steps are:

  • Provision the Control Resource Group and the relevant underlying resources and services that monitor DR states.

  • Set up the replication for SQL server.

  • Provision backup automation (covers the backup of webapps and synchronizes SQL server databases in the failover group).

  • Set up of the outage page.

  • Set up of the Traffic manager to redirect traffic and switch between primary CD and the outage page.

  • Set up of Email alerts to notify Managed Cloud operations team when availability tests fail.

Sitecore Managed Cloud continuously checks the health of the primary region environment. If three out of five data centers report an issue, the Sitecore Managed Cloud operations team begins to investigate the Sitecore environment in the primary data center to see if there is a legitimate issue and not a false positive. The operations team performs the following validation checks in the primary data center:

  • Check for alerts raised by the Azure Resources used by the Sitecore site.

  • Check if the Traffic Manager is reporting a degraded endpoint.

  • Check the Azure Status site for known data center issues.

Should the Cloud Operations team determine that there is an unrecoverable issue in part or all of the underlying infrastructure in the primary data center, then the failover confirmation process begins, and the customer is contacted.

When the customer confirms, Sitecore triggers the recovery procedure using the following steps:

  1. Deploy a new Sitecore environment in the secondary region.

  2. Restore the WebApps from the last backup.

  3. Switchover SQL server in GeoReplication.

  4. Update the connection strings with the credentials from the primary instance.

  5. Re-index content and xDB indexes.

  6. Switch the Traffic Manager to the secondary Sitecore instance.

After Sitecore has finished the failover process and the cause of the disaster has been fixed, the customer and the Managed Cloud Operations team will agree on a time to return to the primary region environment. After a failback, the primary environment will resumed from its state before the failure.