Disaster Recovery 2.0 procedure
This topic describes the different steps in the disaster recovery process. For more information, including the roles and responsibilities for deployment, invocation, and failback tasks, see Disaster Recovery 2.0 roles and responsibilities.
Sitecore Cloud Operations creates your DR environment in the Secondary Azure region in advance, at your request. For example, you could request that this is done shortly after the Production environment is deployed or at a later time. The environment will initially match your Production environment, but will then be scaled down to lower tiers to save costs. The environment is scaled up again once a disaster recovery process is initiated.
You are responsible for requesting, or giving permission for a disaster recovery process to start. When you've done this, the recovery process follows the following steps:
-
Switch to the secondary Azure region (DR region)
-
The DR environment is scaled up
-
Failover to the DR region completed
-
Go live from the DR region
Scaling Azure services
The following Azure services will be scaled up to match the production SKUs and tiers as part of the DR failover process. Review the following articles for a complete list of Azure SKUs included in the default Production and DR topologies and tiers:
Azure Application Gateway
All Application Gateway instances will be scaled up to match their corresponding Production SKUs and tiers.
Azure App services
All Azure App services will be scaled up to match their corresponding Production SKUs and tiers.
Azure SQL database
All Azure SQL elastic pools will be scaled up to match their corresponding Production SKUs and tiers.
Azure Redis cache
Azure Redis requires two failover scaling steps:
-
Basic C0 is scaled to Standard C0.
-
Standard C0 is scaled to Premium P1.
Redis Cache scaling activities take approximately 30 minutes. During this time, there might be intermittent access to the Azure Redis cache. This scale-up activity is entirely dependent on Microsoft Azure. Sitecore cannot influence or guarantee the time to scale the Redis Cache instance.
Search services (Solr)
Managed Cloud includes the provision of Solr search services as part of the initial Production deployment. Where DR is included, Sitecore will also ensure the provision of a dedicated DR Solr search cluster. When the DR process is initiated, the DR search cluster will be attached to the failover environment. To meet these business objectives, Sitecore take a backup once every 3 hours, including all Solr configuration files, Solr collections, aliases, security files, and custom JAR files. After a backup from the Production cluster has successfully completed, the restoration process will be initiated in the DR environment.
Post-failover activities
Following a successful failover from the Primary environment, you are advised to avoid changing or updating the Web App folder contents and structure. To preserve the integrity of the source location (the initial Primary location), Sitecore turns off App Service synchronization. SQL Database synchronization is maintained, if the Primary location is available.
During the failover of the Primary environment, you can expect a momentary disruption in service. Active user sessions on the source location will be dropped, and users will need to re-authenticate before new sessions are established in the secondary location. This includes active session data stored in the primary Redis cache instance.
Failback
Failing back to the Production environment is only possible when all services in the previously impacted region are back online. Sitecore will initiate the failback process at your request. The failback process will not synchronize App Service updates and changes to the primary environment.
During the failback to the primary environment, there will be a momentary disruption in service. Active user sessions in the source location will be dropped, and new sessions will be established when the primary location is back online. This includes active session data stored in the Redis Cache instance. Users who were logged in will need to re-authenticate before a session is opened in the primary environment.
Customer-custom Azure subscription
The Customer-custom Azure subscription is a blank Azure subscription that you can use as the location for any non-Sitecore components that might be needed after DR failover.
When you receive the subscription, it will only contain an empty Azure resource group. Sitecore will not deploy any Azure services within this subscription. You own the deployment of resources within the Customer-custom Production subscription and the corresponding Customer-custom DR subscription.