Managed Cloud Premium

High availability

Version:

High availability (HA) is defined as the ability of a system or system component to be continuously operational for a desirably long length of time. In Managed Cloud this is achieved by introducing redundancy and the high availability of infrastructure is guaranteed by the Azure provider. Within the Managed Cloud containers configuration we target the availability of our infrastructure components as 99.99%. High availability is provided within the same location only and does not cover scenarios where an entire region fails. Entire region failures must be addressed by the Disaster Recovery scenario.

High availability is implemented with the support of the service vendors' built-in features.

Azure Kubernetes Service (AKS) Uptime SLA

Although AKS does not provide any SLA(s) with the default configuration, as it’s free, Microsoft does endeavour to provide a 99.5% uptime with the default configuration.

Uptime SLA guarantees 99.95% availability of the Kubernetes API server endpoint for clusters that use Availability Zones and 99.9% of availability for clusters that don't use Availability Zones. AKS uses master node replicas across update and fault domains to ensure SLA requirements are met.

Read here for more on the Azure Kubernetes Service (AKS) with Uptime SLA.

AKS Workload Availability Zones

The Managed Cloud deployment model, when using availability zones, ensures nodes in a given availability zone are physically separated from those defined in another availability zone. AKS clusters deployed with multiple availability zones configured across a cluster, provide a higher level of availability to protect against a hardware failure or a planned maintenance event.

For more details, read Use availability zones in Azure Kubernetes Service (AKS) - Azure Kubernetes Service.

Known limitations:

You can only define availability zones when the cluster or node pool is created.·
Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones.·
The chosen node size (VM SKU) selected must be available across all availability zones selected.·
Clusters with availability zones enabled require the use of Azure Standard Load Balancers for distribution across zones. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see Azure load balancer standard SKU limitations.·
Additional price can be applied for data transfer – See Pricing – Bandwidth | Microsoft Azure

SQL Elastic Pool

Azure SQL Database is a fully managed relational database with built-in regional high availability.

The Azure SQL Managed Instance has an availability guarantee of at least 99.99%. This applies to both the Business Critical tier and the General Purpose tiers. There are three service tiers:

General Purpose/Standard—for common workloads
Business Critical/Premium—for high throughput OLTP applications requiring low latency and high resilience
Hyperscale—for very large OLTP systems, performs auto-scaling of storage and compute:
1. Azure SQL Database Business Critical or Premium tiers configured as Zone Redundant Deployments have an availability guarantee of at least 99.995%.
2. Azure SQL Database Business Critical or Premium tiers not configured for Zone Redundant Deployments have an availability guarantee of at least 99.99%.
3. Azure SQL Database General Purpose, Standard, Basic tiers, or Hyperscale tier with two or more replicas have an availability guarantee of at least 99.99%.
4. Azure SQL Database Hyperscale tier with one replica has an availability guarantee of at least 99.95% and 99.9% for zero replicas.
5. Azure SQL Database Business Critical tier configured with geo-replication has a guarantee of Recovery point objective (RPO) of 5 sec for 100% of deployed hours.
6. Azure SQL Database Business Critical tier configured with geo-replication has a guarantee of Recovery time objective (RTO) of 30 sec for 100% of deployed hours.

Search Stax

High Availability is built-in. The uptime depends on the corresponding tier from 99.5 (Gold) to 99.95 (Platinum Plus).

Read more for details on Managed Solr Pricing and Features.

Front Door

Azure guarantees that at least 99.99% of the time Azure Front Door Service will respond to client requests and deliver the requested content without error.

Azure Container Registry(ACR)

We guarantee that at least 99.9% of the time Managed Registry will successfully process Registry Transactions. The SLA for Classic Registry is provided through Azure Storage.

If you are using a public image, consider importing it into your container registry that aligns with your SLO. Otherwise, the image might be subject to unexpected availability issues. Those issues can cause operational issues if the image isn't available when you need it.

See the SLA for Container Registry from Azure.

Decisions

High availability decisions are applicable for Production configuration.

Table 1. High availability solutions and potential target SLA

Resource	Solution	Potential target SLA	Comments
AKS	Enable uptime SLA feature by default for production deployments	99.95% in pair with enabled Availability Zones for Workload	Review list of supported locations.
Windows Node Pool	Configure 2 Availability zones Get rid of 2nd node pool - use one node pool Scale Set should be configured at least with 2 Nodes	99.99% with configured Availability Zones	Review list of public supported locations where availability zones are supported.
Linux Node Pool	Configure 2 Availability zones Scale Set should be configured at least with 2 Nodes	99.99% with configured Availability Zones	Review list of public supported locations where availability zones are supported.
SQL Elastic Pool	General Purpose tier	99.99%
Search Stax	Platinum tier	99.9%
Front Door	High Availability provided by default	99.99%
ACR	Premium tier Geo-replication to paired locations Enable zone redundancy	99.9%	Pull all (sitecore + 3rd party) images locally during the provisioning time
Storage Account	LRS replication	99.999999999% (11 nines)
Kubernetes Workload	Roles that can be scaled horizontally should be configured at least with 2 pods

If you have suggestions for improving this article, let us know!