High availability
High availability (HA) is defined as the ability of a system or system component to be continuously operational for a desirably long length of time. In Managed Cloud this is achieved by introducing redundancy and the high availability of infrastructure is guaranteed by the Azure provider. Within the Managed Cloud containers configuration we target the availability of our infrastructure components as 99.99%. High availability is provided within the same location only and does not cover scenarios where an entire region fails. Entire region failures must be addressed by the Disaster Recovery scenario.
High availability is implemented with the support of the service vendors' built-in features.
Azure Kubernetes Service (AKS) Uptime SLA
Although AKS does not provide any SLA(s) with the default configuration, as it’s free, Microsoft does endeavour to provide a 99.5% uptime with the default configuration.
Uptime SLA guarantees 99.95% availability of the Kubernetes API server endpoint for clusters that use Availability Zones and 99.9% of availability for clusters that don't use Availability Zones. AKS uses master node replicas across update and fault domains to ensure SLA requirements are met.
Read here for more on the Azure Kubernetes Service (AKS) with Uptime SLA.
AKS Workload Availability Zones
The Managed Cloud deployment model, when using availability zones, ensures nodes in a given availability zone are physically separated from those defined in another availability zone. AKS clusters deployed with multiple availability zones configured across a cluster, provide a higher level of availability to protect against a hardware failure or a planned maintenance event.
For more details, read Use availability zones in Azure Kubernetes Service (AKS) - Azure Kubernetes Service.
Known limitations:
You can only define availability zones when the cluster or node pool is created.·
Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones.·
The chosen node size (VM SKU) selected must be available across all availability zones selected.·
Clusters with availability zones enabled require the use of Azure Standard Load Balancers for distribution across zones. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see Azure load balancer standard SKU limitations.·
Additional price can be applied for data transfer – See Pricing – Bandwidth | Microsoft Azure
See also Data transfer between Availability Zones(Egress and Ingress) is payable.
SQL Elastic Pool
Azure SQL Database is a fully managed relational database with built-in regional high availability.
The Azure SQL Managed Instance has an availability guarantee of at least 99.99%. This applies to both the Business Critical tier and the General Purpose tiers. There are three service tiers:
General Purpose/Standard—for common workloads
Business Critical/Premium—for high throughput OLTP applications requiring low latency and high resilience
Hyperscale—for very large OLTP systems, performs auto-scaling of storage and compute:
Azure SQL Database Business Critical or Premium tiers configured as Zone Redundant Deployments have an availability guarantee of at least 99.995%.
Azure SQL Database Business Critical or Premium tiers not configured for Zone Redundant Deployments have an availability guarantee of at least 99.99%.
Azure SQL Database General Purpose, Standard, Basic tiers, or Hyperscale tier with two or more replicas have an availability guarantee of at least 99.99%.
Azure SQL Database Hyperscale tier with one replica has an availability guarantee of at least 99.95% and 99.9% for zero replicas.
Azure SQL Database Business Critical tier configured with geo-replication has a guarantee of Recovery point objective (RPO) of 5 sec for 100% of deployed hours.
Azure SQL Database Business Critical tier configured with geo-replication has a guarantee of Recovery time objective (RTO) of 30 sec for 100% of deployed hours.
Search Stax
High Availability is built-in. The uptime depends on the corresponding tier from 99.5 (Gold) to 99.95 (Platinum Plus).
Read more for details on Managed Solr Pricing and Features.
Front Door
Azure guarantees that at least 99.99% of the time Azure Front Door Service will respond to client requests and deliver the requested content without error.
Azure Container Registry(ACR)
We guarantee that at least 99.9% of the time Managed Registry will successfully process Registry Transactions. The SLA for Classic Registry is provided through Azure Storage.
If you are using a public image, consider importing it into your container registry that aligns with your SLO. Otherwise, the image might be subject to unexpected availability issues. Those issues can cause operational issues if the image isn't available when you need it.
Decisions
High availability decisions are applicable for Production configuration.
Resource | Solution | Potential target SLA | Comments |
---|---|---|---|
AKS | Enable uptime SLA feature by default for production deployments | 99.95% in pair with enabled Availability Zones for Workload | Review list of supported locations. |
Windows Node Pool |
| 99.99% with configured Availability Zones | Review list of public supported locations where availability zones are supported. |
Linux Node Pool |
| 99.99% with configured Availability Zones | Review list of public supported locations where availability zones are supported. |
SQL Elastic Pool | General Purpose tier | 99.99% | |
Search Stax | Platinum tier | 99.9% | |
Front Door | High Availability provided by default | 99.99% | |
ACR |
| 99.9% | Pull all (sitecore + 3rd party) images locally during the provisioning time |
Storage Account | LRS replication | 99.999999999% (11 nines) | |
Kubernetes Workload | Roles that can be scaled horizontally should be configured at least with 2 pods |