Observability

For peak performance, Sitecore Content Hub uses an observability strategy consisting of three main pillars as illustrated in the following diagram:

Diagram of three observability pillars

Pillar

Description

Tools used

Watch

To evaluate performance, we collect data using industry-leading tools. We constantly monitor the infrastructure, operating system, and application metrics. In addition to technology metrics, we integrate business metrics into the data, focusing on the customer experience and expected performance.

Elastic Stack, Prometheus, Site 24x7

Learn

The collected data is aggregated to create meaningful dashboards. These dashboards provide a critical visual aid for the immediate identification of issues. Long-term aggregation of data provides the ability to predict or identify potential trends that could lead to failure and creates opportunities to act before customer environments are impacted.

TIG (Telegraph, InfluxDB, Grafana)

Act

When issues or potential impacts are identified, our suite of alerting tools works in unison to collect the relevant information and generate alerts targeted to the specific person, team, or automation agent required to facilitate rapid response and recovery.

OpsGenie

Observability and alerting tools

Observability and alerting tools enable Content Hub to have constant insight and receive continuous feedback from our systems through monitoring and logs.

Content Hub uses the following tools:

Tool

Description

Elastic Stack

An open-source logging platform used to consume and deliver detailed logging information in a unified format for easy ingestion and aggregation.

Prometheus

Fits both machine-centric monitoring and monitoring of highly dynamic service-oriented architectures. On top of providing multi-dimensional data collection and powerful querying, Prometheus can monitor Kubernetes environments, which makes it a must-have for Content Hub.

Site 24x7

Provides a global perspective of website performance from more than 100 locations worldwide, checking that public-facing websites and APIs that access back-end services are up, performing well, and returning the expected data.

Metrics tools

Tools for machine and service metrics provide standards for monitoring that provide great versatility and accommodate almost any business monitoring need.

Content Hub uses the TIG (Telegraph, InfluxDB, Grafana) tools stack:

Tool

Description

Telegraf

Active agent used to collect metrics.

InfluxDB

Time-series database used to store metrics collected by Telegraf.

Grafana

Metric analytics and visualization suite that provides the ability to visualize time series data for infrastructure and application analysis. Aggregates and visualizes data from InfluxDB. Enables the creation of pre-defined alerting rules.

Incident management tool

Content Hub uses OpsGenie as an incident management platform.

OpsGenie:

  • checks that critical incidents are never missed and that the right people take appropriate actions in the shortest possible time.

  • categorizes the alerts received from monitoring systems and custom applications based on importance and timing.

  • provides on-call schedules to notify the appropriate people through multiple communication channels (voice calls, email, SMS, and push messages) with automatic escalation procedures.

Do you have some feedback for us?

If you have suggestions for improving this article,