Platform Administration and Architecture

Sitecore Azure Search overview

Abstract

Introducing Sitecore Azure Search, helpful for Sitecore administrators to read before installation or use.

The Sitecore Azure Search provider integrates the Sitecore Search  engine with the Microsoft Azure Search service. The Microsoft Azure Search service is a part of the Microsoft Azure computing platform, you can read more about the Microsoft Azure Search service on their website.

This topic applies to Sitecore Experience Platform 8.2 Update-1 and later and describes:

The Microsoft Azure Search service provides the following features:

  • Extreme scalability, simplicity, and stability.

  • A highly available infrastructure with 99.95% uptime as a part of the Microsoft Azure service level agreement (SLA).

  • An easy way to scale up and scale down as needed.

The Sitecore Azure Search provider includes the following features:

  • Support for all Sitecore search-driven UIs, including user-typed queries, and faceted searches.

  • Support for the majority of LINQ expressions, to enable rapid development of search-powered applications.

  • Native support for fundamental data types such as numbers and dates in faceting, and range queries.

  • Flexible configuration and precise control over the schema of the indexes.

  • Support for running Sitecore in geo-replicated scenarios.

Note

Sitecore Azure Search behaves slightly differently from the Lucene and Solr search providers; this is important to consider if you are going to switch between search providers. Read more about Sitecore Azure Search limitations and behavioral differences in the Limitations of Azure Search section.

Sitecore Azure Search is the default provider for Sitecore instances that are deployed using the Sitecore Azure SDK. It supports on premise and IaaS deployments. Follow the instructions in Configure Azure Search to configure Sitecore Azure Search.

Compared with Sitecore Search on Lucene and Solr, Sitecore Search on Azure Search has several limitations:

  • Automatic tokenization by the Azure Search service of document field values and queries when searching and faceting. This means that:

    • Substring searches that are limited to a single term, for instance, predicates, .StartsWith(), .EndsWith() and .Contains(), will match parts of terms, and will match terms that are located in any part of the field value. When multiple terms are passed, each term is searched separately, (this can provide more results than expected).

    • Regular expressions spanning multiple terms (containing spaces) returns 0 results.

    • Multiple terms that are passed to .Wildcard() are interpreted as individual wildcards in a field-scoped query.

    • The facet values are calculated based on individual terms in faceted fields, not on whole field values, when a value contains multiple words, (unlike Lucene and Solr).

    Note

    This limitation only applies to Sitecore versions 8.2.7 and 9.0.1 or earlier. For later versions, you can only change the behavior only by applying a lowercase analyzer to specific fields, for example:

    <fieldNames hint="raw:AddFieldByFieldName"> <field fieldName="_fullpath" … cloudAnalyzer="lowercase_keyword" />…
  • Same name fields - The Azure Search service has a strong schema, this means for example, that there cannot be such things as fields that have the same name but different types in different documents.

  • Joining queries - For example, .GroupJoin(), .SelfJoin(),and other operators that join queries, is not supported and results in an error.

  • Maximum content length - For filterable, sortable, or facetable fields, the length is: 32766 bytes.

  • Retrieve specific fields from documents with Azure Search - Even though this is possible, the functionality is not currently visible through the Sitecore Content Search API.Retrieve specific fields from documents with Azure Search - Even though this is possible, the functionality is not currently visible through the Sitecore Content Search API.

  • Switch-on rebuild - Is only supported from Sitecore versions 8.2.7, 9.0.2, and later.

  • Media indexing - Is not supported.

  • Language-specific analysis - Is only supported from Sitecore versions 8.2.7, 9.0.1, and later.

  • Range queries - Are always expressed as filters, as a result:

    • Combining range queries with Search using the logical operator OR (||) produces an error.

    • Range queries on string fields always operate on the whole field value without tokenization and are case-sensitive.

  • Date-time and numeric values - The Azure Search service stores date-time and numeric values as native types and only allows filtering on these fields. Search and filter parts can only be combined with the logical operator AND (&&), as a result:

    • Complex queries involving fields with different types that are combined with the logical operator OR (||) can return an error.

    • .Union() and .Except() operators may generate queries that return an error, depending on the types of the fields used.

    • Certain user queries in the Content Editor that span multiple fields with different types (such as creation date or version), return an error.

  • Fields - an Azure Search index can only contain up to 1000 fields. This may be an issue for the and Master Web indexes that both have a default setup that starts with ~550 fields. If you reach the 1000 fields limit, create a new index that is specifically dedicated to indexing your custom templates and fields, then exclude your custom fields from the Master and Web indexes. 

    Note

    The limitation of 1000 fields per index means the Azure Search capabilities for multilingual solutions are also limited.

  • Pivot faceting - Used with the FacetPivotOn operator is not supported.

  • Fuzzy query semantics - Are different in Azure Search, for example:

    • .Like(pattern, similarity)interprets the similarity parameter as the Damerau-Levenshtein Distance (value between 0 and 2). This is different from the way Lucene implements the similarity parameter in Sitecore.

    • The similarity and slop parameters cannot be combined in the Azure Search Lucene syntax, this means multiple-word fuzzy queries, such as .Like() are always interpreted as a phrase query with a slop.

Refer to the following list for features that exist in Azure that are not currently supported by your Sitecore provider:

  • Geospatial data types

  • Scoring profiles

  • Indexers

  • Suggestions

  • Highlighters