Optimize keyword searches

Types of analyzers

When you configure an attribute for use with a feature, analyzers process and transform input text to improve search relevance and functionality.

The default analyzers applied by Search are recommended in most use cases, but there are many more advanced analyzers available to you.

Basic analyzers

Basic analyzers handle straightforward text processing tasks, like making text lowercase, removing punctuation, and creating exact matches. They are used for most common search scenarios.

Standard `rfk_standard`

The Standard analyzer is an older, English-only version of the Multi locale standard analyzer. It performs all of the same operations as Multi locale standard but without taking locale into account.

If you only work with English-language data, you can use this analyzer for textual relevance; however, we recommend you use the Multi locale standard analyzer.

Multi locale standard `rfk_standard_multi_locale`

The Multi locale standard analyzer processes input by making it lowercase, determining the root form of each word using stemming, applying synonyms, and removing stop words and punctuation. It takes locale into account while it does this.

For example, if a visitor searches for How can I improve search results?, the standard analyzer outputs the following tokens: how, can, i, improve, search, and result. In this example, the capital letters have been made lowercase, the words improve and results have been reduced to their root form, and the question mark has been removed.

The Multi locale standard analyzer is able to work differently in different locales. For example, there are different stop words in different languages: here is a stop word in English, and aquí is the corresponding stop word in Spanish. In Spanish-speaking locales, the Multi locale standard analyzer takes this difference into account.

Use the Multi locale standard analyzer for textual relevance, even if your domain does not support multiple locales.

Alphanumeric only `rfk_alphanumeric_only_analyzer`

The Alphanumeric only analyzer performs all of the same transformations as the standard analyzer, but it also strips all non-alphanumeric characters instead of using them as token separators.

For example, in the case of a document ID 1235-abhe-3f34s, the Alphanumeric only analyzer generates a single token: 1235abhe3f34s. This is useful when you want visitors to be able to search both with and without the hyphens. This result is different from the Standard analyzer, which uses the hyphens to separate the ID into three tokens: 1235, abhe, and 3f34s.

This analyzer is often used for sorting or filtering.

Keyword `rfk_keyword`

The Keyword analyzer generates the input text as a single token.

For example, if a visitor searches for Sitecore Search, the Keyword analyzer creates a single token: Sitecore Search. This means you couldn't match sitecore or search individually as matches only work with the full exact text.

This analyzer is useful for filters or other special cases where you need an exact match.

Lowercase `rfk_lowercase`

The Lowercase analyzer produces a single output token with the whole input in lowercase form.

For example, if a visitor searches for How to create Search experiences, the Lowercase analyzer generates the following token: how to create search experiences.

This analyzer is often used for sorting or filtering.

Prefix match `rfk_prefix_match`

The Prefix match analyzer generates lowercase prefixes with lengths ranging from 3 to 15 characters, stripping all non-alphanumeric characters from the input.

For example, if a visitor searches for the ISBN 978-3-16-148410-0, the tokens generated include 978, 9783, 97831, 978316, and so on.

This analyzer is often used in textual relevance for matching unique IDs.

Advanced analyzers

Advanced analyzers handle complex text manipulation tasks, like creating n-grams, handling compound words, and generating word pairs.

Ngram based matching `rfk_ngram_analyzer`

The Ngram based matching analyzer breaks text into words, then creates n-grams of length n for each word.

For example, if a visitor searches for Sitecore Search and the value of n is 2, the tokens generated include Si, it, te, ec, co, or, re, Se, ea, and so on.

This analyzer is useful for querying languages that don’t use spaces, like Japanese, and languages that have long compound words, like German. It is also useful when working with prefixes. Ngram based matching is often used in suggestion blocks.

Partial match `rfk_partial_match`

The Partial match analyzer generates lowercase variants of the input tokens, both splitting and joining on special characters and removing stop words.

For example, if a visitor searches for How do I keep Sitecore Search results up-to-date, it generates the following tokens: how, do, i, keep, sitecore, search, results, up, date, and uptodate. In this example, all of the words are converted to lowercase, and the hyphenated word up-to-date is split into separate tokens (while removing the stop word to) and joined into a single token: uptodate.

Shingle generator `rfk_shingle_analyzer`

The Shingle generator analyzer works by creating word-level n-grams called shingles.

For example, if a visitor searches for How to improve search results and the analyzer was configured to create two-word long shingles, the following tokens are generated: How to, to improve, improve search, and search results.

This analyzer is useful for extracting partial data and matching against it. The Shingle generator analyzer is often used in suggestion blocks.

Standard no stemmer `rfk_no_stemmer_analyzer`

The Standard no stemmer analyzer performs the same operations as the Standard analyzer but without reducing tokens to their root form using stemming.

For example, if a visitor searches for How to improve search results?, the Standard no stemmer analyzer generates the following tokens: how, improve, search, and results. The words are converted to lowercase, the stop word to is removed, and the question mark is removed. In contrast to the Standard analyzer, the word improve is not changed to its root form, improv.

Exact prefix match `rfk_exact_prefix_match`

The Exact prefix match analyzer creates lowercase prefixes ranging from 3 to 10 characters by removing all non-alphanumeric characters from the input text. Unlike the basic Prefix match analyzer, which produces prefixes from both document and search input text, the Exact prefix match analyzer only generates lowercase prefixes from documents. When using the Exact prefix match analyzer, input text is kept intact and is not split into prefixes.

For example, searching for AB012 will only return results beginning with ab012, such as ab0123, ab0124, or ab0125. It won’t match results like ab0223 or ab0133 which do not start with the exact prefix.

If you have suggestions for improving this article, let us know!

Types of analyzers

Basic analyzers

Standard rfk_standard

Multi locale standard rfk_standard_multi_locale

Alphanumeric only rfk_alphanumeric_only_analyzer

Keyword rfk_keyword

Lowercase rfk_lowercase

Prefix match rfk_prefix_match