Best practices for indexing

The following are recommended best indexing practices for successful Sitecore Search implementations.

Define triggers and schedules

Before indexing items, prepare a plan for your implementation. This will guide you when setting the triggers and schedules in the sources. They can affect the speed and accuracy of any updates.

Important

Do not recrawl or reindex a website multiple times a day because unchanged data is also recrawled or reindexed in every run. This can cause delays.

Validate your items

To ensure your implementation meets your indexing needs, periodically validate the configuration and performance of the sources and make adjustments as necessary.

Update or or delete only after a successful run

To prevent issues with incremental updates or deletions, make sure there has been at least one successful indexing run of all items.

Preprocess metadata

You can use analyzers and extractors to preprocess metadata before indexing because metadata formatted for output reduces processing time in the browser.

Always overestimate URLS to be crawled

Set MAX URLS in crawler settings to a number greater than the estimated number of URLs. This prevents a crawl from stopping before completion.

Create multiple taggers

For a document extractor, create multiple taggers where each tagger is linked to a unique tag. This way each tagger:

  • Generates a set of index documents.

  • Can have multiple rules where each rule defines the extraction logic for one attribute.

Use the wildcard symbol

Use the wildcard symbol (*) to create a glob expression to match multiple URLs, following a URL pattern. The symbol or character stand for any amount of characters. This eliminates having to enter a long list of URLs.

Reindex or recrawl

For most Sitecore Search implementations for content or commerce, it is common for searchable items to change or to be updated. You can create rules to exclude items with an expiry date, however in other cases you will need to reindex or recrawl.

  • Reindex when there are changes to domain-level search configuration settings. This can include changes to analyzers, entities, sorting, and facet options, among others.

  • Recrawl when the settings in the source or indexing mechanism change. This can include edits to extractors, entity settings, and tags, among others.

Important

Reindexing and recrawling can affect the speed and accuracy of any updates.

Use checklists when updating with crawlers

The following topics are best practices you can use when updating with crawlers:

Do you have some feedback for us?

If you have suggestions for improving this article,