Best practices for indexing
The following are recommended best indexing practices for successful Sitecore Search implementations.
Define triggers and schedules
Before indexing items, prepare a plan for your implementation. This will guide you when setting the triggers and schedules in the sources. They can affect the speed and accuracy of any updates.
Do not recrawl or reindex a website multiple times a day because unchanged data is also recrawled or reindexed in every run. This can cause delays.
Validate your items
To ensure your implementation meets your indexing needs, periodically validate the configuration and performance of the sources and make adjustments as necessary.
Update or or delete only after a successful run
To prevent issues with incremental updates or deletions, make sure there has been at least one successful indexing run of all items.
Preprocess metadata
You can use analyzers and extractors to preprocess metadata before indexing because metadata formatted for output reduces processing time in the browser.
Always overestimate URLS to be crawled
Set MAX URLS in crawler settings to a number greater than the estimated number of URLs. This prevents a crawl from stopping before completion.
Create multiple taggers
For a document extractor, create multiple taggers where each tagger is linked to a unique tag. This way each tagger:
-
Generates a set of index documents.
-
Can have multiple rules where each rule defines the extraction logic for one attribute.
Use the wildcard symbol
Use the wildcard symbol (*) to create a glob expression to match multiple URLs, following a URL pattern. The symbol or character stand for any amount of characters. This eliminates having to enter a long list of URLs.
Reindex or recrawl
For most Sitecore Search implementations for content or commerce, it is common for searchable items to change or to be updated. You can create rules to exclude items with an expiry date, however in other cases you will need to reindex or recrawl.
-
Reindex when there are changes to domain-level search configuration settings. This can include changes to analyzers, entities, sorting, and facet options, among others.
-
Recrawl when the settings in the source or indexing mechanism change. This can include edits to extractors, entity settings, and tags, among others.
Reindexing and recrawling can affect the speed and accuracy of any updates.
Use checklists when updating with crawlers
The following topics are best practices you can use when updating with crawlers: