Keep indexes up to date
After you publish a source and the first crawl is complete, your indexed content is searchable. However, it's important to ensure that your indexed content is an up-to-date representation of your original content and the latest search configuration.
Here, search configuration refers to features like suggestions, sorting options, facets, and more, that you have configured for your implementation. Search stores information relevant to these configurations in each index document.
Sitecore Search offers two mechanisms to update indexes, reindexing and recrawling.
Reindexing
Reindexing is a process where Search updates indexes to reflect domain-level changes to search settings.
Reindex a source when you make changes to domain-level search configuration settings (but not to source settings) and you want your search experience to reflect these updates. For example, if you add new features like a suggestion block, or create a new sorting option, you'll need to reindex sources before these changes appear in a search experience.
Sitecore Search doesn't connect to your original content during a reindex. To get the latest changes from your original content into index documents, you need to recrawl sources. Additionally, if you make any changes to the source configuration itself, you have to recrawl the source
Reindexing uses fewer resources than recrawling.
There are two ways you can reindex sources:
-
When you make any domain configuration change, Search asks you if you want to reindex all sources. If you agree, Search will automatically reindex all sources.
-
You can manually reindex sources at any time.
Recrawling
Recrawling is a process in which Search crawls your original content and then recreates indexes according to the latest search configuration settings. You can think of recrawling as reindexing with the additional step of getting the latest content.
You can recrawl sources to ensure that index documents reflect the most current version of your original content. You must also recrawl sources when you make any source configuration changes.
There are two ways you can recrawl sources:
-
You can schedule scans. This is the easiest way to ensure that indexed content stays up to date. You just have to set up a frequency, and Search automatically recrawls your original content according to the schedule.
We strongly recommend that you schedule frequent scans for all sources.
-
You can manually recrawl sources at any time.
For example, if you know that your original content has important updates and you don't want to wait for the next scheduled scan, you can kick off a manual recrawl of that source.
If a developer has used the Ingestion API to update index documents, these updates will be overwritten with information from your original content during a recrawl.
If you want changes made using the Ingestion API to persist, you'll need to make the change in your original content so that the crawler picks it up.
Sample scenarios
Here are some sample scenarios that might help you decide whether to reindex or recrawl a source:
-
Scenario: You created a new sorting option.
Source updates required: Reindex the source.
-
Scenario: You edited an existing attribute for use as a facet.
Source updates required: Reindex the source.
-
Scenario: You added a new attribute.
Source update required: Update the source's document extractor to extract the new attribute and then recrawl the source.
-
Scenario: You changed the source trigger from request to sitemap.
Source update required: Recrawl the source.
-
Scenario: You added a new locale to the domain.
Source update required: Update the source to crawl localized content and then recrawl the source.
-
Scenario: You modified the facet configuration at the widget variation level.
Source update required: Nothing. You don't have to reindex or recrawl sources for global widget or widget-level changes.