Web crawler optimizations
Sitecore Search web crawlers have been optimized to ensure timely content updates in search results and improved indexing speed and efficiency.
Generally, a web crawler crawls all the URLs provided in the connector configuration. After optimization, the crawler now only crawls URLs that have changed since the previous run.
This is achieved using the lastmod field in a sitemap.
These optimizations are only applicable to web crawlers using the sitemap or sitemap_index trigger with the lastmod field available in the sitemap, depth set to 0, and incremental updates disabled.
The optimizations provide the following benefits:
-
Receive up-to-date search results due to rapid indexing and identification of content changes.
-
Users can trigger a full crawl from the Sitecore Search user interface for comprehensive site indexing.
-
Even if the
lastmoddate is absent, periodic full crawls ensure no updates are missed.
If a sitemap contains URLs without a lastmod field, they're crawled regardless of the status of the previous run.