Web crawler optimizations
Sitecore Search web crawlers support optimized crawling, called delta crawling, to ensure timely content updates in search results and improved indexing speed and efficiency.
Generally, a web crawler crawls all the URLs provided in the connector configuration. With delta crawling, the crawler only crawls URLs that have changed since the previous run.
This is achieved using the lastmod field in a sitemap.
Delta crawling is only applicable to web crawlers using the sitemap or sitemap_index trigger with the lastmod field available in the sitemap, depth set to 0, and incremental updates disabled.
Delta crawling provides the following benefits:
-
Receive up-to-date search results due to rapid indexing and identification of content changes.
-
Users can trigger a full crawl from the Sitecore Search user interface for comprehensive site indexing.
-
Even if the
lastmoddate is absent, periodic full crawls ensure no updates are missed.
If a sitemap contains URLs without a lastmod field, they're crawled regardless of the status of the previous run.