Best practices for indexing

Best practices to crawl frequent new updates

This topic describes considerations when you want to capture frequent new changes to a site that has been previously crawled and indexed.

Create the crawler

Ensure that the crawler type is appropriate for the updated items: web crawler or API crawler.
Configure all the basic settings of the crawler including name, URL, and type.

Configure settings

Confirm that the crawler focuses only on areas of the site with dynamic or frequently changing content.
Include all the URLs for the focused areas and their authentication information.
Confirm that crawler type can index all item types at the URLs or endpoints.
Support the crawler with extractors for PDFs, images, or localized content, as applicable.

Assign document extractors

Confirm the document extractors can extract values from crawled items. For example, XPath and JSON.
Ensure the document extractors are configured to handle incremental changes.
Validate and verify the document extractors accurately extract desired values.

Configure triggers

Request or JavaScript triggers to initiate incremental updates have been implemented.
Schedule trigger to run at regular intervals, such as hourly or every few hours while still matching to the frequency of the changes.
Set the trigger for a low-traffic time to minimize impact on site performance.

Add tags

For your categorization and prioritization needs, add tags to your crawler.

If you have suggestions for improving this article, let us know!

Documentation Assistant

This assistant uses AI to generate responses based on Sitecore documentation. While it has access to official sources, answers may be incomplete or inaccurate and should not be considered official advice or support.

Powered by

k

Protected by reCAPTCHA