Validate extractors
When you configure a web crawler in Sitecore Search, you can verify your attribute extraction logic before publishing. This check, called validation, shows the attribute values the crawler will extract from a given URL based on your defined settings. This is useful because you can refine the document extractor settings before performing the resource-intensive publishing and indexing of items.
Validation only shows you what extracted attributes can look like. No index documents are created during the validation process.
You must define extraction logic for all attributes on the Attribute Extraction page before you can validate.
Javascript rendering
Validation does not execute client-side JavaScript, or apply crawler-specific settings such as JavaScript rendering. Because of this, validation results might be incomplete or inaccurate for websites that rely on JavaScript frameworks such as React.
For websites that require JavaScript rendering:
-
Configure your crawler with the appropriate rendering settings.
-
Run the crawler with a low MAX_URLS value to test extraction logic.
-
Review the indexed results to validate your configuration.