Configure an advanced web crawler

The Sitecore Search advanced web crawler is a powerful crawler that crawls your content and adds it to an index. It can handle complicated use cases like needing authentication to access your source content, creating index documents in multiple languages, using JavaScript to extract attribute values, and more.

Important

Refer to detailed specifications of crawler types when choosing an indexing method.

Follow these best practices to successfully index best practices or crawl updates .

This walkthrough describes how to:

Note

If your original content requires authentication before the crawler can access it, configure crawler authentication settings. For example, your original content might need a GUI-based user name and password, or an access token or key in the request header. You can do this at any point after source creation.

Create a source

To create a source:

  1. In the CONNECTOR drop-down list, click Web Crawler (Advanced).

Configure crawler settings

Configuring web crawler settings includes defining allowed domains and max URLs among other settings.

To configure crawler settings:

  1. On the menu bar, click Sources and then click the source you created.

  2. In the left pane, click Web Crawler Settings then in the Web Crawler Settings section, click Edit.

  3. Click Save.

Configure a trigger

Configuring triggers for an advanced web crawler give it a starting point to look for content to index. In addition to using the request trigger, as described in this procedure, you can use the following types of triggers:

Create a document extractor

Configuring a document extractor specifies how to extract values for attributes of one or more than one entity.

In addition to the XPath document extractor type described in this procedure, you can use:

Schedule the crawler

To schedule the crawler:

Publish updates to the source

To start the first crawl and index items, you need to publish the source. You also need to publish the source every time you make any changes to the source.

Do you have some feedback for us?

If you have suggestions for improving this article,