Identify the required source type
You can create different types of sources in Sitecore Search. Depending on your business requirements, you can use one, some, or all of them. However, crawling an entire site can affect site performance. We recommend you follow some of the best practices outlined in these topics.
The following table provides a quick way to identify which type of source you need based on item type and number of items to update:
Crawlable content |
For indexing |
For frequent new updates |
For extensive changes |
---|---|---|---|
Yes |
Scheduled crawler |
Ingestion API |
Reindex using a crawler |
No |
API Push |
Ingestion API |
API Push |
Search provides different types of crawlers. Each one is designed for a specific business requirement. The following lists provide more information on different source types.
Crawler sources
-
If you have content for only one locale and language, and all the content is available on an HTML page, create a web crawler. The web crawler is usually able to cover all basic crawling requirements.
We recommend starting with a web crawler and then converting it to an advanced web crawler if necessary.
NoteTo extract values for multiple entities in a single crawl, you need to convert the web crawler to an advanced web crawler. A web crawler can extract values for only one entity.
-
If you create a web crawler but then reach a point where you need additional settings, convert it to an advanced web crawler. For example, if you want to handle authentication requirements, use JavaScript expressions to extract attribute values, index content in multiple languages, and so on, you'll need to use the advanced web crawler.
-
If your content can only be accessed by an API endpoint, and the endpoint returns JSON, use an API crawler.
-
If you want to create a new index document and add it to an existing index or quickly update or delete existing index documents, use the Ingestion API.
For example, you have an advanced web crawler that frequently crawls a website for blogs and adds it to an index. You get a new blog that you urgently need to make available to your visitors and cannot wait for the next scheduled scan. Use the Ingestion API to add the blog.
API Push source
-
If you want to exclusively push content to Sitecore Search, create an API push source. This creates an empty index. Then, a developer can use the Ingestion API to add index documents.
These Crawler specifications compare various crawlers in a table.