Sources
Introduction to Sitecore Search sources, including what sources are and the types available.
A source defines the content you want Sitecore Search to access. When you create and configure a source, you define a starting point and rules that define which content to crawl and index. When your visitors search for content, Search searches the indexed documents, applies the Search algorithm, and shows relevant, personalized content. Without a source, Search cannot show your visitors any content.
Important
You create a source after a Search representative sets up your domain , but before you integrate Search with your website or mobile application.
To add a source, you create a Search advanced web crawler, a program that searches content and creates index documents. When you configure an advanced web crawler, you give Sitecore a starting point, called a trigger, like a sitemap or a link to an RSS feed, and define rules, such as how many levels of the URL directory structure you want the search to include.
You do not upload any content into Search. Instead, Search creates index documents from the content it searches and uses them to show relevant, personalized content to your visitors.
You can create more than one source and combine documents from multiple sources into one search experience. You can also archive a source and restore it later.
The advanced web crawler source can crawl and index HTML pages, PDFs, and all Microsoft Office formats.
Sitecore Search uses Elasticsearch to handle indexing. When Search indexes content, it does two things based on the rules you define:
It creates an index document and adds it to an internal Elasticsearch index. An index document is a JSON object and is the base unit of storage. Search creates one index document from each URL or document. For example, a 1000-word HTML page becomes one document, and a 10-page PDF becomes one document.
It extracts attributes like title, description and , image_url, for example, for each index document and stores them as metadata for the index document. Later, when you use the Search and Recommendation API to create search experiences, you can use these attributes,
You can view documents and their attributes on the Catalog page of the Customer Engagement Console (CEC).
Note
When you index content for the first time, it takes a few minutes for the Catalog to refresh after the crawler completes indexing.