Configure a feed crawler
In Sitecore Search, a feed crawler is used to index data from documents such as:
-
Text files containing delimiters.
-
CSV files.
-
JSON string files.
It systematically extracts data from these files and transforms it into searchable content within Sitecore Search. This process involves creating a feed crawler source, configuring document extractors, and setting up transformers to handle the data.
If you select the sales feed crawler connector type, then certain sales feed crawler configurations must be present.
This walkthrough describes how to:
Documents can only be uploaded to the feed crawler via Secure File Transfer Protocol (SFTP). To receive your SFTP credentials, contact Sitecore support.
Create a feed crawler source
The first step is to create a source. This is a connector that retrieves items from a specified website, document, or repository, and indexes them so they are searchable.
To create a source:
-
In the menu bar, click SOURCES.
-
Click Add Source
to add a source. -
In the SOURCE NAME field, enter a name for the source.
-
In the DESCRIPTION field, enter a few lines to describe the source you want to configure.
-
In the CONNECTOR drop-down list, click Feed Crawler.
-
Click Save. If there are no errors, Search creates a new source.
Choose locales
Locales specify the geographical and linguistic regions for which the data is relevant. Searchuses locale values to ensure that data indexed by the feed crawler is appropriate for the language and regional settings of the target audience, to direct visitors to locale specific pages, and to create locale specific rules for all types of widgets.
To choose locales for your source:
-
On the menu bar, click Sources, then select your feed crawler source.
-
In the Source Settings menu, click Available Locales.
-
In the Locales drop-down list, select a locale that you have configured for the domain.
-
Click Save.
Define tags
Tags are used to create search experiences for specific entities. When you set up a source, the Tags Definition window is used to specify which entities will be updated by that source.
To use tags to update specific entities with this source:
-
On the menu bar, click Sources, then select your feed crawler source.
-
In the Source Settings menu, click Tags Definition.
-
In the Entity drop-down list, select an entity to be updated by this source.
-
To configure a basic tag, in the From drop-down list, click Tags.
-
In the Tags field, enter a name for the tag.
-
Click Save.
Create a document extractor
Document extractors process and extract data from the files uploaded to the feed crawler, and convert it into a structured format that can be indexed by Search. Document extractors come with default JavaScript logic to perform this conversion, but you can edit this logic to suit your needs if you require further configuration.
To create a document extractor:
-
On the menu bar, click Sources and then select your feed crawler source.
-
In the left pane, click Document Extractors and then in the Document Extractors section, click
Edit. -
To create an extractor, on the Document Extractors page, click
Add Extractor, and then on the Document Extractors page, do the following:-
In the Basepath field, enter the file's basepath that is appended to /upload/. For example, enter files.
-
In the File Name field, enter the filename. For example, enter filename.json.
-
In the File Type drop-down menu, select CSV or JSON.
-
If the file is in GZip format, turn on the isGzip switch.
Uploading a document using SFTP will automatically trigger the source.
-
-
In the Taggers section, click
Add Tagger. Then, in the tag editor, select a tag in the Tag drop-down list. For example, select content.Note -
In the Extraction Type field, select either Base or JavaScript from the extraction logic drop-down list.
-
To use existing column headers from the uploaded document as indexed attribute names, select Base from the drop-down list, enter ID column header in the ID Field field, and enter the separator used in the CSV file in the Separator field.
-
To alter Search 's default extractor logic, select Javascript from the drop-down list, and edit the function in the JS Source window.
RequestResponse// Sample extractor function. Change the function to suit your individual needs function extract(headerSegments, lineSegments) { response = {} for (i = 0; i < lineSegments.length; i++) { response[headerSegments[i]] = lineSegments[i] } return [response] }NoteIf the function name in the Function Name field and the JS Source window do not match, an error will occur when the extractor is initiated.
-
-
In the tag editor, click Save.
-
(Optional) To extract attributes for another tag, click
Add Tagger, and in the Tag drop-down list, click a tag and repeat steps 6 through 10. -
(Optional) To add another document extractor, repeat steps 4 to 6.
-
On the Document Extractors page, click Save.
Add document and column transformers
When you configure a feed crawler, Search uses transformers to index structured documents like CSV and JSON. Additional transformers can be added and configured to further alter data before indexing.
To add document and column transformers:
-
In the left pane, click Transformers and then, in the Transformers section, click
Edit. -
On the Transformers page, in the left pane, click Document Transformers.
-
In the Document Transformers pane, add a document transformer.
-
To add another document transformer, repeat step 3.
-
In the left pane, next to Column Transformers, click
Column Transformer. -
In the Column Transformers pane, add a column transformer.
-
To add another column transformer, repeat step 6.
-
Click Save.
Set incremental updates
Uploading a file via SFTP will trigger the source, but you can also make updates to your source via API updates like Patch or Puts. To update the source with API requests, enable incremental updates.
To enable incremental updates to your source:
-
On the menu bar, click Sources and then select your feed crawler source.
-
In the Source Settings menu, click Incremental Updates.
-
Click the Enable Incremental Updates switch to enable the feature, then click Save.
Publish the source
You must publish the source to start the first scan and index.
To publish the source:
-
To open the Publish Source dialog, in the upper right corner of the source page, click Publish.