Configuring the xConnect Search Indexer

Abstract

An overview of Search Indexer configuration settings, including settings for different search collection providers.

This topic describes the configuration settings that affect the xConnect Search Indexer. The xConnect Search Indexer has six XML configuration files. Going from generic configuration to provider-specific configuration, these are: sc.Xconnect.SearchIndex.xml, sc.Xdb.Collection.IndexerSettings.xml, sc.Xdb.Collection.Data.Sql.xml, sc.Xdb.Collection.IndexWriter.AzureSearch.xml, and sc.Xdb.Collection.IndexWriter.SOLR.xml.

The xConnect Search Indexer is affected by the underlying provider’s change tracking retention period configuration.

In a single machine on-prem deployment, indexer configuration is located under under C:\>Path to xConnect>\App_data\jobs\continuous\IndexWorker\App_data\Config.

Important

The xConnect Search Indexer does not use all of the configuration settings available in the \App_data\Config sub-folder. For example, the indexer does not use the settings in sc.Xdb.Collection.RepositorySettings.xml.

The following settings control how often the indexer checks for changes.

File path: C:\<Path to xConnect>\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\SearchIndexer\sc.XConnect.SearchIndexer.xml

Setting

Description

Frequency

Controls how often the indexer checks for changes.

If the indexer is busy for 0.20 seconds, it waits 0.05 seconds before trying again.

If the indexer is busy for longer than 0.25 seconds, it checks for changes immediately after indexing the previous set.

DelayAfterError

Controls how long the indexer waits before retrying after encountering an error.

DelayAfterRecurringError

Controls how long the indexer waits before retrying after hitting the RecurringErrorThreshold.

If an error occurs, increase this value to reduce number of log entries.

RecurringErrorThreshold

Controls how many times an error can occur before the DelayAfterRecurringError setting takes effect.

The following settings control memory usage during index rebuild.

File path: C:\<Path to xConnect>\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\SearchIndexer\sc.Xdb.Collection.IndexerSettings.xml

Setting

Description

IncomingDataLagOnCompletion

At the end of an index rebuild, the indexer must catch up with incoming data. When the indexer is behind by less than the value specified by IncomingDataLagOnCompletion, the cores are swapped.

ParallelizationDegree

Controls how many parallel streams of data are processed during an index rebuild. Use together with BatchSize to tune memory usage during an index rebuild.

If the indexer is consuming too much memory, reduce this value.

If the indexer is not consuming very much memory, you can try increasing this value.

BatchSize

Controls how many contacts or interactions are loaded at one per parallel stream during an index rebuild. To tune memory usage during an index rebuild, use this together with ParallelizationDegree.

If an index rebuild is consuming too much memory, reduce this value.

If the indexer is not consuming very much memory, you can try increasing this value.

SplitRecordsThreshold

Controls the limit for the number of records loaded into memory (per core) by the indexer at any one time.

If the indexer is using too much memory, decrease this value.

To disable splitting, set the value to 0, a negative value, or remove the element completely.

SyncLatestChangesIntervalSec

Controls how often synchronization between the indexer and the database happens. Measured in seconds. The default value is 3600 seconds. To avoid losing data between syncs, we recommend that this value is always higher than the change tracking retention period.

For more information, read:

The following settings apply if you are using the SQL xDB Collection provider.

File path: C:\<Path to xConnect>\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\Collection\sc.Xdb.Collection.Data.Sql.xml

Setting

Description

NumberOfChangeVersions

Controls the number of transactions (not records) that the change table returns. For example, 500 transactions might include much more than 500 changes.

To prevent the indexer from falling behind and keeping the number of pending changes low, this the default value is high (50000) by default. However, high values increase the indexer’s memory requirements.

If the indexer is consuming too much memory, try decreasing the value of SplitRecordsThreshold. Reducing the NumberOfChangeVersions setting does not work as well, because only a small amount of data (IDs, facet keys, and types of change) is loaded from the change table.

GetChangesCommandTimeoutInSeconds

Controls the timeout for the get changes command. If you experience timeouts during indexing, Increase this value.

CommandTimeoutInSeconds

Controls the timeout for collection provider commands such as GET and SAVE.

AbsoluteExpiration

The SQL provider caches metadata about sharded cluster configuration. This setting controls the expiration of the cache.

The following settings are specific to the Azure Search provider.

File path: C:\&lt;Path to xConnect&gt;\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\SearchIndexer\sc.Xdb.Collection.IndexWriter.AzureSearch.xml

Setting

Description

MaximumUpdateBatchSize

Controls the number of contacts or interactions that are sent in a single post. Cannot be larger than 1000. If large contacts or interactions lead to rejected posts, decrease this value.

ParallelizationDegree

If you use multiple Azure Search partitions, we recommend that you increase this value. Increasing this value causes higher memory consumption.

We recommend that you monitor memory usage of a busy indexer when changing this value. To monitor the memory usage, use these performance counters: IndexWriteAvgTime and IndexWriteAvgBatchSize.

Note

If you remove this setting from the configuration, the provider falls back to the number of cores available on the host machine.

MaximumRetryDelayMilliseconds

Controls the maximum amount of time the provider waits when Azure Search fails temporarily due to load.

If Azure Search consistently returns a 503 error, try increasing this value. Increasing the value might allow the system time to return related errors.

We recommend that you analyze your Azure Search traffic early and frequently.

RetryCount

Controls how many times the provider retries before reporting an error to the indexer. When the provider exceeds this value, it logs an error. Modify together with MaximumRetryDelayMilliseconds.

MaximumWaitTimeoutMilliseconds

Controls the maximum time the indexer waits for changes to be indexed. If the limit is hit, the indexer retries indexing, and adds the following entry to the logs: Waiting documents in index is failed by timeout. Recurring timeout issues indicate that Azure Search is under heavy load. If this happens, consider using Azure Search partitions or increasing the value of the setting.

DataReplicationTimeoutMilliseconds

If you are using Azure Search replicas, this setting controls the amount of time to wait after initially detecting changes. When this wait time has passed, the indexer signals to the xConnect Search service that changes are available to be searched. The delay allows time for data to propagate across replicas. There is currently no way to determine if data is available in all replicas, and it is therefore difficult to tune this setting.

If you are not using replicas, you can set this to 0.

The following settings are specific to the Solr search provider.

File path: C:\&lt;Path to xConnect&gt;\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\SearchIndexer\sc.Xdb.Collection.IndexWriter.SOLR.xml

Setting

Description

MaximumUpdateBatchSize

Controls the number of contacts or interactions in a single post.

MaximumDeleteBatchSize

Controls the number of contact or interaction deletes in a single post. Although deletes usually affect IDs, Solr's OR clause limit might also affect deleting of child documents.

MaximumCommitMilliseconds

Controls how fast data is soft committed in Solr with commitWithin.

ParallelizationDegree

If you are using multiple Solr replicas, consider increasing this value. Increasing this value causes higher memory consumption.

We recommend that you monitor memory usage of a busy indexer when changing this value. To monitor the indexer, use the performance counters IndexWriteAvgTime and IndexWriteAvgBatchSize.

Note

If you remove this setting from the configuration, the provider falls back to the number of cores available on the host machine.

MaximumRetryDelayMilliseconds

Controls the maximum amount of time the provider waits when Solr fails temporarily due to load.

If Solr consistently returns a 503 error, consider increasing this value. Increasing the value might allow the system time to return related errors.

RetryCount

Controls how many times the provider retries before reporting an error to the indexer. When the provider hits this retry count, the indexer logs an error. Modify together with MaximumRetryDelayMilliseconds.

Encoding

Controls the encoding for Solr posts. Only utf-8 has been tested.