Skip to main content

Index update strategies

Abstract

Describes the options you have for updating search indexes.

You use index update strategies to maintain indexes. You can configure each index with a unique set of index update strategies. We recommend that you do not specify more than three update strategies per index for performance reasons. Indexes can still also be updated manually, too, for example from the Indexing Manager, or from custom code.

Sitecore provides a varied set of index update strategies, and you can extend this set with more strategies. All the strategies that are delivered with Sitecore are defined under the following node in the Sitecore.ContentSearch configuration files:

sitecore/contentSearch/indexConfigurations/indexUpdateStrategies
<manual type="Sitecore.ContentSearch.Maintenance.Strategies.ManualStrategy,Sitecore.ContentSearch" />

Sitecore comes with the following strategies:

Note

Some of these strategies use the CrawlingLog file. To enable messages in the CrawlingLog file, you must use a patch file to enable the DEBUG level in the Sitecore.Diagnostics.Crawling logger. For example:

<logger name="Sitecore.Diagnostics.Crawling" additivity="false">
    <level value="DEBUG"/>
    <appender-ref ref="CrawlingLogFileAppender"/>
</logger>

This strategy is defined in the following way in the configuration file:

<rebuildAfterFullPublish type="Sitecore.ContentSearch.Maintenance.Strategies.RebuildAfterFullPublishStrategy,Sitecore.ContentSearch" />

During initialization, this strategy subscribes to the OnFullPublishEnd event and it triggers a full index rebuild.

In a distributed environment, the index rebuild is triggered on all remote servers where this strategy is configured. In this case, you must enable the event queue.

In environments where a full publish is required to run regularly, we recommend that you do not trigger incremental index rebuilds because this uses a lot of resources. Instead, this strategy triggers a full index rebuild when a full publish process has completed.

When you attach this strategy to an index, you see the following message in the CrawlingLog file when it is initialized:

Initializing RebuildAfterFullPublishStrategy for index '<index_name>'

When this strategy is triggered, you see the following message in the CrawlingLog file:

RebuildAfterFullPublishStrategy triggered on index '<index_name>'

Attaching the RebuildAfterFullPublish strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.
          SolrIndex, Sitecore.ContentSearch.SolrProvider">
   <param desc="name">$(id)</param>
   <param desc="folder">$(id)</param>
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/rebuildAfterFullPublish" />
   </strategies>
   <Analyzer ref="search/analyzer" />

Best practice

You must not combine this strategy with the Synchronous Strategy, but you can combine it with any of the other strategies.

Because this strategy causes a full index rebuild, you must combine it with SwitchOnRebuildSolrSearchIndex.

This strategy is defined in the following way in the configuration file:

<onPublishEndAsync type="Sitecore.ContentSearch.Maintenance.Strategies.
           OnPublishEndAsynchronousStrategy, Sitecore.ContentSearch">
          <param desc="database">web</param>
          <CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsync>

During initialization, this strategy subscribes to the OnPublishEnd event and triggers an incremental index rebuild.

If you have separate CM and CD servers, this event is triggered via the EventQueue object. This means that you must enable the EventQueue object for this strategy to work in this kind of environment.

Note

There is an additional database parameter that is passed to the constructor of the OnPublishEndAsynchronousStrategy class. This parameter defines the database to look up the item changes from.

When you attach this strategy to an index and it is initialized, you see the following message in the CrawlingLog file:

Initializing OnPublishEndAsynchronousStrategy for index '<index_name>'.

When this strategy is triggered, you see the following message in the CrawlingLog file:

"<index_name> OnPublishEndAsynchronousStrategy executing."

Processing

This strategy uses the EventQueue object from the database it was initialized with:

<param desc="database">web</param>

This means that this strategy depends on a number of things:

  • This database must be specified in the <databases /> section of the configuration file.

  • The EnableEventQueues setting must be true.

  • The EventQueue table within the preconfigured database must have entries that are dated later than the last update timestamp of the index.

If the number of unprocessed events related to item changes exceeds a threshold, then a full index rebuild is triggered instead of an incremental update.

Events related to item changes are:

  • RemovedVersionRemoteEvent

  • SavedItemRemoteEvent

  • DeletedItemRemoteEvent

  • MovedItemRemoteEvent

  • AddedVersionRemoteEvent

  • CopiedItemRemoteEvent

  • RestoreItemCompletedEvent

Unprocessed events are item change events that have a stamp value higher than the value of the LAST_UPDATED_TIMESTAMP property. This property is stored in the system properties table that is determined by the defaultStore attribute of the PropertyStoreProvider setting. The property is unique per search index and instance, for example: CORE_SITECORE_MASTER_INDEX_MyMachineName-MySite.local_LAST_UPDATED_TIMESTAMP.

The threshold value is set by the ContentSearch.FullRebuildItemCountThreshold setting and is shared by all index update strategies. The setting is hidden: it is not available in the configuration, but you can add it manually. The default value of the setting is 100,000.

The optimal value for the threshold depends on:

  • The total number of documents in a search index. For example, if a search index contains 50,000 documents then the threshold value can be set to 25,000.

  • The ratio between add and remove operations (an update is equivalent to remove and then add). You can lower the threshold if remove operations are more frequent than add or update operations.

If there are many operations, consider whether it is faster to build the index from scratch (using add operations) than to process all the delete, add, and update operations separately.

The check for the threshold value can be disabled for each strategy: <CheckForThreshold>false<CheckForThreshold>. If you set this setting to true, we recommend that you also use the SwitchOnRebuildSolrSearchIndex implementation for any index that uses this strategy.

The value of the ContentSearch.FullRebuildItemCountThreshold setting has a default of 100,000.

Attaching the OnPublishEndAsync strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, 
         Sitecore.ContentSearch.SolrProvider">
    <param desc="name">$(id)</param>
    <param desc="core">$(id)</param>
    <param desc="propertyStore"
        ref="contentSearch/indexConfigurations/databasePropertyStore"
        param1="$(id)" />
   <strategies hint="list:AddStrategy">
      <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
   </strategies>

Best practice

Do not combine this strategy with any of these strategies:

  • Synchronous

  • IntervalAsynchronous

  • OnPublishEndAsyncSingleInstance

You can combine it with these strategies:

  • RebuildAfterFullPublish

  • RemoteRebuild

You can use this strategy for multiserver/multi-instance environments, where you have already enabled the EventQueue.

This strategy is defined in the following way in the configuration file:

<onPublishEndAsyncSingleInstance type="Sitecore.ContentSearch.Maintenance.Strategies.OnPublishEndAsynchronousSingleInstanceStrategy, Sitecore.ContentSearch" singleInstance="true">
  <param desc="database">web</param>
  <CheckForThreshold>true</CheckForThreshold>
</onPublishEndAsyncSingleInstance>

Processing

Like the OnPublishEndAsync strategy, this strategy is triggered by the OnPublishEnd event. It launches an incremental update operation for item modifications as determined by the event queue.

The key difference between these strategies is that when the OnPublishEndAsyncSingleInstance strategy is triggered, it retrieves the event records only once and reuses them for all indexes it is attached to, while the OnPublishEndAsync strategy retrieves the records individually for each index.

This different behavior reduces the load on the database and decreases the resource consumption by the Sitecore instance.

Attaching the OnPublishEndAsyncSingleInstance strategy to an index

You attach this strategy to an index in the following way:

<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
    ...
   <strategies hint="list:AddStrategy">
        <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsyncSingleInstance" />
    </strategies>
        ...
</index>

Best practice

We recommend that you use the onPublishEndAsyncSingleInstance strategy if you have multiple indexes that cover the same database with published content.

Do not combine this strategy with any of these strategies:

  • Synchronous

  • IntervalAsynchronous

  • OnPublishEndAsync  

You can combine it with these strategies:

  • RebuildAfterFullPublish

  • RemoteRebuild

This strategy is defined in the following way in the configuration file:

<intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.
          IntervalAsynchronousStrategy, Sitecore.ContentSearch">
   <param desc="database">master</param>
   <param desc="interval">00:00:10</param>
   <CheckForThreshold>true</CheckForThreshold>
</intervalAsyncMaster>
  • You specify the database to look up item changes for the processing from with the database parameter.

  • You specify the frequency of the strategy trigger with the interval parameter.

When you attach this strategy to an index and it is initialized, you can see the following message in the CrawlingLog file:

Initializing IntervalAsynchronousUpdateStrategy for index '<index_name>'.

When this strategy is triggered, you can see the following message in the CrawlingLog file:

IntervalAsynchronousUpdateStrategy triggered on index '<index_name>'

Processing

This strategy is triggered by a time interval and not the OnPublishEnd event. It uses the EventQueue table of the source database. The source database is specified by the database parameter of the strategy. For example:

<param desc="database">web</param>

The preconditions for using this strategy are:

  • The EnableEventQueues setting must be true.

  • The referenced database must be defined in the <databases> configuration section.

  • The referenced database must match at least one database that is defined in a search index to be crawled.

The strategy uses an internal timer that is initialized with a predefined interval value. The strategy is triggered when the timer fires. In this example, the timer is set to fire every 10 seconds:

<intervalAsync type="Sitecore.ContentSearch.Maintenance.Strategies.
         IntervalAsynchronousStrategy, Sitecore.ContentSearch">
   <param desc="database">web</param>
   <param desc="interval">00:00:10</param>
   <CheckForThreshold>true</CheckForThreshold>
</intervalAsync>

The threshold value is set by the ContentSearch.FullRebuildItemCountThreshold setting and is shared by all index update strategies. The setting is hidden: it is not available in the configuration, but you can add it manually. The default value of the setting is 100,000.

The optimal value for the threshold depends on:

  • The total number of documents in a search index. For example, if a search index contains 50,000 documents then the threshold value can be set to 25,000.

  • The ratio between add and remove operations (an update is equivalent to remove and then add). You can lower the threshold if remove operations are more frequent than add or update operations.

If there are many operations, consider whether it is faster to build the index from scratch (using add operations) than to process all the delete, add, and update operations separately.

The check for the threshold value can be disabled for each strategy: <CheckForThreshold>false<CheckForThreshold>. If you set this setting to true, we recommend that you also use the SwitchOnRebuildSolrSearchIndex implementation for any index that uses this strategy.

The ContentSearch.FullRebuildItemCountThreshold setting is not enabled in the configuration files that Sitecore delivers. It defaults to 100,000.

Attaching the IntervalAsynchronous strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, 
 Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">$(id)</param>
 <param desc="propertyStore"
 ref="contentSearch/indexConfigurations/databasePropertyStore"
 param1="$(id)" />
 <strategies hint="list:AddStrategy">
 <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/intervalAsync" />
 </strategies>

Best practice

Do not combine this strategy with these strategies:

  • SynchronousStrategy

  • OnPublishEndAsync

  • OnPublishEndAsyncSingleInstance

You can combine it with these strategies:

  • RebuildAfterFullPublish

  • RemoteRebuild

We recommend that you use this strategy for the master database indexes and for single-server environments where you want to use as few resources as possible.

This strategy is also useful for less critical indexes that you do not need to update frequently. You can adjust the interval to fit your needs.

This strategy is created for the core and master databases in the setup that Sitecore delivers:

            <intervalAsyncCore type="Sitecore.ContentSearch.Maintenance.Strategies.
                     IntervalAsynchronousStrategy, Sitecore.ContentSearch">
               <param desc="database">core</param>
               <param desc="interval">00:01:00</param>
               <CheckForThreshold>true</CheckForThreshold>
            </intervalAsyncCore>
            <intervalAsyncMaster type="Sitecore.ContentSearch.Maintenance.Strategies.
                     IntervalAsynchronousStrategy, Sitecore.ContentSearch">
               <param desc="database">master</param>
               <param desc="interval">00:00:10</param>
               <CheckForThreshold>true</CheckForThreshold>
            </intervalAsyncMaster>

This strategy is the index update strategy closest to real-time. It is also the most expensive strategy in terms of CPU and I/O.

Before you use this strategy, you must be familiar with the best practices.

You specify this strategy in the following way:

<sync type="Sitecore.ContentSearch.Maintenance.Strategies.SynchronousStrategy, Sitecore.ContentSearch" />

When you attach this strategy to an index and it is initialized, you see the following message in the CrawlingLog file:

Initializing SynchronousStrategy for index '<index_name>'.

When this strategy is triggered, you see this message in the CrawlingLog file:

SynchronousStrategy triggered on index '<index_name>'

Processing

This strategy subscribes to low-level DataEngine events, such as ItemSaved and ItemSavedRemote. When you use it on a single-server instance, it guarantees an index update immediately after an item update.

In a multiserver environment, the strategy uses the EventQueue that broadcasts remote ItemSavedRemote events. When an item is published and the ItemSavedRemote event is raised, the strategy is triggered.

Attaching the Synchronous strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.SolrIndex, 
 Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">$(id)</param>
 <param desc="propertyStore"
 ref="contentSearch/indexConfigurations/databasePropertyStore"
 param1="$(id)" />
 <strategies hint="list:AddStrategy">
 <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/sync" />
 </strategies>

Best practice

Use this strategy if you need immediate index updates and you have a dedicated indexing server infrastructure that has plenty of processing resources. Only use the Synchronous strategy on CM servers for the indexes that process the master database and where the timing of the index update is critical.

If you use this strategy on a CM server where many entries are added and changed, it can degrade system performance severely. In most cases, the IntervalAsyncronous strategy configured for the master database is sufficient.

Any changes that occur in the BulkUpdateContext are not be processed by this strategy and a full index rebuild is required to bring the search index back in sync. If you use BulkUpdateContext on a regular basis, we recommend that you use asynchronous strategies.

You can only combine this strategy with the following strategy:

  • RemoteRebuild

The strategy has these prerequisites:

  • This strategy does not require the EventQueue to be enabled when the strategy is used on the same instance that the item changes occur on. For example, if your solution only has a single CM instance, the Synchronous strategy can be used to process changes in the master database. However, if you have multiple CM instances, the EventQueue must be enabled to share events across the different instances.

This strategy subscribes to the OnIndexingEndedRemote event. This event is triggered when a particular index is rebuilt. The strategy is only activated when a full index rebuild takes place.

You use this mechanism to rebuild remote indexes when you force an index rebuild. You specify this strategy like this:

<remoteRebuild type="Sitecore.ContentSearch.Maintenance.Strategies.
          RemoteRebuildStrategy, Sitecore.ContentSearch" />

Attaching the RemoteRebuild strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, 
 Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">$(id)</param>
 <param desc="propertyStore"
 ref="contentSearch/indexConfigurations/databasePropertyStore"
 param1="$(id)" />
 <strategies hint="list:AddStrategy">
 <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/remoteRebuild" />
 </strategies>

Best practice

You can combine this strategy with any other strategy. You use it in multiserver environments, where each Sitecore instance maintains its own copy of the index. You can then trigger a full rebuild from one CM server, and all remote servers where the index is configured with this strategy will rebuild.

The strategy has these prerequisites:

  • The name of the index on the remote server must be identical to the name of the index that you forced to rebuild.

  • You must enable the EventQueue.

  • The database you assign for system event queue storage (core by default) must be shared between the Sitecore instance where the rebuild takes place and the other instances.

This strategy disables any automatic index updates. When you use this strategy for an index, you must rebuild this index manually.

You specify this strategy like this:

<manual type="Sitecore.ContentSearch.Maintenance.Strategies.ManualStrategy, 
          Sitecore.ContentSearch" />

When you attach this strategy to an index and it is initialized, you see the following message in the CrawlingLog file:

Initializing ManualStrategy for index '<index_name>'.

Index will have to be rebuilt manually

Attaching the Manual strategy to an index

Attach this strategy to an index in the following way:

<index id="sitecore_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, 
 Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">$(id)</param>
 <param desc="propertyStore"
 ref="contentSearch/indexConfigurations/databasePropertyStore"
 param1="$(id)" />
 <strategies hint="list:AddStrategy">
 <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/manual" />
 </strategies>

Best practice

Do not combine this strategy with any other strategy. It is reserved for special situations where you have to outsource the whole indexing process to a dedicated server and you do not want any index updates on other Sitecore instances.