Multiple indexes (sharding)
Index sharding is a process that splits the documents in an index into smaller partitions. These smaller partitions are called shards. The result is that instead of all documents being in one large index, documents are distributed between shards. The sharding process has logic (the "sharding strategy") that decides how the documents are allocated to the shards.
One index satisfies the needs of most Sitecore solutions but multiple indexes offer better scaling when needed.
Sharding and Solr
When you use Solr, Sitecore does not handle the sharding. Instead, the SolrCloud feature of the Solr application handles the sharding.
Solr can automatically assign documents to shards and it has extra features, such as replicated shards. Replicated shards are useful for handling failure and failover scenarios.
The Sitecore implementation of Solr handles a sharded endpoint in the same way it handles an unsharded endpoint. You do not need any extra configuration to work with Solr sharded indexes.
Sitecore does not fully support failover. Specifically, Sitecore (as a Solr client) cannot switch between Solr servers (Solr replicas) if the current server (leader) goes down.
For more information about the configuration of the SolrCloud, go to https://cwiki.apache.org/confluence/display/solr/SolrCloud
Sharding and Lucene
When you use Lucene, the data from each of the three Sitecore databases (master, web, and core) is, by default, stored in a single search index. As your search index grows, you can implement a sharding strategy to store the data from each database in its own separate search index.
You can also shard in other ways. For example, you can have a separate index for the media library.
If you use buckets and have thousands or millions of items, sharding is an approach you can use if you want to continue using Lucene. If your search indexes continue to grow and become too large for this strategy, you should switch to using Solr.
If you use sharding, you must turn off the other Lucene configuration files because leaving these enabled will create redundant indexes.
Configure multiple search indexes
Sitecore provides the following example configuration files that help you create an index for each database:
Sitecore.ContentSearch.Lucene.Indexes.Sharded.Core.config.example
Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config.example
Sitecore.ContentSearch.Lucene.Indexes.Sharded.Web.config.example
These files are stored in the wwwroot\<site name>\App_Config\Include\Examples
folder .
If these configuration files are not sharded enough, you can change the configuration to fit your needs.
Use the following code sample and table to see what you need to add:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<contentSearch>
<configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration,
Sitecore.ContentSearch.LuceneProvider">
<indexes hint="list:AddIndex">
<index id="sitecore_core_index"
type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex,
Sitecore.ContentSearch.LuceneProvider">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/databasePropertyStore"
param1="$(id)" />
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy ref="contentSearch/indexUpdateStrategies/intervalAsyncCore" />
</strategies>
<commitPolicy hint="raw:SetCommitPolicy">
<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy,
Sitecore.ContentSearch" />
</commitPolicy>
<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
<policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor,
Sitecore.ContentSearch" />
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler,
Sitecore.ContentSearch.LuceneProvider">
<Database>core</Database>
<Root>/sitecore</Root>
</crawler>
</locations>
</index>
</indexes>
</configuration>
</contentSearch>
</sitecore>
</configuration>
Name |
Description |
Example |
---|---|---|
|
Specify the root node of the content tree to be included in the index. |
|
|
Name of the search index. |
|
|
Database name. |
|
|
List of index strategies to run. |
|
|
Controls when the index commits what it has in memory or in temporary files to disk. This can be time based or document count based. |
|
|
The class that executes the commit. |
|
Index context switcher
If you use sharding, Sitecore uses the <Root>
element in relation to the Context.Item
to determine which index to use. This index switching is automatic.
The more specific your <Root>
is, the higher it needs to be listed in the configuration file. The index context switcher uses the indexes in the order that they are listed.
For example, if you have an index <Root>
element of /sitecore/content/Home
, it should be located below the index for a <Root>
element of /sitecore/content/Home/Flights
:
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<contentSearch>
<configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration,
Sitecore.ContentSearch.LuceneProvider">
<indexes hint="list:AddIndex">
<index id="sitecore_core_index"
type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex,
Sitecore.ContentSearch.LuceneProvider">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/databasePropertyStore"
param1="$(id)" />
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy ref="contentSearch/indexUpdateStrategies/intervalAsyncCore" />
</strategies>
<commitPolicy hint="raw:SetCommitPolicy">
<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy,
Sitecore.ContentSearch" />
</commitPolicy>
<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
<policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor,
Sitecore.ContentSearch" />
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler,
Sitecore.ContentSearch.LuceneProvider">
<Database>core</Database>
<Root>/sitecore</Root>
</crawler>
</locations>
</index>
</indexes>
</configuration>
</contentSearch>
</sitecore>
</configuration>
Default sharding strategy
Sitecore provides a default sharding strategy called the LucenePartitionShardingStrategy
. This strategy takes a document and calculates a hash of the ID to determine which shard to put it into. This hashing is very fast and does not rely on any shared state or ID generation. This approach does not give you a completely even distribution (for example, 100 documents are not split 50/50) but it improves performance considerably.
This strategy only has one option: the shardDistribution
parameter. You must set this parameter to be a factor of 2 (2, 4, 8, 16, …) and this specifies how many shards the index is split into.
Create a new sharding strategy
If the default strategy is not what you need, you can implement your own strategy. You do this by using the Sitecore.ContentSearch.Sharding.IShardingStrategy
interface, and passing the implementation into the index.
You should rebuild your index after applying a strategy. It is not essential, but it will give the index a more even distribution of documents.