Batch aggregation

Current version: 9.3

In the Sitecore Experience Database (xDB), the aggregation process groups and reduces live or historical data from the collection database so that it can be used by the reporting database and Sitecore reporting applications. In a scalable xDB architecture, aggregation is usually performed on one or more dedicated Processing servers.

When you configure a processing server for aggregation, you can specify the number of agents or threads that you want to run concurrently. Batch aggregation enables you to group interactions together into batches to improve the performance and throughput of the aggregation process.

Batch processing:

  • Makes more optimal use of SQL Server resources.

  • Processes more interactions in fewer SQL Server transactions.

  • Can improve the performance of your existing aggregation framework.

  • Can reduce network traffic.

  • Means that fewer database input/output operations are required to process interactions.

In xDB, batch aggregation comes as part of the standard Sitecore installation. The default number of interactions that you can process in a single batch has been set so that, for each transaction, the cost and execution time per row is low. However, solutions can vary, and you may need configure the batch aggregation settings to suit your own requirements.

You can change the default number of interactions that are processed in each batch in the MaximumBatchSize setting, and you can apply this setting separately for live or history collections.

Note

Do not make changes directly to the configuration files, but instead create your own custom configuration patch file that performs the required changes during run time.

Batch aggregation components

When you use batch aggregation, there are several components containing settings that you can change to improve the performance of your solution.

The batch aggregation agent

The batch aggregation agent is a background service that you can schedule to run at regular intervals to process live interactions. Each time it runs, it gathers a batch of interactions from the collection database and runs them through the batch aggregator. When the aggregator has finished, it marks each interaction that it has processed as complete and reschedules any interactions that have failed.

You can configure the batch aggregation agent using the Sitecore.Analytics.Processing.Aggregation.Services.config file.

The following example shows the default batch aggregation configuration:

RequestResponse
<aggregator type="Sitecore.Analytics.Aggregation.InteractionAggregationAgent, Sitecore.Analytics.Aggregation">
                <param desc="xdbContextFactory" type="Sitecore.Analytics.Aggregation.XConnect.DefaultXdbContextFactory, Sitecore.Analytics.Aggregation" />
                <param desc="context" ref="aggregation/aggregatorContexts/interaction/live" />
                <param desc="aggregator" type="Sitecore.Analytics.Aggregation.BatchOptimizedInteractionBatchAggregator, Sitecore.Analytics.Aggregation" singleInstance="true" resolve="true">
                    <MultiplexingTimeout>0.00:00:01</MultiplexingTimeout>
                </param>
                <param desc="dateTimeStrategy" ref="aggregation/dateTimePrecisionStrategy" />
                <param desc="maximumBatchSize" type="Sitecore.Analytics.Core.ConfigurationHelper, Sitecore.Analytics.Core" factoryMethod="ToShort" arg0="64" />
            </aggregator>

You can change the following settings in the configuration file for the batch aggregation agent:

Configuration node

Description

Context

Specify the path or location of the data that you want to aggregate and the location where you want the results saved.

Aggregator

Specify the batch aggregator that you want to use to process interactions.

MaximumBatchSize

Specify the maximum number of interactions to include in a single batch.

The batch aggregator

The batch aggregator takes one or more interactions at a time, runs the aggregation pipeline for that batch or for one interaction at a time depending on the processor type, and combines the aggregated data into a larger data set. The combined data set is then saved back to the reporting database.

Note

Sitecore 9.3 and later supports two types of interaction aggregation processors: the classic processor that handles one interaction at a time and the batch processor that iterates over the entire batch of interactions. Although classic processors are backwards compatible, we recommend that you upgrade to batch processors.

The multiplexer

The multiplexer reduces the number of requests made to the reporting database by combining individual aggregation threads into a single batch or data set, which can then be saved more efficiently to the reporting database. This can significantly reduce the amount of traffic sent across the network.

The MultiplexingTimeout configuration setting enables you to specify the maximum time that you want the multiplexer to wait for other batch aggregators before saving the data set.

The Microsoft SQL Server reporting storage provider now supports storing batches of aggregation data sets, and its robustness has been improved. It has been optimized to save large data sets in a single transaction while at the same time minimizing the use of system resources.

The history worker

The history worker agent enables you to rebuild the reporting database, and it also comes with support for batch aggregation.

You can configure the history worker agent using the Sitecore.Analytics.Processing.Aggregation.Services.config file.

It contains the same parameters as the batch aggregation live agent: MultiplexingTimeout and MaximumBatchSize.

The following example shows the default configuration for the history worker:

RequestResponse
<historyWorker type="Sitecore.Analytics.Aggregation.Data.Processing.InteractionHistoryWorker, Sitecore.Analytics.Aggregation">
                <param desc="aggregator" type="Sitecore.Analytics.Aggregation.BatchOptimizedInteractionBatchAggregator, Sitecore.Analytics.Aggregation" singleInstance="true" resolve="true">
                    <MultiplexingTimeout>0.00:00:01</MultiplexingTimeout>
                </param>
                <param desc="historyTaskManager" ref="aggregation/historyTaskManager" />
                <param desc="aggregatorContext" ref="aggregation/aggregatorContexts/interaction/history" />
                <param desc="dateTimePrecisionStrategy" ref="aggregation/dateTimePrecisionStrategy" />
                <param desc="maximumBatchSize" type="Sitecore.Analytics.Core.ConfigurationHelper, Sitecore.Analytics.Core" factoryMethod="ToShort" arg0="128" />
</historyWorker>

Processors may execute multiple times for the same interaction

In some scenarios, an interaction may be processed multiple times. Although trail tables ensure that duplicate data is never added to the Reporting database, there is no guarantee that a processor will only be executed once per interaction.

If an agent fails at any point between a batch being checked out and data being flushed to SQL, the progress of the batch is not reported back to the agent and the batch will be processed again.

The same thing can happen during history aggregation if a cursor expires and the other agent picks up the same cursor for processing.

Do you have some feedback for us?

If you have suggestions for improving this article,