Platform Administration and Architecture

Processing and aggregation

When Sitecore collects customer data and interactions through xConnect, the Sitecore Experience Database (xDB) stores the information. However, as it is being collected, the xDB also processes the data – in near real-time – to use in analytics aggregation and to trigger marketing automation.

The xDB processes data as it is submitted in real time or on demand. Processing can continually aggregate the collected data and make it available for actionable insights or reporting through the Experience Analytics, or to external business intelligence tools.

The xConnect Collection role exposes a single point where internal and external trusted systems systems can plug in and react to data being collected or updated about a contact. For example, to ensure that your Sitecore solution is GDPR or privacy compliant systems can execute the right to be forgotten operation and any plugin in xConnect can respond to this and notify surrounding systems to forget the contact.Execute right to be forgottenExecute right to be forgotten

processing.1.png

The concept of plug-ins is also used for aggregating the data as it is being collected. This is called live aggregation.

For example, when a session on the Content Delivery role ends, it submits an interaction to the xConnect Collection role and the live aggregation plug-in in xConnect reacts.

This plug-in saves a record in the xDB Processing Pools database and relays information to the xDB Processing application role about how to handle the new interaction.

processing.2.png

The xDB Processing application role continuously polls the xDB Processing Pools database and will pull the recently added aggregation task and start the aggregation process.

During the processing, the xDB Processing role will pull the new interaction from the xConnect Collection role – and can pull additional data needed in the aggregation from other sources, for example the Reference Data service.

Finally once the aggregation is done, the resulting data is stored in the xDB Reporting database.

After you deploy a new Sitecore version or if you extend your solution with new reporting dimensions or datasets, you may need to reprocess all of the interactions in the xDB. This process is called historical aggregation.

To enable historical aggregation, you must set up an additional secondary xDB Reporting database.

processing.3.png

When you attach a secondary xDB Reporting database to the xDB Processing role, both the primary and secondary xDB Reporting database will store all live aggregation data.

Note

You should not add a secondary xDB Reporting database unless you plan to run historical re-aggregation, as it requires the system to write to both primary and secondary xDB Reporting database and increases the overall load on the system.

An administrator can begin the historical re-aggregation process through the Sitecore administrative interface. The Content Management role then triggers the processing operation on the xDB Processing role.

processing.4.png

This initially erases all the data in the secondary xDB Reporting database and creates a historical reaggregation task in the xDB Processing Tasks database. Subsequently, the xDB Processing role extracts data to get an enumerator for the entire set of interactions in the xDB Collection database.

processing.5.png

Processing all historic interactions in the Sitecore Experience Database can be a very heavy on resources. To avoid system strain, you can scale the xDB Processing role horizontally to split the aggregation task across multiple servers and threads.

The initiating xDB Processing role splits the dataset in the xDB Processing Tasks database and assigns a part of the dataset - also called a cursor - to each processing worker or thread on each xDB Processing role.

processing.6.png

The aggregation process then runs on all processing workers using the saved cursors in the xDB Processing Tasks database. Each processing worker retrieves interactions data from the xConnect Collection role and pulls additional data needed in the aggregation from other sources, for example the Reference Data service.

If the aggregation of a single interaction fails, it is added to the xDB Processing Pools database and the aggregation is retried at the end of the historical re-aggregation process.

If any new interactions come in through the xConnect Collection role, the live aggregation process writes them to both the primary and secondary databases. This ensures that the new interactions that are submitted to xConnect during the historical aggregation process are not lost.

processing.7.png

When aggregation completes, the secondary xDB Reporting database contains the newly aggregated data. However, the primary xDB Reporting database always serves the reporting, insights and analytics applications, so the secondary and primary xDB Reporting database need to be switched to ensure the new data is live. System administrators can manually switch the databases, typically by updating the connection strings.

The last type of processing handled by the xDB Processing role is distributed processing.

Distributed processing allows systems to schedule xDB data processing tasks and distribute them to other databases. For example, the Path Analyzer uses distributed processing operations to process interactions and store aggregated traffic maps.

You can queue distributed processing operations using the xDB Processing API, for example through a scheduled task that runs on the xDB Processing role.

processing.8.png

When a distributed processing task is triggered, the xDB Processing Tasks database creates a task record. The xDB Processing role performs data extraction to get an enumerator for the desired set of entities, such as interactions or contacts, in the xDB Collection database. You can limit the data set based on time range. The dataset is then split up into parts - or cursors - in the xDB Processing Tasks database. There is one cursor per thread for each xDB Processing role.

processing.9.png

Custom logic runs on all xDB Processing roles and processes the entities. During processing, the custom processing logic continually sends the processed data for storage or handling in other systems. For example, the specific aggregated data for the Path Analyzer is stored in separate tables in the xDB Reporting database.

Privacy and security

Refer to the Architecture and Roles documentation for privacy and security considerations for each role on the processing and aggregation data flow:

Related reading

Related data flows:

Other topics: