Sitecore Experience Platform

Data extraction

Data extraction is a function of the xConnect Client API that allows all or a subset of contacts and interactions to be exported for use in a third party application. The process of rebuilding the reporting database (also known as history aggregation) uses the data extraction feature.

Data extraction in a multi-shard environment

Data extraction uses a round-robin strategy to call shards.

  • Shards are requested one by one.

  • If a shard does not have any relevant data, it is skipped.

The following example demonstrates the process of data extraction in an environment with three shards, and a requested batch size of 1000.

  • Before data extraction begins, data is distributed across the shards as follows:

    de-shard-11.png
  • First read cursor operation (starting at Shard 1): Shard 1 (1000) = 1000 records returned:

    de-shard-21.png
  • Second read cursor operation (start at Shard 2): Shard 2 (1000) = 1000 records returned:

    de-shard-31.png
  • Third read cursor operation (start at Shard 3): Shard 3 (1000) = 1000 records returned:

    de-shard-41.png
  • Fourth read cursor operation (start at Shard 1): Shard 1 (500) + Shard 2 (500) = 1000 records returned:

    de-shard-51.png
  • Fifth read cursor operation (start at Shard 2): Shard 2 (500) + Shard 3 (500) = 1000 records returned:

    de-shard-61.png

At this point, data extraction is complete.