Data lake export service
Sitecore CDP securely stores your organization's data in an Amazon Simple Storage Service (Amazon S3) bucket. You can request access to your data from your Sitecore representative.
After you get access, the Sitecore CDP data lake export service will run daily. Every day, all your organization's data will be exported into an export folder in the Sitecore CDP Amazon S3 bucket. You can interact with the export folder in the following ways:
-
Download it locally.
-
Copy it to another Amazon S3 bucket of your choice.
-
Perform any Amazon S3 action on it that starts with Get or List.
Accessing your data lets you:
-
Interrogate your organization's data in a flexible, secure way.
-
Load the data into your organization's dashboards.
-
Combine disparate datasets and analyze them with Sitecore CDP data to produce useful insights.
-
Build and train analytical models.
Contents of the exported data
The export contains your organization's entire tenant history, including all the data sent to Sitecore CDP using Sitecore CDP APIs up until midnight of the previous day.
Your organization's entire tenant history, including all the data that you send to Sitecore CDP using Sitecore CDP APIs, is included in the exported data up until midnight of the previous day. For example, an export folder with a datestamp of January 27, 2024, contains a snapshot of data taken at midnight on January 26, 2024.
The exported files have the Apache Parquet file format. Parquet is an open source format that stores nested data structures in a flat columnar format. Parquet is more efficient in terms of storage and performance than storing data in rows, and it's supported by most modern databases.
Your data is partitioned into the following entities (see also the entity relationship diagram):
Every time the data lake export service runs, it fully rebuilds historic data from the guests
, orders
, order_items
, and experience_definitions
entities and includes it in the export.
For the events
and sessions
entities, the data lake export service rebuilds data for the last three days: yesterday, the day before yesterday, and the day before that. Event and session data prior to this is not rebuilt, but is still included in the exported files.
Export frequency
The data lake export service runs daily. For more information about your organization's schedule, contact your Sitecore representative.
Use a polling approach to know when the daily Sitecore CDP data lake export service has finished running. When the service finishes running, a _SUCCESS
file is created in the export folder. AWS Lambda and AWS step functions can execute code and access the Amazon S3 bucket to check whether the _SUCCESS
file is present.