Skip to main content
Users
CloudPortalLogin
  • Powered byPowered by
Developing with Sitecore CDP
Data privacy
Before you start sending data
Integrating with Sitecore CDP
Stream API
Batch API
REST APIs
Data lake export service
  • Sitecore CDP for developers
  • Data lake export service

Data lake export service

Sitecore CDP securely stores your organization's data in an Amazon Simple Storage Service (Amazon S3) bucket. You can request access to your data by creating a support case.

After you get access, the Sitecore CDP data lake export service will run daily. Every day, all your organization's data will be exported into an export folder in the Sitecore CDP Amazon S3 bucket. You can interact with the export folder in the following ways:

  • Download it locally.

  • Copy it to another Amazon S3 bucket of your choice.

  • Perform any Amazon S3 action on it that starts with Get or List.

Accessing your data lets you:

  • Interrogate your organization's data in a flexible, secure way.

  • Load the data into your organization's dashboards.

  • Combine disparate datasets and analyze them with Sitecore CDP data to produce useful insights.

  • Build and train analytical models.

Contents of the exported data

The export contains your organization's entire tenant history, including all the data sent to Sitecore CDP using Sitecore CDP APIs up until midnight of the previous day.

Your organization's entire tenant history, including all the data that you send to Sitecore CDP using Sitecore CDP APIs, is included in the exported data up until midnight of the previous day. For example, an export folder with a datestamp of January 27, 2025, contains a snapshot of data taken at midnight on January 26, 2025.

The exported files have the Apache Parquet file format. Parquet is an open source format that stores nested data structures in a flat columnar format. Parquet is more efficient in terms of storage and performance than storing data in rows, and it's supported by most modern databases.

Your data is partitioned into the following entities (see also the entity relationship diagram):

  • guests

  • orders

  • order_items

  • experience_definitions

  • events

  • sessions

Every time the data lake export service runs, it fully rebuilds historic data from the guests, orders, order_items, and experience_definitions entities and includes it in the export.

For the events and sessions entities, the data lake export service rebuilds data for the last three days: yesterday, the day before yesterday, and the day before that. Event and session data prior to this is not rebuilt, but is still included in the exported files.

Export frequency

The data lake export service runs daily. For more information about your organization's schedule, create a support case.

Tip

Use a polling approach to know when the daily Sitecore CDP data lake export service has finished running. When the service finishes running, a _SUCCESS file is created in the export folder. AWS Lambda and AWS step functions can execute code and access the Amazon S3 bucket to check whether the _SUCCESS file is present.

Do you have some feedback for us?

If you have suggestions for improving this article,

Privacy policySitecore Trust CenterCopyright © 1999-2025 Sitecore