Skip to main content
Users
CloudPortalLogin
  • Powered byPowered by
Developing with Sitecore CDP
Data privacy
Before you start sending data
Integrating with Sitecore CDP
Stream API
Batch API
REST APIs
Data lake export service
  • Sitecore CDP for developers
  • Data lake export service
  • Accessing your data

Accessing your data

This walkthrough describes how to access your organization's data in the Sitecore Amazon S3 bucket using the Amazon Web Services Command Line Interface (AWS CLI).

This walkthrough assumes that you have:

  • An Amazon Web Services (AWS) account with access to the AWS Management Console and permission to create an IAM role.

  • The AWS Command Line Interface (AWS CLI), configured to access your AWS instance using the IAM role.

To prepare to access your data, you first create an IAM role and update its policy. Next, create a support case to request access to your data by authorizing the IAM role. After the IAM role is authorized, you can use the IAM role in the AWS CLI to securely access the data.

This walkthrough describes how to:

  1. Create an IAM role

  2. Configure the IAM role policy

  3. Request access

  4. Understand the exported data

  5. Access your data

Create an IAM role

You can use the AWS Management Console to create an IAM role that will grant you, as the creator, exclusive read access to your organization's data.

To create an IAM role:

  1. In the AWS Management Console, create an IAM role that will be authorized to access your organization's data in the Sitecore Amazon S3 bucket.

  2. Make a note of the IAM role Amazon Resource Name (ARN). Replace <aws_account_id> with your AWS account ID and <role_name_with_path> with a valid path.

    RequestResponse
    arn:aws:iam::<aws_account_id>:role/<role_name_with_path>

    Example:

    RequestResponse
    arn:aws:iam::012345678901:role/sitecore-access-s3-role
Important

The IAM role you created grants exclusive read access only to you, the original user who created it. When requesting access to your organization's data, you must provide the specific ARN associated with this role.

Configure the IAM role policy

After you create the IAM role, you must attach a permission policy to it. The permissions in this policy determine whether your request to access your organization's data is allowed or denied.

To configure the IAM role policy:

  • In the AWS Management Console, in the access management area for the IAM role you created in the previous procedure, create the following inline policy:

    RequestResponse
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "AllowS3Access",
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:ListBucket"
          ],
          "Resource": [
            "arn:aws:s3:::bx-<client_key>-production-<region_code>/*",
            "arn:aws:s3:::bx-<client_key>-production-<region_code>"
          ]
        }
      ]
    }

    Replace the placeholder values with details from your Sitecore CDP instance.

Request access

After you configure the IAM role policy, you must request access to your organization's data by creating a support case to enable the data lake export.

To request access:

  1. Create a support case and provide your IAM role ARN.

  2. Wait for confirmation. You will be notified once Sitecore has enabled the data lake export, granting access to the created IAM role. Only the specific IAM role ARN you provided will be authorized to access your organization's data.

Important

Do not modify the IAM role after the data lake export has been enabled, as any changes will disrupt access. The Sitecore Amazon S3 bucket is strictly configured to recognize only the original IAM role ARN that you provided. Changing the permissions or details of the IAM role will result in conflicts, preventing you from accessing your organization's data.

Understand the exported data

Before accessing your organization's data, it's important to understand where it's stored and what data is included in the export.

Data storage location

After access is granted, the Sitecore CDP data lake export service runs daily, creating a full export of your organization's data. This data exported is stored in a designated folder in the Sitecore Amazon S3 bucket. The export folder follows this format (with placeholder values replaced with details from your Sitecore CDP instance):

RequestResponse
s3://bx-<client_key>-<env>-<region_code>/analytics/bdl/exports/data/

Data redundancy

Sitecore provides a level of designed redundancy in the export folder in case of failures or errors in the export process. To ensure data reliability, Sitecore stores the last three days of full data lake exports in the designated folder.

For example, on May 5th 2024, you'll find three subfolders in the export folder, each labeled with a date following the YYYY-MM-DD ISO 8601 format and contains the following data:

  • 2024-05-04 - the entire contents of the data lake, including new and updated data from May 3rd.

  • 2024-05-03 - the entire contents of the data lake, including new and updated data from May 2nd.

  • 2024-05-02 - the entire contents of the data lake.

If a failure occurs on May 6th, the previous three days' worth of data will still be available, but the May 5th data will not be there. When the service resumes on May 7th, the folder structure will look like this:

  • 2024-05-06 - the entire contents of the data lake, including new and updated data from May 4th and May 5th.

  • 2024-05-04 - the entire contents of the data lake, including new and updated data from May 3rd.

  • 2024-05-03 - the entire contents of the data lake.

Data partitioning

The export folder is further partitioned into subfolders for different Sitecore CDP entities. You'll find separate folders for events, guests, and sessions.

The events subfolder contains the largest dataset in the data lake, and downloading this data daily can be inefficient. To optimize this process, Sitecore uses date-based partitioning for the events data. Each day's events are stored in a separate folder labeled meta_created_at_date=YYYY-MM-DD. For example: meta_created_at_date=2024-05-04, meta_created_at_date=2024-05-03, and so on.

The events data is additive, meaning previous events are not deleted. As a result, the events subfolder in each daily export contains partitioned subfolders for each day that the data lake export has been active. This structure makes it unnecessary to download the entire events subfolder every day. Instead, you only need to pull the latest day's partitioned events data and add it to your existing dataset to keep it updated.

The sessions subfolder is partitioned by date in the the same way as the events subfolder, and it is recommended to follow the same approach.

The guests subfolder is partitioned by guest type: CUSTOMER, VISITOR, RETIRED. Since this dataset is constantly changing, it is recommended that you take a full pull of this folder each day.

Access your data

After the IAM role is authorized, only you, the creator of the IAM role, will be able to access your organization's data in the Sitecore Amazon S3 bucket.

Important

Only the specific IAM role ARN you provided is authorized to access your organization's data. No other users can access your data using this IAM role. If you grant the role to a different user than originally specified, and they attempt to carry out the export process, they will be denied for security reasons.

If you need to change the assigned user after the data lake export has been set up, you must create a support case to request this update. This will reset the process, so make sure you have the correct user, role, and ARN before requesting access.

This section describes how to use aws s3 cp (or copy) commands in the AWS Command Line Interface (AWS CLI) to download your data or copy it to another Amazon S3 bucket of your choice. Alternatively, you can also perform any Amazon S3 action that starts with Get or List to access data.

To access your data:

  1. Make sure you have AWS CLI installed and configured to access your AWS instance using the IAM role.

  2. Determine which folders and subfolders you want to copy from the export folder. You can select different sets of data depending on your specific requirements.

  3. Open a terminal or command prompt and run the following aws s3 cp commands to copy your organization's data. Make sure to replace the placeholder values with details from your Sitecore CDP instance:

    Example 61. Frequently used aws s3 cp commands you can enter in your terminal

    Download your organization's data to your local machine. This includes the last three days of full data lake exports.

    RequestResponse
    aws s3 cp s3://bx-<client_key>-<env>-<region_code>/analytics/bdl/exports/data . --recursive

    Download a full data lake export for a specific date to your local machine.

    RequestResponse
    aws s3 cp s3://bx-<client_key>-<env>-<region_code>/analytics/bdl/exports/data/<date> . --recursive

    Copy all your organization's data to another Amazon S3 bucket of your choice. This includes the last three days of full data lake exports.

    RequestResponse
    aws s3 cp s3://bx-<client_key>-<env>-<region_code>/analytics/bdl/exports/data <destination> --recursive

    Copy a full data lake export for a specific date to another Amazon S3 bucket of your choice:

    RequestResponse
    aws s3 cp s3://bx-<client_key>-<env>-<region_code>/analytics/bdl/exports/data/<date> <destination> --recursive


After you run one of these commands, your organization's data is either downloaded locally or copied to another Amazon S3 bucket of your choice.

Reference for placeholder values

In the example commands, replace the placeholder values with the required details from your Sitecore CDP instance and with export details depending on your specific needs.

Attribute

Type

Description

Example

<client_key>

string

Your Sitecore CDP client key from your Sitecore CDP instance. This is your organization's unique and public identifier.

To find your client key, in Sitecore CDP, on the navigation pane, click > API access > Client key.

ZpHxO9WvLOfQRVPlvo0BqB8YjGYuFfNe

<env>

string

The deployment environment. Typically set to production unless you are informed otherwise.

production

<region_code>

string

The region code corresponding to your Sitecore CDP instance's environment.

To find the region code, in Sitecore CDP, on the navigation pane, click > Company information > Environment.

Must be one of:

  • ap-southeast-2

  • eu-west-1

  • ap-northeast-1

  • us-east-1

<date>

string

A specific date in the past or today's date to copy a full data lake export for that date.

Format: YYYY-MM-DD

2025-01-27

<destination>

string

Your local machine denoted by a period (.) or another Amazon S3 bucket where you want to copy the data.

s3://my-bucket/myData

Do you have some feedback for us?

If you have suggestions for improving this article,

Privacy policySitecore Trust CenterCopyright © 1999-2025 Sitecore