Importing a batch file into Sitecore CDP

Abstract

Details the required approach in importing against the Sitecore CDP Batch API (Data model 2.1).

Before you import the batch file, you must ensure that the file meets formatting requirements.

Complete the following steps to upload guests, orders, products and tracking events:

1. Compress the batch file: Apply GZIP compression to the batch import file using either operating system utilities or a programmatic approach. There is a 50MB size limit for uploading batch files.

Tip

If the size of the compressed batch file exceeds the 50MB limit, recompress the files into two or more compressed batch files that do not exceed the 50MB size limit. Then upload the compressed files as separate batches.

2. Produce checksum: Generate a hex-encoded MD5 checksum for the compressed file. You must provide this during the upload process to provide assurance that the integrity of the import file sent is intact.

The following command generates the MD5 hash: \$ md5 import.json.gz

The following is an example of the generated output: MD5 (import.json.gz)= 69b8a56502866f460e5930f7e53d4bf9

3. Generate unique batch identifier: Generate a unique identifier for the batch. This will be used when interacting with the Batch API endpoint and should be unique across all batches. We recommend using a UUID for this purpose. When you append this identifier to the Batch API base endpoint, it forms the unique URL for this batch upload request.

4. Request pre-signed URL: Issue a JSON HTTP PUT request to Sitecore CDP Batch API and allocate a location to upload the batch file. The following table defines the fields in the body of the request:

Attribute

Type

Description

checksum

string

MD5 checksum of the compressed batch file.

size

integer

Size in bytes of the compressed batch file.

The following is an example of a pre-signed URL PUT request:

Authorization: Basic aHR0cHdhdGNoOmY=
Content-Type: application/json
Accept: application/json
PUT https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785

{
“checksum”: “40d9a12f0a3c93c8ed66a3b6f3735790”,
“size”: 3456
}

The following table defines the fields in the body of the PUT response:

Attribute

Description

Type

href

Echo back of URL used in this request.

string

ref

UUID of the batch, forms part of the href.

string

checksum

string

size

integer

location

Contains details on the file upload location.

object

location.href

URL of the file upload location.

string

location.expiry

Date and time that the upload location href expires.

date time

status

Contains the status of the file upload.

object

status.code

Description of the current batch file processing status. Potential values; uploading: initial state while awaiting client upload of the import, fileprocessing: import file has been uploaded and processing has started, success: import file has been successfully processed, corrupted: import file did not match the checksum*, error: import has failed, see the log file.

string

status.log

Optional, included if the batch upload request contains any errors, this details the log file location containing a description of the errors.

string

summaryStats

Contains a summary of statistics on the file upload.

object

summaryStats.timeToProcessMillis

The amount of time, in milliseconds, that the batch import job took to run.

integer

summaryStats.totalCount

The total number of records read from the import file by the batch import job.

integer

summaryStats.succeededCount

The number of records that the batch import job successfully processed.

integer

summaryStats.failedCount

The number of records that the batch import job failed to process.

integer

createdAt

Date and time the request was created.

date time

modifiedAt

Date and time the request was last updated.

date time

The following is an example of the PUT response:

{
"href":
"https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785",
"ref" : "3ee694e5-0b77-2d1e-af19-1aa78f500785",
"checksum" : "40d9a12f0a3c93c8ed66a3b6f3735790",
"size" : 3456,
"location" : {
"href" : "https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D",
"expiry" : "2016-02-06T14:41:54.251Z"
},
"status" : {
},
"createdAt" : "2016-02-06T13:41:54.251Z",
"modifiedAt" : "2016-02-06T13:41:54.251Z"
}
5. Base64 encode the MD5 checksum: Base64 encode the MD5 checksum of the batch file to be uploaded, which was produced in Step 2. The hexadecimal value of the checksum should be first converted to bytes, and then the bytes should be converted to Base64.

6. Upload the batch file: Issue a HTTP PUT request to the file upload location. This is the URL provided by the location.hrefs attribute in the PUT response. Include the following additional headers:

Value

x-amz-server-side-encryption

Constant value of AES256.

Content-Md5

Base64 encoded checksum.

The following is an example of a file upload request:

x-amz-server-side-encryption: AES256
Content-Md5: QNmhLwo8k8jtZqO283NXkA==
PUT https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D

<BATCH FILE CONTENT AS BYTES>

Caution

The content-type header must be added to request

The content-type header must be added to the request. If you are using HTTP utility classes in programming languages, the content-type header can automatically be added to the request. This can cause the calculated signature of the upload request to not match the signature of the pre-signed URL and can cause HTTP 401 unauthorized errors.

The following are additional details on the batch import process:

If the size of the import does not match the size specified in the size field in the request, the service returns a HTTP 400 response. If an attempt is made to alter the checksum of size fields after their initial creation, the service returns a HTTP 409 response.

You can import batch files in parallel. If there are any interdependencies between imports, it is best practice to wait until the initial import is completed before starting any subsequent dependent imports. The file upload can stall for a number of reasons, such as network latency. The server timeout interval of the batch upload is set to 60 minutes.