Import a batch file into Sitecore CDP
Describes how to import a file into Sitecore CDP, using the Sitecore CDP Batch API (Data model 2.1).
You can upload guests, orders, products and tracking events to Sitecore CDP. Before you import the batch file, you must ensure that the file meets formatting requirements.
To import a batch file into Sitecore CDP:
Compress the batch file you want to import. Apply GZIP compression to the batch import file using either operating system utilities or a programmatic approach. There is a 50MB size limit for uploading batch files.
Tip
If the size of the compressed batch file exceeds the 50MB limit, recompress the files into two or more compressed batch files that do not exceed the 50MB size limit. Then upload the compressed files as separate batches.
Generate a hex-encoded MD5 checksum for the compressed file. You must provide this during the upload process to provide assurance that the integrity of the import file sent is intact.
The following command generates the MD5 hash:
$ md5 import.json.gz
The following is an example of the generated output:
MD5 (import.json.gz)= 69b8a56502866f460e5930f7e53d4bf9
If you are importing the file using a PUT request, generate a unique identifier for the batch. This is used when interacting with the Batch API endpoint and must be unique across all batches.
Important
The identifier must be in UUID format. When you append this identifier to the Batch API endpoint, it forms the unique URL for this batch upload request.
Issue a JSON HTTP PUT request to the Sitecore CDP Batch API to allocate a location to which you subsequently upload the batch file. The following table defines the fields in the body of the request:
Attribute
Type
Description
checksum
string
MD5 checksum
of the compressed batch file, produced in step 2.size
integer
Size in bytes of the compressed batch file.
The following is an example of a pre-signed URL
PUT
request:Authorization: Basic aHR0cHdhdGNoOmY= Content-Type: application/json Accept: application/json PUT https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785 { “checksum”: “40d9a12f0a3c93c8ed66a3b6f3735790”, “size”: 3456 }
The following fields can be returned in the body of the
PUT
response:Attribute
Description
Type
href
Echo back of the URL used in this request.
string
ref
UUID of the batch, forms part of the href.
string
checksum
Echo back of checksum of the uploading file.
string
size
Echo back of the size of the uploading file.
integer
location
Details of the file upload location.
object
location.href
The allocated URL to which you upload the batch file. This URL is valid for one hour.
string
location.expiry
Date and time that the upload location href expires.
date time
status
The status of the file upload.
object
status.code
The batch file processing status. Possible values:
uploading
- the import process was initialized by the client.fileprocessing
- import file was uploaded and processing has started.success
- import file was successfully processed.corrupted
- import file did not match the checksum.error
- import has failed, see the log file. The location of the log file is the value of thestatus.log
parameter.
string
status.log
If the batch upload request contains errors, the location of the log file that includes details of the errors.
string
summaryStats
Contains a summary of statistics on the file upload.
object
summaryStats.timeToProcessMillis
The amount of time, in milliseconds, that the batch import job took to run.
integer
summaryStats.totalCount
The total number of records read from the import file by the batch import job.
integer
summaryStats.succeededCount
The number of records that the batch import job successfully processed.
integer
summaryStats.failedCount
The number of records that the batch import job failed to process.
integer
createdAt
Date and time the request was created.
date time
modifiedAt
Date and time the request was last updated.
date time
The following is an example of the
PUT
response:{ "href": "https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785", "ref" : "3ee694e5-0b77-2d1e-af19-1aa78f500785", "checksum" : "40d9a12f0a3c93c8ed66a3b6f3735790", "size" : 3456, "location" : { "href" : "https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D", "expiry" : "2016-02-06T14:41:54.251Z" }, "status" : { "code" : "uploading" }, "createdAt" : "2016-02-06T13:41:54.251Z", "modifiedAt" : "2016-02-06T13:41:54.251Z" }
Apply Base64 encoding to the MD5 checksum of the batch file to be uploaded, which was produced in Step 2. The hexadecimal value of the checksum must be first converted to bytes, and then the bytes must be converted to Base64.
To upload the batch file, issue a HTTP
PUT
request to the file upload location. This is the URL provided *in thelocation.href
attribute in thePUT
response. Include the following additional headers:Header Name
Value
x-amz-server-side-encryption
Constant value of AES256.
Content-Md5
Base64 encoded checksum.
The following is an example of a file upload request:
x-amz-server-side-encryption: AES256 Content-Md5: QNmhLwo8k8jtZqO283NXkA== PUT https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D <BATCH FILE CONTENT AS BYTES>
Caution
The content-type
header must be added to the request. If you are using HTTP utility classes in programming languages, the content-type header can automatically be added to the request. This can cause the calculated signature of the upload request to not match the signature of the pre-signed URL and can cause HTTP 401 unauthorized errors.
The following are additional details on the batch import process:
The file you upload is checked against the checksum and file size you provided in the original PUT request. If the size of the import does not match the one specified in the size
field in the request, the service returns a HTTP 400 response. If an attempt is made to alter the checksum
, the service returns a HTTP 409 response.
You can import batch files in parallel. If there are any interdependencies between imports, it is best practice to wait until the initial import is completed before starting any subsequent dependent imports. The file upload can stall for a number of reasons, such as network latency. The server timeout interval of the batch upload is set to 60 minutes.
You can check the batch API import status by issuing a GET request to the batches
endpoint, using the same UUID you generated in step 3.