Importing a batch file into Sitecore CDP

Abstract

Details the required approach in importing against the Sitecore CDP Batch API (Data model 2.1).

Before you import the batch file, you must ensure that the file meets formatting requirements.

Complete the following steps to upload guests, orders, products and tracking events:

  1. Compress the batch file: Apply GZIP compression to the batch import file using either operating system utilities or a programmatic approach. There is a 50MB size limit for uploading batch files.

    Tip

    If the size of the compressed batch file exceeds the 50MB limit, recompress the files into two or more compressed batch files that do not exceed the 50MB size limit. Then upload the compressed files as separate batches.

  2. Produce checksum: Generate a hex-encoded MD5 checksum for the compressed file. You must provide this during the upload process to provide assurance that the integrity of the import file sent is intact.

    The following command generates the MD5 hash: $ md5 import.json.gz

    The following is an example of the generated output: MD5 (import.json.gz)= 69b8a56502866f460e5930f7e53d4bf9

  3. Generate unique batch identifier: Generate a unique identifier for the batch. This will be used when interacting with the Batch API endpoint and should be unique across all batches. We recommend using a UUID for this purpose. When you append this identifier to the Batch API base endpoint, it forms the unique URL for this batch upload request.

  4. Request pre-signed URL: Issue a JSON HTTP PUT request to Sitecore CDP Batch API and allocate a location to upload the batch file. The following table defines the fields in the body of the request:

    Attribute

    Type

    Description

    checksum

    string

    MD5 checksum of the compressed batch file.

    size

    integer

    Size in bytes of the compressed batch file.

    The following is an example of a pre-signed URL PUT request:

    Authorization: Basic aHR0cHdhdGNoOmY=
    Content-Type: application/json
    Accept: application/json
    PUT https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785
    
    {
      “checksum”: “40d9a12f0a3c93c8ed66a3b6f3735790”,
      “size”: 3456
    }

    The following table defines the fields in the body of the PUT response:

    Attribute

    Description

    Type

    href

    Echo back of URL used in this request.

    string

    ref

    UUID of the batch, forms part of the href.

    string

    checksum

    Echo back of checksum of the uploading file.

    string

    size

    Echo back of the size of the uploading file.

    integer

    location

    Contains details on the file upload location.

    object

    location.href

    URL of the file upload location.

    string

    location.expiry

    Date and time that the upload location href expires.

    date time

    status

    Contains the status of the file upload.

    object

    status.code

    Description of the current batch file processing status. Potential values; uploading: initial state while awaiting client upload of the import, fileprocessing: import file has been uploaded and processing has started, success: import file has been successfully processed, corrupted: import file did not match the checksum*, error: import has failed, see the log file.

    string

    status.log

    Optional, included if the batch upload request contains any errors, this details the log file location containing a description of the errors.

    string

    summaryStats

    Contains a summary of statistics on the file upload.

    object

    summaryStats.timeToProcessMillis

    The amount of time, in milliseconds, that the batch import job took to run.

    integer

    summaryStats.totalCount

    The total number of records read from the import file by the batch import job.

    integer

    summaryStats.succeededCount

    The number of records that the batch import job successfully processed.

    integer

    summaryStats.failedCount

    The number of records that the batch import job failed to process.

    integer

    createdAt

    Date and time the request was created.

    date time

    modifiedAt

    Date and time the request was last updated.

    date time

    The following is an example of the PUT response:

    {
     "href":
     "https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785",
     "ref" : "3ee694e5-0b77-2d1e-af19-1aa78f500785",
     "checksum" : "40d9a12f0a3c93c8ed66a3b6f3735790",
     "size" : 3456,
     "location" : {
       "href" : "https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D",
       "expiry" : "2016-02-06T14:41:54.251Z"
     },
     "status" : {
       "code" : "uploading"
     },
     "createdAt" : "2016-02-06T13:41:54.251Z",
     "modifiedAt" : "2016-02-06T13:41:54.251Z"
    }
  5. Base64 encode the MD5 checksum: Base64 encode the MD5 checksum of the batch file to be uploaded, which was produced in Step 2. The hexadecimal value of the checksum should be first converted to bytes, and then the bytes should be converted to Base64.

  6. Upload the batch file: Issue a HTTP PUT request to the file upload location. This is the URL provided by the location.hrefs attribute in the PUT response. Include the following additional headers:

    Header Name

    Value

    x-amz-server-side-encryption

    Constant value of AES256.

    Content-Md5

    Base64 encoded checksum.

The following is an example of a file upload request:

x-amz-server-side-encryption: AES256
Content-Md5: QNmhLwo8k8jtZqO283NXkA==
PUT https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D

<BATCH FILE CONTENT AS BYTES>

Caution

The content-type header must be added to request

The content-type header must be added to the request. If you are using HTTP utility classes in programming languages, the content-type header can automatically be added to the request. This can cause the calculated signature of the upload request to not match the signature of the pre-signed URL and can cause HTTP 401 unauthorized errors.

The following are additional details on the batch import process:

If the size of the import does not match the size specified in the size field in the request, the service returns a HTTP 400 response. If an attempt is made to alter the checksum of size fields after their initial creation, the service returns a HTTP 409 response.

You can import batch files in parallel. If there are any interdependencies between imports, it is best practice to wait until the initial import is completed before starting any subsequent dependent imports. The file upload can stall for a number of reasons, such as network latency. The server timeout interval of the batch upload is set to 60 minutes.