Skip to main content

Import a batch file into Sitecore CDP

Abstract

Describes how to import a file into Sitecore CDP, using the Sitecore CDP Batch API (Data model 2.1).

You can upload guests, orders, products and tracking events to Sitecore CDP. Before you import the batch file, you must ensure that the file meets formatting requirements.

To import a batch file into Sitecore CDP:

  1. Compress the batch file you want to import. Apply GZIP compression to the batch import file using either operating system utilities or a programmatic approach. There is a 50MB size limit for uploading batch files.

    Tip

    If the size of the compressed batch file exceeds the 50MB limit, recompress the files into two or more compressed batch files that do not exceed the 50MB size limit. Then upload the compressed files as separate batches.

  2. Generate a hex-encoded MD5 checksum for the compressed file. You must provide this during the upload process to provide assurance that the integrity of the import file sent is intact.

    The following command generates the MD5 hash: $ md5 import.json.gz

    The following is an example of the generated output: MD5 (import.json.gz)= 69b8a56502866f460e5930f7e53d4bf9

  3. If you are importing the file using a PUT request, generate a unique identifier for the batch. This is used when interacting with the Batch API endpoint and must be unique across all batches.

    Important

    The identifier must be in UUID format. When you append this identifier to the Batch API endpoint, it forms the unique URL for this batch upload request.

  4. Issue a JSON HTTP PUT request to the Sitecore CDP Batch API to allocate a location to which you subsequently upload the batch file. The following table defines the fields in the body of the request:

    Attribute

    Type

    Description

    checksum

    string

    MD5 checksum of the compressed batch file, produced in step 2.

    size

    integer

    Size in bytes of the compressed batch file.

    The following is an example of a pre-signed URL PUT request:

    Authorization: Basic aHR0cHdhdGNoOmY=
    Content-Type: application/json
    Accept: application/json
    PUT https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785
    
    {
      “checksum”: “40d9a12f0a3c93c8ed66a3b6f3735790”,
      “size”: 3456
    }

    The following fields can be returned in the body of the PUT response:

    Attribute

    Description

    Type

    href

    Echo back of the URL used in this request.

    string

    ref

    UUID of the batch, forms part of the href.

    string

    checksum

    Echo back of checksum of the uploading file.

    string

    size

    Echo back of the size of the uploading file.

    integer

    location

    Details of the file upload location.

    object

    location.href

    The allocated URL to which you upload the batch file. This URL is valid for one hour.

    string

    location.expiry

    Date and time that the upload location href expires.

    date time

    status

    The status of the file upload.

    object

    status.code

    The batch file processing status. Possible values:

    • uploading - the import process was initialized by the client.

    • fileprocessing - import file was uploaded and processing has started.

    • success - import file was successfully processed.

    • corrupted - import file did not match the checksum.

    • error - import has failed, see the log file. The location of the log file is the value of the status.log parameter.

    string

    status.log

    If the batch upload request contains errors, the location of the log file that includes details of the errors.

    string

    summaryStats

    Contains a summary of statistics on the file upload.

    object

    summaryStats.timeToProcessMillis

    The amount of time, in milliseconds, that the batch import job took to run.

    integer

    summaryStats.totalCount

    The total number of records read from the import file by the batch import job.

    integer

    summaryStats.succeededCount

    The number of records that the batch import job successfully processed.

    integer

    summaryStats.failedCount

    The number of records that the batch import job failed to process.

    integer

    createdAt

    Date and time the request was created.

    date time

    modifiedAt

    Date and time the request was last updated.

    date time

    The following is an example of the PUT response:

    {
     "href":
     "https://api.boxever.com/v2/batches/3ee694e5-0b77-2d1e-af19-1aa78f500785",
     "ref" : "3ee694e5-0b77-2d1e-af19-1aa78f500785",
     "checksum" : "40d9a12f0a3c93c8ed66a3b6f3735790",
     "size" : 3456,
     "location" : {
       "href" : "https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D",
       "expiry" : "2016-02-06T14:41:54.251Z"
     },
     "status" : {
       "code" : "uploading"
     },
     "createdAt" : "2016-02-06T13:41:54.251Z",
     "modifiedAt" : "2016-02-06T13:41:54.251Z"
    }
  5. Apply Base64 encoding to the MD5 checksum of the batch file to be uploaded, which was produced in Step 2. The hexadecimal value of the checksum must be first converted to bytes, and then the bytes must be converted to Base64.

  6. To upload the batch file, issue a HTTP PUT request to the file upload location. This is the URL provided *in the location.hrefattribute in the PUT response. Include the following additional headers:

    Header Name

    Value

    x-amz-server-side-encryption

    Constant value of AES256.

    Content-Md5

    Base64 encoded checksum.

The following is an example of a file upload request:

x-amz-server-side-encryption: AES256
Content-Md5: QNmhLwo8k8jtZqO283NXkA==
PUT https://boxever-batch-service-production-eu-west-1.s3.amazonaws.com/xyzsla2xze5vxn02kf283wo020jg/3ee694e5-0b77-2d1e-af19-1aa78f500785/import.gz?AWSAccessKeyId=AKIAI2JLVI7OT2L6QDRQ&Expires=1459953714&Signature=yIG7nFv5w%2B2N%2Fkz11Eh7BjqSt2U%3D

<BATCH FILE CONTENT AS BYTES>

Caution

The content-type header must be added to the request. If you are using HTTP utility classes in programming languages, the content-type header can automatically be added to the request. This can cause the calculated signature of the upload request to not match the signature of the pre-signed URL and can cause HTTP 401 unauthorized errors.

The following are additional details on the batch import process:

The file you upload is checked against the checksum and file size you provided in the original PUT request. If the size of the import does not match the one specified in the size field in the request, the service returns a HTTP 400 response. If an attempt is made to alter the checksum, the service returns a HTTP 409 response.

You can import batch files in parallel. If there are any interdependencies between imports, it is best practice to wait until the initial import is completed before starting any subsequent dependent imports. The file upload can stall for a number of reasons, such as network latency. The server timeout interval of the batch upload is set to 60 minutes.

You can check the batch API import status by issuing a GET request to the batches endpoint, using the same UUID you generated in step 3.