Walkthrough: Uploading a batch file
In this walkthrough, you upload a batch file to Sitecore CDP.
This walkthrough assumes that you:
-
Have a JSON file (
import.json
) that contains valid JSON records. If you do not have your own JSON records, you can use our sample JSON records. -
Have access to a file archiver.
This walkthrough describes how to:
-
Gzip your JSON file.
-
Collect required details.
-
Upload the gzipped file.
-
Verify that Sitecore CDP ingested the uploaded data.
Gzip your JSON file
Before you can upload a batch file, you gzip your JSON file and use the gzipped file in the upload process.
To gzip your JSON file:
-
Use either a file archiver or a programmatic approach.
For example, if the data you want to upload is in import.json
, the gzipped file is import.json.gz
.
There is a 50MB size limit for uploading batch files.
If the size of the gzipped file exceeds the 50MB limit, recompress the original JSON files into two or more gzipped files that do not exceed the 50MB size limit. Then, upload the gzipped files one at a time.
Collect required details
After you have created the gzipped file, you collect required details. You use these details when you start the upload process.
Detail |
Type |
Description |
Example |
---|---|---|---|
File size in bytes |
integer |
The size in bytes of the gzipped file ( |
|
MD5 checksum |
string |
A hex-encoded MD5 checksum for the gzipped file. This is to provide assurance that the integrity of the gzipped file is intact. |
|
Base64 string |
string |
The same hex-encoded MD5 checksum, converted to Base64. |
|
UUID for batch reference |
string |
The UUID of a batch upload. This a UUID that you generate. It must be unique across all batches. You can use online tools to generate a UUID. |
|
Upload the gzipped file
After you have collected the required details, you can start the upload process. In this process, you first allocate a batch upload location, then start uploading the gzipped file, then retrieve the status of the upload.
To upload the gzipped file:
-
Make a PUT request to allocate a batch upload location.
RequestResponsecurl -X PUT '<baseURL>/v2/batches/<batchRef_UUID>' \ -u '<username>:<password>' \ --data-raw ' { "checksum": "<MD5_checksum>", "size": <file_size_in_bytes> }'
Note that you must:
-
Use basic authentication.
-
Replace
<baseURL>
with your base URL. Replace<batchRef_UUID>
,<MD5_checksum>
, and<file_size_in_bytes>
with the values you collected in the previous procedure.
In the response, in the
location
object, thehref
key contains the batch upload path. You use the batch upload path in the next step. The batch upload path is valid for 1 hour.Example batch upload path:
https://sitecore-batch-service-dev-eu-west-1.s3.amazonaws.com/...
WarningIf you try to upload too many batch files for processing, you'll receive an error stating that you've exceeded the batch upload limit. You must upload one batch file at a time.
-
-
Make a PUT request to the batch upload path. This request starts the upload.
RequestResponsecurl -X PUT 'https://sitecore-batch-service-dev-eu-west-1.s3.amazonaws.com/...' \ -H 'x-amz-server-side-encryption: AES256' \ -H 'Content-Md5: <Base64_string>' \ --data-binary '<path_to_gzipped_file>'
Note that you must:
-
Not use basic authentication, contrary to the previous step.
-
Specify the headers as seen in the code sample above. Replace
<Base64 string>
with the value you collected in the previous procedure. -
In
--data-binary
, specify the path to the gzipped file. You must upload a file in the body as binary data.Example:
@/C:/Users/user/Desktop/import.json.gz
TipIf you are using Postman, click Body > binary > Select File.
This request returns an empty,
200 OK
response. -
-
Make a GET request to retrieve the batch file upload status.
RequestResponsecurl -X GET '<baseURL>/v2/batches/<batchRef_UUID>' \ -u '<username>:<password>' \ -H 'Accept: application/json'
Note that you must:
-
Use basic authentication.
-
Replace
<baseURL>
with your base URL. Replace<batchRef_UUID>
with the same batch reference you used when you made a PUT request to allocate a batch upload location.
In the response, the
status
object contains information about the upload status. First, the status isprocessing
. After the upload succeeds, the status issuccess
.TipWe recommend checking that each file upload status is successful before uploading the next file. This can save you from having to upload files again if you didn't check their status and realize they failed due to a formatting issue or other error.
-
Verify that Sitecore CDP ingested the uploaded data
After the upload succeeds, you log in to Sitecore CDP to find the uploaded data. In this example, you search for the guests listed in our sample JSON records.
To verify that Sitecore CDP ingested the uploaded data:
-
In Sitecore CDP, click Guests. In the Search field, enter the email address of one of the uploaded guest records, for example, [email protected].
The guest displays.
-
Click the guest. The guest profile displays.
-
On the guest profile page, click Properties. The uploaded details about the guest displays. For the guests listed in our sample JSON records, the details include
First Seen
,Last Seen
, and data extensions in theExt
attribute, for example,LoyaltyNumber
.
After you verify that the upload was successful and Sitecore CDP ingested the data, you can start uploading the rest of the batch files, one at a time.
You have now successfully uploaded a batch file to Sitecore CDP. You gzipped your JSON file, started the upload, and verified that Sitecore CDP ingested the uploaded data.