Sitecore Cortex Processing Engine

Workers

Version:

Workers are responsible for performing a task, such as projecting data from an external data source into temporary storage or evaluating contacts against a trained model. Workers are called by task agents, which can run in parallel.

As a developer, you can:

Register a task or chain of tasks to be processed by existing workers.
Create custom workers.

Types of workers

There are two types of workers: distributed workers and deferred workers.

In a complex business scenario such as training a machine learning model, a combination of distributed and deferred workers complete a chain of tasks in a specific order.

Distributed workers

Distributed workers read data in batches from an external data source and perform processing on that data, such as projecting it into a format that is suitable for machine learning. The data set is split into cursors and processed by multiple agents in parallel, which means that several workers are working on the same task. Examples of distributed workers include:

Projection worker (ProjectionWorker)

Important

The external system must support parallel reads. If parallel reads are not supported, you should not use a distributed worker to extract data. Use a deferred worker instead.

Other uses for distributed workers include:

Performing business logic that includes updating a large number of contact. For example, evaluating contacts against a trained model and updating a facet.
Aggregating data from the xDB Collection database into a reporting database.

Note

If you are importing data into xConnect without performing any processing, use the Data Exchange Framework instead of the Processing Engine.

Deferred workers

Deferred workers are called once by a single task agent and do not accept a data source by default. Example of deferred workers include:

Merge worker (MergeWorker)
Training worker (TrainingWorker)

Other uses for deferred workers include:

Exporting a trained model to other systems.
Importing a model that has already been trained.
Running a calculation.
Importing data from a system that does not support parallel reads.

Worker options dictionaries

All workers accept a options dictionary object that inherits DeferredWorkerOptionsDictionary or DistributedWorkerOptionsDictionary depending on the type of worker. You must pass in a worker options dictionary when registering a task.

An options dictionary serves two main purposes:

It contains contains the fully qualified type name of the worker that should handle the task.
It contains all the information that a worker requires to complete a task. For example, the ContactTrainingWorkerOptions class contains information about:
- Which model to use for training
- Where to find training data in temporary storage
- The training data schema

Default worker options dictionaries

All default workers are matched by one or more strongly typed worker options dictionaries. The following table lists several default workers and matching worker options dictionaries:

Worker	Worker options
`TrainingWorker`	`InteractionTrainingWorkerOptionsDictionary` `ContactTrainingWorkerOptionsDictionary`
`MergeWorker`	`MergeWorkerOptionsDictionary`
`ProjectionWorker`	`InteractionProjectionWorkerOptionsDictionary` `ContactProjectionWorkerOptionsDictionary`

Worker

Worker options

TrainingWorker

InteractionTrainingWorkerOptionsDictionary

ContactTrainingWorkerOptionsDictionary

MergeWorker

MergeWorkerOptionsDictionary

ProjectionWorker

InteractionProjectionWorkerOptionsDictionary

ContactProjectionWorkerOptionsDictionary

Note

It is recommended that you use a specialized options dictionary when registering a task. However, you technically register a task for any worker using the DeferredWorkerOptionsDictionary or DistributedWorkerOptionsDictionary base classes.

Distributed worker data sources

Distributed workers accept a data source options dictionary in addition to a worker options dictionary. You must specify a data source when registering a distributed task.

There are four default data sources that get data from xConnect via data extraction or search:

Data source	Data source options dictionary
`ContactDataSource`	`ContactDataSourceOptionsDictionary` (uses data extraction)
`InteractionDataSource`	`InteractionDataSourceOptionsDictionary` (uses data extraction)
`ContactSearchDataSource`	`ContactSearchDataSourceOptionsDictionary` (uses xConnect search)
`InteractionSearchDataSource`	`InteractionSearchDataSourceOptionsDictionary` (uses xConnect search)

All xConnect data sources support expand options and sampling. You can also create your own data source.

Model wrappers

Some workers require a model wrapper to be passed in as part of the worker options dictionary when registering a task. A model wrapper defines:

Data projection logic
Training logic (specific to machine learning)
Evaluation logic (specific to machine learning)

Sample and base workers

The following base workers can be inherited and adapted to fit your business scenario:

Evaluation worker

Note

There is no default evaluation worker, as the result of an evaluation - for example, writing data to an xConnect facet - depends on the implementation.

If you have suggestions for improving this article, let us know!