Projection framework
The projection framework is a fluent API that transforms source data into tabular form, including the ability to group data by a key. The projection framework is available in the context of the projection worker and saves projected data to the Cortex Processing Storage database by default.
Keys, attributes, and measures
A projection includes at least one key, optional attributes, and optional measures.
- A key uniquely identifies a row and determines how source data should be grouped. Composite keys are supported. Assuming the source data is a list of contacts, examples of keys include:
- A contact ID. This key is unique per contact and the resulting rows will not be grouped.
- A combination of job title and year of birth. This key is unlikely to be unique per contact and the resulting rows will be grouped.
- An attribute represent additional data about the key:
- Given that the key is a contact ID, you might output that contact's total engagement value as an attribute.
- Given that the key is a combination of job title and year of birth, you might call a method that returns the average starting salary as an attribute.
- A measure is a property of the key on which calculations can be made. Measures are only relevant if projection reduces the source data set. For example, a source list of 40 contacts grouped by job title may results in six rows. In this scenario, you can:
- Use
.Measure("Count", x => 1)to return the number of contacts that have each job title. - Use
.Measure("YearsOfExperience", c => c.CareerFacet().YearsOfExperience)to return the total number of years of experience represented by contacts with each job title.
- Use
The following contact projection example groups contacts by birth year and includes a measure and an attribute:
Projection without grouping
The following sample projection outputs groups rows by contact ID and outputs each contact's job title and number of identifiers:
A sample set of 3 contacts would produce the following output:
| ContactID | IdentifierCount | JobTitle |
|---|---|---|
| a3102a8c-ed77-492b-b447-4b6a0df3c412 | 3 | Programmer Writer |
| c35556da-17dc-476a-98d5-56df981e4529 | 2 | Developer |
| 0600f5a2-4491-491e-b568-be09a8d801fe | 5 | Junior Developer |
This example uses a unique contact ID as a key and the data set has not been reduced. There is a one-to-one relationship between the number of contacts in the source data set and the number of rows in the projected table. In scenarios like this:
- Do not use
Measure(). Every contact ID in the source data set is unique, and the data set will not be reduced. UsingMeasurecreates an unnecessary performance overhead.
Be wary of including contact IDs in a projection. Contact IDs uniquely identify a contact in the xDB and are not removed if a contact executes the right to be forgotten, and should not be persisted outside of the xDB. Projected data is deleted from the Cortex Processing Storage database at configurable intervals.
Projection with grouping
The following sample projection groups rows by contact birth years and job titles, and outputs a count for each row:
A sample set of 17 contacts that span 4 birth years and 3 job titles would produce the following output:
| BirthYear | JobTitle | Count | YearsExperience |
|---|---|---|---|
| 1987 | Programmer Writer | 1 | 10 |
| 1987 | Developer | 5 | 50 |
| 1972 | Developer | 10 | 92 |
| 1990 | Junior Developer | 1 | 3 |
This example uses a composite key consisting of contact birth years and job titles, and the data set has been reduced. In scenarios like this:
- Do use
Measure(). This example includes:- A sum of the number of contacts within each group.
- The total number of years of work experience represented by the group.