Sitecore Cortex Content Tagging architecture
Overview of the architecture of Sitecore CortexTM Content Tagging.
This topic describes the architecture of the Sitecore CortexTM Content Tagging feature in Sitecore. This topic contains the following sections:
The Sitecore Cortex Content Tagging feature in Sitecore consists of the following:
Providers (
IContentProvider
,IDiscoveryProvider
,ITaxonomyProvider
, andITagger
) – contain business logic that performs content tagging operations.Configuration services (
IItemContentTaggingProviderSetBuilder
,IItemContentTaggingConfigurationService
) – enable you to build a combination of providers that provide content tagging operations, based on the configuration.Pipelines (
getTaggingConfiguration
,tagContent
,normalizeContent
) – give you extension points to inject custom logic into the content tagging process.
The process of content tagging consists of four steps. For each step, there is an abstraction:
IContentProvider – takes as input objects of type
T
(for example, a Sitecore item) and returnsTaggableContent
objects. You can implement a custom version of the IContentProvider.IDiscoveryProvider – takes as input
TaggableContent
objects and returnsTagData
objects. You can implement a custom version of the IDiscoveryProvider.ITaxonomyProvider – takes as input
TagData
objects and returnsTags
objects. Can also return the parent and/or children of a tag if you have implemented structured taxonomy in the provider. You can implement a custom version of the ITaxonomyProvider.ITagger – takes as input an object of generic type
T
(for example, a Sitecore item) and a collection ofTags
objects and assigns tags to the typeT
object. You can implement a custom version of the ITagger.
The following diagram shows the dependencies between all provider types:

You can configure each part of the content tagging process. When a user triggers the tagging process, the getTaggingConfiguration
pipeline reads the Sitecore configuration and builds a named set of providers based on the configuration.
The
IItemContentTaggingConfigurationService
service reads the names of providers that are specified in the content tagging configuration and returns theItemContentTaggingConfiguration
object.The
IItemContentTaggingProviderSetBuilder
service uses theItemContentTaggingConfiguration
object to build a set of providers that will be used for content tagging.
The getTaggingConfiguration
pipeline reads the configuration name and then builds a provider set for this configuration.
The tagContent
pipeline uses a set of providers created by the getTaggingConfiguration
pipeline to provide content tagging. The tagContent
pipeline consists of the following pipeline processors:
RetrieveContent
– uses the configured content provider to get taggable content from the context item.Normalize
– takesTaggableContent
objects and provides some processing in order to normalize content before passing it to theGetTags
pipeline processor.GetTags
– getsTagData
objects forTaggableContent
objects. Uses the configured discovery provider for tagging. The output is the list ofTagData
objects related to the input content.StoreTags
– stores received tags. Uses the configured taxonomy provider. The default implementation will create tags items in the Sitecore tag repository.ApplyTags
– marks the context item with tags. Adds tag item IDs, created by theStoreTags
pipeline processor, to the context item’s Semantics field under the Tagging section of the Item. Uses the configured tagger provider.
The normalizeContent
pipeline is a separate pipeline to prepare TaggableContent
objects for tagging. It is triggered by the Normalize
pipeline processor in the tagContent
pipeline.
The code for Sitecore Cortex Content Tagging is broken down into three DLLs.
The Sitecore.ContentTagging.Core DLL contains abstractions, default implementations, and infrastructural code. You can reference this DLL to run parts of Sitecore Cortex Content Tagging. For example, in order to get tags for some text without storing the tags, you can use the IDiscoveryProvider CreateDiscoveryProvider(string providerName)
method to instantiate a discovery provider that is registered by name in the config file. You can use the IContentTaggingProviderFactory
interface to get an instance of any of the four types of provider by name.
The Sitecore.ContentTagging DLL integrates Sitecore Cortex Content Tagging with Sitecore. This DLL contains the infrastructure to run content tagging from the Sitecore UI. It contains extension points (pipelines).
The Sitecore.ContentTagging.OpenCalais DLL implements the discovery provider for Refinitiv Intelligent Tagging Open Calais. This allows Sitecore to use Open Calais for content tagging.
The configuration file contains the <contentTagging>
section. This contains the following:
<providers>
contains all registered providers grouped into the following sections:<content>
aggregates IContentProvider implementations<discovery>
aggregates IDiscoveryProvider implementations<tagger>
aggregates ITagger implementations<taxonomy>
aggregates ITaxonomyProvider implementations
<configurations>
defines different configuration sets using providers defined in the<providers>
section.
<contentTagging> <providers> <content> <add name="DefaultContentProvider" type="Sitecore.ContentTagging.Core.Providers.DefaultContentProvider, Sitecore.ContentTagging.Core" /> </content> <discovery> <add name="DefaultDiscoveryProvider" type="Sitecore.ContentTagging.Core.Providers.DummyDiscoveryProvider, Sitecore.ContentTagging.Core" /> </discovery> <tagger> <add name="DefaultTagger" type="Sitecore.ContentTagging.Core.Providers.DefaultTagger, Sitecore.ContentTagging.Core" /> </tagger> <taxonomy> <add name="DefaultTaxonomyProvider" type="Sitecore.ContentTagging.Core.Providers.DefaultTaxonomyProvider, Sitecore.ContentTagging.Core" /> </taxonomy> </providers> <configurations> <config name="Default"> <content> <provider name="DefaultContentProvider"/> </content> <tagger> <provider name="DefaultTagger"/> </tagger> <taxonomy> <provider name="DefaultTaxonomyProvider"/> </taxonomy> <discovery> <provider name="DefaultDiscoveryProvider"/> </discovery> </config> </configurations> </contentTagging>
Video: Sitecore Cortex - Content Tagging Architecture
You can watch this video to see the customization and extension points included in the Sitecore Cortex content tagging feature. The video demonstrates how to configure new providers and configuration sets.