Crawler tags

In Sitecore Search, a tag is a source-level mechanism that gives you precise control over the attributes you want to use to create a specific search experience. Tags establish a connection between entities and sources, allowing the flexibility to create index documents with combinations and aggregations of attributes from the same or different entities.

Note

You can use tags in the advanced web crawler and API crawler sources.

When you configure attribute extraction rules, you do so per tag and not per entity. As a result, you get one set of index documents for each tag for which you configure extraction rules. Keep this in mind when you plan and configure tags.

Tags are specific to a source and do not persist across sources. For example, if you define a tag called actor with movies in source A, the tag is unique to source A. If you create source B and define another tag named actor with movies, Search considers it a new tag with its own logic.

When you create a source, you get some default tags. You can also create custom tags to build specific search experiences.

Default tags

Sitecore Search automatically generates default tags when you create a source. Each tag corresponds to an entity within your domain. You can use default tags to define attribute extraction rules in the document extractor.

Default tags suffice when:

  • You want to use one entity's attributes to create only one search experience.

    And

  • You want each index document to have attribute values from only one content item. In other words, you do not want to combine or aggregate attributes from multiple content items.

For example, consider a film and television website where visitors can search for actors, TV shows, and movies. You want to create a search experience where visitors see only the attributes of each entity that a result belongs to. To do this, you can simply define extraction rules for the default tags in the document extractor.

Custom tags

Custom tags are any tags you create. Use custom tags when default tags are not enough. You can create as many custom tags as you want.

Usually, you'll need to configure custom tags when:

  • You want to use the attributes of one entity to create more than one search experience.

    Or

  • You want to enhance index documents with aggregations of attribute values or combinations of attributes from more than one entity into an index document.

To continue with the example of the film and TV website, you might want to create a search experience where you show a list of movies an actor has acted in, along with general information like name and age. In this example, movie information is not available in any attribute of the Actor entity. However, the Movie entity has a movie_actor attribute. You can use this relationship to tie actors to a list of movies they acted in.

There are two ways to configure custom tags in Search: Basic tag configuration and Aggregated tag configuration.

Basic tags

A basic tag is a simple setup you use when you want to create more than one set of index documents from an entity. Usually, you define one or more tags and assign them to an entity.

Tip

We recommend that you assign one basic tag to only one entity. Assigning basic tags to multiple entities defeats the purpose of entities as distinct, search-indexed object types. Because there is no mapping definition in basic tags, if you use the same basic tag across multiple entities, you might get disorganized index documents that are not conducive to useful search experiences.

Instead, if you want to pull attributes from multiple entities into a single index document, we recommend using an aggregated tag configuration.

For example, for a website, some blogs are written by internal writers, and some by external freelancers. Both types of blogs have the same attributes but require different attribute extraction logic. You want your visitors to have a unified search experience where they can search among all blogs. Your implementation has one entity, Blog.

Here's one way to handle this requirement: You can configure basic tags called internal blogs and external blogs under the Blog entity. When you extract attributes for the internal blogs tag, use extraction logic that suits the internal content. When you extract attributes for the external blogs tag, use extraction logic that suits the external content.

Note

Another way to handle this requirement is to configure two sources, one for external content and one for internal content. Using tags is easier because developers don't need to keep track of multiple source IDs to pass at runtime.

Here's a sample index document extracted for the internal tag:

RequestResponse
 "id": xvsusdua232533ygd,
 "name": "Top 10 Most Anticipated Cars in 2024",
 "type": "blog",
 "description": "some description"

Here's a sample index document extracted for the external tag:

RequestResponse
 "id": agahsfeev1nnbsa7d,
 "name": "How to Save 10% on Your New Car",
 "type": "blog",
 "description": "some other description"

As you can see, index documents extracted by both tags look identical. Attributes from both sets of index documents are from the Blog entity. The only difference was in their extraction logic. To a visitor, there is no difference between freelancer content and internal content.

Aggregated tags

An aggregated tag configuration is an advanced setup where you can create index document that have related attributes from more than one content item. You define mappings to aggregate related attributes. Then, you create a new attribute to combine and hold this aggregated information. Configuring tags through aggregation is a powerful way to enhance search experiences when dealing with relational data.

Use aggregated tags when you want to take advantage of a relationship between attributes in the same entity or different entities. An example of this relationship is when items have a shared attribute value, like name or id.

Depending on whether you want to define a relationship between attributes in the same entities or other entities, use flat aggregation or hierarchical aggregation.

Flat aggregation to combine attributes from different entities

Use flat aggregation when you want to enhance index documents with a new attribute that contains aggregated information from related items across multiple entities.

For example, you have a website that has information about actors, movies, and TV shows. There is a search bar where visitors can search for results. Your implementation has the Actor and Movie entities. The Actor entity has attributes like actor_name, actor_age, and so on. It does not have attributes about movies the actor has been in. The Movie entity has attributes like movie_name, movie_year, and a movie_cast attribute with a list of actors.

You want to create the following search experience: when an actor appears, also show a list of movies they acted in.

To support this experience, you'll need index documents with Actor attributes and an attribute that lists movies they have acted in.

To get these index documents, you can create a tag through flat aggregation under the Actor entity and associate it with the Movie entity. Then, configure this tag to use the actor's name as a baseline to find movies in which they've acted.

Here's a sample index document extracted for this tag:

RequestResponse
{
  "actor_id": "A101",
  "actor_name": "Tom Cruise",
  "actor_description": "A critically acclaimed actor...",
  "actor_age": 61,
  "actor_gender": "male",
  "filmography": [
    {
      "movie_name": "Top Gun",
      "movie_year": 1986,
      "movie_genre": "Action"
    },
    {
      "movie_name": "Edge of Tomorrow",
      "movie_year": 2014,
      "movie_genre": "Science Fiction"
    },
    ...
  ]
Note

In this sample index document, attributes within the filmography array come from other items in the Movie entity, which have one cast value of Tom Cruise (assuming that cast is an array of strings).

Hierarchical aggregation to combine attributes of one entity

Use hierarchical aggregation when you want to enhance index documents with a new attribute that contains aggregated information from related items in one entity.

For example, you have a website where visitors can search for employees who work for your company. Your implementation has an entity called Employee, and one of the attributes in this entity is manager.

You want to create the following search experience: when an employee appears in the search results, also show a list of their colleagues.

To support this experience, you'll need index documents that have regular Employee attributes as well as an attribute that lists colleagues.

To get these index documents, you can create a tag under the Employee entity. Then, configure this tag to use the manager attribute as a baseline to find employees with the same manager, that is, to find colleagues.

Here's a sample index document extracted for this tag:

RequestResponse
{
  "employee_id": 1,
  "name": "John Doe",
  "position": "Senior developer",
  "department": "R&D",
  "manager": "Mary Smith",
  "colleagues": [ 
    {
      "employee_id": 2,
      "name": "Jane Smith",
      "position": "Lead Developer",
    },
    {
      "employee_id": 3,
      "name": "David Johnson",
      "position": "Senior  Developer",
    }
  ]
}
Note

In this sample index document, attributes within the colleagues array come from other content items in the Employee entity who have a manager value of Mary Smith.

Example: Tags for an implementation with three entities

Consider a film and television website where visitors can search for actors, TV shows, and movies. Your implementation has the Actor, Movie , and TVShow entities.

The Actor entity has the following attributes: actor_awards, actor_age, actor_gender, actor_movies, actor_name, actor_image, actor_tv, actor_bio.

The Movie entity has the following attributes: movie_description, movie_awards, movie_actor, movie_name, movie_rating, movie_studio, movie_director, movie_poster.

The TV Show entity has the following attributes: tv_name, tv_cast, tv_rating, tv_poster, tv_episodes, tv_seasons.

You want to create the following search experiences:

  • When an actor appears, visitors see all the details of that actor.

  • When a TV show appears, visitors see TV show details and some related cast details.

  • When a movie appears, visitors see movie details and some related cast details.

The following image shows a sample source and tag configuration that you can use to get the desired search experiences:

Image showing the relationship between entities, attributes, sources, and tags. The entities are Actor, Movie, and TV Show. There are three sources. Each source has one tag each.

Here's an explanation of this image:

  • To show all actor information, you can create a source (source 1 ) and then use the default actor tag Search creates for the Actor entity. In the document extractor, for the actor tag, extract all available attributes. This will get you an index document that has all the actor details.

    Note

    You do not need to define any custom tags to achieve this search experience.

  • To show cast details along with TV show details when a TV show appears, you can create another source (source 2). In this source, you can create a tag through flat aggregation called tv with actor,, associate it with the TVShow entity, and configure it to borrow from the Actor entity. In the document extractor, for this tag, extract required attributes from both TVShow and Actor entities.

  • To show cast details along with TV show details when a TV show appears, you can create another source (source 3). In this source, you can create a tag through flat aggregation called movie with actor, associate it with the Movie entity and configure it to borrow from the Actor entity. In the document extractor, for this tag, extract required attributes from both TVShow and Actor entities.

Note

For brevity, this example is restricted to three sources with one tag each. In reality, you might need more than one tag in a source, or you might need more sources.

For example, you might want to have a search experience that draws only from the movie details. To enable this, create a source to extract only movie content and use the default movie tag to extract all attributes.

Do you have some feedback for us?

If you have suggestions for improving this article,