Sorting search results by relevance

To help you organize search results, you can sort them based on how relevant they are to your search criteria. This means that those most relevant to your search are displayed at the top of the search results list.

When generating search results, Sitecore Content Hub:

  1. Reduces the number of search result candidates by applying the search criteria you define.

  2. Scores and ranks search results. This step outputs all assets that match the defined search or filter criteria.

In the second step, Content Hub calculates and assigns a score to each asset in the candidate set. This score reflects how relevant the asset is for the defined query. After the relevancy score is assigned to assets, the search results are sorted and ranked.

Relevance score

Content Hub uses the BM25 best matching algorithm to calculate relevancy. This algorithm uses three factors to determine each asset score as described in the following table.

Factor

Description

Term frequency (TF)

The number of times the search term is repeated in the asset fields. The more it is repeated, the more relevant the asset is. For example, the Winter cookbook and the Classic Cocktails recipe book are both assets in Content Hub. In the asset descriptions, the term cook is used more often in the Winter cookbook description than in the Classic Cocktails recipe description. This means that when a user searches for the term cook, the Winter cookbook is more relevant than the Classic Cocktails recipe book in term of frequency.

Inverse document frequency (IDF)

The number of assets that contain the search term. The higher the number of assets, the less important that term is. For example, consider the Winter cookbook and the Classic Cocktails recipe book from the previous example along with eight more assets in the same context. When a user searches for the term famous chef, nine out of the ten assets include the term famous in their description. However, only three assets have the term chef in the description. This means that the term famous is less important than the term chef in this search attempt and that the three assets that have the term chef are more relevant than the other assets to the famous chef search term.

Field length

An asset containing the search term in a field with a shorter length is considered more relevant than an asset that containing the same term in an extended field. For example, the Winter cookbook has a description of 350 characters while the Classic Cocktails recipe book has a description of 1200 characters. The user searches for the term ingredient and both assets have this term in the description. However, because of the differing field length, the Winter Cookbook is more relevant than the Classic Cocktails recipe book.

Boosting an asset

You can influence how an asset is ranked in search results by turning on the Boost field of a property member.

For example, on the M.Asset entity definition, you boost the Author field but not the Information About Author field.

If a user then uploads two cookbooks written by Sara Dubler and does the following:

  • On the Summer Salads cookbook asset, they add the name of the author in the Author field, but leave the other field blank.

  • On Mediterranean Salads cookbook, they leave the Author field blank but add information about Sara Dubler in the Information About Author field.

When someone searches for Sara Dubler, Summer Salad cookbook appears first in the results because it has Sara Dubler in the Author field, and this field is boosted while the Information about Author field is not.

Note

The boost feature is compatible with wildcards in the Search component.

Do you have some feedback for us?

If you have suggestions for improving this article,