Index items

Use locale extractors for localized content

You can index localized content with some Sitecore Search sources. To index localized content, you extract the content item's locale, like English (US) (en-US) or Japan (Japanese) (ja-JP), and generate a common ID for localized versions of the same content. Then, when you specify the locale context at runtime, you can show locale-specific content to your site visitors.

To index localized content, you'll need to define available locales and configure locale extractors.

Note

Ensuring that localized versions of a content piece share the same ID

When you create a source for localized content, you have to ensure that index documents for localized versions of the same content items share the same ID. This means that you have to explicitly configure how to extract the id attribute when you have localized content.

For example, your company's About Us page is available in six locales, including English (US). If you don't configure how to extract the id attribute, Search generates six different IDs for the six About Us index documents. This creates a problem when you configure anything that uses the ID of an index document. For example, many pin rules are based on pinning a content item with a specific ID to a specific slot. If you want to pin About Us, and you use the ID of the English (US) version, only users in the English (US) locale will see that content item pinned. Users in the other five locales will not see a localized version of About Us pinned. To avoid this problem, always ensure that localized versions of the same content have the same ID.

Locale extractors

In Sitecore Search, when you configure a source to crawl and index localized content, you must add the locale to the metadata of each index document. To do this, configure a locale extractor.

Configure the following settings to define how the crawler extracts the locale information from a content item:

Settings	Description
Name	A meaningful name for the locale extractor.
Extractor Type	The type of locale extractor you want to use. You can use: URL - use this locale extractor when you want to use a regular expression to extract the locale from the URL of each page. Header - use this locale extractor when you want to extract the locale from the header. JS - use this extractor when you want to use as JavaScript function to extract the locale from the URL of each page.
URLs to Match	The pattern that defines the URLs to which this extractor and its rules apply. You can use a regular expression or a glob expression. Use this field to create different extractors for different areas of your source content. This is an optional setting.

JavaScript locale extractor

Add a JS function to extract locales from each page.

The JavaScript function you define must:

Use Cheerio syntax. That is, it must use this format:
function extract(request, response) { $ = response.body; }
Return an array of objects.

Header locale extractor

Add the header key whose value you want to use as locale. If the advanced web crawler does not find this key in the request header, it looks for it in the response header.

For example, if you add Accept-Language as the header, the crawler looks for the key Accept-Language and uses the value as the locale for that document. If the request header is Accept-Language: es-ES, the index document has metadata that tags it as an es-ES (Spain, Spanish) document.

If you have suggestions for improving this article, let us know!