Configure a sitemap in the Content Editor
Marketers, strategists, and content authors use industry standard SEO tools to ensure that their content is discoverable and included in search engine indexes. To enable this, a sitemap is required for each site.
A sitemap helps search engine crawlers navigate your site and improve search engine optimization (SEO). By default, the sitemap is generated for the whole site and stored in an XML file. An XML sitemap is created specifically for search engines to show details of the available pages in a website, their relative importance, and the frequency of content updates. In the sitemap, each page is represented by a URL element.
Sitemap generation uses Content Management data to list publishable items and does not verify which ones are actually published. This means pages or language versions in a final workflow state may appear in the sitemap even if not published. Using a workflow can prevent unpublished pages from being included.
To ensure that a sitemap link is generated properly for each of your sites, you must configure the target hostname for the site host. This field specifies the base URL that will be used in the sitemap entries, such as www.example.com.
This topic describes how to configure the sitemap of a site using the Content Editor. You can also configure the sitemap using the sites settings
Configure the hostname
To ensure that the sitemap link is generated properly, you must configure the target hostname and scheme on the site host item.
By default, the sitemap uses the Host Name defined in the Basic section in the site host item of your site (<site collection>/<site>/Settings/Site Grouping/Site). If both the Host Name and the TargetHostName fields are empty, the sitemap returns a 404 error.
To configure the hostname for your sitemap:
- In the Content Editor, in the content tree, navigate to the site you want to configure, and find the
<site collection>/<site>/Settings/Site Grouping/Siteitem. - Update the HostName and TargetHostName fields with the correct domains for your sitemap.
- Publish the item.
Configure the sitemap
When an item is published on a site, SitecoreAI generates the Sitemap media item. This process can occur no sooner than a defined period of time (the refresh threshold) after completion of the previous sitemap generation job. The generated media item contains the Sitemap.xml file that is served to Experience Edge.
To configure the sitemap for your site:
- In the Content Editor, navigate to
<site>/Settings/Sitemap. - Fill in the following fields:
- Refresh threshold - how much time must pass after a sitemap is generated on publish before it can be generated again. Measured in minutes.
- Cache expiration - a time, in minutes, after which the current cache expires. Set this to match the refresh threshold value.
- Maximum number of pages per sitemap - if specified, this determines the maximum number of pages in a sitemap.
- Generate sitemap media items - this must be enabled to allow SitecoreAI to work properly with Experience Edge.
The following table gives a full description of all the parameters available for sitemap configuration.
| Tab | Field | Description |
|---|---|---|
| Alternate links | Generate alternate links | Select to add xhtml elements to the URL elements in the sitemap. xhtml is used for alternate links, for example to link to other language versions of the same page.For example: |
| | Include x-default | Select to add the xhtml
element with hreflang set to x-default to the url element. The x-default value signals to the search algorithm that the page does not target any specific language or region.Example:<xhtml rel="alternate" hreflang="x-default" href="https://sxa" /> | | hreflang | Specify language and region options for the hreflang attribute language and region - URLs are rendered both for region-dependent and independent codes. For example:<xhtml rel="alternate" hreflang="en" href="https://sxa/en" /> <xhtml rel="alternate" hreflang="en-US" href="https://sxa/en-US" /> <xhtml rel="alternate" hreflang="en-CA" href="https://sxa/en-CA" />with region only - URLs are rendered only for region dependent codes. Example:<xhtml rel="alternate" hreflang="en-US" href="https://sxa/en-US" /> <xhtml rel="alternate" hreflang="en-CA" href="https://sxa/en-CA" />with language only - URLs are rendered only for region independent codes. Example:xhtml:link rel="alternate" hreflang="en" href="https://sxa/en" / xhtml:link rel="alternate" hreflang="da" href="https://sxa/da" / | | Urlset attributes | lastmod | Select to render the lastmod attribute that belongs to the url element. Specifies the date when the page was last modified. | | changefreq | Select to render the changefreq attribute that belongs to the url element. Specifies how often the page content is changed. | | priority | Select to render the priority attribute that belongs to the url element. Specifies a number between 0 and 1 that represents the importance of a specific page. | | URL options | Link provider name | Specify a custom link provider. It is added to the providers node under the linkManager node in the Sitecore.config file. If you leave it blank, the default link provider of the site is used. | | Content crawling | Crawler | Specify the name of the item crawler that fetches items from your site. The default value is itemCrawler. You can define multiple item crawlers in the sitemapItemCrawler element.Example:Exclude an item from the sitemap
By default, pages that are excluded from publication by site publication restrictions or approval workflows are not included in the sitemap. However, you might want to exclude other pages from the sitemap that have been approved and published, such as the error 404 page.
To exclude an item from the sitemap using the Content Editor:
- In the Content Editor, in the content tree, click the content item that you want to exclude from the sitemap. For example, the homepage located at
<site collection>/<site>/Home. - Under the Sitemap settings section for the item, configure the Change frequency field to Do not include.
- Click Save.
- Click Publish to publish the item.
The next time your sitemap is generated and the cache expires, you can confirm that the item has been excluded.
To check that the item has been excluded from the sitemap without having to wait for the refresh threshold and the site cache expiration, go to the Sitemap configuration item <site collection>/<site>/Settings/Sitemap and set both the Refresh threshold and the Cache expiration to 0.
Remember to update these values back to the previous value once you complete testing.
Locate a site's sitemap
When configuring the sitemap in the Content Editor, if you've enabled the Generate sitemap media items setting, the generated sitemap will be stored in the Media Library.
When configuring the sitemap using the Sites UI, the Generate sitemap media items setting is enabled by default.
You can locate a sitemap by doing any of the following:
-
On the
<site collection>/<site>item, the Sitemap media items field displays the sitemaps associated with the selected site. With raw values turned on, you can see the sitemap's ID, which you can then search for in the content tree. -
In the Media Library, you can find a list of sitemaps for a specific site in the following folder:
Project<site>/<site>/Sitemaps/<site>. -
Use the following query to retrieve the sitemap for a site from Experience Edge: