Sites

Configure the robots.txt file

Version:

The robots.txt file is located in the root folder of your website and controls which files on your website that the search engines can index. The robots.txt file consists of rules that either allow or block access for a particular crawler to a file path on the domain or subdomain where the robots.txt file is hosted.

Important

If you do not add any rules, the following code is written to the file:

User-agent: *
Disallow: /

This means that no crawlers can access the content.

To configure the robots.txt file:

In the content tree, navigate to your site and click the Settings item.
Scroll down to the Robots section and, in the Robots content field, enter the rules.
For the robots.txt file to be updated, you must publish the entire site: in the site tree, click the Site item, then in the top toolbar click Publish.
In the Publish Site dialog, click Republish.
Click Publish.

Example

In the following example, the website is called http://www.mywebsite.com, and you want to instruct all search engines not to index any of the content in the ignorethesepages folder:

User-agent: *
Disallow: /ignorethesepages/

Note

You do not have to specify where the sitemap.xml is located. SXA adds this information to the robots.txt file automatically.

If you have suggestions for improving this article, let us know!