Configure the robots.txt file
The robots.txt
file is located in the root folder of your website and controls which files on your website that the search engines can index. The robots.txt
file consists of rules that either allow or block access for a particular crawler to a file path on the domain or subdomain where the robots.txt
file is hosted.
If you do not add any rules, the following code is written to the file:
User-agent: *
Disallow: /
This means that no crawlers can access the content.
To configure the robots.txt
file:
-
In the content tree, navigate to your site and click the Settings item.
-
Scroll down to the Robots section and, in the Robots content field, enter the rules.
-
Save and publish the changes.
Example
In the following example, the website is called http://www.mywebsite.com
, and you want to instruct all search engines not to index any of the content in the ignorethesepages folder:
User-agent: *
Disallow: /ignorethesepages/
You do not have to specify where the sitemap.xml
is located. SXA adds this information to the robots.txt
file automatically.