Configure robot detection functionality

Abstract

How to filter out unwanted visitors by user agent or IP address or by using the robot detection feature.

Sitecore can detect robots automatically, using the user agent string. You must enable Device Detection for robot detection to work. You can identify and block custom robots by configuring the ExcludeRobots configuration file to filter out unwanted visitors by a custom user agent or IP address. Additionally, you can filter out robots by using the robot detection component to identify a lack of human behavior.

You can create a list to assign custom user agents and IP addresses to exclude in the Sitecore.Analytics.ExcludeRobots.config file. For example, if a visitor comes to your website from an IP address or user agent that was added to the exclude list, the request to view the page is ignored and not tracked.

You can manually edit this list by adding the user agents that you want to block under the <analyticsExcludeRobots> node and IP addresses under the <excludedIPAddresses> node.

To filter by user agent:

  1. Go to the App_Config\Sitecore\Marketing.Tracking folder and open the Sitecore.Analytics.ExcludeRobots.config file.

  2. Under the <analyticsUserAgents> node, enter each user agent that you want to block on a separate line:

    <excludedUserAgents>
      UserAgent 1.0
      UserAgent 2.0
      UserAgent 2.0
    </excludedUserAgents>
    

To filter by IP address:

  1. Navigate to the App_Config\Sitecore\Marketing.Tracking folder and open the Sitecore.Analytics.ExcludeRobots.config file.

  2. Under the <excludedIPAddresses> node, enter each IP address that you want to block on a separate line:

    <excludedIPAddresses>
      10.1.2.3
      12.9.2.2
      35.2.5.4
    </excludedIPAddresses>
    

    Note

    Ensure that IP addresses conform to the following supported formats:

    IP address example: 10.2.3.4

    IP range example: 10.1.2.3 - 10.1.2.30

The robot detection component is enabled by default but to fully implement it, ensure that you have added the Visitor Identification control to the layout of each page on your website.

Note

The Visitor Identification control is stored in the Website\layouts\system folder.

To use the robot detection component to identify human behavior:

  • Add the VisitorIdentification control to the layout on all pages on the website.

The Sitecore Sample layout contains an example reference to the VisitorIdentification control for the website page layout.

<head runat="server">
<title>Welcome to Sitecore</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="CODE_LANGUAGE" content="C#" />
<meta name="vs_defaultClientScript" content="JavaScript" />
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5" />
<link href="/default.css" rel="stylesheet" />
<sc:VisitorIdentification runat="server" />
</head>

Note

If you are using MVC layouts, use the @Html.Sitecore().VisitorIdentification() helper method to render the Visitor Identification control.

By default, robot detection is enabled globally for all sites. You might want to disable robot detection, for example, if you are running automated performance tests. You can disable robot detection globally, or for a specific site.

Important

If robot detection is disabled globally, you cannot enable it for a specific site. The global setting takes priority.

To disable robot detection globally for all sites:

  • Using a patch file, change the Analytics.AutoDetectBots setting to False.

    <setting name="Analytics.AutoDetectBots" value="false" />

To disable robot detection for a specific site:

  • Go to configuration/sitecore/tracking/siteSettings and add the following configuration:

    <?xml version="1.0" encoding="utf-8"?>
    <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
      <sitecore>
        <tracking>
          <siteSettings>
            <site name="mySite" autoDetectBots="false" />
          </siteSettings>
        </tracking>
      </sitecore>
      </configuration>

You can change the default session time out setting to minimize the time that robot-initiated interactions are stored in the session database. The default setting is set for one minute. For human contacts the default setting is set for 20 minutes.

To change the default session time out:

  1. Go to the App_Config\Sitecore\Marketing.Tracking folder and open the Sitecore.Analytics.Tracking.config file.

  2. In the Sitecore.Analytics.Tracking.config file, go to the Analytics.Robots.SessionTimeout setting.

  3. To change the default setting, enter a different time value, in minutes, as the value attribute:

    <setting name=”Analytics.Robots.SessionTimeout” value=”1” />

By default, the IgnoreRobots setting is set to true so that robot visits are ignored and not saved to xDB. When you change this setting to false, all robot visits are saved.

To generate statistics on robot visits:

  1. Go to the App_Config\Sitecore\Marketing.Tracking folder and open the Sitecore.Analytics.Tracking.config file.

  2. In the Sitecore.Analytics.Tracking.config file, go to the Analytics.Robots.IgnoreRobots setting.

  3. Change the value of this setting to false:

    <setting name="Analytics.Robots.IgnoreRobots" value="false" />
    

You can also generate statistics on the following performance counters:

  • Human requests

  • Robot requests

  • Malicious robot requests

Note

To change or customize the default robot detection logic you must edit both the Sitecore.Analytics.RobotDetection.config file and the web.config file. The Sitecore.Analytics.RobotDetection.config file is the main configuration file for the robot detection component and the web.config file contains the media request session module for the robot detection component.