Crawler authentication

Note

This topic only covers authentication configuration required in your crawler settings when your content needs authentication before access. For information on how to authenticate with Sitecore Search source when you integrate your website or app, see API authentication and authorization.

If your original content requires authentication before the crawler can access it, you can define authentication settings. You can add a key, access token, or password to the source configuration. In Search, this mechanism is available in Source Settings > Authentication.

Note

Basic authentication settings

Use basic authentication when your original content requires a key or access token that you add to the request header.

Configure the following basic authentication settings:

Table 1. 

Setting

Description

Authentication Type

Type of authentication you want to use. Select Basic.

URL

The URL where your source content requires authentication. For example, enter www.acme.com/login.

If you use a request trigger, the URL is usually the same as the request URL.

BODY

Body of the request. Use this when you send a POST, PUT, or PATCH request.

METHOD

HTTP method of the request. You can use GET, POST, PUT, or PATCH.

TIMEOUT

Time, in milliseconds, the crawler waits to get a response from the URL. If the TIMEOUT expires before the crawler gets a response, the crawler does not crawl the URL.

HEADERS

Authorization header that describes the user-agent used to authenticate when accessing your source content. Set as key and value. For example, enter the key as authorization and the value as the key or access token required for the source content.

You can add multiple headers.



Browser authentication

Use browser authentication when your website requires a GUI-based username and password, rather than a key or access token in the request header. If visitors need to enter a username and password to access content, you'll need browser authentication.

Configure the following browser authentication settings:

Setting

Description

Authentication Type

Type of authentication you want to use. Select Browser.

URL

The URL where your website requires authentication. Usually, this is the login page. For example, enter www.acme.com/login

If you use a request trigger, the URL is usually the same as the request trigger URL.

USERNAME SELECTOR

CSS notation for the username selector field. For example, this can be the Username, USERNAME or EMAIL, or Enter email field on your content login page. To get the CSS notation value run an inspect element on the username field on the browser.

We recommend that you add more than one username selector to make sure that the crawler finds the right username field.

For example, to use the id and name CSS selectors to find the username field, enter:

RequestResponse
#username,[name=UserName]

USERNAME VALUE

Username your website expects, in plain text.

PASSWORD SELECTOR

CSS notation for the password selector field. For example, this can be the Password or Enter password field on your content login page. To get the CSS notation value, run an inspect element on the password field in the browser.

We recommend that you add more than one password selector to make sure that the crawler finds the correct password field.

For example, to use the id , name, and type CSS selectors for the username field, enter:

RequestResponse
#passwrd,[name=Password],[type=password]

PASSWORD VALUE

Password your website expects, in plain text.

SUBMIT SELECTOR

CSS notation for the submit selector field. For example, this can be the Login, Submit , or Sign in button on your website's login page. To get the CSS notation value, right-click the field and run an inspect element.

We recommend that you add more than one submit selector to make sure that the crawler finds the right submit field.

For example, to use the id and type CSS selectors for the submit field, enter:

RequestResponse
#log_in,[type=submit]

MIN WAIT

Minimum time, in milliseconds, that the crawler waits to get a response from the URL.

MAX WAIT

Maximum time, in milliseconds, that the crawler waits to get a response from the URL. If the MAX WAIT expires before the crawler gets a response, the crawler does not crawl your content.

Do you have some feedback for us?

If you have suggestions for improving this article,