Tutorial: Configuring the RobotsTxt file

You can control the access of a visiting Web robot. You can configure the robots.txt file that exists on your web server, usually at the root level, to control access. Web robots are programs that crawl through the web to obtain web content for all the sites that are visited, and provide indexing for better performance of search engines. You can also specify separate rules for different robots.

Why would I want to edit Drupal's pre-existing robots.txt file?

Malicious robots might choose not to honor the robots.txt file, and by editing this file you are broadcasting which sites you do not want others to see. Therefore, you should not use this file to hide sensitive data. Instead, you might want to edit your robots.txt file to:

  • Prevent duplicate information from being identified on your site
  • Prevent internal pages from appearing in search engines
  • Prevent private pages from appearing in search engines
  • Prevent particular images, files, and so on, from bring crawled
  • Specify a crawl-delay attribute to prevent robots from overloading your server at load time
  • Exclude a particular robot

Before you begin

You must have a Developer Portal enabled, and you must have administrator access to complete this tutorial.

About this tutorial

You will edit the pre-existing robots.txt file and exclude access to a visiting robot called BadBot.

  1. Log in to your Developer Portal as an administrator.
  2. If the administrator dashboard isn’t displayed, click Manage to display it.
  3. Navigate to Configuration > Search and Metadata > RobotsTxt.

    Show robots.txt

  4. Enter the policy to exclude access to a robot called BadBot.
    User-agent: BadBot
    Disallow: /
  5. Click Save Configuration to save your changes.

What you did in this tutorial

You have now successfully customized the robots.txt file. Robots now use this updated file to decide where they can crawl on your site. The BadBot robot is excluded access.

You can check whether your robots.txt file is successfully changed by navigating to your site and appending /robots.txt. You should see the content you entered into that file.

Show the results

For more information on how to edit your robots.txt file, see https://www.robotstxt.org/.

What to do next

You can edit the robots.txt at any time by navigating back to the page within the configuration settings. You might choose to duplicate this file across all of your sites, or choose different policies for different sites.