Providing a robots.txt file

Web robots are programs that make automatic requests to servers. For example, search engines use robots, sometimes known as web crawlers, to retrieve pages to include in their search databases. You can provide a robots.txt file to identify URLs that robots should not visit.

About this task

On visiting a website, a robot makes a request for the document robots.txt, using the following URL:
http://www.example.com/robots.txt
where www.example.com is the host name for the site. If you have host names that can be accessed using more than one port number, robots request the robots.txt file for each combination of host name and port number. The policies listed in the file can apply to all robots or can name specific robots. Disallow statements are used to name URLs that the robots must not visit. Even when you provide a robots.txt file, any robots that do not comply with the robots exclusion standard might still access and index your web pages.
If a web browser requests a robots.txt file and you do not provide one, CICS® sends an error response to the web browser:
  • If you are using the CICS-supplied default analyzer DFHWBAAX, a 404 (Not Found) response is returned. No CICS message is issued in this situation.
  • If you are using the sample analyzer DFHWBADX or a similar analyzer that can interpret only the URL format that was required before CICS TS Version 3, the analyzer can misinterpret the path robots.txt as an incorrectly specified converter program name. In this case, message DFHWB0723 is issued, and a 400 (Bad Request) response is returned to the web browser. To avoid this situation, you can either modify the analyzer program to recognize the robots.txt request and provide a more suitable error response, or provide a robots.txt file using a URIMAP definition. Both actions bypass the sample analyzer program for these requests.

To provide a robots.txt file for all or some of your host names:

Procedure

  1. Create the text content for the robots.txt file.
    Information and examples about creating a robots.txt file are available from several websites. Search on robots.txt or robots exclusion standard and select an appropriate site.
  2. Decide how to store and provide the robots.txt file. You can provide the file using only a URIMAP definition or an application program.
    • You can store the robots.txt file on z/OS® UNIX System Services and provide the file as a static response using a URIMAP definition. Most web servers store the robots.txt file in the root directory for the host name. For CICS, a URIMAP definition can provide a file stored anywhere on z/OS UNIX, and the same file can be used for more than one host name.

      If you use a file stored on z/OS UNIX, the CICS region must have permissions to access z/OS UNIX, and it must have permission to access the z/OS UNIX directory containing the file and the file itself. Giving CICS regions access to z/OS UNIX directories and files explains how to grant these permissions.

    • You can make the robots.txt file into a CICS document, and provide it either as a static response using a URIMAP definition or as a response from an application program. Creating a document explains how to create a CICS document template. A document template can be held in a partitioned data set, a CICS program, a file, a temporary storage queue, a transient data queue, an exit program, or a z/OS UNIX System Services file.
    • If you want to provide the contents of the robots.txt file using an application program, create a suitable web-aware application program, as described in Developing HTTP applications. For example, you can use the EXEC CICS WEB SEND command with the FROM option to specify a buffer of data containing your robots.txt information. Alternatively, you can use the application program to deliver a CICS document from a template. Specify a media type of text/plain.

      You might want to use an application program to handle requests from robots, so that you can track which robots are visiting your web pages. The User-Agent header in a request from a robot gives the name of the robot, and the From header includes contact information for the owner of the robot. Your application program can read and log these HTTP headers.

  3. Create a URIMAP definition using the CICS Explorer that matches requests made by web robots for the robots.txt file.
    The following sample URIMAP definition attributes show how to match a request for a robots.txt file for any host name:
    Table 1. Example robots values for a URIMAP definition
    Attribute Value Description
    URIMAP robots The name of the URIMAP
    Group MYGROUP Any suitable group name
    Description Robots.txt  
    Status Enabled  
    Usage Server For CICS as an HTTP server
    Scheme HTTP Will also match HTTPS requests
    Host * * matches any host name. Specify host name if you provide separate robots.txt files
    Path /robots.txt Robots use this path to request robots.txt
    TCPIPSERVICE   Blank matches any port. Specify the TCPIPSERVICE defintion name if you provide different robots.txt files depending on port
    Remember that the path components of URLs are case-sensitive. The path /robots.txt must be specified in lowercase.
  4. If you are providing the robots.txt file as a static response, complete the URIMAP definition to specify the file location and the other information that CICS web support uses to construct responses.
    For example, you might specify the following URIMAP attributes to provide a robots.txt file that was created using the EBCDIC code page 037 and stored in the /u/cts/CICSHome directory:
    Table 2. Example static document properties in a URIMAP definition
    Attribute Value
    Media type /text/plain
    Character set iso-8859-1
    Host codepage 037
    HFS file u/cts/CICSHome/robots.txt
    The HFS file name is case-sensitive.
  5. If you are providing the content of the robots.txt file using an application program, complete the URIMAP definition to specify that the program must handle requests.
    For example, you might use the following URIMAP definition attributes to make the web-aware application program ROBOTS handle the request, with no analyzer or converter program involved:
    Table 3. Example associated CICS resource properties in a URIMAP definition
    Attribute Value Description
    Analyzer No Analyzer is not used for the request
    Converter   Blank equals no converter program
    Transaction   Blank defaults to CWBA
    Program ROBOTS ROBOTS