Web robots are programs that make automatic requests to
servers. For example, search engines use robots, sometimes known as
web crawlers, to retrieve pages to include in their search databases.
You can provide a robots.txt file to identify
URLs that robots should not visit.
About this task
On visiting a website, a robot makes a request for the document robots.txt,
using the following URL:http://www.example.com/robots.txt
where www.example.com
is the host name
for the site. If you have host names that can be accessed using more
than one port number, robots request the robots.txt file
for each combination of host name and port number. The policies listed
in the file can apply to all robots or can name specific robots. Disallow
statements are used to name URLs that the robots must not visit. Even
when you provide a robots.txt file, any robots
that do not comply with the robots exclusion standard might still
access and index your web pages.If a web browser requests a
robots.txt file
and you do not provide one, CICS® sends an error response to
the web browser:
- If you are using the CICS-supplied default analyzer DFHWBAAX,
a 404 (Not Found) response is returned. No CICS message
is issued in this situation.
- If you are using the sample analyzer DFHWBADX or a similar analyzer
that can interpret only the URL format that was required before CICS TS
Version 3, the analyzer can misinterpret the path
robots.txt
as
an incorrectly specified converter program name. In this case, message
DFHWB0723 is issued, and a 400 (Bad Request) response is returned
to the web browser. To avoid this situation, you can either modify
the analyzer program to recognize the robots.txt request
and provide a more suitable error response, or provide a robots.txt file
using a URIMAP definition. Both actions bypass the sample analyzer
program for these requests.
To provide a robots.txt file for all or some of your
host names:
Procedure
- Create the text content for the robots.txt file.
Information and examples about creating a robots.txt file are
available from several websites. Search on robots.txt
or robots
exclusion standard
and select an appropriate site.
- Decide how to store and provide the robots.txt file.
You can provide the file using only a URIMAP definition or an application
program.
- You can store the robots.txt file on z/OS® UNIX System
Services and provide the file as a static response using a URIMAP
definition. Most web servers store the robots.txt file
in the root directory for the host name. For CICS,
a URIMAP definition can provide a file stored anywhere on z/OS UNIX,
and the same file can be used for more than one host name.
If you
use a file stored on z/OS UNIX,
the CICS region must have permissions to access z/OS UNIX,
and it must have permission to access the z/OS UNIX directory
containing the file and the file itself. Giving CICS regions access to z/OS
UNIX directories and files explains how to grant these permissions.
- You can make the robots.txt file into a CICS document,
and provide it either as a static response using a URIMAP definition
or as a response from an application program. Creating a document explains
how to create a CICS document template. A document template
can be held in a partitioned data set, a CICS program,
a file, a temporary storage queue, a transient data queue, an exit
program, or a z/OS UNIX System Services file.
- If you want to provide the contents of the robots.txt file
using an application program, create a suitable web-aware application
program, as described in Developing HTTP applications. For example,
you can use the EXEC CICS WEB SEND command with the
FROM option to specify a buffer of data containing your robots.txt information.
Alternatively, you can use the application program to deliver a CICS document
from a template. Specify a media type of
text/plain
.You
might want to use an application program to handle requests from robots,
so that you can track which robots are visiting your web pages. The
User-Agent header in a request from a robot gives the name of the
robot, and the From header includes contact information for the owner
of the robot. Your application program can read and log these HTTP
headers.
- Create a URIMAP definition using the CICS Explorer that
matches requests made by web robots for the robots.txt file.
The following sample URIMAP definition attributes show how
to match a request for a
robots.txt file for
any host name:
Table 1. Example robots values for a URIMAP definition
Attribute |
Value |
Description |
URIMAP |
robots |
The name of the URIMAP |
Group |
MYGROUP |
Any suitable group name |
Description |
Robots.txt |
|
Status |
Enabled |
|
Usage |
Server |
For CICS as an HTTP server |
Scheme |
HTTP |
Will also match HTTPS requests |
Host |
* |
* matches any host name. Specify host name if
you provide separate robots.txt files |
Path |
/robots.txt |
Robots use this path to request robots.txt |
TCPIPSERVICE |
|
Blank matches any port. Specify the TCPIPSERVICE
defintion name if you provide different robots.txt files depending
on port |
Remember that the path components of URLs
are case-sensitive. The path /robots.txt
must be
specified in lowercase.
- If you are providing the robots.txt file
as a static response, complete the URIMAP definition to specify the
file location and the other information that CICS web
support uses to construct responses.
For example, you might specify
the following URIMAP attributes to provide a
robots.txt file
that was created using the EBCDIC code page 037 and stored in the
/u/cts/CICSHome directory:
Table 2. Example static document properties in a URIMAP definition
Attribute |
Value |
Media type |
/text/plain |
Character set |
iso-8859-1 |
Host codepage |
037 |
HFS file |
u/cts/CICSHome/robots.txt |
The HFS file name is case-sensitive.
- If you are providing the content of the robots.txt file
using an application program, complete the URIMAP definition to specify
that the program must handle requests.
For example, you might use the following
URIMAP definition attributes to make the web-aware application program
ROBOTS handle the request, with no analyzer or converter program involved:
Table 3. Example associated CICS resource properties in a URIMAP definition
Attribute |
Value |
Description |
Analyzer |
No |
Analyzer is not used for the request |
Converter |
|
Blank equals no converter program |
Transaction |
|
Blank defaults to CWBA |
Program |
ROBOTS |
ROBOTS |