The HTTP protocol runs on top of the TCP protocol, but
provides extra information about the message destination. For that
reason, the two proxies are configured differently.
HTTP traffic includes the destination host and port for the message,
and is sent over a TCP connection to a TCP endpoint,
that is, a specific host and port. Typically, the HTTP message specifies
the same TCP endpoint as the one to which the underlying TCP connection
is made. If you change the client configuration to use an HTTP proxy,
the TCP connection is made to a different host and port than the one
in the HTTP URLs, which means that the TCP endpoint in the message
is different from the endpoint being connected to. For example, if
an HTTP request is sent to http://192.0.2.1:8080/operation, the request
includes "192.0.2.1:8080" in the "Host" header of the HTTP message
that is sent to the TCP port 8080 on host 192.0.2.1.
However, if you configure the HTTP client to use a proxy, the underlying
TCP connection goes to the TCP endpoint for the proxy, while the messages
still contain the original TCP endpoint. For example, if you configure
the client to send its messages to a proxy at 198.51.100.1 port 3128,
and the client sends a request for http://192.0.2.1:8080/operation,
the message still contains "192.0.2.1:8080" in the "Host" header,
and now in the "Request-Line" field also. However, this message is
now sent over a TCP connection to the proxy at 198.51.100.1:3128.
In this way, the HTTP proxy can receive messages on a single port
and can forward those messages to many different services based on
the destination information in the message.
Note: The "Host" header was added in HTTP/1.1. HTTP/1.0 connections
do not include this header. For this reason, HTTP/1.0 connections
that do not pass through a proxy do not include the host and port
for the message. However, HTTP/1.0 messages that are sent to a proxy
still contain the destination host and port in the "Request-Line";
therefore, the absence of a "Host" header does not cause a problem
for proxies.
To enable a TCP proxy, you change the client configuration from
the live system TCP endpoint to the TCP endpoint for the proxy. Unlike
HTTP, TCP does not provide the built-in ability to use a proxy. That
is, if you connect to a proxy through TCP, no mechanism is defined
to communicate the intended final destination to the proxy. The only
way for a TCP proxy to allow connections to multiple live systems
(that is, to final destinations, or
onward endpoints),
without knowing what traffic will be sent over those connections,
is to listen on a different port for each live system it allows connections
to, and to maintain the information about which of its port numbers
corresponds to each onward endpoint. The client is then configured
with the appropriate proxy port corresponding to each live system
that it needs to communicate with. The TCP proxy ports to listen on,
and their corresponding onward endpoints, are configured in
<forward> statements
in the proxy configuration file,
RTCP_install_dir/httptcp/registration.xml.
In the following example, 198.51.100.1 is the IP address of the proxy.
Any traffic sent to port 3333 on the proxy is forwarded to port 80
at www.example.com:
<forward bind ="198.51.100.1:3333" destination="www.example.com:80"/>
You must therefore change the client configuration file whenever
you add a new destination for proxy traffic. This restriction does
not apply to HTTP proxies.
To understand how port numbers are handled differently in the HTTP
proxy and the TCP proxy, assume that you have two services, one at
192.0.2.1:8080 and one at 192.0.2.1:8081, and a proxy that is running
on 198.51.100.1. (If the two services differed in IP address rather
than in port number, this example would be the same except for the
appropriate IP address for each service.) If these two services expect
HTTP traffic, a single HTTP proxy port (such as 3128) is opened, and
requests for both TCP endpoints can be sent to that port. When the
HTTP proxy sees that a message is addressed to 192.0.2.1:8080, the
proxy either redirects the message to that address or applies any
rules that it has for that service. The same procedure applies to
192.0.2.1:8081, using the same proxy port.
If these two services instead expect TCP traffic, two TCP proxy
ports must be opened, defined by two
<forward> elements
in the configuration file:
<forward bind ="198.51.100.1:3333" destination="192.0.2.1:8080"/>
<forward bind ="198.51.100.1:3334" destination="192.0.2.1:8081"/>
The
client configuration for the first service changes from "192.0.2.1:8080"
to "198.51.100.1:3333" and for the second service from "192.0.2.1:8081"
to "198.51.100.1:3334". The client sends a message (TCP packet) to
the first service at 198.51.100.1:3333. The proxy receives it on that
port (3333), but does not know what data is being sent over that TCP
connection. All it knows is that the connection was made to port 3333.
Therefore the proxy consults its configuration and sees that traffic
to that port must be forwarded to 192.0.2.1:8080 (or that a rule for
that service must be applied to it).
If you cannot route all of your HTTP traffic through a proxy server
because the client configuration does not support HTTP proxy configuration,
you must use a reverse HTTP proxy. In a reverse HTTP proxy, you change
the destination URL instead of configuring a proxy. This process is
similar to that for setting up a TCP proxy in that you specify the
proxy as the TCP endpoint for the message in the client system and
create a forwarding rule in the proxy. The difference is that you
add a
type attribute to the rule that specifies HTTP,
as in the following example:
<forward bind ="198.51.100.1:3333" destination="192.0.2.1:8080" type="HTTP"/>
Now that the proxy server is configured to receive only HTTP traffic
on the designated port (3333 in the example), the server can apply
the richer filtering that is available from the HTTP proxy to messages
that are addressed to stubs. For example, the server can filter out
traffic to the stub that does not have a certain path in its URL,
or that does not use a certain HTTP method, such as POST. However,
because a stub is not always running, the server still needs the destination
from the
<forward> element to be able to send
traffic to the live system. For example, assume a client needs to
connect to a service on 192.0.2.1:8080 and uses a reverse HTTP proxy
on 198.51.100.1:3333. Before the client can use the proxy, the client
configuration for that service must be changed from a URL such as
http://192.0.2.1:8080/operation to http://198.51.100.1:3333/operation.
A request that is sent to that new URL reaches the proxy. The request
message contains the TCP endpoint for the proxy (198.51.100.1:3333)
in the "Host" header rather than the address of the live system because
the client is not aware that it is sending the message to a proxy
rather than a normal server. This simplified client role defines the
nature of a reverse proxy. Thus the proxy uses the
<forward> elements
to know that a request that comes in on port 3333 requires one of
the following actions:
- The request must be redirected to the live system at 192.0.2.1:8080,
and the "Host" header in the message must be updated to specify that
live system.
- Any rules for that service must be applied to the message, such
as routing it to a stub instead.
In conclusion, for efficiency and ease of configuration, use the
standard HTTP proxy whenever possible. When you cannot, use the reverse
proxy. Use the TCP proxy when you work with TCP traffic that is not
HTTP.