Web Client
The Web Client processor sends requests to a resource endpoint and writes responses to records. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The Web Client processor requires that Data Collector use Java version 17. For more information, see ../../../reusable-content/datacollector/../../datacollector/UserGuide/Installation/JavaVersions-Features.html.
The Web Client processor provides much of the same functionality as the HTTP Client processor. It also provides functionality not available in the HTTP Client processor. For more information, see Comparing Web Client and HTTP Client Processors.
For each request, the processor writes data from the response to the specified output field. When the response contains multiple values, the processor can write either the first value, all values to a list in a single record, or all values to separate records.
You can use the Web Client processor to perform a range of standard requests or you can use an expression to determine the request for each record.
When you configure the Web Client processor, you define the request endpoint, optional headers, and method to use for the requests. You can also use a connection to configure the processor.
You configure the processor to generate one request for each record or to generate a single request containing all records in the batch.
You define the pagination mode, optional status response actions, and an optional response endpoint for responses.
You can configure the timeout, request transfer encoding, and authentication type for both requests and responses.
You can optionally use a proxy server and configure TLS properties. You can also configure the processor to use the OAuth 2 protocol to connect to an HTTP service.
Comparing Web Client and HTTP Client Processors
Data Collector provides two processors that send requests to HTTP endpoints and write data to records. The HTTP Client processor was the first processor. The new Web Client processor includes key functionality available in the older processor, as well as improvements and new features.
The following is a list of key differences between the two processors:
-
The Web Client processor allows you to configure different data formats for request data and response data.
- The Web Client processor supports parallel HTTP requests.
-
The Web Client processor allows you to configure per-timeout actions.
-
The HTTP Client processor can be configured to use Universal authentication. Both processors can be configured to use Basic, Digest, OAuth 1, and OAuth 2 authentication.
HTTP Method
You can use the following methods with the Web Client processor:
-
GET
-
POST
-
PUT
-
PATCH
-
DELETE
-
HEAD
-
Expression - An expression that evaluates to one of the other methods.
Expression Method
The Expression method allows you to write an expression that evaluates to a standard HTTP method. Use the Expression method to generate a workflow. For example, you can use an expression that passes data to the server using the PUT method based on the data in a field.
Headers
You can configure optional headers to include in the request made by the stage. Configure the headers in the following properties on the Request tab:
-
Security Headers
-
Common Headers
You can define headers in either property. However, only security headers support using credential functions to retrieve sensitive information from supported credential stores.
If you define the same header in both properties, security headers take precedence.
Grouping Style
The Web Client processor can generate one HTTP request for each record, or it can generate a single request containing all records in the batch.
Configure the processor to generate requests in one of the following ways:
- Multiple requests per batch
- If you set the Grouping Style property to One Request per Record, the processor generates one HTTP request for each record in the batch and sends multiple requests at a time. To preserve record order, the processor waits until all requests for the entire batch are completed before processing the next batch.
- Single request per batch
- If you set the Grouping Style property to One Request per Batch, the processor generates a single HTTP request containing all records in the batch.
Event Generation
The Web Client processor can generate events that you can use in an event stream. When you enable event generation, the processor generates event records each time the processor completes processing all available data.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
Event Records
Event records generated by the Web Client processor include the following event-related record header attributes. Record header attributes are stored as String values:
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses one of the following types:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The processor can generate the following types of event records:
- finished
- The processor generates a finished event record when the processor finishes processing data from the endpoint.
- start
- The processor generates a start event record when the processor starts reading data from the instance.
Per-Status Actions
The Web Client processor accepts only responses that include a status code that has been configured to be read as successful by the stage. When the response includes any other status code, the processor generates an error and handles the record based on the error record handling configured for the stage.
You can configure the processor to perform one of several actions when it encounters an unsuccessful status code.
- Retry with constant backoff
- Retry with linear backoff
- Retry with exponential backoff
- Generate output record
- Generate error record
- Abort pipeline
When defining the retry with a constant, linear, or exponential backoff action, you also specify the backoff interval to wait in milliseconds. When defining any of the retry actions, you specify the maximum number of retries and a status failure response. If the stage receives a successful status code during a retry, then it processes the response. If the stage doesn't receive a successful status code after the maximum number of retries, then the stage performs the specified status failure action. You can only specify a status failure action for a retry action.
You can add multiple status codes and configure a specific action for each code.
Per-Timeout Actions
By default, the Web Client processor retries an operation five times before generating an error. You can configure the stage to use different timeout criteria and perform one of several actions when a specific type of timeout has reached its configured timeout limit.
- Retry with constant backoff
- Retry with linear backoff
- Retry with exponential backoff
- Generate output record
- Generate error record
- Abort pipeline
When defining the retry with a constant, linear, or exponential backoff action, you also specify the backoff interval to wait in milliseconds. When defining any of the retry actions, you specify the maximum number of retries and timeout failure action. If the stage receives a response during a retry, then it processes the response. If the stage doesn't receive a response after the maximum number of retries, then the stage performs the specified timeout failure action.
You can add multiple timeout types and specify timeout criteria and actions for each of them.
Pagination
The Web Client processor can use pagination to retrieve a large volume of data from a paginated API.
When configuring the Web Client processor to use pagination, use the pagination type supported by the API of the HTTP client. You will likely need to consult the documentation for the origin system API to determine the pagination type to use and the properties to set.
The Web Client processor supports the following common pagination types:
- Link in Header
- After processing the current page, the stage uses the link in the
HTTP header to access the next page. The link in the header can
be an absolute URL or a URL relative to the next page link base
URL configured for the stage. For example, let's say you
configure the following next page link base URL for the
stage:
https://myapp.com/api/objects?page=1
- Link in Body
- After processing the current page, the stage uses the link in a
field in the response body to access the next page. The link in
the response field can be an absolute URL or a URL relative to
the next page link base URL configured for the stage. For
example, let's say you configure the following next page link
base URL for the
stage:
http://myapp.com/api/tickets.json?start_time=138301982
- Page
- The stage begins processing with the specified initial page, and
then requests the following page. Use the
${startAt}
variable in the resource URL as the value of the page number to request. You can optionally set a final page or offset for the stage to stop reading data. - Offset
- The stage begins processing with the specified initial offset, and
then requests the following offset. Use the
${startAt}
variable in the resource URL as the value of the offset number to request.
Page or Offset Number
When using page or offset pagination, the API of the HTTP client typically requires that you include a page or offset parameter at the end of the response endpoint URL. The parameter determines the next page or offset of data to request.
The name of the parameter used by the API varies. For example, it
might be offset
, page
, start
, or
since
. Consult the documentation for the origin system API to
determine the name of the page or offset parameter.
The Web Client processor provides a ${startAt}
variable that you can use
in the URL as the value of the page or offset. For example, your resource URL might be
any of the following:
http://webservice/object?limit=15&offset=${startAt}
https://myapp.com/product?limit=5&since=${startAt}
https://myotherapp.com/api/v1/products?page=${startAt}
When the pipeline starts, the Web Client stage
uses the value of the Initial Page or Initial
Offset property as the ${startAt}
variable
value. After the stage reads a page of results, the stage increments the
${startAt}
variable by one if using page pagination, or by
the number of records read from the page if using offset pagination.
Example
https://myapp.com/product?limit=5&since=${startAt}
https://myapp.com/product?limit=5&since=0
${startAt}
variable
by 5, such that the next response endpoint is resolved
to:https://myapp.com/product?limit=5&since=5
The second page of results also includes 5 items, starting at the 5th item.
OAuth 2 Authentication
The Web Client processor can use the OAuth 2 protocol to connect to an HTTP service that uses basic or digest authentication, OAuth 2 client credentials, OAuth 2 username and password, or OAuth 2 access token.
The OAuth 2 protocol authorizes third-party access to HTTP service resources without sharing credentials. The Web Client processor uses credentials to request an access token from the service. The service returns the token to the processor, and then the processor includes the token in a header in each request to the request endpoint.
- Client credentials grant
-
The stage sends its own credentials - the client ID and client secret or the basic, or digest authentication credentials - to the HTTP service. For example, use the client credentials grant to process data from the Twitter API or from the Microsoft Azure Active Directory (Azure AD) API.
For more information about the client credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.4.
- Access token grant
-
The stage sends an access token to an authorization service and obtains an access token for the HTTP service.
- Owner credentials grant
-
The stage sends the credentials for the resource owner - the resource owner user name, password, client ID, and client secret - to the HTTP service. Or, you can use this grant type to migrate existing clients using basic or digest authentication to OAuth 2 by converting the stored credentials to an access token.
For example, you can use this grant to process data from the Getty Images API. For more information about using OAuth 2 to connect to the Getty Images API, see http://developers.gettyimages.com/api/docs/v3/oauth2.html.
For more information about the resource owner password credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.3.
Generated Output
For each request that returns a successful status code, the Web Client processor writes the response to the specified output field. The processor parses data in the response body into values based on the selected data format. You configure how the processor writes multiple values. The processor can write either the first value to a single record, all values to a list in a single record, or all values to separate records.
For HEAD responses, the response body contains no data. Therefore, the processor writes output only to record header attributes, leaving the output field empty.
Data Formats
The Web Client processor parses each server response based on the selected data format and writes the response to the specified output field in the selected format.
You configure how the processor writes parsed responses that contain multiple values. The processor can write either the first value to a single record, all values to a list in a single record, or all values to separate records.
Available data formats include:
- Avro
- Generates a record for every message. Includes a
precision
andscale
field attribute for each Decimal field. - Binary
- Generates a record with a single byte array field at the root of the record.
- Datagram
- Generates a record for every message. The processor can process collectd messages, NetFlow 5 and NetFlow 9 messages, and the following types of syslog messages:
- Delimited
- The processor parses each line in the response as a value, and either writes only the first delimited line to a single record, writes all delimited lines to a single record with each line written to a list item, or writes each delimited line to separate records.
- JSON
- The processor parses each object in the response into a value, and either writes only the first object to a single record, writes all objects to a list in a single record, or writes each object to separate records.
- Log
- Generates a record for every log line.
- Protobuf
- Generates a record for every protobuf message. By default, the assumes messages contain multiple protobuf messages.
- Text
- If you specify a custom delimiter, the processor parses the data into values based on the delimiter. Otherwise, the processor parses each line into a value. Then, the processor either writes only the first value to a single record, writes all values to a list in a single record, or writes each value to separate records.
- XML
- If you specify a delimiter element, the processor uses the delimiter element to parse the response into values. The processor either writes only the first delimited element to a single record, writes all delimited elements to a list in a single record, or writes each delimited element to separate records.
Configuring a Web Client Processor
Configure a Web Client processor to perform requests against a resource endpoint.