Web Client
The Web Client source reads data from an HTTP endpoint. For information about supported versions, see Supported systems and versions.
Data Collector provides several HTTP sources to address different needs. For a quick comparison chart to help you choose the right one, see Comparing HTTP sources.
When you configure the Web Client source, you define the request endpoint, optional headers, and method to use for the requests. You can also use a connection to configure the source.
You configure the source to generate one request for each record or to generate a single request containing all records in the batch.
You can configure the actions to take based on the response status and configure pagination properties to enable processing large volumes of data from paginated APIs.
You can configure the timeout, request transfer encoding, and authentication type for both requests and responses.
You can optionally use a proxy server and configure TLS properties. You can also configure the source to use the OAuth 2 protocol to connect to an HTTP service.
Ingestion mode
The Web Client source can use one of the following processing modes to read source data:
- Streaming
-
The source maintains a connection and processes data as it becomes available. Use to process streaming data in real time.
- Polling
-
The source polls the server at the specified interval for available data. Use to access data periodically, such as metrics and events at a REST endpoint.Note: After the polling interval passes, the source continues processing from where it stopped. For example, say that you configured the source to use the polling mode with an interval of two hours and to use page number pagination. After the source reads 25 pages of results, the 26th page returns no results and so the source stops reading. After the two hour interval passes, the source polls the server again, reading the results starting with page 26.
- Batch
-
The source processes all available data and then stops the flow. Use to process data as needed.
HTTP method
-
GET
-
POST
-
PUT
-
PATCH
-
DELETE
-
HEAD
-
Expression - An expression that evaluates to one of the other methods.
Expression method
The Expression method allows you to write an expression that evaluates to a standard HTTP method. Use the Expression method to generate a workflow. For example, you can use an expression that passes data to the server using the PUT method based on the data in a field.
Headers
- Security Headers
- Common Headers
You can define headers in either property. However, only security headers support using credential functions to retrieve sensitive information from supported
If you define the same header in both properties, security headers take precedence.
Grouping style
The Web Client source can generate one HTTP request for each record, or it can generate a single request containing all records in the batch.
Configure the source to generate requests in one of the following ways:
- Multiple requests per batch
-
If you set the Grouping Style property to One Request per Record, the source generates one HTTP request for each record in the batch and sends multiple requests at a time. To preserve record order, the source waits until all requests for the entire batch are completed before processing the next batch.
- Single request per batch
-
If you set the Grouping Style property to One Request per Batch, the source generates a single HTTP request containing all records in the batch.
Event generation
The Web Client source can generate events that you can use in an event stream. With event generation enabled, the source generates event records each time the source completes processing all available data.
- With the Pipeline Finisher executor to stop the flow and transition the flow to a Finished state when the source completes processing available
data.
For an example, see Stopping a flow after processing all available data.
- With a target to store event information.
For an example, see Preserving an audit trail of events.
Event records
| Record Header Attribute | Description |
|---|---|
| sdc.event.type | Event type. Uses one of the following types:
|
| sdc.event.version | Integer that indicates the version of the event record type. |
| sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
- finished
- The source generates a finished event record when the source finishes reading data from the endpoint.
- no-more-data
- The source generates a no-more-data event record when the source completes processing all data returned by all queries.
- start
- The source generates a start event record when the source starts reading data from the endpoint.
Per-status actions
The Web Client source accepts only responses that include a status code that has been configured to be read as successful by the stage. When the response includes any other status code, the source generates an error and handles the record based on the error record handling configured for the stage.
You can configure the source to perform one of several actions when it encounters an unsuccessful status code.
- Retry with constant backoff
- Retry with linear backoff
- Retry with exponential backoff
- Generate output record
- Generate error record
- Abort flow
When defining the retry with a constant, linear, or exponential backoff action, you also specify the backoff interval to wait in milliseconds. When defining any of the retry actions, you specify the maximum number of retries and a status failure response. If the stage receives a successful status code during a retry, then it processes the response. If the stage doesn't receive a successful status code after the maximum number of retries, then the stage performs the specified status failure action. You can only specify a status failure action for a retry action.
You can add multiple status codes and configure a specific action for each code.
Per-timeout actions
By default, the Web Client source retries an operation five times before generating an error. You can configure the stage to use different timeout criteria and perform one of several actions when a specific type of timeout has reached its configured timeout limit.
- Retry with constant backoff
- Retry with linear backoff
- Retry with exponential backoff
- Generate output record
- Generate error record
- Abort flow
When defining the retry with a constant, linear, or exponential backoff action, you also specify the backoff interval to wait in milliseconds. When defining any of the retry actions, you specify the maximum number of retries and timeout failure action. If the stage receives a response during a retry, then it processes the response. If the stage doesn't receive a response after the maximum number of retries, then the stage performs the specified timeout failure action.
You can add multiple timeout types and specify timeout criteria and actions for each of them.
Pagination
The Web Client source can use pagination to retrieve a large volume of data from a paginated API.
When configuring the Web Client source to use pagination, use the pagination type supported by the API of the HTTP client. You will likely need to consult the documentation for the source system API to determine the pagination type to use and the properties to set.
The Web Client source supports the following common pagination types:
- Link in Header
- After processing the current page, the stage uses the link in the
HTTP header to access the next page. The link in the header can
be an absolute URL or a URL relative to the next page link base
URL configured for the stage. For example, let's say you
configure the following next page link base URL for the
stage:
https://myapp.com/api/objects?page=1 - Link in Body
- After processing the current page, the stage uses the link in a
field in the response body to access the next page. The link in
the response field can be an absolute URL or a URL relative to
the next page link base URL configured for the stage. For
example, let's say you configure the following next page link
base URL for the
stage:
http://myapp.com/api/tickets.json?start_time=138301982 - Page
- The stage begins processing with the specified initial page, and
then requests the following page. Use the
${startAt}variable in the resource URL as the value of the page number to request. You can optionally set a final page or offset for the stage to stop reading data. - Offset
- The stage begins processing with the specified initial offset, and
then requests the following offset. Use the
${startAt}variable in the resource URL as the value of the offset number to request.
Page or offset number
When using page or offset pagination, the API of the HTTP client typically requires that you include a page or offset parameter at the end of the response endpoint URL. The parameter determines the next page or offset of data to request.
The name of the parameter used by the API varies. For example, it
might be offset, page, start, or
since. Consult the documentation for the source system API to
determine the name of the page or offset parameter.
The Web Client source provides a ${startAt} variable that you can
use in the URL as the value of the page or offset. For example, your resource URL might
be any of the following:
http://webservice/object?limit=15&offset=${startAt}https://myapp.com/product?limit=5&since=${startAt}https://myotherapp.com/api/v1/products?page=${startAt}
When the flow starts, the Web Client stage
uses the value of the Initial Page or Initial
Offset property as the ${startAt} variable
value. After the stage reads a page of results, the stage increments the
${startAt} variable by one if using page pagination, or by
the number of records read from the page if using offset pagination.
Example
https://myapp.com/product?limit=5&since=${startAt}https://myapp.com/product?limit=5&since=0${startAt} variable
by 5, such that the next response endpoint is resolved
to:https://myapp.com/product?limit=5&since=5The second page of results also includes 5 items, starting at the 5th item.
OAuth 2 authentication
- OAuth 2 client credentials
- OAuth 2 access token
- OAuth 2 owner credentials
The OAuth 2 protocol authorizes third-party access to HTTP service resources without sharing credentials. The Web Client source uses credentials to request an access token from the service. The service returns the token to the source, and then the source includes the token in a header in each request to the request endpoint.
- Client credentials grant
- The stage sends its own credentials - such as the client ID and client secret, or the basic or digest authentication credentials - to the HTTP service. For example, use the client credentials grant to process data from the Twitter API or from the Microsoft Azure Active Directory (Azure AD) API.
- Access token grant
- The stage sends an access token to an authorization service and obtains an access token for the HTTP service. You can specify the details to use in the access token request, such as token headers and claims, and a signing algorithm and key. When using Data Collector 7.2 and later, if you specify a signing key, it must be Base64-encoded.
- Owner credentials grant
- The stage sends the credentials for the resource owner - such as the resource owner user name, password, client ID, and client secret - to the HTTP service. Or, you can use this grant type to migrate existing clients using basic or digest authentication to OAuth 2 by converting the stored credentials to an access token.
Generated records
The Web Client source generates records based on the responses it receives.
Data in the response body is parsed based on the selected data format. For HEAD responses, when the response body contains no data, the source creates an empty record. Information returned from the HEAD response appears in record header attributes. For all other methods, when the response body contains no data, and no records are created.
In generated records, all standard response header fields, such as Content-Encoding and Content-Type, are written to corresponding record header attributes. Custom response header fields are also written to record header attributes. Record header attribute names match the original response header names.
When you configure the source to generate records for unsuccessful statuses that are not added as per-status actions, then the record might also include a field that contains the error response body.
Data formats
The Web Client source processes data differently based on the data format that you select.
The Web Client source processes data formats as follows:
- Avro
- Generates a record for every message. Includes a
precisionandscalefield attribute for each Decimal field. - Binary
- Generates a record with a single byte array field at the root of the record.
- Datagram
- Generates a record for every message. The source can process collectd messages, NetFlow 5 and NetFlow 9 messages, and the following types of syslog messages:
- Delimited
- Generates a record for each delimited line.
- JSON
- Generates a record for each JSON object. You can process JSON files that include multiple JSON objects or a single JSON array.
- Log
- Generates a record for every log line.
- Protobuf
- Generates a record for every protobuf message. By default, the source assumes messages contain multiple protobuf messages.
- Text
- Generates a record for each line of text.
- XML
- Generates records based on a user-defined delimiter element. Use an XML element directly under the root element or define a simplified XPath expression. If you do not define a delimiter element, the source treats the XML file as a single record.
Configuring a Web Client source
About this task
Configure a Web Client source to read data from an HTTP endpoint.