HTTP Client
The HTTP Client processor sends requests to an HTTP resource URL and writes responses to records. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The newer Web Client processor provides much of the same functionality as the HTTP Client processor. It also provides functionality not available in the HTTP Client processor. For more information, see Comparing Web Client and HTTP Client Processors.
For each request, the processor writes data from the response to the specified output field. When the response contains multiple values, the processor can write either the first value, all values to a list in a single record, or all values to separate records.
You can use the HTTP Client processor to perform a range of standard requests or you can use an expression to determine the request for each record.
When you configure the HTTP Client, you define the resource URL, optional headers, and method to use. For some methods, you can specify the request body and default content type.
You can configure the actions to take based on the response status and configure pagination properties to enable processing large volumes of data from paginated APIs.
You can configure the processor to include response header fields in the record as a set of record header attributes or as a map in a record field. You can configure the processor to log request and response information. And you can write the resolved request URL to the Data Collector log.
You can also configure the timeout, request transfer encoding, and authentication type. You can optionally use an HTTP proxy and configure SSL/TLS properties.
You can also configure the processor to use the OAuth 2 protocol to connect to an HTTP service.
HTTP Method
- GET
- PUT
- POST
- DELETE
- HEAD
- PATCH
- Expression - An expression that evaluates to one of the other methods.
Expression Method
The Expression method allows you to write an expression that evaluates to a standard HTTP method. Use the Expression method to generate a workflow. For example, you can use an expression that performs a lookup (GET) or passes data to the server (PUT) based on the data in a field.
Headers
- Headers
- Additional Security Headers
You can define headers in either property. However, only additional security headers support using credential functions to retrieve sensitive information from supported credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
If you define the same header in both properties, additional security headers take precedence.
Per-Status Actions
By default, the HTTP Client processor accepts only responses that include a 2xx success status code. When the response includes any other status code, such as a 4xx or 5xx status code, by default, the processor generates an error and handles the record based on the error record handling configured for the stage.
You can configure the processor to perform one of several actions when it encounters an unsuccessful status code, that is, any non-2xx status code.
- Retry with linear backoff
- Retry with exponential backoff
- Retry immediately
- Cause the stage to fail and stop the pipeline
- Generate errors
When defining the retry with linear or exponential backoff action, you also specify the backoff interval to wait in milliseconds. When defining any of the retry actions, you specify the maximum number of retries. If the stage receives a 2xx status code during a retry, then it processes the response. If the stage doesn't receive a 2xx status code after the maximum number of retries, then the stage enters its error handling routine.
You can add multiple status codes and configure a specific action for each code.
You can also configure the stage to generate records for all unsuccessful statuses that are not added to the Per-Status Actions list. You then specify the output field name that stores the error response body for those records.
For example, if the stage receives a 400 Bad Request code,
you want the pipeline to process the response body that contains the description of the
error. When configuring the stage, you do not add an action for the 400 status code
because you don't need the stage to retry the request. You select the Records for
Remaining Statuses property and then use the default value outErrorBody
as the name of the error response body field.
Pass Records after Retry or Timeout Failures
You can configure the HTTP Client processor to pass input records downstream when you configure per-status actions or timeout options.
For example, you might configure the processor to pass a record downstream after all timeout retries fail. Or, you might configure a per-status action to pass a record downstream after all retries fail after receiving a 503 Service Unavailable code.
- When you configure the Action for Status property or the Action for Timeout
property to one of the retry options.
When you enable the Pass Records property, the processor passes the input record downstream after it reaches the configured maximum number of retries.
- When you configure the Action for Status property or the Action for Timeout
property to Generate Errors.
When you enable the Pass Records property, the processor generates a stage error instead of an error record, and passes the input record downstream.
Record Header Attributes for Passed Records
When you pass an input record downstream using one of the Pass Records properties, the processor adds the following record header attributes to the record:
Record Header Attribute | Description |
---|---|
httpClientError | Error message generated by the processor, such
as:
|
httpClientStatus | Error status that triggers the record to be passed. This is the first HTTP status code that exceeds the defined number of retries. |
httpClientLastAction | Per-status action or timeout action that the processor is
configured to perform:
|
httpClientTimeoutType | Timeout type causing the record to be passed:
|
httpClientRetries | Number of retries that were attempted for the error status in httpClientStatus. |
Pagination
The HTTP Client processor can use pagination to retrieve a large volume of data from a paginated API.
When configuring the HTTP Client processor to use pagination, use the pagination type supported by the API of the HTTP client. You will likely need to consult the documentation for the origin system API to determine the pagination type to use and the properties to set.
The HTTP Client processor supports the following common pagination types:
- Link in HTTP Header
- After processing the current page, uses the link in the HTTP header to access
the next page. The link in the header can be an absolute URL or a URL relative
to the resource URL configured for the stage. For example, let's say you
configure the following resource URL for the
stage:
https://myapp.com/api/objects?page=1
- Link in Response Field
- After processing the current page, uses the link in a field in the response body
to access the next page. The link in the response field can be an absolute URL
or a URL relative to the resource URL configured for the stage. For example,
let's say you configure the following resource URL for the
stage:
http://myapp.com/api/tickets.json?start_time=138301982
- By Page Number
- Begins processing with the specified initial page, and then requests the
following page. Use the
${startAt}
variable in the resource URL as the value of the page number to request. - By Offset Number
- Begins processing with the specified initial offset, and then requests the
following offset. Use the
${startAt}
variable in the resource URL as the value of the offset number to request.
For the link in response field pagination type, you must define a stop condition that determines when there are no more pages to process. For all other pagination types, the stage stops reading when it returns a page that does not contain any more records.
When you use any pagination type, you must specify a result field path and can choose whether to include all other fields in the record.
Page or Offset Number
When using page number or offset number pagination, the API of the HTTP client typically requires that you include a page or offset parameter at the end of the resource URL. The parameter determines the next page or offset of data to request.
The name of the parameter used by the API varies. For example, it
might be offset
, page
, start
, or
since
. Consult the documentation for the origin system API to
determine the name of the page or offset parameter.
The HTTP Client processor provides a ${startAt}
variable that you can
use in the URL as the value of the page or offset. For example, your resource URL might
be any of the following:
http://webservice/object?limit=15&offset=${startAt}
https://myapp.com/product?limit=5&since=${startAt}
https://myotherapp.com/api/v1/products?page=${startAt}
When the pipeline starts, the HTTP Client stage uses the value
of the Initial Page/Offset property as the
${startAt}
variable value. After the stage reads a page of results,
the stage increments the ${startAt}
variable by one if using page
number pagination or by the number of records read from the page if using offset number
pagination.
https://myapp.com/product?limit=5&since=${startAt}
https://myapp.com/product?limit=5&since=0
${startAt}
variable by 5, such that the next resource URL is
resolved to:https://myapp.com/product?limit=5&since=5
The second page of results also includes 5 items, starting at the 5th item.
Result Field Path
When using any pagination type, you must specify the result field path. The result field path is the location in the response that contains the data that you want to process.
The result field path must be a list or array.
{
"count":"1023",
"startAt":"2",
"maxResults":"2",
"total":"6",
"results":[
{
"firstName":"Joe",
"lastName":"Smith",
"phone":"555-555-5555"
},
{
"firstName":"Jimmy",
"lastName":"Smott",
"phone":"333-333-3333"
},
{
"firstName":"Joanne",
"lastName":"Smythe",
"phone":"777-777-7777"
}
]
}
The processor creates records from the result field path based on how you configure the Multiple Values Behavior property on the HTTP tab:
- First value only
- When configured to write the first value only, the processor creates the
following single record from this sample
data:
{ "firstName":"Joe", "lastName":"Smith", "phone":"555-555-5555" }
- All values as a list
- When configured to write all values to a list in a single record, the
processor creates the following single record from this sample
data:
[ { "firstName":"Joe", "lastName":"Smith", "phone":"555-555-5555" }, { "firstName":"Jimmy", "lastName":"Smott", "phone":"333-333-3333" }, { "firstName":"Joanne", "lastName":"Smythe", "phone":"777-777-7777" } ]
- Split into multiple records
- When configured to write all values, each to a separate record, the processor creates three records from this sample data:
Keep All Fields
When using any pagination type, you can configure the processor to keep all fields in addition to those in the specified result field path. The resulting record includes all fields in the original structure and the result field path that includes the data.
By default, the processor returns only the data within the specified result field path.
For example, say we use the same sample data as above, with /results for the result field path. And we configure the processor to keep all fields. The processor creates records from the result field path based on how you configure the Multiple Values Behavior property on the HTTP tab:
- First value only
- When configured to write the first value only, the processor creates the
following single record that keeps the existing record structure and the
first set of data in the /results
field:
{ "count":"1023", "startAt":"2", "maxResults":"2", "total":"6", "results":{ "firstName":"Joe", "lastName":"Smith", "phone":"555-555-5555" } }
- All values as a list
- When configured to write all values to a list in a single record, the
processor creates the following single record that keeps the existing record
structure with each set of data in the /results field:
[ { "count":"1023", "startAt":"2", "maxResults":"2", "total":"6", "results":{ "firstName":"Joe", "lastName":"Smith", "phone":"555-555-5555" } }, { "count":"1023", "startAt":"2", "maxResults":"2", "total":"6", "results":{ "firstName":"Jimmy", "lastName":"Smott", "phone":"333-333-3333" } }, { "count":"1023", "startAt":"2", "maxResults":"2", "total":"6", "results":{ "firstName":"Joanne", "lastName":"Smythe", "phone":"777-777-7777" } } ]
- Split into multiple records
- When configured to write all values, each to a separate record, the processor generates three records that keep the existing record structure, and includes one set of data in the /results field:
Pagination Examples
Let's look at some examples of how you might configure the supported pagination types.
Example for Link in HTTP Header
link:<https://myapp.com/api/objects?page=2>; rel="next",
<https://myapp.com/api/objects?page=9>; rel="last"
So after the HTTP Client processor reads the first page of results, it can use the next link in the HTTP header to read the next page.
https://myapp.com/api/objects?page=1
Then, you set the Multiple Values Behavior property to write all values to a list in a single record or all values to separate records.
{
"total":"2000",
"limit":"10",
"results":[
{
"firstName":"Joe",
"lastName":"Smith"
},
...
{
"firstName":"Joanne",
"lastName":"Smythe"
}
]
}
On the Pagination tab of the
stage, you simply set Pagination Mode to link in HTTP header, and
then you set the result field path to the /results
field:

Example for Link in Response Field
The API of the HTTP client uses a field in the response body to access the next page. It requires that you include a timestamp in the resource URL indicating which items you want to start reading.
http://myapp.com/api/tickets.json?start_time=138301982
Then, you set the Multiple Values Behavior property to write all values to a list in a single record or all values to separate records.
{
"ticket_events":[
{
"ticket_id":27,
"timestamp":138561439,
"via":"Email"
},
...
{
"ticket_id":30,
"timestamp":138561445,
"via":"Phone"
}
]
"next_page":"http://myapp.com/api/tickets.json?start_time=1389078385",
"count":1000,
"end_time":1389078385
}
On the Pagination tab of
the stage, you set Pagination Mode to link in response field, and
set the next page link field to the /next_page
field.
${record:value('/count') < 1000}
Then you set the result field path to the
/ticket_events
field:
Example for Page Number
The API of the HTTP client uses page number pagination. It requires that you include a page parameter in the URL that specifies the page number to return from the results.
${startAt}
variable:https://myotherapp.com/api/v1/products?page=${startAt}
Then, you set the Multiple Values Behavior property to write all values to a list in a single record or all values to separate records.
{
"total":"2000",
"items":[
{
"item":"pencil",
"cost":"2.00"
},
...
{
"item":"eraser",
"cost":"1.10"
}
]
}
On the Pagination tab of the
stage, you set Pagination Mode to by page number. You want to
begin processing from the first page in the results, so you set the initial page to 0.
Then you set the result field path to the /items
field:
Example for Offset Number
limit
- Specifies the number of results per page.offset
- Specifies the offset value.
${startAt}
variable:https://myapp.com/product?limit=10&offset=${startAt}
Then, you set the Multiple Values Behavior property to write all values to a list in a single record or all values to separate records.
{
"total":"2000",
"limit":"10",
"results":[
{
"firstName":"Joe",
"lastName":"Smith"
},
...
{
"firstName":"Joanne",
"lastName":"Smythe"
}
]
}
On the Pagination tab of the
stage, you set Pagination Mode to by offset number. You want to
begin processing from the first item in the results, so you set the initial offset to 0.
Then you set the result field path to the /results
field:
OAuth 2 Authorization
You can configure the HTTP Client processor to use the OAuth 2 protocol to connect to an HTTP service that uses basic, digest, or universal authentication, OAuth 2 client credentials, OAuth 2 username and password, or OAuth 2 JSON Web Tokens (JWT).
The OAuth 2 protocol authorizes third-party access to HTTP service resources without sharing credentials. The HTTP Client processor uses credentials to request an access token from the service. The service returns the token to the processor, and then the processor includes the token in a header in each request to the resource URL.
- Client credentials grant
-
HTTP Client sends its own credentials - the client ID and client secret or the basic, digest, or universal authentication credentials - to the HTTP service. For example, use the client credentials grant to process data from the Twitter API or from the Microsoft Azure Active Directory (Azure AD) API.
For more information about the client credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.4.
- Resource owner password credentials grant
-
HTTP Client sends the credentials for the resource owner - the resource owner username and password - to the HTTP service. Or, you can use this grant type to migrate existing clients using basic, digest, or universal authentication to OAuth 2 by converting the stored credentials to an access token.
For example, use this grant to process data from the Getty Images API. For more information about using OAuth 2 to connect to the Getty Images API, see http://developers.gettyimages.com/api/docs/v3/oauth2.html.
For more information about the resource owner password credentials grant, see https://tools.ietf.org/html/rfc6749#section-4.3.
- JSON Web Tokens
-
HTTP Client sends a JSON Web Token (JWT) to an authorization service and obtains an access token for the HTTP service. For example, use JSON Web Tokens to process data with the Google API.
Let’s look at some examples of how to configure authentication and OAuth 2 authorization to process data from Twitter, Microsoft Azure AD, and Google APIs.
Example for Twitter
To use OAuth 2 authorization to read from Twitter, configure HTTP Client to use basic authentication and the client credentials grant.
For more information about configuring OAuth 2 authorization for Twitter, see https://developer.twitter.com/en/docs/authentication/oauth-2-0/application-only.
Example for Microsoft Azure AD
To use OAuth 2 authorization to read from Microsoft Azure AD, configure HTTP Client to use no authentication and the client credentials grant.
For more information about configuring OAuth 2 authorization for Microsoft Azure AD, see https://docs.microsoft.com/en-us/azure/active-directory/develop/active-directory-protocols-oauth-code.
Example for Google
Configure the HTTP Client processor to use OAuth 2 authorization to read from Google service accounts. The stage sends a JSON Web Token in a request to the Google Authorization Server and obtains an access token for calls to the Google API.
Before you configure the stage, create a service account and delegate domain-wide authority to the service account. For details, see the Google Identity documentation: Using OAuth 2.0 for Server to Server Applications.
For more information about Google service accounts, see the Google Cloud documentation: Understanding service accounts.
For more information about configuring OAuth 2 authorization for Google, see the Google Identity documentation: Using OAuth 2.0 to Access Google APIs.
Generated Output
For each request that returns a 2xx success status code, the HTTP Client processor writes the response to the specified output field. The processor parses data in the response body into values based on the selected data format. You configure how the processor writes multiple values. The processor can write either the first value to a single record, all values to a list in a single record, or all values to separate records.
When you configure the processor to generate records for unsuccessful statuses that are not added to the Per-Status Actions list, then the HTTP Client processor might also write an error response body to the specified error response body field.
For HEAD responses, the response body contains no data. Therefore, the processor writes output only to record header attributes, leaving the output field empty.
Response Headers
- Record header attributes
- The processor writes data in response headers to corresponding record header attributes.
- Record field
- The processor can also write the response headers to a field in the record. The processor writes the response headers to the record field as a map of key-value pairs where the key is the response header name.
Logging Request and Response Data
The HTTP Client processor can log request and response data to the Data Collector log.
When enabling logging, you configure the following properties:
- Verbosity
-
The type of data to include in logged messages:
- Headers_Only - Includes request and response headers.
- Payload_Text - Includes request and response headers as well as any text payloads.
- Payload_Any - Includes request and response headers and the payload, regardless of type.
- Log Level
- The level of messages to include in the Data Collector log. When you select a level, higher level messages are also logged. That is, if you select the Warning log level, then Severe and Warning messages are written to the Data Collector log.
- Max entity size
-
The maximum size of message data to write to the log. Use to limit the volume of data written to the Data Collector log for any single message.
Logging the Resolved Resource URL
You can write the resolved resource URL to the Data Collector log.
The resolved resource URL is the URL that is defined in the Resource URL property after resolving any expressions included in the URL.
https://api.twitter.com/1.1/search/tweets.json?q=${record:value('/text')}
This
allows building the URL based on information in the /text field of each record. So, when
a record contains %23DataOps
in the /text field, then the resolved URL
is:https://api.twitter.com/1.1/search/tweets.json?q=%23DataOps
To write the resolved resource URL to the Data Collector log, set the Data Collector log level to DEBUG or higher. You do not need to use the Enable Request Logging property in the processor to log the resolved resource URL.
Data Formats
The HTTP Client processor parses each server response based on the selected data format and writes the response to the specified output field in the selected format.
You configure how the processor writes parsed responses that contain multiple values. The processor can write either the first value to a single record, all values to a list in a single record, or all values to separate records.
- Delimited
- The processor parses each line in the response as a value, and either writes only the first delimited line to a single record, writes all delimited lines to a single record with each line written to a list item, or writes each delimited line to separate records.
- JSON
- The processor parses each object in the response into a value, and either writes only the first object to a single record, writes all objects to a list in a single record, or writes each object to separate records.
- Text
- If you specify a custom delimiter, the processor parses the data into values based on the delimiter. Otherwise, the processor parses each line into a value. Then, the processor either writes only the first value to a single record, writes all values to a list in a single record, or writes each value to separate records.
- XML
- If you specify a delimiter element, the processor uses the delimiter element to parse the response into values. The processor either writes only the first delimited element to a single record, writes all delimited elements to a list in a single record, or writes each delimited element to separate records.
Configuring an HTTP Client Processor
Configure an HTTP Client processor to perform requests against a resource URL.