Azure Event Hubs
The Azure Event Hubs origin reads data from a single event hub in Microsoft Azure Event Hubs.
When you configure the origin, you specify the event hub to use and connection information for the event hub. You define a consumer group name and select the default offset to use: latest or earliest. When needed, you can define specific offsets for individual partitions. You can also configure the maximum batch size to use.
Before you use the Azure Event Hubs origin, complete the prerequisite tasks.
Prerequisites
Complete the following prerequisites, as needed, before you configure the Azure Event Hubs origin.
- Authorize access
to the event hub using shared access signatures.
The Azure Event Hubs origin requires read access to the event hub. For information about assigning access to Azure Event Hubs resources, see the Azure documentation.
The origin does not support access through Active Directory at this time.
- Retrieve the Azure
Event Hubs connection string.When you configure the Azure Event Hubs origin, you must provide the namespace, shared access policy, and shared access key. These details are included on the Azure Event Hubs connection string, as follows:
Endpoint=sb://<namespace>.servicebus.windows.net/;SharedAccessKeyName=<shared access policy>;SharedAccessKey=<shared access key>
For information about retrieving the connection string, see the Azure documentation.
Default and Specific Offsets
The Azure Event Hubs origin can start reading from the earliest record in partitions or from a specified offset for partitions. It can also read only new data that arrives as the pipeline runs.
When you configure the origin, you can optionally define specific offsets to use for individual partitions. When you define a specific offset, you specify the partition ID and the sequence number where you want to start the read. Both partition IDs and sequence numbers typically start with 0.
- Earliest - The origin reads all available data in the partition before processing incoming data.
- Latest - The origin reads only the data that arrives after you start the pipeline.
For example, say you have four partitions with IDs 0-3. You want to read the first partition from the 100th record in the partition. For all other partitions, you want only events that arrive after you start the pipeline. The following configuration achieves this behavior: