UDP Multithreaded Source
The UDP Multithreaded Source source reads messages from one or more UDP ports. The source can create multiple worker threads to enable parallel processing in a multithreaded flow. For information about supported versions, see Supported systems and versions.
When processing NetFlow messages, the stage generates different records based on the NetFlow version. When processing NetFlow 9, the records are generated based on the NetFlow 9 configuration properties. For more information, see NetFlow data processing.
The source can also read binary or character-based raw data.
When you configure UDP Multithreaded Source, you specify the ports to use and the batch size and wait time. You specify the number of worker threads to use in multithreaded processing and you can specify the packet queue size. When epoll is available on the Data Collector machine, you can also specify the number of receiver threads to use to increase the throughput of packets to the flow.
You specify the data format for the data, then configure any related properties.
When a flow stops, the source notes where it stops reading. When the flow starts again, the source continues processing from where it stopped by default. You can reset the offset to process all requested data.
Processing raw data
Use the Raw/Separated Data data format to enable the UDP Multithreaded Source source to generate records from binary or character-based raw data.
When processing raw data, the source can generate a record for each UDP packet that it receives. Or, if you specify a separator character, then the source can generate multiple records from each UDP packet.
When generating multiple records, you specify the multiple value behavior: one record with only the first value, one record with all values as a list, or multiple records with one record for each value.
You can optionally specify an output field to use for the data. When not specified, the source writes the raw data to the root field.
You might use the Raw/Separated Data data format to write raw data to a field that you later process using the Data Parser processor. This allows you to retain the raw data for another use.
Receiver and worker threads
- Receiver threads
- Used to pass data from the operating system socket to the source's packet queue.
By default, the source uses a single receiver thread.
You can configure the source to use multiple receiver threads when Data Collector runs on a machine enabled for epoll. Epoll requires native libraries and is only available when Data Collector runs on recent versions of 64-bit Linux.
When you enable multiple receiver threads, you increase the rate that data can be passed to the source, but at the cost of a standard increase of overhead for thread management.
To use additional receiver threads, select the Use Native Transports (epoll) property, and then configure Number of Receiver Threads.
- Worker threads
- Used to perform multithreaded flow processing. By default, the source uses a single thread for flow processing. You can increase the number of threads to use to perform parallel processing of larger volumes of data. For more information, see Multithreaded flows.
Packet queue
The UDP Multithreaded Source source uses a packet queue to hold incoming data in memory until the data can be incorporated in a batch and passed through the flow. When the packet queue is full, incoming packets are dropped. The number of packets that are dropped is noted in stage metrics.
When you configure the source, you can specify the maximum number of packets to allow in the queue. The default is 200,000. Because the packet queue uses Data Collector heap memory, when increasing the size of the queue, you should consider increasing the as well.
Multithreaded flows
The UDP Multithreaded Source source performs parallel processing and enables the creation of a multithreaded flow.
When you enable multithreaded processing, the UDP Multithreaded Source source uses multiple concurrent threads for flow processing based on the Number of Worker Threads property. When you start the flow, the source creates the number of threads specified in the property.
As packets arrive from the specified UDP ports, they enter the packet queue. There is a single instance of the packet queue per flow. All receiver threads (which can be more than one, when using epoll) place packets onto the queue. At the same time, each worker thread removes packets from the queue, parses them according to the specified data format, and processes the rest of the flow using a flow runner.
A flow runner is a sourceless flow instance - an instance of the flow that includes all of the processors, executors, and targets in the flow and handles all flow processing after the source. Each flow runner processes one batch at a time, just like a flow that runs on a single thread. When the flow of data slows, the flow runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time flow property to specify the interval or to opt out of empty batch generation.
Multithreaded flows preserve the order of records within each batch, just like a single-threaded flow. But since batches are processed by different flow runners, the order that batches are written to targets is not ensured.
For example, say you enable multithreaded processing and set the Number of Worker Threads property to 5. When you start the flow, the source creates five threads, and Data Collector creates a matching number of flow runners. The source adds incoming data to the packet queue, creates batches of data from the queue and then passes the batches to the flow runners for processing.
Each flow runner performs the processing associated with the rest of the flow. After a batch is written to flow targets, the flow runner becomes available for another batch of data. Each batch is processed and written as quickly as possible, independent from other batches processed by other flow runners, so batches may be written differently from the read order.
At any given moment, the five flow runners can each process a batch, so this multithreaded flow processes up to five batches at a time. When incoming data slows, the flow runners sit idle, available for use as soon as the data flow increases.
For more information about multithreaded flows, see Multithreaded flow overview.
Metrics for performance tuning
The UDP Multithreaded Source source provides packet queue metrics that you can use to tune flow performance.
- Dropped Packets - The number of packets that were dropped because the packet queue was full.
- Queue Size - The current size of the packet queue.
- Queued Packets - The total number of packets that have passed through the packet queue for processing.
These metrics can help you determine how to improve flow performance. For example, if you have a high volume of dropped packets and the queue size seems to be maxed out as you monitor the flow, you might increase the number of worker threads for the flow to allow for greater throughput. Or, if you have relatively high bursts of data volume and find packets getting dropped during those bursts, consider increasing the packet queue size to better accommodate them.
If the queue size is not maxed out, but the number of queued packets does not seem as high as you expect, you might be dropping packets on the operating system side. When epoll is available - that is, when Data Collector runs on recent versions of 64-bit Linux - increasing the number of receiver threads can increase the volume of packets that are passed to the source.
Configuring a UDP Multithreaded Source
About this task
Configure a UDP Multithreaded Source source to use multiple worker threads to process messages from one or more UDP ports.