Data in motion
Data passes through the flow in batches. This is how it works at a Data Collector level:
The source creates a batch as it reads data from the source system or as data arrives from the source system, noting the offset. The offset is the location where the source stops reading.
The source sends the batch when the batch is full or when the batch wait time limit elapses. The batch moves through the flow from processor to processor until it reaches flow targets.
Targets write the batch to target systems, and Data Collector commits the offset internally after receiving confirmation of the write from all target systems. After the offset commit, the source stage creates a new batch.
Note that this describes general flow behavior. Behavior can differ based on the specific flow configuration. For example, for the Kafka Multitopic Consumer, the offset is stored in Kafka. And for source systems that do not store data, such as HTTP Client, offsets are not stored because they aren't relevant.
Single and multithreaded flows
In a standard single-threaded flow, the source creates a batch and passes it through the flow, creating a new batch only after processing the previous batch.
Some sources can generate multiple threads to enable parallel processing in multithreaded flows. In a multithreaded flow, you configure the source to create the number of threads or amount of concurrency that you want to use. And Data Collector creates a number of flow runners based on the flow Max Runners property to perform flow processing. Each thread connects to the source system, creates a batch of data, and passes the batch to an available flow runner.
Each flow runner processes one batch at a time, just like a flow that runs on a single thread. When the flow of data slows, the flow runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time flow property to specify the interval or to opt out of empty batch generation.
All general references to flows in this guide describe single-threaded flows, but this information generally applies to multithreaded flows. For more details specific to multithreaded flows, see Multithreaded flow overview.
Delivery guarantee
Data Collector flows process all expected data.
If a failure causes a flow to stop while processing a batch of data, when the flow restarts, it reprocesses the batch.
Data Collector commits the offset after receiving write confirmation from all target systems. If a failure occurs after Data Collector passes data to target systems but before receiving confirmation and committing the offset, up to one batch data might be duplicated in target systems.
Data Collector data types
| Data Type | Description |
|---|---|
| BOOLEAN | True or False. Corresponds to the Boolean Java data type. |
| BYTE | 8-bit signed whole number. Ranges from -128 to 127. Corresponds to the Byte Java data type. |
| BYTE_ARRAY | Array of byte values. Corresponds to the Byte[] Java data type. |
| CHAR | Stores a single Unicode character. Corresponds to the Char Java data type. |
| DATE | Time object that includes the day, month, and four-digit year. Does not include time zone. |
| DATETIME | Time object that includes the day, month, four-digit year, hours, minutes, and seconds. Precise to the millisecond. Does not include time zone. |
| DECIMAL | Arbitrary-precision signed decimal numbers. Corresponds to the BigDecimal Java class. |
| DOUBLE | 64-bit double-precision IEEE 754 floating point. Corresponds to the Double Java data type. |
| FILE_REF | Internal type for whole file format. |
| FLOAT | 32-bit single-precision IEEE 754 floating point. Corresponds to the Float Java data type. |
| INTEGER | 32-bit signed whole number. Corresponds to the Int Java data type. |
| LIST | Nested type. Contains an enumerated list of values. Values can be any type, and each value in a list can be a different type. |
| LIST_MAP | Nested type. Contains a list of key-value pairs kept in a specific order. Keys are always strings. Values can be any type, and each value can be a different type. |
| LONG | 64-bit signed whole number. Ranges from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Corresponds to the Long Java data type. |
| MAP | Nested type. Contains a list of key-value pairs. Keys are always strings. Values can be any type, and each value can be a different type. |
| SHORT | 16-bit signed whole number. Ranges from -32,768 to 32767. Corresponds to the Short Java data type. |
| STRING | Text. Corresponds to the String Java class. |
| TIME | Time object that includes the hours, minutes, and seconds. Precise to the millisecond. Does not include the time zone. |
| ZONED_DATETIME | Time object that includes day, month, four-digit year, hours, minutes, seconds, and time zone. Precise to the nanosecond. Corresponds to the ZonedDateTime Java class. |