Couchbase
The Couchbase source reads JSON documents from Couchbase Server and generates a record for each document in the bucket. Couchbase Server is a distributed NoSQL document-oriented database. The Couchbase source can process objects in parallel with multiple threads. For information about supported versions, see Supported systems and versions.
When you configure the Couchbase source, you enter connection information, such as the nodes and bucket to connect to, as well as timeout properties for the connection. Optionally, you can enable TLS for the connection. You also enter information to authenticate with Couchbase Server.
When a flow stops, the Couchbase source notes where it stops reading. When the flow starts again, the source continues processing from where it stopped by default. You can reset the offset to process all requested data.
The source can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow triggers overview.
Prerequisites
Connecting to a Couchbase Server bucket requires that the bucket have a primary index, which makes the bucket queryable.
Before you configure the Couchbase source to connect to a bucket, you must create a primary index for that bucket. For information on creating a primary index, see the Couchbase documentation.
Offset
When adding new documents to a bucket read by the Couchbase source, be sure to add them to the end of the configured read order.
The source uses Couchbase offset capabilities, which uses position in a bucket to determine where to start using data within the bucket. Adding new documents before the offset can result in those documents not being read by the Couchbase source and other documents being read again.
Event generation
The Couchbase source can generate events that you can use in an event stream. When you enable event generation, the source generates an event when it completes processing the data returned by the specified query.
Couchbase events can be used in any logical way. For example:
- With the Pipeline Finisher executor to
stop the flow and transition the flow to a Finished state when the source completes processing available data.
For an example, see Stopping a flow after processing all available data.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending email during flow processing.
-
With a target to store information about completed queries.
For an example, see Preserving an audit trail of events.
For more information about dataflow triggers and the event framework, see Dataflow triggers overview.
Event record
| Record Header Attribute | Description |
|---|---|
| sdc.event.type | Event type. Uses the following type:
|
| sdc.event.version | Integer that indicates the version of the event record type. |
| sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The Couchbase source can generate the following event records:
- no-more-data
- The Couchbase source generates a no-more-data event record when the source completes processing all data returned by the queries for all buckets.
- no-more-bucket-data
- The Couchbase source generates a no-more-bucket-data event record when the
source completes processing all data returned by the queries for a single
bucket.
The no-more-bucket-data event record generated by the source has the
sdc.event.typerecord header attribute set tono-more-bucket-dataand does not include any additional fields.
Multithreaded processing
The Couchbase source uses multiple concurrent threads to process data based on the Number of Threads property.
As the flow runs, each thread connects to the source system, creates a batch of data, and passes the batch to an available flow runner. A flow runner is a sourceless flow instance - an instance of the flow that includes all of the processors, executors, and targets in the flow and handles all flow processing after the source.
Each flow runner processes one batch at a time, just like a flow that runs on a single thread. When the flow of data slows, the flow runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time flow property to specify the interval or to opt out of empty batch generation.
Multithreaded flows preserve the order of records within each batch, just like a single-threaded flow. But since batches are processed by different flow runners, the order that batches are written to targets is not ensured.
For more information about multithreaded flows, see Multithreaded flow overview.
Configuring a Couchbase source
About this task
Configure a Couchbase source to read data from Couchbase Server.