Couchbase
The Couchbase origin reads JSON documents from Couchbase Server and generates a record for each document in the bucket. Couchbase Server is a distributed NoSQL document-oriented database. The Couchbase origin can process objects in parallel with multiple threads. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
When you configure the Couchbase origin, you enter connection information, such as the nodes and bucket to connect to, as well as timeout properties for the connection. Optionally, you can enable TLS for the connection. You also enter information to authenticate with Couchbase Server. You can also use a connection to configure the origin.
When a pipeline stops, the Couchbase origin notes where it stops reading. When the pipeline starts again, the origin continues processing from where it stopped by default. You can reset the origin to process all requested data.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Prerequisites
Connecting to a Couchbase Server bucket requires that the bucket have a primary index, which makes the bucket queryable.
Before you configure the Couchbase origin to connect to a bucket, you must create a primary index for that bucket. For information on creating a primary index, see the Couchbase documentation.
Offset
When adding new documents to a bucket read by the Couchbase origin, be sure to add them to the end of the configured read order.
The origin uses Couchbase offset capabilities, which uses position in a bucket to determine where to start using data within the bucket. Adding new documents before the offset can result in those documents not being read by the Couchbase origin and other documents being read again.
Event Generation
The Couchbase origin can generate events that you can use in an event stream. When you enable event generation, the origin generates an event when it completes processing the data returned by the specified query.
Couchbase events can be used in any logical way. For example:
- With the Pipeline Finisher executor to
stop the pipeline and transition the pipeline to a Finished state when
the origin completes processing available data.
When you restart a pipeline stopped by the Pipeline Finisher executor, the origin continues processing from the last-saved offset unless you reset the origin.
For an example, see Stopping a Pipeline After Processing All Available Data.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
-
With a destination to store information about completed queries.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Record
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following type:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The Couchbase origin can generate the following event records:
- no-more-data
- The Couchbase origin generates a no-more-data event record when the origin completes processing all data returned by the queries for all buckets.
- no-more-bucket-data
- The Couchbase origin generates a no-more-bucket-data event record when the
origin completes processing all data returned by the queries for a single
bucket.
The no-more-bucket-data event record generated by the origin has the
sdc.event.type
record header attribute set tono-more-bucket-data
and does not include any additional fields.
Multithreaded Processing
The Couchbase origin uses multiple concurrent threads to process data based on the Number of Threads property.
As the pipeline runs, each thread connects to the origin system, creates a batch of data, and passes the batch to an available pipeline runner. A pipeline runner is a sourceless pipeline instance - an instance of the pipeline that includes all of the processors, executors, and destinations in the pipeline and handles all pipeline processing after the origin.
Each pipeline runner processes one batch at a time, just like a pipeline that runs on a single thread. When the flow of data slows, the pipeline runners wait idly until they are needed, generating an empty batch at regular intervals. You can configure the Runner Idle Time pipeline property to specify the interval or to opt out of empty batch generation.
Multithreaded pipelines preserve the order of records within each batch, just like a single-threaded pipeline. But since batches are processed by different pipeline runners, the order that batches are written to destinations is not ensured.
For more information about multithreaded pipelines, see Multithreaded Pipeline Overview.
Configuring a Couchbase Origin
Configure a Couchbase origin to read data from Couchbase Server.