SQL Server Change Tracking
The SQL Server Change Tracking source processes data from Microsoft SQL Server change tracking tables. The source can process data from tables with simple numeric primary keys. The source cannot process data from tables with compound or non-numeric primary keys. For information about supported versions, see Supported systems and versions.
By default, the source generates a record with change tracking information and the latest version of each record from the data tables. You can configure it to use only the change tracking information.
Use the SQL Server Change Tracking source to generate records from change tracking tables. To read data from Microsoft SQL Server change data capture (CDC) tables, use the SQL Server CDC Client source. For more information about the differences between change tracking and CDC data, see the Microsoft documentation. To read data from SQL Server temporal tables, use the JDBC Multitable Consumer source or the JDBC Query Consumer source. For more information about temporal tables, see the Microsoft documentation.
The SQL Server Change Tracking source includes the CRUD operation type in a record header attribute so generated records can be easily processed by CRUD-enabled targets. For an overview of Data Collector changed data processing and a list of CRUD-enabled targets, see Processing changed data.
You might use this source to perform database replication. You can use a separate flow with the JDBC Query Consumer or JDBC Multitable Consumer source to read existing data. Then start a flow with the SQL Server Change Tracking source to process subsequent changes.
When you configure the source, you can define groups of change tracking tables in the same database and any initial offsets to use. When you omit initial offsets, the source processes only incoming data.
To determine how the source connects to the database, you specify connection information, a query interval, number of retries, and any custom JDBC configuration properties that you need. You can also use a connection to configure the source.
You specify whether you want to include the latest version of the data in generated records or whether to include only change tracking data. You also define the initial order that the source uses to read the tables.
The source can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow triggers overview.
When a flow stops, the SQL Server Change Tracking source notes where it stops reading. When the flow starts again, the source continues processing from where it stopped by default. You can reset the offset to process all requested data.
JDBC driver
When connecting to Microsoft SQL Server, you do not need to install a JDBC driver. Data Collector includes the JDBC driver required for SQL Server.
Prerequisites
Enable change tracking
Before you configure the SQL Server Change Tracking source, enable change tracking for the SQL Server database and for each table within that database that you want tracked.
For instructions, see the Microsoft SQL Server documentation.
Assign permissions
- VIEW CHANGE TRACKING permission on the database.
- When using the default record generation to join change tracking data with the current version
of the data, the user must have SELECT permission on at least the primary key column for each
associated data table.
If processing data only from the change tracking data, the user does not need this permission.
Table configuration
- Schema - The schema where the tables reside.
- Table name pattern - Use SQL LIKE syntax to define a set of tables to process. For example, the
table name pattern st% matches tables whose names start with "st". The default pattern, %, matches
all tables in the schema.
For more information about valid patterns for the SQL LIKE syntax, see the Microsoft documentation.
- Table exclusion pattern - When necessary, use a regex pattern to exclude certain tables that
match the table name pattern from being read.
For example, say you want to process all change tracking tables in the schema except for those that start with "dept". You can use the default % for the table name pattern, and enter dept* for the table exclusion pattern.
For more information about using regular expressions with Data Collector, see Regular expressions overview.
- Initial offset - The SQL Server Change Tracking source uses the
SYS_CHANGE_VERSION column as the offset column. To process existing data, define the offset value to
use. The offset is used for all tables included in the table configuration.
When not set, the source processes only incoming data.
Important: When processing an offset, the source starts with the first value greater than the specified offset.
Initial table order strategy
You can define the initial order that the source uses to read the tables.
- None
- Reads the tables in the order that they are listed in the database.
- Alphabetical
- Reads the tables in alphabetical order.
Generated record
- Change tracking and current data
-
By default, when the SQL Server Change Tracking source generates a record, it includes the data from the change tracking table and performs an outer join with the current version of the table.
The resulting record includes the following:- Change tracking fields such as SYS_CHANGE_VERSION, SYS_CHANGE_CREATION_VERSION, SYS_CHANGE_OPERATION, etc.
- The latest version of the record, when available.
Important: Unlike a record generated by a CDC source, the change tracking record includes the latest version of the record, not the version of the record created by the change. - Change tracking only
- You can configure the source to omit the join and produce a record with only the change tracking data. The resulting record includes the following:
All generated records include change tracking information in record header attributes.
The source also generates field attributes that provide additional information about field.
Record header attributes
The SQL Server Change Tracking source generates JDBC record header attributes that provide additional information about each record, such as the original data type of a field or the source tables for the record.
The source also includes the sdc.operation.type attribute and
information from the SQL Server change tracking table. The SQL Server Change Tracking
header attributes are prefixed with "jdbc".
You can use the record:attribute or
record:attributeOrDefault functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with header attributes.
| Header Attribute Name | Description |
|---|---|
| sdc.operation.type |
Provides one of the following values to represent the operation
type:
|
| jdbc.tables |
Provides a
comma-separated list of source tables for the fields in the
record.
Note: Not all JDBC drivers
provide this information.
|
| jdbc.<column name>.jdbcType | Provides the numeric value of the original SQL data type for each field in the record. See the Java documentation for a list of the data types that correspond to numeric values. |
| jdbc.<column name>.jdbc.precision | Provides the original precision for all numeric and decimal fields. |
| jdbc.<column name>.jdbc.scale | Provides the original scale for all numeric and decimal fields. |
| jdbc.primaryKeySpecification |
Provides a JSON-formatted
string that lists the columns that form the primary key in the table and the
metadata for those columns. For example, a table with a
composite primary key contains the following attribute:
A table without a primary key
contains the attribute with an empty value:
|
| jdbc.primaryKey.before.<primary key column name> | Provides the old
value for the specified primary key column. Generated for an update operation on a table with a primary key. |
| jdbc.primaryKey.after.<primary key column name> | Provides the new
value for the specified primary key column. Generated for an update operation on a table with a primary key. |
| jdbc.SYS_CHANGE_COLUMNS | Lists the columns that have changed since the last
sync. Returns NULL when column change tracking is not enabled, when the operation is insert or delete, or when all non-primary key columns were updated at once. |
| jdbc.SYS_CHANGE_CONTEXT | Provides change context information when available. |
| jdbc.SYS_CHANGE_CREATION_VERSION | Provides the version number associated with the last insert operation. |
| jdbc.SYS_CHANGE_OPERATION | Indicates the type of change that occurred:
|
| jdbc.SYS_CHANGE_VERSION | Provides the version number of the most recent change to the row. |
For details about the SYS_CHANGE change tracking attributes, see the SQL Server documentation.
CRUD operation header attributes
- sdc.operation.type
- The SQL Server Change Tracking source writes the operation type to the sdc.operation.type record header attribute.
- jdbc.SYS_CHANGE_OPERATION
- The SQL Server Change Tracking source also writes the CRUD operation type to the jdbc.SYS_CHANGE_OPERATION record header attribute. However note that CRUD-enabled stages only use the sdc.operation.type header attribute, they do not check the jdbc.SYS_CHANGE_OPERATION attribute.
Field attributes
The SQL Server Change Tracking source generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.
- SQL Server Decimal and Numeric data types are converted to the Data Collector Decimal data type, which does not store scale and precision.
- SQL Server Datetime, Datetime2, and Smalldatetime data types are converted to the Data Collector Datetime data type, which does not store nanoseconds.
| Data Collector Data Type | Generated Field Attribute | Description |
|---|---|---|
| Decimal | precision | Provides the original precision for every decimal or numeric column. |
| Decimal | scale | Provides the original scale for every decimal or numeric column. |
| Datetime | nanoSeconds | Provides the original nanoseconds for every datetime, datetime2, or smalldatetime column. |
You can use the record:fieldAttribute or
record:fieldAttributeOrDefault functions to access the information
in the attributes. For more information about working with field attributes, see Field attributes.
Event generation
The SQL Server Change Tracking source can generate events that you can use in an event stream. When you enable event generation, the source generates an event when it completes processing the data returned by the specified queries for all tables.
- With the Pipeline Finisher executor to
stop the flow and transition the flow to a Finished state when the source completes processing available data.
For an example, see Stopping a flow after processing all available data.
For more information about dataflow triggers and the event framework, see Dataflow triggers overview.
Event record
| Record Header Attribute | Description |
|---|---|
| sdc.event.type | Event type. Uses the following type:
|
| sdc.event.version | Integer that indicates the version of the event record type. |
| sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The no-more-data event record includes no record fields.