SQL Server Change Tracking

The SQL Server Change Tracking source processes data from Microsoft SQL Server change tracking tables. The source can process data from tables with simple numeric primary keys. The source cannot process data from tables with compound or non-numeric primary keys. For information about supported versions, see Supported systems and versions.

By default, the source generates a record with change tracking information and the latest version of each record from the data tables. You can configure it to use only the change tracking information.

Use the SQL Server Change Tracking source to generate records from change tracking tables. To read data from Microsoft SQL Server change data capture (CDC) tables, use the SQL Server CDC Client source. For more information about the differences between change tracking and CDC data, see the Microsoft documentation. To read data from SQL Server temporal tables, use the JDBC Multitable Consumer source or the JDBC Query Consumer source. For more information about temporal tables, see the Microsoft documentation.

The SQL Server Change Tracking source includes the CRUD operation type in a record header attribute so generated records can be easily processed by CRUD-enabled targets. For an overview of Data Collector changed data processing and a list of CRUD-enabled targets, see Processing changed data.

You might use this source to perform database replication. You can use a separate flow with the JDBC Query Consumer or JDBC Multitable Consumer source to read existing data. Then start a flow with the SQL Server Change Tracking source to process subsequent changes.

When you configure the source, you can define groups of change tracking tables in the same database and any initial offsets to use. When you omit initial offsets, the source processes only incoming data.

To determine how the source connects to the database, you specify connection information, a query interval, number of retries, and any custom JDBC configuration properties that you need. You can also use a connection to configure the source.

You specify whether you want to include the latest version of the data in generated records or whether to include only change tracking data. You also define the initial order that the source uses to read the tables.

The source can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow triggers overview.

When a flow stops, the SQL Server Change Tracking source notes where it stops reading. When the flow starts again, the source continues processing from where it stopped by default. You can reset the offset to process all requested data.

JDBC driver

When connecting to Microsoft SQL Server, you do not need to install a JDBC driver. Data Collector includes the JDBC driver required for SQL Server.

Prerequisites

Before you configure the SQL Server Change Tracking source, complete the following prerequisites in the SQL Server database:
  1. Enable change tracking for the database.
  2. Assign the required permissions to the database user.

Enable change tracking

Before you configure the SQL Server Change Tracking source, enable change tracking for the SQL Server database and for each table within that database that you want tracked.

For instructions, see the Microsoft SQL Server documentation.

Assign permissions

Before you configure the SQL Server Change Tracking source, create a database user that the source can use to connect to the database. Assign the database user the following permissions:
  • VIEW CHANGE TRACKING permission on the database.
  • When using the default record generation to join change tracking data with the current version of the data, the user must have SELECT permission on at least the primary key column for each associated data table.

    If processing data only from the change tracking data, the user does not need this permission.

Table configuration

When you configure the SQL Server Change Tracking source, you define multiple tables using a single set of table configuration properties and you can define multiple table configurations to process multiple groups of change tables. When you define table configurations you can define the following properties for each set of tables:
  • Schema - The schema where the tables reside.
  • Table name pattern - Use SQL LIKE syntax to define a set of tables to process. For example, the table name pattern st% matches tables whose names start with "st". The default pattern, %, matches all tables in the schema.

    For more information about valid patterns for the SQL LIKE syntax, see the Microsoft documentation.

  • Table exclusion pattern - When necessary, use a regex pattern to exclude certain tables that match the table name pattern from being read.

    For example, say you want to process all change tracking tables in the schema except for those that start with "dept". You can use the default % for the table name pattern, and enter dept* for the table exclusion pattern.

    For more information about using regular expressions with Data Collector, see Regular expressions overview.

  • Initial offset - The SQL Server Change Tracking source uses the SYS_CHANGE_VERSION column as the offset column. To process existing data, define the offset value to use. The offset is used for all tables included in the table configuration.

    When not set, the source processes only incoming data.

    Important: When processing an offset, the source starts with the first value greater than the specified offset.

Initial table order strategy

You can define the initial order that the source uses to read the tables.

Define one of the following initial table order strategies:
None
Reads the tables in the order that they are listed in the database.
Alphabetical
Reads the tables in alphabetical order.

Generated record

The SQL Server Change Tracking source can generate a record in the following ways:
Change tracking and current data

By default, when the SQL Server Change Tracking source generates a record, it includes the data from the change tracking table and performs an outer join with the current version of the table.

The resulting record includes the following:
  • Change tracking fields such as SYS_CHANGE_VERSION, SYS_CHANGE_CREATION_VERSION, SYS_CHANGE_OPERATION, etc.
  • The latest version of the record, when available.
Important: Unlike a record generated by a CDC source, the change tracking record includes the latest version of the record, not the version of the record created by the change.
Change tracking only
You can configure the source to omit the join and produce a record with only the change tracking data. The resulting record includes the following:
  • Change tracking fields such as SYS_CHANGE_VERSION, SYS_CHANGE_CREATION_VERSION, etc.
  • Primary key field for the changed record, provided by the change tracking table.

All generated records include change tracking information in record header attributes.

The source also generates field attributes that provide additional information about field.

Record header attributes

The SQL Server Change Tracking source generates JDBC record header attributes that provide additional information about each record, such as the original data type of a field or the source tables for the record.

The source also includes the sdc.operation.type attribute and information from the SQL Server change tracking table. The SQL Server Change Tracking header attributes are prefixed with "jdbc".

You can use the record:attribute or record:attributeOrDefault functions to access the information in the attributes. For more information about working with record header attributes, see Working with header attributes.

The source provides the following header attributes:
Header Attribute Name Description
sdc.operation.type
Provides one of the following values to represent the operation type:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE
jdbc.tables
Provides a comma-separated list of source tables for the fields in the record.
Note: Not all JDBC drivers provide this information.
jdbc.<column name>.jdbcType Provides the numeric value of the original SQL data type for each field in the record. See the Java documentation for a list of the data types that correspond to numeric values.
jdbc.<column name>.jdbc.precision Provides the original precision for all numeric and decimal fields.
jdbc.<column name>.jdbc.scale Provides the original scale for all numeric and decimal fields.
jdbc.primaryKeySpecification Provides a JSON-formatted string that lists the columns that form the primary key in the table and the metadata for those columns.
For example, a table with a composite primary key contains the following attribute:
jdbc.primaryKeySpecification = 
     {{"<primary key column 1 name>":
         {"type": <type>, 
          "datatype": "<data type>", 
          "size": <size>, 
          "precision": <precision>, 
          "scale": <scale>, 
          "signed": <Boolean>,  
          "currency": <Boolean> }},
          ...,
     {"<primary key column N name>":
         {"type": <type>, 
          "datatype": "<data type>", 
          "size": <size>, 
          "precision": <precision>, 
          "scale": <scale>, 
          "signed": <Boolean>,  
          "currency": <Boolean> } } }
A table without a primary key contains the attribute with an empty value:
jdbc.primaryKeySpecification = {} 
jdbc.primaryKey.before.<primary key column name> Provides the old value for the specified primary key column.

Generated for an update operation on a table with a primary key.

jdbc.primaryKey.after.<primary key column name> Provides the new value for the specified primary key column.

Generated for an update operation on a table with a primary key.

jdbc.SYS_CHANGE_COLUMNS Lists the columns that have changed since the last sync.

Returns NULL when column change tracking is not enabled, when the operation is insert or delete, or when all non-primary key columns were updated at once.

jdbc.SYS_CHANGE_CONTEXT Provides change context information when available.
jdbc.SYS_CHANGE_CREATION_VERSION Provides the version number associated with the last insert operation.
jdbc.SYS_CHANGE_OPERATION Indicates the type of change that occurred:
  • I for insert
  • D for delete
  • U for update
jdbc.SYS_CHANGE_VERSION Provides the version number of the most recent change to the row.

For details about the SYS_CHANGE change tracking attributes, see the SQL Server documentation.

CRUD operation header attributes

When generating records, the SQL Server Change Tracking source specifies the operation type in both of the following record header attributes:
sdc.operation.type
The SQL Server Change Tracking source writes the operation type to the sdc.operation.type record header attribute.
The source uses the following values in the sdc.operation.type record header attribute to represent the operation type:
  • 1 for INSERT
  • 2 for DELETE
  • 3 for UPDATE

If you use a CRUD-enabled target in the flow such as JDBC Producer or Elasticsearch, the target can use the operation type when writing to target systems. When necessary, you can use an Expression Evaluator processor or any scripting processor to manipulate the value in the header attribute. For an overview of Data Collector changed data processing and a list of CRUD-enabled targets, see Processing changed data.

When using CRUD-enabled targets, the target looks for the operation type in the sdc.operation.type attribute only.
jdbc.SYS_CHANGE_OPERATION
The SQL Server Change Tracking source also writes the CRUD operation type to the jdbc.SYS_CHANGE_OPERATION record header attribute. However note that CRUD-enabled stages only use the sdc.operation.type header attribute, they do not check the jdbc.SYS_CHANGE_OPERATION attribute.

Field attributes

The SQL Server Change Tracking source generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.

The following data type conversions do not include all information in the corresponding Data Collector type:
  • SQL Server Decimal and Numeric data types are converted to the Data Collector Decimal data type, which does not store scale and precision.
  • SQL Server Datetime, Datetime2, and Smalldatetime data types are converted to the Data Collector Datetime data type, which does not store nanoseconds.
To preserve this information during data type conversion, the source generates the following field attributes for these Data Collector data types:
Data Collector Data Type Generated Field Attribute Description
Decimal precision Provides the original precision for every decimal or numeric column.
Decimal scale Provides the original scale for every decimal or numeric column.
Datetime nanoSeconds Provides the original nanoseconds for every datetime, datetime2, or smalldatetime column.

You can use the record:fieldAttribute or record:fieldAttributeOrDefault functions to access the information in the attributes. For more information about working with field attributes, see Field attributes.

Event generation

The SQL Server Change Tracking source can generate events that you can use in an event stream. When you enable event generation, the source generates an event when it completes processing the data returned by the specified queries for all tables.

SQL Server Change Tracking events can be used in any logical way. For example:

For more information about dataflow triggers and the event framework, see Dataflow triggers overview.

Event record

Event records generated by SQL Server Change Tracking source have the following event-related record header attributes:
Record Header Attribute Description
sdc.event.type Event type. Uses the following type:
  • no-more-data - Generated when the source completes processing all data in the specified change tables.
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.

The no-more-data event record includes no record fields.

Configuring a SQL Server Change Tracking source

About this task

Configure a SQL Server Change Tracking source to process records from change tables that have simple numeric primary keys.

Procedure

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the flow for error handling.
    • Stop Flow - Stops the flow.
  2. On the JDBC tab, configure the following properties:
    JDBC Property Description
    Connection

    6.4.0-0107 and later

    Connection that defines the information that is required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can specify the details in local properties. When you select a connection, the flow canvas hides properties that are defined in the connection.

    JDBC Connection String

    Connection string used to connect to the database.

    Data Time Zone Time zone to use to evaluate datetime-based offset column conditions.
    Fetch Size Maximum number of rows to fetch and store in memory on the Data Collector machine. Enter zero to use the default fetch size set in the database.

    Default is 1,000.

    Use Credentials Enables entering credentials. Use when you do not include credentials in the JDBC connection string.
    Queries per Second Maximum number of queries to run in a second across all partitions and tables. Use 0 for no limit.

    Default is 10.

    Max Batch Size (records) Maximum number of records to include in a batch.
    Max Clob Size (characters) Maximum number of characters to be read in a Clob field. Larger data is truncated.
    Max Blob Size (bytes)

    Maximum number of bytes to be read in a Blob field.

    Number of Retries on SQL Error Maximum number of times the source tries to execute the query after encountering a SQL error. After retrying this number of times, the source handles the error based on the error handling configured for the source.

    Use to handle transient network or connection issues that prevent the source from submitting a query.

    Default is 0.

    No-more-data Event Generation Delay (seconds) Number of seconds to delay generation of the no-more-data event after processing all rows. Use to allow time for additional data to arrive before generating the no-more-data event.
    Convert Timestamp To String Enables the source to write timestamps as string values rather than datetime values. Strings maintain the precision stored in the source system. For example, strings can maintain the precision of a high-precision SQL Server datetime2 field.

    When writing timestamps to Data Collector date or time data types that do not store nanoseconds, the source stores any nanoseconds from the timestamp in a field attribute.

    Additional JDBC Configuration Properties Additional JDBC configuration properties to use. To add properties, click Add and define the JDBC property name and value.

    Use the property names and values as expected by JDBC.

  3. On the Change Tracking tab, define one or more table configurations. Using simple or bulk edit mode, click the Add icon to define another table configuration.
    Change Tracking Property Description
    Schema Name Name of the schema to use for this table configuration.
    Table Name Pattern

    Pattern of the table names to read for this table configuration. Use the SQL LIKE syntax to define the pattern.

    Default is the percentage wildcard (%) which matches all tables in the schema.

    Table Exclusion Pattern Pattern of the table names to exclude from being read for this table configuration. Use a Java-based regular expression, or regex, to define the pattern.

    Leave empty if you do not need to exclude any tables.

    To view and configure this option, click Show Advanced Options.

    Initial Offset Offset value to use for this table configuration when the flow starts. When processing an offset, the source starts with the first value greater than the specified offset.

    Use -1 to opt out of an initial offset. With the initial offset set to -1, the source ignores existing data and begins processing with new, incoming changes.

    To view and configure this option, click Show Advanced Options.

  4. To enter JDBC credentials separately from the JDBC connection string, on the Credentials tab, configure the following properties:
    Credentials Property Description
    Username User name for the JDBC connection. The user account must have the correct permissions or privileges in the database.
    Password Password for the account.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
  5. When you use JDBC versions older than 4.0, on the Legacy Drivers tab, optionally configure the following properties:
    Legacy Drivers Property Description
    JDBC Class Driver Name Class name for the JDBC driver. Required for JDBC versions older than version 4.0.
    Connection Health Test Query Optional query to test the health of a connection. Recommended only when the JDBC version is older than 4.0.
  6. On the Advanced tab, optionally configure the following properties:
    The defaults for these properties should work in most cases:
    Advanced Property Description
    Initial Table Order Strategy Initial order used to read the tables:
    • None - Reads the tables in the order that they are listed in the database.
    • Alphabetical - Reads the tables in alphabetical order.
    On Unknown Type Action to take when encountering an unsupported data type:
    • Stop Flow - Stops the flow after completing the processing of the previous records.
    • Convert to String - When possible, converts the data to string and continues processing.
    Maximum Pool Size Maximum number of connections to create.

    Default is 1.

    Minimum Idle Connections Minimum number of connections to create and maintain. To define a fixed connection pool, set to the same value as Maximum Pool Size.

    Default is 1.

    Connection Timeout (seconds) Maximum time to wait for a connection. Use a time constant in an expression to define the time increment.
    Default is 30 seconds, defined as follows:
    ${30 * SECONDS}
    Idle Timeout (seconds) Maximum time to allow a connection to idle. Use a time constant in an expression to define the time increment.

    Use 0 to avoid removing any idle connections.

    When the entered value is close to or more than the maximum lifetime for a connection, Data Collector ignores the idle timeout.

    Default is 10 minutes, defined as follows:
    ${10 * MINUTES}
    Max Connection Lifetime (seconds) Maximum lifetime for a connection. Use a time constant in an expression to define the time increment.

    Use 0 to set no maximum lifetime.

    When a maximum lifetime is set, the minimum valid value is 30 minutes.

    Default is 30 minutes, defined as follows:
    ${30 * MINUTES}
    Auto Commit Determines if auto-commit mode is enabled. In auto-commit mode, the database commits the data for each record.

    Default is disabled.

    Enforce Read-only Connection Creates read-only connections to avoid any type of write.

    Default is enabled. Disabling this property is not recommended.

    Transaction Isolation Transaction isolation level used to connect to the database.

    Default is the default transaction isolation level set for the database. To override the database default, select one of the following levels:

    • Read committed
    • Read uncommitted
    • Repeatable read
    • Serializable