Oracle CDC
The Oracle CDC origin processes change data capture (CDC) information stored in redo logs and accessed using Oracle LogMiner. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
The Oracle CDC origin provides the same functionality as the original Oracle CDC Client origin in a more efficient package. It also provides functionality not available in the Oracle CDC Client origin. For more information, see Comparing Oracle CDC Origins.
The Oracle CDC origin can process changes for primary and standby databases. It processes data based on the system change number, in ascending order, and buffers changes in memory. The origin captures and adjusts to schema changes.
The origin includes CDC and CRUD information in record header attributes so generated records can be easily processed by CRUD-enabled destinations. For an overview of Data Collector changed data processing and a list of CRUD-enabled destinations, see Processing Changed Data.
You might use this origin to track the data that has changed in your database in order to run tasks using the changed data. With this origin, you can implement data integration processes based on the identification, capture, and delivery of the changes made to the database.
When you configure the origin, you specify change data capture details, such as the schemas and tables to read from, how to read the initial change, and the operations to include. You specify an initial change or time to start processing, transaction details, and uncommitted transaction handling.
You specify the record format to generate, basic or rich. And you can optionally include a wide range of information in the record or record header, such as the redo statement, null columns, and old and new values for the record. You can define resilience properties, such as how many retries to perform for a task and how long to wait between retries.
You can specify JDBC connection information and user credentials. You can also use a connection to configure the origin. If the schema was created in a pluggable database, you state the pluggable database name. You can also define custom properties that the driver requires or supports.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Comparing Oracle CDC Origins
Data Collector provides two origins that use LogMiner to process Oracle change data. The Oracle CDC Client origin was the first origin. This new Oracle CDC origin includes key functionality available in the older origin, as well as improvements and additional new features.
If you are new to Data Collector or building a new pipeline that processes Oracle change data, use the newer Oracle CDC origin for a wider range of functionality.
If you already have pipelines that use the older Oracle CDC Client origin, you can continue using that origin. However, you might consider updating the pipeline to use the newer Oracle CDC origin for access to additional functionality.
- Similarities
-
- The newer Oracle CDC origin can generate the same records as those
generated by the older Oracle CDC Client origin.
However, the newer origin has streamlined and reorganized properties, so you configure the origin differently.
- Both origins require completing similar prerequisite tasks.
- Both origins place the operation type in the
sdc.operation.type
record header attribute and can generate additional attributes. - You can configure both origins to perform multithreaded parsing.
- Both origins can generate events that can be used in dataflow trigger pipelines.
- The newer Oracle CDC origin can generate the same records as those
generated by the older Oracle CDC Client origin.
- Key differences
-
- The newer Oracle CDC origin can process changes from primary and standby databases. The older Oracle CDC Client origin only supports primary databases.
- The newer Oracle CDC origin receives raw values from LogMiner that
it then efficiently converts to the corresponding data values. As a
result, the newer origin does not support processing encrypted data.
When Transparent Data Encryption is enabled for columns or tables in
the Oracle database, that data remains encrypted when processed by
the newer origin.
The older Oracle CDC Client origin can process encrypted data because it receives the decrypted values from Oracle.
- The newer origin provides two record formats for processed data.
- The following options and properties from the older origin are not
included in the Oracle CDC origin because the newer origin uses the
most efficient option, rather than offering a choice between optimal
and sub-optimal options:
- Parser type - The older origin uses the ALL(*) parser by
default because it was the only parser available when the
origin was created. When the more performant PEG parser
became available, it was added to the older origin as an
advanced option.
The newer origin uses the more efficient PEG parser.
- Buffer type and location - The older origin provides several
buffering options.
The newer origin is more efficient than the older origin, so it always buffers data locally, in memory. However, you can configure the maximum buffer size and a buffer wait strategy.
- Dictionary source - The older origin lets Oracle retrieve
dictionary definitions from the online catalog or from redo
logs, both of which can be problematic.
The newer origin loads dictionary definitions from the Oracle catalog. While the pipeline runs, it tracks dictionary changes from redo logs. To support pipeline stops and starts, the origin stores the dictionary state in the offset so there is no dependency on dictionaries extracted to redo logs or on LogMiner transforming the original redo log data. This allows for consistent matching from column values to column data types across time.
- Schema and table specification - When specifying the schemas
and tables to monitor, the older origin supports SQL LIKE
syntax for inclusion patterns and Java regular expression
syntax for exclusion patterns.
The newer origin uses REGEXP_LIKE syntax for all schema and table patterns to provide a richer pattern specification model. This also avoids problems with commonly-used characters in schema and table names, such as the underscore ( _ ), that are also wildcards.
- Truncate operations - For Truncate operations, the older
origin only generated events.
The newer Oracle CDC origin also generates events for Truncate operations. However, when configured to use the rich record format, the origin can create records for Truncate operations as well. This enables you to understand when the truncation occurred in relationship to processed data.
- Parser type - The older origin uses the ALL(*) parser by
default because it was the only parser available when the
origin was created. When the more performant PEG parser
became available, it was added to the older origin as an
advanced option.
Updating Oracle CDC Client Pipelines
The newer Oracle CDC origin provides the same functionality as the older Oracle CDC Client origin, with additional enhancements. As a result, you can update existing pipelines to use the new origin if you like. The older origin remains available, so updating existing pipelines is not required.
As you configure the replacement origin, you might reconsider how you want to process Oracle change data. However, you can configure the origin to generate the same records, so you don't have to update the pipeline when you replace the old origin.
Below are some configuration issues to consider when updating existing Oracle CDC Client pipelines to use the newer origin:
- The newer Oracle CDC origin supports processing changes from primary or standby
databases.
The older origin only supports primary databases.
To configure the Oracle CDC origin to use a primary database, on the Oracle tab, set the Database Role property to Primary.
- The newer Oracle CDC origin can generate two different record formats:
basic or rich.
To generate records like those generated by the Oracle CDC Client origin, on the Data tab, set the Record Format property to Basic. Then, keep the default values for the custom record options on the same tab.
- The newer Oracle CDC origin supports REGEXP_LIKE syntax for specifying schemas
and tables, while the older origin supports SQL LIKE syntax for inclusion
patterns and Java regular expression syntax for exclusion patterns.Update the schema and table patterns from the older origin to use the appropriate syntax. The following tips might help:
- Inclusion patterns - If you used SQL LIKE syntax to define inclusion
patterns, replace all occurrences of the percent character ( % )
with a dot and asterisk ( .* ).
Also, replace all underscores ( _ ) that represent a single character with a dot ( . ).
- Exclusion patterns - If you used Java regular expressions syntax, you don’t need to make any changes.
- Java regular expressions - You can use Java regular expressions for both patterns. However, you might anchor them using a carrot character ( ^ ) at the start of the expression, and a dollar sign ( $ ) at the end of the expression.
- Inclusion patterns - If you used SQL LIKE syntax to define inclusion
patterns, replace all occurrences of the percent character ( % )
with a dot and asterisk ( .* ).
- By default, the newer Oracle CDC origin logs details about LogMiner sessions in
the Data Collector log, just like the older origin. However, the newer origin can also generate
events when new LogMiner sessions are created.
If you monitor Data Collector log entries to determine when the origin creates new LogMiner sessions, then you might configure the Oracle CDC origin to produce events instead. Data Collector logs are not a reliable source for automated processes.
To produce events for new LogMiner sessions, on the General tab, select Produce Events. Then, on the Oracle tab, enable the Track LogMiner Sessions property.
Prerequisites
- Configure the database archiving mode.
- Enable supplemental logging for the database or tables.
- Create a user account with the required roles and privileges.
- Install required libraries on Data Collector.
- CDB or multitenant database - A multitenant container database (CDB) that includes one or more pluggable databases (PDB).
- PDB or pluggable database - A database that is part of a CDB.
- Non-CDB database - A standalone database created on earlier versions of Oracle.
Task 1. Configure the Database Archiving Mode
LogMiner provides access to database changes stored in redo log files that summarize database activity. The origin uses this information to generate records. LogMiner requires that the database be open and in ARCHIVELOG mode with archiving enabled.
When reading from a primary database, the database must also be writable. When reading from a standby database, you might archive logs at a higher frequency if you want to approximate real-time processing. For more information, see Primary and Standby Databases.
You configure the database archiving mode differently based on whether you use an Oracle Standard, Oracle RAC, or Oracle Exadata database:
- Oracle Standard databases
- Use the following steps to determine the status of the database and to
configure the database archiving mode:
- In a SQL shell, log into the database as a user with DBA privileges.
- Check the database logging mode.
For example:
select log_mode from v$database;
If the command returns ARCHIVELOG, you can go to "Task 2. Enable Supplemental Logging".
If the command returns NOARCHIVELOG, continue with the following steps.
- Shut down the database.
For example:
shutdown immediate;
- Start up and mount the database.
For example:
startup mount;
- Enable archiving, open the database, and make it writable.
For example:
alter database archivelog; alter database open read write;
- Oracle RAC and Exadata databases
- Use the following steps to configure the database archiving mode and verify
the status of the database:
- In a Terminal session on one of the database nodes, stop and mount
the database.
For example:
srvctl stop database -d <database name> srvctl start database -d <database name> -o mount
- In a SQL shell, log into the database as a user with DBA privileges
and configure the database.
For example:
alter database archivelog;
- In the Terminal session, restart the database.
For example:
srvctl stop database -d <database name> srvctl start database -d <database name>
- In the SQL shell, check if the database is in logging mode.
For example:
select log_mode from v$database;
- In a Terminal session on one of the database nodes, stop and mount
the database.
Task 2. Enable Supplemental Logging
To retrieve data from redo logs, LogMiner requires supplemental logging at a database level, table level, or a combination of both.
- Minimal supplemental logging
- You must enable minimal supplemental logging to use the Oracle CDC origin. Enable it in your CDB for a multitenant database or in your non-CDB database.
- Identification key logging
- Enable identification key logging based on your needs:
- Identification key logging is optional depending on the data that you want to process and how you plan to use the data.
- Primary key identification key logging is required to enable proper
primary key handling and change tracking.
Depending on the structure specification of each table, primary key logging provides primary key columns, first unique index columns, or all columns to the origin for record generation.
- Full supplemental identification key logging is recommended to
provide data from all columns, those with unchanged data as well as
the primary key and changed columns.
Enable full supplemental logging to ensure that the origin can generate records that include all fields, regardless of whether they have been changed.
For details on the data included in records based on the supplemental logging type, see Parse SQL Statements, Supported Operations, and Record Attributes.
- Determine the logging that you want to enable.
- Before enabling supplemental logging, check if it is enabled for the database.
For example, you might run the following command on your CDB, then the pluggable database:
select supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all from v$database;
If the command returns
Yes
orImplicit
for all three columns, then minimal supplemental logging is enabled, as well as identification key and full supplemental logging. You can skip to "Task 3. Create a User Account."If the command returns
Yes
orImplicit
for the first two columns, then supplemental and identification key logging are both enabled. If this is what you want, you can skip to "Task 3. Create a User Account." - Enable minimal supplemental logging. For example, you can run the following command from your CDB or from your non-CDB database:
alter database add supplemental log data;
- Optionally, enable identification key logging, if needed.
When enabling identification key logging for a database, do so for the database that contains the schemas to be monitored.
To enable identification key logging for a pluggable database, you must also enable it for the CDB.When enabling identification key logging for individual tables, you don’t need to enable it for the CDB. According to Oracle, best practice is to enable logging for individual tables, rather than the entire CDB. For more information, see the Oracle LogMiner documentation.
For example, you can run the following command to apply the changes to just the container:Enable primary key or full supplemental identification key logging to retrieve data from redo logs. You do not need to enable both:alter session set container=<pdb>;
- To enable primary key logging
- You can enable primary key identification key logging for individual tables or all tables in the database:
- To enable full supplemental logging
- You can enable full supplemental identification key logging for individual tables or all tables in the database:
- Submit the changes.
For example:
alter system archive log current;
Task 3. Create a User Account
Create a user account to use with the origin. You need the account to access the database through JDBC.
- 12c, 18c, 19c, or 21c CDB databases
- For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c CDB
databases, create a common user account. Common user accounts are created in
cdb$root
and must use the convention:c##<name>
. - 12c, 18c, 19c, or 21c non-CDB databases
- For Oracle, Oracle RAC, and Oracle Exadata 12c, 18c, 19c, and 21c non-CDB
databases, create a user account with the necessary privileges:
- In a SQL shell, log into the database as a user with DBA privileges.
- Create the user account.
For example:
create user <user name> identified by <password>; grant create session, alter session, set container, logmining, select any dictionary, select_catalog_role, execute_catalog_role to <user name>; grant select on database_properties to <user name>; grant select on all_objects to <user name>; grant select on all_tables to <user name>; grant select on all_tab_columns to <user name>; grant select on all_tab_cols to <user name>; grant select on all_constraints to <user name>; grant select on all_cons_columns to <user name>; grant select on v_$database to <user name>; grant select on gv_$database to <user name>; grant select on v_$instance to <user name>; grant select on gv_$instance to <user name>; grant select on v_$database_incarnation to <user name>; grant select on gv_$database_incarnation to <user name>; grant select on v_$log to <user name>; grant select on gv_$log to <user name>; grant select on v_$logfile to <user name>; grant select on gv_$logfile to <user name>; grant select on v_$archived_log to <user name>; grant select on gv_$archived_log to <user name>; grant select on v_$standby_log to <user name>; grant select on gv_$standby_log to <user name>; grant select on v_$logmnr_parameters to <user name>; grant select on gv_$logmnr_parameters to <user name>; grant select on v_$logmnr_logs to <user name>; grant select on gv_$logmnr_logs to <user name>; grant select on v_$logmnr_contents to <user name>; grant select on gv_$logmnr_contents to <user name>; grant select on v_$containers to <user name>; grant select <db>.<table> to <user name>;
Repeat the final command for each table that you want to monitor.
Task 4. Install Required Libraries on Data Collector
The Oracle CDC origin connects to Oracle through JDBC. You cannot access the database until you install the required Oracle JDBC libraries on Data Collector.
ojdbc8.jar
- Oracle JDBC driver. The latest driver version is recommended, though earlier versions supported by your database are supported. When Data Collector runs on Java 11 or higher, you can alternatively useojdbc11.jar
.The drivers are also available in
ojdbc8-full.tar.gz
andojdbc11-full.tar.gz
, respectively. Extract the library before installation.These drivers and archives are available on the Oracle driver downloads page.
orai18n.jar
- Oracle globalization library. Use the latest version of this library to ensure accurate processing of international character sets.This library is available in the
ojdbc8-full.tar.gz
archive. When Data Collector runs on Java 11 or higher, you can alternatively use theojdbc11-full.tar.gz
archive.The archives are available on the Oracle driver downloads page. Extract the library before installation.
orai18n-mapping.jar
- Oracle globalization library. Use the latest version of this library to ensure accurate processing of international character sets.This library is available within a JDBC Supplement Package (ZIP) through the Oracle Instant Client downloads page. Extract the library before installation.
Install the libraries as external libraries into the JDBC Oracle stage library,
streamsets-datacollector-jdbc-oracle-lib
, which includes the origin. For more information, see Install External
Libraries in the Data Collector
documentation.
Primary and Standby Databases
- Primary database
-
You can use the Oracle CDC origin to process changes from a primary database. When processing changes from a primary database, LogMiner can process changes directly from online redo logs or from archived redo logs.
- Standby database
- You can use the Oracle CDC origin to process changes from a standby database when archived redo logs are regularly replicated. Standby databases are replica databases managed by Oracle Data Guard on physical or logical databases, or other available Oracle tools. When processing changes from a standby database, LogMiner can only process changes from archived redo logs.
Table Filters
When you configure the Oracle CDC origin, you specify table filters that define the schemas and tables with the change capture data that you want to process.
REGEXP_LIKE
syntax to define a set of
tables within a schema or across multiple schemas. For more information about
REGEXP_LIKE
syntax, see the Oracle documentation.SALES
while excluding those that end with a dash
(-) and single-character suffix. You can use the following filter configuration to
specify the tables to process: - Schema:
sales
- Table Name Pattern:
^SALES.*$
- Exclusion Pattern:
^SALES.*-.$
Start Mode
You indicate where the Oracle CDC origin should start processing by configuring the Start Mode property and related settings. You can configure the origin to start at the current change or instant in time so that it processes all available changes. Or, you can specify a specific change or time to start processing.
The origin only uses the Start Mode and any specified initial change or time when you start the pipeline for the first time, or after you reset the origin. At other times, the origin uses its last saved offset to ensure that it processes all appropriate data.
Record Formats
- Basic
- Includes root-level record fields that represent the changes defined in the Oracle redo log, and record header attributes for additional metadata.
- Rich
- Has a record structure that allows the inclusion of the old values for changed data in the record, in addition to the new values. Also includes more metadata than the basic record format.
Custom Record Options
The Oracle CDC origin provides a wide range of properties that allow you to customize the data and metadata included in generated records. For example, you can include redo and undo statements, primary key and table definitions, and the precision and scale for numeric fields in record headers, in the record body, or omit the information from the record entirely.
When the origin parses redo statements and generates rich records, you can include information such as new and old field values, hexadecimal values, or raw values in record fields or omit the information from the record.
Define record customization on the Data tab of the origin. The default values for the properties generate a record similar to the Oracle CDC Client origin.
Parse SQL Statements, Supported Operations, and Record Attributes
The Oracle CDC origin generates records for the operation types that you specify. However, the origin reads all operation types from redo logs to ensure proper transaction handling.
- Parsing SQL statements
- When the origin parses redo SQL statements, you can
configure it to create records for the following operations:
- INSERT
- DELETE
- UPDATE
- Not parsing SQL statements
- When the origin does not parse redo SQL statements, it writes each SQL
statement to a field. When the origin generates basic records, it writes SQL statements to a
oracle.cdc.query
field. When generating rich records, it writes SQL statements to ajdbc.cdc.oracle.redoStatement
field. The origin also generates field attributes that provide additional information about this field.
CRUD Operation Header Attributes
- sdc.operation.type
- The origin evaluates the operation type associated
with each entry that it processes. When appropriate, it writes the
operation type to the
sdc.operation.type
record header attribute. - oracle.cdc.operation
- The origin also writes the CRUD operation type to an
oracle.cdc.operation
attribute. This attribute was implemented in an earlier release and is supported for backward compatibility.
CDC Header Attributes
Basic Record CDC Attribute | Rich Record CDC Attribute | Description |
---|---|---|
oracle.cdc.sequence.oracle | jdbc.cdc.oracle.ordinalOracle | Includes
sequence numbers that indicate the order in which statements were processed
within the transaction. The sequence number is generated by Oracle. Use to
keep statements in order within a single transaction. Not used at this time. |
oracle.cdc.sequence.internal | jdbc.cdc.oracle.ordinalCollector | Includes sequence numbers equivalent to those in the attribute above, but
created by the origin. The sequence starts with 0 each time the pipeline
starts. You can use these values to keep statements in order within a single transaction. |
Other Header Attributes
- Primary keys
-
When a table contains a primary key, the origin includes the following record header attribute:
jdbc.primaryKeySpecification
- Provides a JSON-formatted string that lists the columns that form the primary key in the table and the metadata for those columns.For example, a table with a composite primary key contains the following attribute:jdbc.primaryKeySpecification = {{"<primary key column 1 name>": {"type": <type>, "datatype": "<data type>", "size": <size>, "precision": <precision>, "scale": <scale>, "signed": <Boolean>, "currency": <Boolean> }}, ..., {"<primary key column N name>": {"type": <type>, "datatype": "<data type>", "size": <size>, "precision": <precision>, "scale": <scale>, "signed": <Boolean>, "currency": <Boolean> } } }
A table without a primary key contains the attribute with an empty value:jdbc.primaryKeySpecification = {}
For an update operation on a table with a primary key, the origin includes the following record header attributes:jdbc.primaryKey.before.<primary key column name>
- Provides the old value for the specified primary key column.jdbc.primaryKey.after.<primary key column name>
- Provides the new value for the specified primary key column.
Note: The origin provides the new and old values of the primary key columns regardless of whether the value changes. - Decimal fields
-
When a record includes decimal fields, the origin includes the following record header attributes for each decimal field in the record:
jdbc.<fieldname>.precision
jdbc.<fieldname>.scale
You can use the record:attribute
or
record:attributeOrDefault
functions to access the information
in the attributes. For more information about working with record header attributes,
see Working with Header Attributes.
Default Field Attributes
The Oracle CDC origin generates field attributes for columns converted to the Decimal or Datetime data types in Data Collector. The attributes provide additional information about each field.
- The Oracle Number data type is converted to the Data Collector Decimal data type, which does not store scale and precision.
- The Oracle Timestamp data type is converted to the Data Collector Datetime data type, which does not store nanoseconds.
Data Collector Data Type | Generated Field Attribute | Description |
---|---|---|
Decimal | precision | Provides the original precision for every number column. |
Decimal | scale | Provides the original scale for every number column. |
Datetime | nanoSeconds | Provides the original nanoseconds for every timestamp column. |
You can use the record:fieldAttribute
or
record:fieldAttributeOrDefault
functions to access the information
in the attributes. For more information about working with field attributes, see Field Attributes.
Annotation Field Attributes
You can configure the Oracle CDC origin to generate field attributes that provide additional information about each field in the record. These attributes can help you determine why a field contains a null value.
Annotation Attribute Name | Description |
---|---|
supported | Indicates if the original column type of the field is supported
by the origin. Possible values include:
|
presence | Indicates if the corresponding column for the field was included
in the redo SQL statement. Possible values include:
|
Multithreaded Parsing
By default, the Oracle CDC origin uses multiple threads to parse SQL statements. When performing multithreaded parsing, the origin uses multiple threads to generate records from committed SQL statements in a transaction. It does not perform multithreaded processing of the resulting records.
The Oracle CDC origin uses multiple threads for parsing based on the SQL Parser Threads property. When you start the pipeline, the origin creates the number of threads specified in the property. The origin connects to Oracle, creates a LogMiner session, and processes all transactions that contain changes for monitored tables.
When the origin processes a transaction, it reads and buffers all SQL statements in the transaction to an in-memory queue and waits for statements to be committed before processing them. Once committed, the SQL statements are parsed using all available threads and the original order of the SQL statements is retained.
The resulting records are passed to the rest of the pipeline. Note that enabling multithreaded parsing does not enable multithreaded processing – the pipeline uses a single thread for reading data.
0
to use as many threads as available processing cores. Query Fetch Size
The Oracle CDC origin provides a Query Fetch Size property that determines the maximum number of rows that the origin fetches at one time. When processing a large volume of data, using larger fetch sizes will improve performance by reducing the number of database hits.
As a best practice, set the query fetch size to as large a value as your execution environment allows. For example, you should set this property to at least 10,000 rows, though when possible, 25,000 rows or higher might be optimal.
When setting high fetch sizes, make sure that Data Collector has sufficient resources to handle the load. For information about configuring the Data Collector heap size, see Java Heap Size in the Data Collector documentation.
Uncommitted Transaction Handling
You can configure how the Oracle CDC origin handles long-running, stale, or abandoned uncommitted transactions. These are transactions with some operations on monitored tables but whose commit or rollback takes longer than expected to occur.
- Transaction Maximum Duration - The maximum time one transaction is expected to
last, from transaction start to commit. If a commit or rollback is not received
within the specified duration, the transaction is marked for eviction.
A moderately large setting for this property is recommended to allow for long-running batch processes.
- Operations Maximum Gap - The maximum amount of time that can pass between two
consecutive operations within the same transaction. If the specified time passes
after the last operation without the arrival of a new operation, control, or
rollback, the transaction is marked for eviction.
A relatively small value for this property is recommended to ensure that zombie transactions do not consume memory.
Evicted Transaction Handling
You can configure how the Oracle CDC origin handles evicted transactions and the operations that arrive for a transaction after it is evicted. An evicted transaction is an uncommitted transaction that has expired based on the configuration of the Transaction Maximum Duration or Operations Maximum Gap properties.
- Transaction Maximum Retention - The maximum time that the origin keeps an
internal reference to the evicted transaction. After the time elapses, the
internal reference is removed.
If the origin receives additional operations for an evicted transaction before the internal reference is removed, the origin silently discards these operations. If the origin receives additional operations for an evicted transaction after the internal reference is removed, the origin passes the operations to the pipeline when a commit is received, treating them as if they are from a new transaction.
A moderate amount of time is recommended for this property to prevent operations from being incorrectly passed to the pipeline, without allowing unnecessary references to evicted transactions to consume memory.
- Evicted Transactions - Defines how the origin handles operations that arrive for
an evicted transaction before the internal reference is removed:
- Send Transaction Operations to Pipeline - Sends operations to the pipeline like
normal records. Use to treat the operations as if the transaction was committed.
This option can create inconsistencies in your data. Use with caution.
- Discard Transaction Operations - Discards operations. Use to treat the operations as if the transaction was rolled back.
- Send Transaction Operations to Error - Sends operations to the stage for error handling. Use to treat the operations like error records.
- Abort Pipeline - Stops the pipeline. Use to review the error and address underlying issues.
- Send Transaction Operations to Pipeline - Sends operations to the pipeline like
normal records. Use to treat the operations as if the transaction was committed.
Event Generation
The Oracle CDC origin can generate events that you can use in an event stream when the pipeline starts, when a LogMiner session starts, and when changes occur to monitored tables.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Sending Email During Pipeline Processing.
- With a destination to store event information.
For an example, see Preserving an Audit Trail of Events.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Records
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses one of the following types. The names of some event types differ depending on the record format the origin uses.
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The Oracle CDC origin can generate the following types of event records:
- initial-database-state
- After the pipeline starts and the origin queries the database, the origin generates an event that describes the state of the database.
- logminer-session-start
- When the origin starts a LogMiner session, the origin generates an event.
- table loaded, created, changed, and removed
- When a monitored table has been loaded, created, changed, or removed, the origin
generates an event. The event records have the
sdc.event.type
attribute set to the different values depending on the record format that the origin uses:- Table loaded -
STARTUP
for basic records andmined-table-loaded
for rich records. - Table created -
CREATE
for basic records andmined-table-created
for rich records. - Table changed -
ALTER
for basic records andmined-table-altered
for rich records. - Table removed -
DROP
for basic records andmined-table-dropped
for rich records.
- Table loaded -
- table truncated
- When a monitored table has been truncated, the origin can generate an event. To generate events, in addition to enable the origin to generate events, you must configure the Table Truncate Handling property on the Oracle tab appropriately.
Oracle Data Types
The Oracle CDC origin converts Oracle data types to Data Collector data types.
- Supported built-in data types
- The origin supports the following Oracle built-in data types:
Built-In Data Type Data Collector Data Type Binary_Double Double Binary_Float Float Char String Date Date Float Float NChar, NVarchar String Number Varies depending on the precision and scale. For details, see the numeric conversion table. Timestamp Datetime Timestamp with Local Timezone, Timestamp with Timezone Zoned_Datetime Varchar, Varchar2 String - Supported ANSI data types
- The origin supports the following ANSI data types:
ANSI Data Type Data Collector Data Type Char Varying, Character, Character Varying String Decimal Decimal Double Precision Double Int, Integer, SmallInt Varies depending on the precision and scale. For details, see the numeric conversion table. National Char, National Char Varying, National Character, National Character Varying String NChar Varying String Numeric Varies depending on the precision and scale. For details, see the numeric conversion table. Real Varies depending on the precision and scale. For details, see the numeric conversion table. - Numeric data type conversions
- Oracle numeric types are converted to Data Collector data types based on the precision and scale of the Oracle numeric type.
The following table describes how the types are converted:
Precision Scale Data Collector Data Type Not specified Not specified Decimal Not specified 0 Long Not specified Greater than 0 Decimal Less than 3 0 Byte 3-4 0 Short 5-9 0 Integer 10 or greater 0 Long Any specified value Greater than 0 Decimal
Unsupported Data Types
You can configure how the origin handles unsupported data types at the record or field level.
The Unsupported Column Data Types property defines how the origin handles records that contain unsupported data types. The property provides the following options:
- Pass the record to the pipeline - Passes the record to the pipeline. You can configure how the origin handles the fields that contain unsupported data types using the Unsupported Columns property. You can configure the origin to remove empty fields using the Null Columns property.
- Discard the record - Drops the record from the pipeline.
- Send record to error - Passes records to the stage for error handling without the unsupported data type fields.
- Abort pipeline - Stops the pipeline with an error. Use to review errors and address underlying issues.
- Keep Column with Raw Value - Includes the column as a record field with its original raw value.
- Keep Column with Null Value - Includes the column as a record field with an empty value.
- Drop Column - Omits the column from record.
- Unsupported built-in data types
- The origin does not support the following built-in data types:
- Unsupported ANSI data type
- The origin does not support the following ANSI data type:
-
Long Varchar
-
- Unsupported data type categories
- The origin does not support the following categories of data types:
- Any
- Rowid
- Spatial
- Supplied
- User Defined
- XML Data
For example, this means that the following data types are not supported:
- DBUritype
- HTTPUritype
- Rowid
- SDO_Geometry
- SDO_Georaster
- SDO_Topo_Geometry
- URowid
- XDBUritype
- XMLType
Data Preview with Oracle CDC
- Preview generates a maximum of 10 records, ignoring the Preview Batch Size property when set to a higher value.
- Preview allows up to 5 seconds for the
origin to receive records, which starts after connecting to Oracle. As a
result, when previewing a pipeline with an Oracle CDC origin, preview
ignores the configured Preview Timeout property when set to a higher
value.
For other origins, the preview timeout period begins when you start preview and includes the connection time to external systems.
Working with the Drift Synchronization Solution
If you use the Oracle CDC origin as part of a Drift Synchronization Solution for Hive pipeline, make sure to pass only records flagged for Insert to the Hive Metadata processor.
- Configure the Oracle CDC origin to process only Insert records.
- If you want to process additional record types in the pipeline, use a Stream Selector processor to route only Insert records to the Hive Metadata processor.
Configuring an Oracle CDC Origin
Configure an Oracle CDC origin to use LogMiner to process change data capture information from an Oracle database. Before you use the origin, complete the prerequisite tasks.