Data mining — Table source operator

The Table source operator uses SQL to select data from local DB2® database tables. It also uses the Java Database Connectivity (JDBC) interface standard to select data from remote database tables.

A Table source operator represents a connection to a relational table in a database management system. A Table source operator corresponds closely to the SELECT clause of an SQL statement. This operator supports column expression and WHERE condition properties for selecting and filtering the data that is passed through a mining flow. A Table source operator provides column information, including column names and data types, to other operators in the mining flow. The Table source operator can connect to a local DB2 database and make the table schemas and data available in the mining flow editor. The Table source operator also supports connections to remote DB2 tables and to other databases via JDBC.

Remote tables are those that do not exist in the SQL execution database, which is a DB2 database. The remote database might exist on the same computer or in the same DB2 instance as the SQL execution database. Remote tables can also exist in databases on remote systems. In all cases, remote databases are accessed via the JDBC protocol. You use the Data Source Explorer in the Design Studio to create and activate database connections. You can also establish database connections by defining system resources in the application server (runtime environment).

The Table source operator can access remote tables via JDBC directly or by using nicknames for tables in federated data sources.

The Table source operator supports SELECT list and WHERE condition properties that can provide performance advantages over creating a Table source operator that is followed by a where operator in the mining flow. When the source database table is a remote table, specifying values for the select list and WHERE condition properties allows them to be evaluated in the remote database. Evaluating these properties in the remote database reduces the load on the SQL execution database and reduces the amount of data that is transferred. Also, query evaluation for the remote database can use indexes and statistics that further improve the performance.

If you want to use a random subset of the data to improve execution time of your mining flow for very large tables, you can optionally specify a sampling rate:

For tables, the Table source operator uses DB2 TABLESAMPLE SYSTEM() REPEATABLE() to compute a repeatable sample.
For views, the Table source operator uses the DB2 rand() function to compute the sample.