© 2001 International Business Machines Corporation. All rights reserved.
The Virtual Table Interface (VTI) is an extension to the DataBlade API that can be used to develop "Primary Access Methods". A Primary Access Method can be used to allow users to access legacy data that is not stored in an Informix database as if it were an Informix table.
This paper will help you understand how VTI works, so that you can make a more informed decision as to whether the development of a new Primary Access Method is the best approach to the problem you are trying to solve.
An Overview of the Capabilities and Limitations of VTI
VTI (Virtual Table Interface) was developed to provide a consistent method for developing gateways to data that is not stored in Informix tables. The data might be in flat files, in competitors' databases, cached in memory, or out on the net; to the end user, it looks exactly as if it's stored in Informix tables. In fact, it's in "virtual" tables.
Informix partners are quick to recognize the benefits of this technology, but they should be aware that this is not one of the simpler IDS features to use. According to an internal ATG (Advanced Technology Group) memo, "[The Virtual Table Interface] is considered an advanced feature that will be used by a few highly skilled developers". The decision to undertake a VTI development projectshould not be taken lightly!
This section will describe the architecture of VTI, from the developer's point of view. It will list the individual modules that make up the package, and it will explain how they fit together. Without going into excruciating detail, the goal will be to explain what the server does when it encounters a request to perform an operation on a "virtual table", and what the developer needs to do to allow the server to access the external data that the "virtual table" represents.
The Virtual Table Interface consists of:
A group of "Purpose Functions" that the server calls to implement SQL calls.
A group of Descriptors that are used to pass data between function calls.
A group of API functions that extend the DataBlade API.
VTI also provides three new SQL statements:
CREATE PRIMARY ACCESS_METHOD <method name> ( AM_BEGINSCAN = <method's beginscan function>, AM_GETNEXT = <method's getnext function>, ... ); ALTER ACCESS_METHOD <method name> ADD|DROP|MODIFY <purpose category> = <routine name | quoted string> DROP ACCESS_METHOD <method name> RESTRICT |
and an extension to the syntax of the "CREATE TABLE" statement :
CREATE TABLE ... USING <access method name> |
How do the Modules Fit Together?
Let's start with purpose functions.
They look like SPL functions, they act like SPL functions, but you won't find them in the sysprocedures catalog. What are they?
Purpose functions can probably best be thought of as, well, "virtual" functions. VTI defines a group of them, and the server executes them in various combinations to carry out SQL operations on a virtual table. For example, to insert a row, the server will call am_open, am_insert, and am_close.
But if they're only "virtual" functions, how do they get executed?
The answer is simple: The server looks in the sysams catalog and executes the function that it finds there. For each access method registered in the database, sysams contains a mapping from the generic purpose function name to the FunctionID of the actual function that implements it for that access method. How did it get there? You, the developer, put it there when you created your access method:
CREATE PRIMARY ACCESS METHOD mymethod ( AM_CREATE = my_am_create, AM_BEGINSCAN = my_am_beginscan, ... ); |
"my_am_create" is your implementation of the AM_CREATE purpose function, "my_am_beginscan" is your implementation of the AM_BEGINSCAN function, and so on.
Unlike the generic purpose function names, yours are real functions. You wrote them using the DataBlade API and its VTI extensions, you compiled them into a shared object, and you registered them in the database with CREATE FUNCTION statements. When you registered each one with CREATE FUNCTION, the server assigned it a unique FunctionID, and when you associated it with a generic purpose function name using CREATE PRIMARY ACCESS_METHOD, the server put an entry in the sysams system catalog mapping its FunctionID to that purpose function name.
So now, when the server is asked to create a table using "mymethod", it will look up "mymethod" in sysams, find the FunctionID in the am_create column, and execute it. If you have done your work correctly, then the equivalent of a table will be created in your data store, and an entry will be made in your systables system catalog to identify it to the server.
The second part of the VTI package is a set of descriptors.
VTI defines 4 descriptors to hold information that gets passed to the C functions that you write to implement the purpose functions for your access method. (In fact, it defines many more that are used internally -- see miami.h for a complete list.)
When the server calls one of your purpose functions, it passes it a descriptor of the appropriate type, filled in with all the information your function will need to carry out its work. Note that two of the descriptors (the Scan Descriptor and the Table Descriptor) also contain userdata fields that you can use to store information specific to your implementation.
The descriptors that VTI defines are:
MI_AM_TABLE_DESC -- Table Descriptor
MI_AM_SCAN_DESC -- Scan Descriptor
MI_AM_QUAL_DESC -- Qualification Descriptor
MI_AM_TSTATSDESC -- Table Statistics Descriptor
VTI also uses the Row Descriptor (MI_ROW) defined by the DataBlade API.
Finally, VTI provides a collection of Access Method API Functions.
This is where the real work lies. The function that you provide for each of the generic purpose functions is called by the server and passed a filled-in instance of a descriptor of the appropriate type. Your job is to access your external data and, using information from the passed-in descriptor plus your knowledge of your data store, do whatever is appropriate to the purpose function that was called.
Let's look at one specific example: am_open:
Suppose you created your access method with this statement:
CREATE PRIMARY ACCESS METHOD mymethod ( AM_BEGINSCAN = my_begin_scan, AM_GETNEXT = my_get_next, AM_OPEN = my_open, ... ) |
This implies that you've provided several SQL functions, including one called "my_open". Each of these functions is based on a C language routine. For example:
CREATE FUNCTION my_open (ptr POINTER) RETURNING integer EXTERNAL NAME "$INFORMIXDIR/extend/mymethod/mymethod.bld(my_open)" LANGUAGE C END FUNCTION; |
Note that my_open()is defined to take an argument of type POINTER. The server will pass a reference to an MI_AM_TABLE_DESC to your function, but there is no SQL data type that corresponds to this structure. The POINTER data type is provided to allow opaque types to be passed to SQL functions under these circumstances.
What does your my_open() C function do? Much of that depends on the requirements of your external data store, but your work might include:
Verifying that the user has authority to open the table or file.
Initializing a userdata structure with information that subsequent calls will require. Hint: Be sure to allocate the memory for this structure using mi_dalloc(), with PER_ROUTINE duration so that it will persist across function calls.
Opening an external file and saving a file descriptor, or doing the equivalent of an mi_open() for an external relational database.
Obtaining a row descriptor for the table, which would allow subsequent function calls to process data from individual columns.
etc.
This work is done with a combination of VTI API calls, DataBlade API calls, and standard C.
A large percentage of the VTI API functions are accessor functions. For example, you can retrieve the name of the table to which an MI_AM_TABLE_DESC refers by making a call to mi_tab_name(). You can get the name of the table's owner with a call to mi_tab_owner().
See the Appendix for a list of VTI API functions by descriptor.
When you issue an SQL SELECT statement against a virtual table, VTI initiates a series of function calls. First, it calls your am_scancost function (if you provided one) to help the optimizer choose the best query execution plan. Next, it calls your am_open function. Finally, it initiates a scan. (Note: the VTI Programmer's Manual includes a detailed breakdown of which purpose functions get called for each SQL statement.)
The server initiates a scan by calling your access method's am_beginscan routine. It then makes one or more calls to your am_getnext routine. am_getnext is called repeatedly until the query has been satisfied, then your am_endscan function is called. All three of these functions take an MI_AM_SCAN_DESC as input.
When is a query satisfied? If the query doesn't have a WHERE clause, then the query is satisfied when all of the rows have been returned. If the query does include a WHERE clause, then the query is satisfied when all of the rows that match the WHERE clause have been returned.
For each row in your data source that qualifies, your am_getnext functionwill build a composite (row) type using mi_row_create(). It sets the retRow parameter that was passed in to the value of this row, sets the retrowid parameter to the id of the row, and returns with the value MI_ROWS. When you have passed back the last row, you return with the value MI_NO_MORE_RESULTS, and the server executes your am_endscan routine.
A qualification is a WHERE clause. VTI reformats the WHERE clause and stores it in a qualification descriptor (an MI_AM_QUAL_DESC structure). The qualification descriptor, in turn, is encapsulated in a scan descriptor (an MI_AM_SCAN_DESC structure), which is passed to your access method's scan routines.
In a typical implementation, the am_beginscan function calls mi_scan_quals() to extract the qualification descriptor from the scan descriptor. Your am_getnext() routine uses that information to qualify each row that it returns to the server.
A qualification might be a simple predicate, e.g., "where column_name = 1", or it could be a boolean expression containing an array of MI_AM_QUAL_DESC structures, each of which contains either a simple predicate or another array of boolean expressions. Your task is to break the qualification down into a series of simple predicates, and to assign a value of MI_VALUE_TRUE, MI_VALUE_FALSE,or MI_VALUE_NOT_EVALUATED to each.
When your access method has processed each of the simple predicates in a qualification, it makes a call to mi_eval_am_qual to finish evaluating the qualification. This will cause the server to evaluate any predicates that were set to MI_VALUE_NOT_EVALUATED and to assign a value of MI_VALUE_TRUE or MI_VALUE_FALSE to the statement as a whole. If MI_VALUE_TRUE, then the row satisfies the qualification, and you return it to the server, otherwise you skip to the next row.
This section is a random collection of bits that are either missing from the current docs, or that are important, but just aren't obvious.
The VTI Programmer's Manual describes the three new SQL commands that are provided with VTI (CREATE PRIMARY ACCESS_METHOD, ALTER ACCESS_METHOD, and DROP ACCESS_METHOD), but it doesn't provide any detail on the extensions that VTI provides for CREATE TABLE. This information is in the Informix Guide to SQL, but it's worth taking note of it here, too:
CREATE TABLE table_name
(
column_one integer,
column_two varchar(20),
...
)
USING my_access_method
(
identifier1 = "value 1",
identifier2 = "value 2",
...
);
|
The "identifier" block is optional, and can be used to store any parameters that your access method will use. They are stored in the MI_AM_TABLE_DESC as an mi_string, and can be retrieved with the mi_tab_am_param() API call.
A new data space has been defined for IDS version 9: Extspace. An extspace is a description of an external location. In practice, it is a way of associating a logical name with an arbitrary text string. The mi_tab_spacename() API call can be used to retrieve the name of the space in which a virtual table is stored.
To create an extspace:
onspaces -c -x <extspace name> -l <location> |
The "extspace name" is the logical name of the external space, and the "location" is a file path, network address, or other string that identifies the location of the space. Note that this string is limited to 265 bytes in length.
To drop an extspace:
onspaces -d <extspace name> |
The cac option can be used to display cached data associated with an access method:
onstat -g cac am <access method name> |
displays the cached data for the specified access method ("<access method name>"). If the "am" keyword is included, but the access method name is omitted, then the cached data associated with all access methods is displayed.
onstat -g dic <table name> |
will display both the cached data for a virtual table, and the configuration parameters that were specified when the table was created.
4. caching. (Note: This section is stolen directly from the Functional Spec)
The VTI developer has access to the following hooks for caching information across API calls.
Scan Level. The access method can cache information during a scan (am_beginscan -> am_endscan), by using the mi_scan_setuserdata and mi_scan_userdata routines on an MI_AM_SCAN_DESC. The server will not free this memory when the scan descriptor is destroyed. The access method can either free it at am_endscan time or let the server free it when the memory duration expires. A memory duration of PER_COMMAND is sufficient for the time a scan is open.
Open Table level. The access method can cache information during a table/index open, (am_open -> am_close), by using the mi_tab_setuserdata and mi_tab_userdata routines on an MI_AM_TABLE_DESC. The server will not free this memory when the table descriptor is destroyed. The access method can either free it at am_close time or let the server free it when the memory duration expires. A memory duration of PER_STATEMENT is sufficient for the time a table/index is open.
Named Memory. The access method can allocate memory of any duration and associate a name with it by calling the SAPI routine mi_named_alloc or mi_named_zalloc. Later, the access method can retrieve the address of that memory by calling mi_named_get with the same name. Eventually, the access method can free the memory by calling mi_named_free.
Module Initialization. When the server dynamically loads a shared object, it will call an initialization routine in that shared object (if one exists) just after loading the shared object. The routine name must be "module_init". It is not otherwise made known to the server (a CREATE FUNCTION statement is not needed). So if the access method needs to allocate very long duration memory (PER_SYSTEM), it would do so from the module_init routine. The access method needs to be careful how it uses that memory because the access method routines are executing in a multi-threaded environment. (Note: By default, any memory created within a module_init routine is allocated with PER_SYSTEM duration. --www)
Currently, VTI does not support 2-phase commit, so transaction management for external data is problematical.
Changes in transaction state can be monitored by registering a callback to check for a "type" value of MI_EVENT_END_STMT, MI_EVENT_END_XACT, or MI_EVENT_END_SESSION. This makes it relatively easy to manage simple transactions by caching any updates and instantiating them only when the transaction ends. (Or not instantiating them if the transaction rolls back.) See the DataBlade API Programmer's Manual for a complete discussion of State-Transition Events and Callbacks.
The difficulty comes when an error occurs during commit. If the transaction affects data from both the external data source and internal Informix tables, then it is possible for an update to occur on one side but not the other. There is no perfect way to handle this condition.
If the data associated with a virtual table is completely external to the database, i.e., if it is in an extspace, then it is not possible for the server to provide logging and recovery services for it. If you need this capability for your access method, then you'll need to implement it on your own.
However, if the data is maintained in an sbspace, then this is a possibility. The data must be stored in a smart blob that was created with the MI_LO_ATTR_LOG flag turned on. See the mi_lo_create, mi_lo_spec_init, and mi_lo_specset_flags routines in the DataBlade API Programmer's Manual for details.
As with logging and recovery, locking is beyond the control of the server if the data is stored in a location that is external to the server.
If the data is stored in an sbspace, however, the server does provide this service. It's important to realize that the granularity of the locking is at the smart blob level -- i.e., if any data in a smart blob is accessed, then the entire smart blob is locked for the duration of the transaction. So, if your external data is stored as one table per smart blob, then you would have, by default, table-level locking.
To enable parallelization for a VTI implementation, the following VTI UDRs must be parallelizable:
am_open
am_close
am_beginscan
am_endscan
am_getnext
am_rescan
am_getbyid
We will refer to these as the "basic set".
For a SELECT query involving VTI in parallel, just the above UDRs must be parallelizable.
For a DELETE query involving VTI in parallel, along with the above "basic set" UDRs, am_delete must be parallelizable.
Similarly for UPDATE, am_update must be parallelizable.
Similarly for INSERT, am_insert must be parallelizable. One way of utilizing INSERT query parallelism is to run either "insert ... select" or "select .. into TEMP .." statements in parallel.
See the VTI Programmer's Manual for more details. Additional information about parallelization for user defined routines in general can be found in the DataBlade API Programmer's Manual.
VTI Programmer's Manual
DataBlade API Programmer's Manual
Terms and acronyms used by this tech note include:
access method
A collection of server functions that IDS uses to access and manipulate an index or a table.
accessor function
An accessor is a function that allows your program to set or query the values stored in an opaque data type. They permit the provider of the opaque type to change its internal implementation and structure without affecting applications that may have been built on top of it.
descriptor
In VTI, a descriptor is an opaque data structure that is passed to one or more purpose functions. The VTI API provides accessor functions to set or retrieve information in each of the descriptors that it uses.
external table
Refers to any data that is stored outside the database and accessed by a gateway. Note that this could include data that is stored in an sbspace and manipulated by an access method.
gateway
An access method that allows data that is stored outside the database to be manipulated with SQL commands, just as if it were stored in database tables.
primary access method
A set of routines that perform operations on an external table. VTI is designed to be used in the development of primary access methods.
Any of a group of function names that the server calls to implement an SQL statement against a virtual table. It is the developer's responsibility to provide the corresponding functions for a specific gateway, and to map them to the generic names via the CREATE PRIMARY ACCESS_METHOD statement. See the VTI Programmer's Manual for a complete list and description of purpose functions.
pushing down
One of the goals of a gateway development project is to let the external data source do as much of the work of qualifying rows as possible. If a high percentage of non-qualifying rows are eliminated at the external data source, then fewer rows are returned to the server, which reduces network traffic, which results in a more efficient gateway. The process of deferring to the external data source for the purpose of evaluating qualifications is known as "pushing a qualification down"
A qualification is essentially a WHERE clause -- it's the logic that's used to selectively return rows for a query. In VTI, qualifications are encoded into an MI_AM_QUAL_DESC descriptor.
scan
A scan is the most common operation that a gateway will perform. It involves reading through the rows of a virtual table, matching each against the selection criteria in the WHERE clause, and returning qualifying rows.
secondary access method
A set of routines to implement an external index. The Virtual Table Interface (VII) was designed for developing secondary access methods, e.g. rtree.
simple predicate
An individual element in a WHERE clause, expressed as:
f(column, constant)
f(column, constant, OUT)
f(constant, column)
f(constant, column, OUT)
f(column)
f(column, OUT)
where "f" is a function name and "OUT" indicates that the function returns a value. Note that the expression "where x = 1" would be expressed as "where equal(x,1)" in the qualification descriptor.
Appendix: Descriptors and VTI API Functions
This appendix lists the VTI API functions that operate on each of the descriptors defined by the Virtual Table Interface. This is provided to answer the question: "I have a <table, scan, qualification, statistics> descriptor. What can I do with it?"
| mi_eval_am_qual | evaluate any parts of a qualification that have not yet been set to MI_VALUE_TRUE or MI_VALUE_FALSE by the access method. |
| mi_init_am_qual | re-initialize all parts of a qualification to MI_VALUE_NOT_EVALUATED |
| mi_qual_boolop | get the operator type in a complex qualification expression |
| mi_qual_column | retrieve the column number in a simple predicate (0-based) |
| mi_qual_commuteargs | determine if a simple predicate is of the form f(constant, column) instead of f(column, constant) |
| mi_qual_constant | retrieve the constant value in a simple predicate |
| mi_qual_constisknown | determine if a constant in asimple predicate is known (Note: This function is deprecated.) |
| mi_qual_constisnull | determine if the constant in a simple predicate is NULL |
| mi_qual_funcname | retrieve the function name in asimple predicate |
| mi_qual_handlenull | determine if the function for asimple predicate handles NULL values |
| mi_qual_issimple | determine if a qualification is a simple predicate |
| mi_qual_needoutput | determine if the SQL statement uses an output parameter for a function |
| mi_qual_negate | determine whether a simple predicate is to be negated (NOT operator) |
| mi_qual_nquals | retrieve the number of qualifications in an AND or OR qualification expression |
| mi_qual_qual | retrieve the nth qualification inan AND or OR qualification expression |
| mi_qual_setoutput | set the nth output parameter value from the function indicated by the mi_qual_funcid() function |
| mi_qual_setvalue | set a simple predicate result to MI_VALUE_TRUE, MI_VALUE_FALSE, orMI_VALUE_NOT_EVALUATED |
| mi_qual_value | get asimple predicate result |
| mi_scan_forupdate | determine if the scan is executing for an UPDATE or DELETE statement |
| mi_scan_isolevel | get the isolation level that the server expects for the table being scanned |
| mi_scan_locktype | retrieve the lock type that the server expects for the table being scanned |
| mi_scan_newquals | determine whether the qualification criteria are new. (typically used with am_rescan()) |
| mi_scan_nprojs | determine the number of columns involved in a scan. |
| mi_scan_projs | identify the columns involved in a scan. |
| mi_scan_quals | extract the MI_AM_QUAL_DESC from a scan descriptor |
| mi_scan_setuserdata | store user-defined data in the scan descriptor |
| mi_scan_table | extract the MI_AM_TABLE_DESC from the scan descriptor |
| mi_scan_userdata | retrieve the user-defined data from a scan descriptor |
| mi_tab_am_param | retrieve any user-defined configuration values from a table descriptor |
| mi_tab_id | retrieve the unique id of the table from a table descriptor |
| mi_tab_isolevel() | establish the isolation level for the table. |
| mi_tab_istable | determine whether the value encapsulated in an MI_AM_TABLE_DESC is a table |
| mi_tab_mode | retrieve the I/O mode of the table |
| mi_tab_name | get the table name from a table descriptor |
| mi_tab_numfrags | get the number of fragments in the table |
| mi_tab_owner | identify the owner of a table |
| mi_tab_partnum | get the unique storage number (tblspace number) of a table |
| mi_tab_rowdesc | get the row descriptor for a table |
| mi_tab_setuserdata | store user-defined data in a table descriptor |
| mi_tab_spaceloc | get the location information for a table stored in an extspace |
| mi_tab_spacename | get the name of the sbspace, dbspace,, or extspace where the table is stored |
| mi_tab_spacetype | get the type of space (sbspace, dbspace, or extspace) specified when a user creates a table fragment |
| mi_tab_userdata | retrieve the user-defined data from a table descriptor |
| mi_tstats_setnpages | set the number of pages in the table. this value is stored in the npused column of the systables system catalog |
| mi_tstats_setnrows | set the number of rows in the table. this value is stored in the nrows column of systables |
IBM, DB2, Informix, and WebSphere are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both.
Windows and Windows NT are registered trademarks of Microsoft Corporation in the United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
IBM copyright and trademark information
- VTI Programmer's Manual
- DataBlade API Programmer's Manual
- INFORMIX-Universal Server:
Secondary Access Methods




