Level: Intermediate Yinghong Fei (yfei@us.ibm.com), Software Engineer, IBM
07 Feb 2008 External Database Access (EDA) has been created to allow U2 users the
flexibility of storing some or all of their data in external relational database
management systems (RDBMS). While EDA documentation describes all the EDA features and their usages, the purpose of this article is to explain how you can use EDA features to satisfy various requirements and to discuss some common issues that arise when converting and using EDA files.
Introduction
EDA allows you to store U2 data in an external RDBMS, such as IBM® DB2® or
Microsoft® SQL Server,
and still access it from U2 applications that use the non-first normal form (NF2) data model.
 | |
EDA is a separately licensed feature for UniData. It will be available for UniVerse at
V11. For more formation, email U2AskUs@us.ibm.com.
|
|
U2 provides a variety of ways to expose the U2 data to relational applications
through Open Database Connectivity (ODBC), JDBC,
Object Linking and Embedding Database (OLE DB), ADO.NET, and so on. Such relational applications access U2 data as if it were first normal form (1NF) data.
In addition, the Basic SQL Client Interface (BCI) allows U2 applications to access external RDBMS data using
Basic programs and ECL/TCL commands through the ODBC interface.
Unlike the two methods above, EDA provides a U2 user with a transparent way to store and access the data in an
external RDBMS. Once an EDA file is created, mapped, and converted, the file is used as a normal U2 file in
Basic programs and ECL/TCL commands. The U2 database engine does all the mapping,
command translation, and
optimization.
This article focuses on how the U2 engine handles the data model mapping, command translation, and
query and record I/O optimization.
EDA mapping
Data model
U2 databases support the NF2 data model, also called a multi-value data model, which
allows one or more values to be stored in each attribute. Most external RDBMSs
support the 1NF data model only, in which
each attribute can only contain one value.
In order to store NF2 data in a 1NF database, EDA needs to do the data model mapping. A group of multi-valued
attributes are mapped into a table in an external RDBMS. To preserve the order of multi-values, an index
to these multi-values also needs to be stored in the RDBMS.
Each U2 file record contains its record ID (@ID) and 0 or more single-valued,
multi-valued, and multi-subvalued
attributes. The following sections describe the mapping of these attributes to external RDBMS tables and columns.
Single-valued attribute mapping
A primary table is created in the external RDBMS to contain the record ID (@ID) and single-valued attributes.
The @ID column is the primary key of this table.
Multi-valued attribute mapping
Since in a 1NF database, each column of each row can only store one value, in order to store multiple values contained
in a multi-valued attribute, a separate 1NF table is created in the external RDBMS, whose primary key consists
of the record ID (@ID) and the index to the values in the multi-valued attribute
(MV-POS). The @ID column of this
table is a foreign key to the primary table. It connects multi-values stored in this table with the corresponding
row of the primary table, and the MV_POS column stores the index to the values in the multi-valued attribute.
Multi-valued association
If several multi-valued attributes are associated, then all of the well-defined attributes of the same association
can be defined in one RDBMS table. Non-associated, multi-valued attributes are independent and therefore must be
stored in separate tables. Note that when two or more such non-associated attributes are used in the same query,
each multi-valued attribute table is equal-joined with the primary table on the @ID column and a Cartesian
product is produced between the multi-valued attributes tables for each @ID. This can
negatively affect the performance. For more information, refer to the Performance considerations section.
Multi-subvalued attribute mapping
In U2, multi-subvalued attributes are always considered to have a child-parent
relationship with multi-valued attributes of the same association. Similar to the multi-valued attributes mapping, multi-subvalued (MS) attributes are stored
in a separate external RDBMS table, whose primary key consists of the record ID (@ID) and two columns containing the
index to multi-valued attributes (MV_POS) and the index to multi-subvalued attributes (MS_POS). The combination of
@ID and MV_POS columns is the foreign key to the multi-valued attributes table. It connects multi-subvalues stored in
this table with the corresponding row of the multi-valued attribute table, since multi-subvalued attributes are always
associated with multi-valued attributes of the same association.
Outer join
In order to retrieve U2 data stored in the external RDBMS, as described above, the EDA engine must join a primary table
with multi-valued and multi-subvalued attribute tables on @ID. When there are no values in a multi-valued attribute
of a particular record, there is no record stored in the multi-valued attributes table for this record ID. If
an inner-join is used to retrieve the data, no data for this record ID is returned. Therefore, the EDA engine
needs to use OUTER-JOIN between the primary table and the multi-valued attributes table and between the multi-valued
attributes table and the corresponding multi-subvalued attributes table.
The following is a modified STUDENT file in the UniData DEMO account, where the single-valued attribute ADVISOR is
replaced with a non-associated multi-valued attribute ADVISORS:
Listing 1. STUDENT dictionary
@ID......... TYP LOC... CONV MNAME......... FORMAT SM ASSOC.....
ID D 0 STUDENT 12R###-## S
-####
LNAME D 1 Last Name 15T S
FNAME D 2 First Name 10L S
MAJOR D 3 Major 4L S
MINOR D 4 Minor 4L S
ADVISORS D 5 Advisor 8L MV
SEMESTER D 6 Term 4L MV CGA
C_NBR D 7 Crs # 5L MS CGA
C_GRD D 8 GD 3L MS CGA
GPA1 V SUBR('GPA1',C MD3 GPA 5R S
OURSE_HOURS
,COURSE_GRD)
|
If all dictionary attributes are chosen to be explicitly mapped to EDA, then the
S-MV-MS mapping hierarchy, will look like Figure 1.
Figure 1. STUDENT table structure
Note: The STUDENT record may contain more than eight attributes. The remaining attributes that are not described in the
dictionary are stored in an extra column of the primary table, called RECORD_BLOB, whose purpose is to
store unmapped attribute values.
There are four tables created in the external RDBMS, as shown in listings 2, 3, 4,
and 5:
Listing 2. STUDENT tables on external RDBMS
STUDENT primary table
ID p_key
LNAME
FNAME
MAJOR
MINOR
RECORD_BLOB
|
Listing 3. STUDENT_ADVISORS tables on external RDBMS
STUDENT_ADVISORS MV table
ID p_key f_key (STUDENT)
MV_POS p_key
ADVISORS
|
Listing 4. STUDENT_CGA_MV tables on external RDBMS
STUDENT_CGA_MV MV table
ID p_key f_key (STUDENT)
MV_POS p_key
SEMESTER
|
Listing 5. STUDENT_CGA_MS tables on external RDBMS
STUDENT_CGA_MS MS table
ID p_key f_key (STUDENT_CGA_MV)
MV_POS p_key f_key (STUDENT_CGA_MV)
MS_POS p_key
C_NBR
C_GRD
|
Listing 6. U2 record
STUDENT..... Last Name...... First Name Major Minor Advisor. Term Crs # GD.
414-44-6545 Offenbach Karl CS PY Otis FA93 CS104 D
Eades MA101 C
FA100 C
SP94 CS105 B
MA102 C
PY100 C
|
A U2 record like that shown in Listing 6, is inserted into the corresponding
external RDBMS tables. as follows:
Listing 7. STUDENT table
ID LNAME FNAME MAJOR MINOR RECORD_CLOB
414446545 Offenbach Karl CS PY
|
Listing 8. STUDENT_ADVISORS table
ID MV_POS ADVISORS
414446545 1 Otis
414446545 2 Eades
|
Listing 9. STUDENT_CGA_MV table
ID MV_POS SEMESTER
414446545 1 FA93
414446545 2 SP94
|
Listing 10. STUDENT_CGA_MS table
ID MV_POS MS_POS C_NBR C_GRD
414446545 1 1 CS104 D
414446545 1 2 MA101 C
414446545 1 3 FA101 C
414446545 2 1 CS105 B
414446545 2 2 MA102 C
414446545 2 3 PY100 C
|
To get the data from the external RDBMS into U2, the following SQL SELECT statement
is sent to the RDBMS by the EDA engine:
Listing 11. SQL SELECT statement
SELECT STUDENT.ID,
STUDENT_ADVISORS.MV_POS,
STUDENT_CGA_MV.MV_POS,
STUDENT_CGA_MS.MS_POS,
STUDENT.LNAME,
STUDENT.FNAME,
STUDENT.MAJOR,
STUDENT,MINOR,
STUDENT_ADVISORS.ADVISORS,
STUDENT_CGA_MV.SEMESTER
STUDENT_CGA_MS.C_NBR,
STUDENT_CGA_MS.C_GRD
FROM STUDENT
LEFT OUTER JOIN
STUDENT_ADVISORS
ON (STUDENT.ID = STUDENT_ADVISORS.ID)
LEFT OUTER JOIN
STUDENT_CGA_MV
ON (STUDENT.ID = STUDENT_CGA_MV.ID)
LEFT OUTER JOIN
STUDENT_CGA_MS
ON (STUDENT_CGA_MV.ID = TUDENT_CGA_MS.ID
AND
STUDENT_CGA_MV.MVPOS = STUDENT_CGA_MS.MV_POS)
ORDER BY 1, 2, 3, 4;
|
The returned result set is shown in Listing 12.
Listing 12. Returned result set
414446545 1 1 1 Offenbach Karl CS PY Otis FA93 CS104 D
414446545 1 1 2 Offenbach Karl CS PY Otis FA93 MA101 C
414446545 1 1 3 Offenbach Karl CS PY Otis FA93 FA100 C
414446545 1 2 3 Offenbach Karl CS PY Otis SP94 FA100 C
414446545 1 2 2 Offenbach Karl CS PY Otis SP94 MA101 C
414446545 1 2 3 Offenbach Karl CS PY Otis SP94 FA100 C
414446545 2 1 1 Offenbach Karl CS PY Eades FA93 CS104 D
414446545 2 1 2 Offenbach Karl CS PY Eades FA93 MA101 C
414446545 2 1 3 Offenbach Karl CS PY Eades FA93 FA100 C
414446545 2 2 3 Offenbach Karl CS PY Eades SP94 FA100 C
414446545 2 2 2 Offenbach Karl CS PY Eades SP94 MA101 C
414446545 2 2 3 Offenbach Karl CS PY Eades SP94 FA100 C
|
This result set is nested by the EDA engine to construct the U2 record shown in Listing 6.
Strong data types and EDA non-conforming data
A U2 table consists of a data file and an associated dictionary. The dictionary provides formatting information for
the query tools, but is optional and not used by U2 to enforce either the data typing or data integrity. There may
be zero, one, or multiple definitions in the dictionary for a given U2 attribute.
EDA uses the conversion, formatting, and MV and MS information of a dictionary attribute to derive the corresponding external
RDBMS column data type. The formatting and conversion information determines the corresponding external RDBMS data
type. The MV and MS information determines the primary table, MV tables, and MS tables structure.
It is very important to create or choose a correct dictionary attribute for each U2 attribute intended to be mapped to
EDA. A well defined U2 attribute should have accurate formatting, conversion, and MV and MS information in the
dictionary.
All U2 data values are stored as character strings, even for numeric or date and time attributes. Attribute values have no
length limitation. To store U2 data in external RDBMS strongly data typed columns, the EDA engine needs to map
U2 data into the corresponding external RDBMS data types, and to handle the exceptions arising from the use of U2
data that is considered invalid for the specified external RDBMS data type or column length. If a U2 application
tries to insert invalid data, the corresponding insert operation on the external RDBMS
fails. Such cases are
called nonconforming data cases in EDA.
EDA provides an option to store nonconforming data in an external RDBMS table in a
CLOB format, so the data is not lost and all U2 operations, such as ECL commands and Basic statements, treat nonconforming data as normal,
thus allowing a U2 application to continue running properly.
Nonconforming data cannot be comprehended by the external database applications though because the data is stored in
a native U2 format, with possibly embedded value and sub-value marks. Users need to fix the invalid data in the
external table in order to make it available to native RDBMS applications.
An ECL command SELECT.NONCONFORMING returns IDs of all records that contain nonconforming data.
U2 attributes that are not explicitly mapped
It is not required to map all U2 file attributes to distinct external database columns. There are various reasons for
mapping only a subset of dictionary attributes. A U2 dictionary may only define
attributes that are used in a U2
query, while the missing dictionary locations are used by U2 Basic applications that do not require a dictionary.
Some attributes are intended to contain non-string type data, such as integer or date
and time, and indeed contain proper
values. However, other attributes of this kind may contain both integer and string data or a multi-valued attribute may
have a valid date as its first value and a string as its second value. Both cases are easily handled by U2
applications, but the latter case produces an exception when the data is stored in a strongly typed external
database.
EDA supports a selective mapping strategy. A user can begin the EDA mapping task by selecting attributes that are well
defined and contain all valid data first. This help with reducing the amount of nonconforming data.
After a U2 file has been converted to an EDA file, a user can still work on attributes that have not been explicitly
mapped by fixing the data, finding the correct data type, and mapping more attributes to EDA by using the remapping
feature of the EDA Schema Manager tool.
Choosing carefully which U2 attributes to be mapped to EDA makes the conversion
easier. You should first consider
those attributes whose data need to be exposed to external RDBMS applications and also those that help to
improve the performance, which is the topic of the next section.
Performance considerations
The EDA engine accesses the external database through a client-server interface, similar to ODBC. When compared to the
performance of the U2 engine native access to data stored in U2 files, EDA is slower
with reading, writing, and querying, since a client-server interface is slower than a
direct database access. The NF2-to-1NF mapping is also a key part
of this slowness. As shown in the example above, reading one record in U2 turns into a complex query on
the external RDBMS side with outer join operations on several tables returning a set of result rows. The more
non-associated multi-values a record has, the bigger the result set is, and the more associations are mapped in
the EDA schema, the more tables are joined in the query. That is why selecting U2 attributes for mapping to EDA
judiciously is important for performance considerations.
EDA also provides the WHOLE RECORD option to improve the performance of READ operations. The U2 Basic READ statement
requires the retrieval of the entire record, which may result in a large SQL query being
issued by the EDA engine,
which in turn negatively affects the performance. When the WHOLE RECORD option is turned on for a U2 file, in
addition to storing the mapped attributes data in the corresponding external table columns, the EDA engine stores
the entire U2 record in an extra column, called RECORD_BLOB. For a Basic READ, the EDA engine retrieves the contents
of the RECORD_BLOB column instead of executing a large outer join query.
As mentioned earlier, the EDA performance is also affected by the number of MV associations mapped to EDA. It is
important to only select the absolutely necessary attributes for mapping to EDA and define their SM dictionary
information correctly, so that the information, whether an attribute is single-valued,
multi-valued, or
multi-subvalued, is accurately represented in the dictionary. Such accuracy helps to improve performance.
EDA also provides a mechanism for mapping U2 virtual attributes. Mapping virtual attributes to EDA improves the
performance of U2 queries.
EDA implementation details
Map a virtual attribute as a generated field
A generated field is an EDA mechanism for mapping a virtual attribute by evaluating it in U2 and storing the values
in an extra column of the external RDBMS table.
In order to map a virtual attribute, drag it from the U2 File Dictionary pane to the EDA Map Schema pane and
chooses the appropriate type in the Attribute Details pane, as described in the EDA manual. Choosing the type of
DATA instructs the EDA Schema Manager tool to use the generated field mechanism to map
this virtual attribute.
Figure 2. EDA Schema Manager
GPA1 is a virtual-type attribute in the dictionary. The EDA Schema Manager tool
maps this attribute as a generated
field when the type of DATA is specified in the Attribute Details pane, as shown above. GPA1 is evaluated by the U2
engine, and the values are stored in the external RDBMS. U2 access can be optimized when GPA1 is used in a query,
especially if an index on the mapped column GPA1 is created on the external RDBMS side.
U2 index
If a U2 file has an index created on an attribute, it is advisable to map this attribute to EDA.
If an index exists on a data type attribute that has been selected for EDA mapping,
then the EDA Schema Manager tool automatically creates an index on the corresponding
column of the external database table during the conversion process.
If a virtual attribute is mapped as a generated field, then the EDA Schema Manager tool can create an index on the
corresponding column of the external database table during the conversion process, if the full mapping is selected.
In selective mapping mode, user can select the V-type attribute and map it to EDA as D-type column, and check the
index box.
Handle an empty string
U2 databases can store a zero length string, called an empty sting, in attributes of all data types. In an RDBMS, an
empty string cannot be stored in a numeric or a date/time column. EDA translates empty strings into NULL values
when storing data in an external RDBMS; translates NULL values to empty strings when returning data to the U2
engine. U2 applications do not need to handle empty string specifically.
WITH/WHEN condition
If a U2 query has a WHEN/WITH condition on a mapped attribute, the condition is
translated into a WHERE clause in the SQL SELECT statement shipped to the external RDBMS, which in turn improves the performance since the condition
is applied on the engine that owns the data and the amount of data passed back to U2 maybe significantly reduced.
The EDA engine translates U2 conditions according to the U2 condition semantics.
Empty strings in U2 conditions require a special handling. In U2, an empty string is smaller than any other value.
All comparison operations, such as <, <=, and <$gt;, include the empty string. This is different from SQL
conditions, where all operations involving the NULL value produce a false outcome, except the IS NULL clause.
Therefore, the EDA engine adds the IS NULL clause in order to retrieve NULL values and convert them to empty
strings.
For example, the returned result set is:
LIST CUSTOMER NAME WITH DUE_DATE < "01/01/07"
|
U2 returns the CUSTOMER record with the due date earlier than "01/01/07" and the due date that has no value.
The EDA engine translates this U2 query into:
SELECT NAME FROM CUSTOMER WHERE DUE_DATE < "01/01/07" OR DUE_DATE IS NULL;
|
Visibility and updatability of data on the external RDBMS side
As mentioned earlier, all U2 data is stored in external tables. However, only EDA-mapped fields are visible to external
RDBMS applications and tools. Only the columns containing data from the D-type
attributes are updatable by external database applications.
Those U2 attributes that have not been mapped are stored in their native U2 format in a RECORD_BLOB column of the
primary table. These attributes are not visible to external database applications, and cannot be updated
by such applications, but only by U2 applications.
U2 virtual attributes that are mapped to generated fields are visible to external RDBMS applications and can be
searched and indexed, but cannot be updated by such applications.
EDA files for which the WHOLE_RECORD option is turned on cannot be updated on the external database side because such updates result in a loss of data consistency due to different data being stored in the individual
columns and in RECORD_BLOB.
When an EDA file contains a primary table as well as an MV table and an MS table, you need to be
very careful during an INSERT or UPDATE from the external RDBMS side when providing the
MV_POS and MS_POS value, and follow the specific order. For instance, insert into the
primary table first, then insert into MV tables, then the MS tables.
The different ways to use EDA
In addition to the primary goal of EDA to convert and store U2 data in the external RDBMS tables and access it by both U2 and external database applications, there are other purposes for using EDA to simplify the task of accessing data
stored in external database tables.
Normalization of U2 data
EDA provides a very easy to use and effective conversion tool for converting NF2 data into the 1NF format. There
can be various reasons for needing to normalize U2 data.
Replicate a U2 file to an external RDBMS
If a user wants to make a copy of U2 data in an external database, then using the UniData replication might be a
good choice. Convert the U2 file into EDA in the subscriber account. The updates performed on the
original U2 file in the publisher account are then replicated to the EDA file in the
subscriber account. The external database tables are then updated by this replication process.
Users only need to map D-fields that they want to be visible on the external database side since the data is only
updated in the original U2 file and no updates are expected by external database applications. A U2 application
operates on the original U2 file and the external database application operates on the
external tables. There is no negative effect on the performance of either of these applications.
Import existing external database tables or views into EDA
BCI is the primary way to access existing external RDBMS tables from U2 applications. On the other hand, it is easier to use EDA than BCI since BCI requires the use of Basic extensions and a certain level of SQL knowledge. In order
to import existing external database tables or views into EDA, you need to define a U2 dictionary containing the
attributes matching the external table or view columns, run the EDA Schema Manager tool to generate the runtime mapping
file, and the data becomes accessible by U2 Basic applications and ECL commands.
Conclusion
This article discussed EDA mapping from NF2 to 1NF, the tips for getting
better EDA performance, details about EDA implementation, and several
creative ways that a U2 customer can use the EDA feature his or her business.
With EDA, U2 users can access the data stored in an external database much easier
than before. When a U2 file is converted into an EDA file, the U2 application
accesses the external data just like it would access U2 data normally, with no ODBC or
SQL, like a statement involved in a BP program.
Resources Learn
Get products and technologies
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
Discuss
About the author  | |  | Yinghong Fei works for IBM U2 in Denver, Colo. She has more than 15 years of experience in developing U2 data servers. She covers several development areas, including data transfer from other PICK systems, internal memory usage, QUERY/SQL engine, query optimization, report formatting, UCI, the U2 ODBC server, and NF2/INF mapping
mechanism. Recently, she has focused on External Database Access for U2. |
Rate this page
|