While Informix Dynamic Server (
The purpose of this article is twofold. First to state these products are still being sold by
Red Brick Warehouse
Red Brick was originally founded by Ralph Kimball in 1987. It became a product company in 1990 and over the course of the next 15 years, produced significant technologies for data warehousing. Commonly used industry terms like Star Schema, Star Join, and Dimensional Modeling were first used at Red Brick.
Here are descriptions of some of the key technologies:
Star Join/Star Index
A dimensional model contains fact tables (which hold the quantitative data about a business – the facts being queried) and dimension tables (which hold data that describe the dimensional attributes). A dimension table is connected to a fact table by means of a foreign key reference.
All databases join tables in pairs in what’s known as a pairwise join. Red Brick recognized that they could improve performance by building a “pre-join” index consisting of foreign keys of dimension tables. The Starjoin algorithm allows multiple tables to be joined in a fast, single pass operation to yield fast query times. A key advantage for the Starjoin approach in Red Brick is that as you have more dimensions in a single query, the better is its performance as compared to other systems that deploy the pairwise join algorithm. Most database products in the industry have implemented Star Join in one form or another, but no company has implemented it using Star Index.
[Note that the Push-Down Hash Join feature being considered in
TargetIndex and TargetJoin
TargetIndex and Targetjoins make up Red Brick’s bitmap index technology. Bitmap index (also known as bit vectored index) maintains information by tracking each unique column value in a compact bit representation, with a pointer back to each row in the table with that specific value. Sybase-IQ is another vendor that has broadly touted this technology for many years.
TargetIndex exist on single columns that are weakly selective, i.e. where many rows would return based on that constraint. For example, GENDER column returns two values: M/F. Targetjoin is a join algorithm that uses target indexes to identify lists of candidate dimension rows; rows that exist on each list are then retrieved from the fact table.
[Note that the Multi-column index feature being considered for
Those familiar with the concept of Materialized Views or Summary Tables will recognize this feature as such. DB2 calls it MQT (Materialized Query Table). One of the most frequent operations performed in decision-support queries is the calculation of aggregate totals such as monthly and quarterly totals, revenue by product or customers, or other types of grouping analysis. In the absence of aggregates, these queries must read hundreds of thousands or million of rows in order to calculate and group the results, and it must do this each time the queries are run. Red Brick’s
The aggregate table is defined to
Versioning (Query Priority Concurrency)
Red Brick’s Versioning facility provides for real-time updates to a data warehouse by allowing data to flow into the warehouse without affecting query response times of the users. The versioning technique permits data modification transactions to occur on a separate version of the same data, while queries are in process. When the data is available for update, the versioned updates are then made the base data.
This facility may seem strange to those accustomed to working with OLTP databases as data in a database should always be kept up to date. In a
Table Management Utility (TMU)
Red Brick’s TMU is essentially a fast loader for the warehouse. What makes it different is that it is able to perform loading, do data conversion, perform referential integrity (RI) checks, build/update index all in a single pass, making it query ready without having separate steps to build indexes or check constraints. There were customers that bought Red Brick simply for its load speed.
The Red Brick TMU has different load modes, all to handle duplicates and new rows. They are: Append, Insert, Replace, Modify and Update. One innovative option unique to Red Brick is something called Auto Row Generation. When loading a data warehouse, it is inevitable that there will be “dirty” rows, i.e. missing dimension rows. Other database products will discard these offending rows. After the initial load, the offending rows must be examined, the appropriate dimension rows must be added, and the previously discarded rows must be reloaded, a very time-consuming process.
The Autorowgen feature gives the user the ability to automatically generate rows in the referenced (dimension) tables while the data is being loaded. With Autorowgen, when a row is automatically inserted into a dimension table, the primary key is populated with the missing value, and the rest of the columns are populated with the default value defined for them.
There are many more features in Red Brick designed for data warehousing that deserve attention. Even with the discussion of data warehouse appliances (in my previous entries), there is a need to enhance the core capabilities of the server itself.