Fred Ho's Blog
As a first time blogger, I'm wondering why anyone would want to see another blog. Even for the Informix loyalist, there are many other more knowledgeable and technically savvy people that have blogs on Informix products and other notable developments in the Informix world. So let me start by saying a few words about my past experiences. If that is in any way interesting, perhaps there are some common areas that warrant discussions.
I have been working in the computer industry and primarily in database development in one form or another for some undisclosed number of years. Places I've worked for include Burroughs (for which most people have never heard of), Sperry Univac, HP (for a very short time), Sun (yes, the Oracle one, also for a very short time), Tandem (14 years), Informix (Red Brick for about 3 years) before being acquired by IBM in 2001. Since then, I've managed teams on DWE, Cube Views, IDS Engineering, and in my current position, as manager of the Competitive Technologies and Enablement team.
My area of interest (not necessarily area of expertise) include DBMS, Fault-tolerant systems, Data Warehousing, Business Intelligence, Clusters and High Availability. Over the years, I have been fortunate to work with some real experts in the database area, including 3 of the original 7 authors of the System R paper, who all happenned to have started at IBM Research but were working at Tandem at the time. And at Red Brick, a company originally founded by Ralph Kimball, is where I really understood how data warehousing works.
Feel free to comment if you are interested in any of this. I will discuss more on what the Competitive Team works on in the next comment.
This is Brussels covered with snow from one of the floors at the Sheraton Hotel where the SW IOT Sales and Tech Sales Training was held during the week of 2/8/10-2/12/10. SW IOT is defined as the Southwest region of Europe comprising of countries like Spain, Portugal, France, Italy, Belgium, Netherlands, and Luxembourg. Subjects covered by this internal sales training include Infosphere, Data Management, Optim, and others. Representing Informix were Terri Gerber for Sales and I for Tech Sales. I presented sessions on the following: 1) Overall Informix business and recent references, 2) Preview of Panther features, 3) Informix Warehousing and ISAO, 4) Informix High Availability features and 5) Informix Embeddability.
The first day consisted of a number of general sessions where Alyse Passerali (WW DM Sales VP) talked about the overall state of business and key plays for DM in 2010. This is followed by general sessions on how to use "White Boarding" for Sales. White-boarding is a new process to get our Sales teams in engaging potential customers in an interactive dialogue to learn their business problems and demonstrate our solutions. I sat through each of the sales sessions that Terri Gerber conducted (Spain/Portugal, Benelux, France, Italy0 and answered questions whereever I could.
My technical sales training sessions were quite well attended, with some dedicated IT specialists from France, Italy, Belgium and Spain. Jose Manuel Gallud has always been a great help in the region. I met others including Mario Rocco Garasto, Stefan Van den Borre and Dr. Simone Moncini. I was able to use a number of public references, e.g. Cisco, Jalisco Mexico, DHL-Germany, China Mobile, Peapod, Hilton Hotels, and others.
One of the keynote session was regarding the new solution offered by Guardiam, a recent acquisition specializing in Data Governance. Their solutions work with any database as they do "sniffing" of SQL operations before they get to the database. Their solution usually involves a hardware component, but a software only solution also exists. I've confirmed with our own in-house security expert (Jonathan Leffler) that this is a solution we should be discussing with our Informix customers.
Besides the presentations, I had good interactions with many of the account teams resulting in possible opportunities in the region. If I do return to the region at their request, I hope it will have better weather. With the snow in Brussels, the airport was temporarily closed. Just walking around the hotel was difficult due to the snow and windy conditions. But there were some fantastic food. Of course, some authentic Belgium waffle and Belgium chocolate. On my way back to the U.S., I was delayed in Washington DC for 6 hours, not due to snow, but due to mechanical problems on the plane. They finally switched plane and instead of getting me back around 7:30pm, I got in at 1:30am. in SFO. Add another hour of driving to get home after that.
In case you're not one of the approximately 1 Billion people tuned into the World Cup soccer event, there was yet another huge controversy similar to the one back in 1966 between the same two teams. If you don't know what I'm referring to, it is the goal that England should have been given in their match with Germany this past weekend. You can see the actual YouTube segment with this link:
http://www.youtube.com/user/KingUndercover1, but the following picture shows where the ball is relative to the goal line.
Of course, the German press had a whole different view of the event, choosing to depict it as follows, which I do admit to be quite clever. (picture courtesy of Sandor Szabo)
I'm sure the English don't find this to be particularly humorous.
So what does this have to do with Informix? Well, if you recall that one of our best references in recent years has been the solution offered by Cairos/IMPIRE of Germany, who uses Informix to provide real-time sports statistics in the German soccer league (Bundesliga). That solution has been very successful and continues to be used today. However, at the same time that the solution is used for statistics, the idea of introducing the use of a RFID chip in the soccer ball also came up. The information captured by the RFID chip will definitively identify the location of the soccer ball. It would put an end to these types of disputes that has plagued the soccer world for so many years. We have the technology with Informix to solve this problem. An article from the RFID Journal points to additional testing that has to be done, but it is nevertheless available.
However, like some many things, it is not about the technology. FIFA (The world Football Association) is refusing to even consider instant-replay, let alone RFID chip in the ball. In fact, their reaction to the weekend fiasco has been to ban all instant replays in the stadium.
I plan to resume my normal blog of hopefully more Informix pertinent information in another week or so (which happens to coincide with the conclusion of the world cup).
fredho66 1200006VRJ Tags:  scan multi-index star warehousing informix panther 11.70 data join 3,916 Views
By now, you should have already seen that Informix 11.70 (Panther) has gone eGA.
Here's the link to the official announcement:
· Flexible Grid
· Application Development
There is a whole host of other collaterals describing all the features in Panther so I don’t want to duplicate the effort here. However, in line with my blog’s focus on Warehousing, I will highlight the significant Panther warehousing features in this blog entry.
For those of you familiar with the Informix product lines of 7.x, 8.x, and 9.x leading up to the current versions of 10.x and 11.x, one knows that the once touted convergence of XPS code line of 8.x with the IDS code line of 9.x (code name: Arrowhead) did not materialize after the IBM acquisition in 2001. I’m happy to report that as of 11.70, the Arrowhead project is now complete.
The significance of this convergence is that the warehousing capabilities of XPS, which have been very popular and well-received over the years, (and for which many key enterprise customers still depend on today), are now largely available in the IDS product. Customers can now expect the combination of key characteristics, i.e. availability, reliability, ease of admin, OLTP performance, to be equally applicable to warehousing workloads. While the XPS product will be maintained and supported indefinitely, customers now have a viable alternative to having a single enterprise RDBMS handle both workloads, either on a single server or in a cluster environment. If migration were to occur from XPS to IDS, it promises to be the easiest transition than any competitive products on the market due to the close proximity in architecture, utilities, administration, and capabilities between XPS and IDS.
There are two significant areas of increased capabilities in Panther for Warehousing. They are:
In this entry, I will first discuss advancements in Query Processing.
Up to now, IDS is not able to take advantage of using multiple indexes on a single table to speed query processing. Take for example that you have a CUSTOMER table containing millions; if not billions of rows (e.g. FACEBOOK’s Customer table should have over 500 million rows). Suppose further that you need to look for Male customers having a Masters degree with a HIGH income level that live in a certain zip code.
The SQL query would look like the following:
SELECT count (customer_id)
WHERE gender = “m”
AND income_category = “high”
AND education_level = “masters”
AND zip_code = “95032”;
IDS would currently process such “low-selectivity” predicates like other higher selectivity predicates by simply choosing the highest selectivity predicate, fetching each record (presumably using an index), then simply matching each record to see if the other predicates qualify. It is not difficult to see that many records will be retrieved and discarded before the resulting rows are returned to the user.
In Panther, the system can drastically reduce the number of rows fetched by deploying bit vector technology. Here, each row of the table is represented by a single bit in a bit vector. For each of the columns mentioned in the query, a separate index is deployed. For each predicate, e.g. zip code = “95032”, a number of bits, each representing a row that qualifies in the table is marked. Once the bit vectors are built for each predicate, these bit vectors can be combined using the AND operator where the resulting bit vector represents the intersection of all the conditions specified by the predicates. This single bit vector contains all the rows that need to be retrieved, presumably much fewer than using the previous method. The following diagram shows this method.
As a further optimization for faster retrieval, the Row IDs (RIDs) are sorted prior to the retrieval of the records from the table in order to facilitate sequential reads from the table. By “skipping” to only the records in the resulting bit vector in a sequential manner, known as SKIP SCAN, we have now further reduced the retrieval time needed.
The benefit of Multi-index scan goes beyond processing a single large table as you will see with the discussion on Star Join below.
Star Join is a join method designed to speed up queries in dimensionally modeled schema, called Star Schema. Dimensional Modeling (and the topic in the book) was invented by Ralph Kimball, also the founder of Red Brick Systems. Red Brick was acquired by Informix in 1998 and is part of the Informix product line. Just like XPS, Red Brick is actively supported and maintained by IBM and still used by hundreds of customers world-wide.
As the diagram below depicts, having multiple dimension tables surrounding a central Fact table gives the impression of a star, and thus its name.
I have already discussed the technologies behind Red Brick so for those interested, please refer to my earlier entries in this blog. Since its inception in the 90’s, other competitors have all implemented Star Joins in some way. Instead of materializing the Star Index, all other products have chosen to do Star Joins “on the fly”. XPS’ method is known as Push-Down Hash Join, which is now implemented in IDS as of the Panther release.
Again, without duplicating a lot of other materials written on this topic, the goal of having a Star Join is to reduce the number of records fetched from the Fact table (similar to the example in the previous example involving the Customer table), but also to minimize the number of records produced as intermediate result during pair-wise joins of multiple dimension tables.
For example, if a query using the Star Schema diagram depicted above involving dimension tables and the Fact table, a Left-Deep Tree involving the Fact table and all the dimensions would result, and would look like the following diagram.
Here F denotes the Fact Table and D1 denotes dimension table 1, D2 is dimension table 2, and so on. For the example, it assumes that the Fact table has 10 million rows with default selectivity of 1/1000, and each of the dimension tables have 10K rows with default selectivity of 1/10. While IDS does have the ability to do Hash Join, for which it is superior to standard Nested-Loop Join for warehousing queries, the intermediate results produced after each join is still prohibitively large as you can see from the example.
In Panther, the Star Join method is now included as part of the IDS’ optimizer costing algorithm. If conditions are met based on statistics of the tables and that foreign keys are defined within the Fact table corresponding to primary keys in the dimension tables, then the Star Join method will be chosen.
Instead of the Left Deep tree, a Star Join in IDS becomes a Right Deep Tree, taking advantage of Multi-Index Scan method described earlier to reduce the number rows being fetched from the Fact table, thus significantly reducing the intermediate result set for each of the Hash Joins performed with each dimension table, as shown in the diagram below.
Beta testing by customer/partners have shown significant improvement with such dimensional queries. Depending on the number of rows in each of the tables and the predicates involved, it is easy to see how performance can double, triple or more. The next blog entry will discuss advancements in time-cyclic data management.
There was a recent InformationWeek webcast sponsored by IBM on the topic of
Combining OLTP and Extreme Data Warehouse Performance in which Lester Knutsen of ADT and I discussed the Informix Warehouse Accelerator as part of the release for the Informix Ultimate Warehouse Edition. The webcast was attended by over 350 people and comments were overall quite positive. In this blog entry, I will attempt to answer the questions that were submitted. I consolidated some of the duplicate questions. Questions are answered in no particular order.
For a replay of the webcast, you can go to the following link:
For more information about the product, e.g. white paper and other collateral, you can use either of the following links:
Note that in the upcoming International Users’ Group conference held in
Q: What is MOLAP/ROLAP/HOLAP?
A: MOLAP stands for Multi-dimensional OLAP, ROLAP stands for Relational OLAP, and HOLAP stands for Hybrid-OLAP.
MOLAP is the most common of the three in building multi-dimensional “cubes” which is essentially summary tables at the BI tools layer. Cognos uses a product called PowerPlay for its MOLAP product, and Hyperion has a similar thing. Once these cubes are built, queries can be answered very quickly if the cubes have the same dimensions used in the query. It can take considerable time to build the cubes, usually measured in hours. ROLAP essentially drills through to the backend relational database for the queries. And HOLAP, which is offered by products such as Microsoft SQL Server, does either or both, depending on the query. There are lots of materials written up on how this is done. With Informix IWA, we eliminate the need to build these cubes.
Q: Can IWA be also used to accelerate OLTP queries, and not only OLAP ones?
A: IWA's sweet spot, as mentioned in the presentation, is for OLAP queries against data warehouses where there is at least some form of Star Schema set up. That is, there needs to be a Fact Table with surrounding dimension tables. For OLTP queries, chances are that the transactions are short and would not need the type of acceleration provided by IWA, where it is scanning potentially millions and billions of records. However, if the OLTP schema can be defined in such a way that a data mart can be offloaded to IWA, then the queries will be accelerated. In some situations, e.g. operational data store, both OLTP and warehouse workloads share the same schema. In that case, IWA can be used to accelerate the OLAP queries while co-existing with OLTP. This is the basis for the mixed-workload I was referring to during the webcast.
Q: Is there any formula to determine how much memory you need for in memory database?
A: The amount of memory needed for IWA is dependent on the amount of compression that can be done on the data. Since we use columnar technology, data can be compressed quite well, as data in columns are usually quite similar. We use a rule of thumb of 3:1, i.e. 300 GB of table data means about 100 GB of main memory. There is no need to account for index or temp table space as IWA does not need them. Allow for additional memory if the same system is being used for IDS and other processing on the same system.
Lester's testing showed that he went from 110GB of raw data to 24GB in memory, which is much better than a ratio of 3:1.
Q: Can get special build to get along with IDS11.5 for iWA?
A: IWA is part of Informix Ultimate Warehouse edition and needs to run on 11.70xC2.
Q: If query using temp table, does IWA works in the same way (fast)?
A: Yes, for BI tools can use temp tables, queries that were run to retrieve the result onto the temp tables can be accelerated. The temp table processing itself is done on the IDS side, not in IWA.
Q: What is the largest size database you can have with the in memory db appliance? How do you handle high availability?
A: For the 1st release of IWA, the configuration in a single IBM x-Series machine, it has a maximum main memory capacity of 1.5 TB. With a 3:1 compression ratio, it would mean that the current maximum limit is about 4.5 TB of raw data. X-series is releasing a new system this year that will double that capacity. Note also that IWA can run on any Linux/Intel box, not necessarily from IBM. In an upcoming release, we will support running IWA on a Blade Server, thus allowing maximum scalability to multiple nodes (or blades).
In terms of high availability, the set of disks on IWA stores an in-memory image of what is in-memory for recovery purposes. This set of disk (much smaller than those that are stored on the Informix side) is only used to provide High Availability, not the actual database itself.
Q: How does the performance compare to Oracle Exadata appliances?
A: We have not done direct comparison to Exadata, but one of our large customer prospect is running a Proof of Concept (PoC) comparing the two. We will have much better data points once that are completed. We do believe that our numbers will be very competitive, and also at a much lower cost as compared to Exadata.
Q: IWA is based on MPP architecture; what is added at hardware level to make it MPP?
A: Actually IWA is not based on an MPP architecture. It is based on SMP. In fact, the recommended configuration for the 1st release is a single Intel box (with multiple cpus/cores). However, the original IWA design allows it to be deployed on a Blade Server with multiple nodes. We intend to support that configuration in an upcoming release. As I mentioned in the talk, we don't need extra hardware to do I/O for us, just the Intel box for which IWA resides. The fact that we use columnar technology and SIMD parallelism to scatter our "cells" to the different cores within the system provides us with a very distributed platform similar to what a MPP system would do. In a way, you can say that we combine the best of SMP and MPP onto a single system.
Q: Does IWA support locales other than default?
A: It is correct that as the 1st release, only the default locale is supported as indicated in our manual.
We recognize this limitation and are working to add more locales for the upcoming fixpack. If there is interest for you company or an end customer, testing using the default locale will demonstrate the performance gain which can lead to successful implementation once we support the additional locales.
Q: Can you comment on costs? Of course it can vary a lot depending on what one want to do?, However can you comment on costs?
A: Cost is highly dependent on the amount of data that you plan to have in memory. As I mentioned during the talk, the rule of thumb is 3:1 so 300GB of raw data in your warehouse would require about 100GB of main memory. There is no need to account for index or temp space since IWA does not require them. The current eX5 system from IBM can contain up to 1.5TB of main memory at a very affordable price.
Q: What vendor/products is IBM positioning this Suite of products (IDS & IWA) against?
Q: How is IBM planning on Marketing/Selling this Suite of products (IDS & IWA) to the General public, the Non-Informix Community?
A: Actually, the two questions are sufficiently related that I think I can answer them together.
IWA is primarily targeted towards the Informix base. Obviously, the solution is generally applicable to non-Informix customers, but we believe there is sufficient need and interest within the customer base to generate sufficient demand. Our target is data warehouses that are less than 5 TB. As mentioned in the webcast, most analytics are done on warehouses of much smaller sizes regardless of how large the enterprise warehouse may be.
Q: Is there a problem\conflict if using CISCO\Riverbed (or any other) compression\acceleration with IWA
A: No, there is no conflict between Wide Area Network compression/acceleration with IWA. Once data comes over from the network to the IDS database, it is in uncompressed format. You may choose to apply IDS compression to store the data on IDS, but data in IWA uses completely different compression techniques and is not affected by other forms of compression.
Q: Is this solution is comparable to Oracle's Exadata?
A: I believe I addressed this somewhat during the call. The IWA solution from Informix is very much comparable to Oracle's Exadata approach, in that it tries to provide a substantial increase in performance for warehousing workloads. The architecture and approach is quite different because instead of utilizing another layer of hardware to perform I/O from the database, IWA eliminates I/O by using pure in-memory technologies, extreme compression and predicate evaluation with compressed data. This is how we can achieve an order of magnitude speedup at a fraction of the cost/admin required by standard relational systems.
Q: The data mart will still exist in the database and then loaded into memory?
A: Yes, database/datawarehouse resides on IDS, the data to be accelerated is loaded by IWA (by IWA). Then all subsequent queries that can be accelerated will be routed automatically to IWA and the answers will be returned to IDS. Applications that access the IDS database, e.g. for OLTP, will continue unchanged. Only the affected data mart (with a defined Fact Table) will be re-routed to IWA.
Q: Is the data loaded into IWA or meta-data gets loaded into IWA? If data is being loaded then is more storage needed?
A: Both IDS and IWA contain metadata for the "data mart" that's defined to be part of IWA. The data warehouse is stored on the IDS side, but not on IWA. It is only loaded into memory. So there is no requirement for additional storage on IWA. The disks shown on IWA are only for recovery purposes in case there is a failure to the IWA box. It is a memory image of what is in IWA.
Q: Putting entirely in memory? And what happens when there's a machine failure? No threads that flush to disk and persistence? Not smart.
Q: You said it was an 'all in memory only' solution, if true, what do you handle system failure, and disaster recovery planning.
A: These first two questions are referring to the same thing. The disks attached to IWA (not the ones on the IDS side that contains the database) are used for machine failure. It contains a memory image of what was loaded in memory. The amount of disk needed on IWA is limited to size of main memory configured, usually much smaller than the data base itself.
Q: So if I understand the diagram... The Warehouse accelerator is a separate box. All in memory solution. The timing listed in Fred's slide show time for queries. But how/when is the data loaded in to the box and what's the time to build the 'In Memory' warehouse that the query is run against?
A: The Beta customer (large shoe retailer) loaded 1 billion records of the Fact Table and two dimension tables (one with 3 million and another with about 330K rows) in about 30 minutes. The customer's comment was that since it used to take him 45 minutes to run a report, he gets his ROI within 2 queries. There are configuration parameters that will help with optimizing the load time.
Q: 512GB of memory? And what's the cost? What's reasonable? I thought the solution was all 'in memory' so why the 6 300GB 'fast' disk?
Q: Again, what's reasonable with respect to price.
A: Answer to both questions - for the recommended configuration using a System X server, the cost of the server including 512GB is around $50K to $60K, for which I believe is a very reasonable price. The software licenses for IWA depends on whether you are coming from the Informix Ultimate Edition ($160/PVU) or if you are buying a new license ($463/PVU). The 6 300GB disk is for recovery purposes as mentioned above.
Q: Will there be any Sessions, Tutorials or Labs for IWA and DW/BI offerings at the upcoming Informix conference in May?
A: There are several dedicated sessions for IWA at IIUG in May. We are also planning a live demo for accessing IWA in the Cloud. There will also be business partner sessions on this topic.
fredho66 1200006VRJ Tags:  genero replication warehousing informix innovation anniversary 2,989 Views
Today marks the 10th anniversary of Informix as part of IBM. While IBM has purchased many software companies since 2001, Informix represents the only other major DBMS (besides DB2) in IBM's Data Management portfolio. Informix has had a proud tradition of innovations, as seen in the attached diagram.
Starting with the Dynamic Scalable architecture which dates back to the early 1990's where multi-threading was inherent and features such as Data Fragmentation came as early as the late 1990's. Beyond that, the product included key replication features for Enterprise Replication (ER), High Availability Data Replication (HDR) leading up to the release of Flexible Grid in 11.70. That release also includes significant warehousing capabilities by completing the "Arrowhead" project that was started prior to the IBM acquisition. With the recent addition of the Informix Warehouse Accelerator, Informix now provides a highly performant, highly reliable database platform for OLTP and Business Analytics workloads with minimal administration.
We have also seen a resurgence of interest in the Extensibility features that have been part of the product since its acquisition of Illustra. The Time-Series capabilities has now found tremendous popularity in the Smart-Metering market resulting in wide acceptance within the Energy & Utilities industry segment. And with the thousands of customers running on 4GL, the recent release of Informix Genero is providing them with new ways to modernize their application environments.
Since the March announcement of Oracle dropping support for the Itanium chip from Intel, there has been much news coverage surrounding this topic. Latest one being that HP is now suing Oracle over this issue. (http://www.pcworld.com/businesscenter/article/230396/hp_sues_oracle_over_dropped_itanium_support.html)
The purpose of this blog entry is to preview that both IBM and HP will be coming out with formal statements stating IBM and more specifically Informix support for Itanium on HP-UX. Current Informix customers running on HP-UX Itanium should not have any doubts about continuing on this platform. HP has declared its HP-UX PA-RISC support will end in 2013 and is actively motivating its customer base on PA-RISC to migrate to HP Itanium. Informix has supported this migration. Besides the fact that Informix has had a long history of supporting the HP-UX operating system, there are specific features that will make this migration easier.
If you are an Oracle customer running on HP-UX Itanium today and wondering what you should do, there is a very simple choice. It will be more cost effective to switch data base than to switch hardware. You can switch to an Informix data base providing Enterprise class OLTP and Data Warehousing capabilities with a fraction of the data base administration needed, and at a much lower total cost. Best of all, you do not have to worry about the outcome the lawsuits between the two companies to decide what to do with your system.
First webcast is sponsored by InformationWeek titled "Now Everything You Have Ever Wanted in Business Intelligence is Possible".
This webcast is presented by Craig Humphreys from Preferred Partners and I from IBM.
Date: Tuesday, February 28, 2012
The new Informix Warehouse Accelerator technology from IBM has completely transformed business intelligence. Now, incredible solutions are not only possible, they are affordable and easily achieved.
This webcast will present examples of real world solutions that could not exist without this software-only technology. You'll hear how Informix Warehouse Accelerator leverages a columnar approach to accelerate warehouse queries up to 100 times or more, including:
Please register at https://www.techwebonlineevents.com/ars/eventregistration.do?mode=eventreg&F=1004077&K=CAA1EC
2nd Webcast is sponsored by Advanced Data Tools:
Advanced DataTools will demonstrate queries on IWA from a table with a billions rows.
To register, use the following link: https://advancedatatools.webex.com/mw0306ld/mywebex/default.do?service=1&siteurl=advancedatatools&nomenu=true&main_url=%2Fmc0805ld%2Fe.do%3Fsiteurl%3Dadvancedatatools%26AT%3DMI%26EventID%3D180784857%26UID%3D1058236562%26Host%3D90b801182d3448034a004a051a095e5f330708010546035e353b%26FrameSet%3D2
fredho66 1200006VRJ Tags:  hana warehouse in-memory database accelerator columnar informix 2,928 Views
IWA – A Year in Review
This week marks the 1-year anniversary for the release of the Informix Warehouse Accelerator (IWA). This blog entry takes a look at what we have accomplished and what we look forward to.
First of all, for those not yet familiar with the IWA product, it is an Accelerator designed specifically for data warehousing or analytics queries, that runs transparently behind an Informix Dynamic Server. Users submit their queries either through SQL directly or via their favorite BI tool, such as Cognos, Microstrategy or Pentaho. IDS then routes the query to the accelerator based on a defined data mart schema and returns the answer to the user as if the query was answered directly by IDS. The only difference is that the performance of such queries may be 10 times to 100 times faster than before. Queries that used to run in many minutes or even hours can be returned in mere seconds. This is all done without special tuning, adding of indexes or data cubes. It is ideally suited for Informix customers who may have standard OLTP transactions on IDS, while also needing to run analytic queries either on the same system or different systems. The Accelerator runs on commodity Intel hardware that has hundreds of gigs of DRAM at a very affordable price (e.g. less than $100K).
As the primary measure of “success” with any product, especially a new product within IBM/Information Management/Informix, the revenue received has been more than respectable, certainly measured in terms of millions. (I am not at liberty to reveal the exact numbers). We have a number of customers across different geographies and across different industries, including retail, insurance, hospitality, public sector and Energy & Utilities. More importantly, we have partners that have built their own offerings on top of the IWA product and have successfully closed business on their own. Informix has always been a partner-centric and partner-friendly product, and IWA is no different, especially for SMB customers where low cost is of utmost importance. Through a number of workshops and bootcamps around the world, we have pursued an aggressive schedule in enabling our partners and this will continue throughout 2012.
Everyone in the market these days seems to claim that they can do analytics fast. If you include Columnar database offerings, you have upwards of 10 to 20 products competing in this space. One that has received most publicity is HANA from SAP, which shares many of the features and characteristics of IWA. This is a good thing for us. By raising awareness of the capabilities of an in-memory database specifically designed for data warehousing in the industry, customers will further investigate the differences between IWA and HANA.
We already have competitive numbers (based on real customer PoC’s) against other vendors including Teradata, SQL Server and columnar database vendors such as Greenplum and Ingres VectorWise. I am happy to report that we stand up well against them both in terms of performance and in overall cost of ownership. So we welcome such comparisons.
Since our release a year ago, we have added some significant features. They are listed here with brief descriptions:
- Workload Analysis Tool
Takes the guesswork out of the data mart schema needed to deploy IWA on. You simply run your standard set of queries through the tool and it generates the schema in XML format which can then be used to load data into IWA.
- Data Currency
While an IDUG study shows that 90% of data warehouses are only updated once a day, customers who need more frequent refreshes of the entire warehouse can maintain a 2nd copy of data (in compressed form). By toggling between the two copies, one can achieve high data currency and still attain the great performance provided by IWA.
- Cluster Support
IWA can run on a standard SMP machine or on a cluster that is typical with a Blade Server configuration. Here, IWA has access to even more cores and memory across different nodes as opposed to a single node system. IWA efficiently partitions the fact table data while replicating the dimension tables for optimal performance. The number of coordinator and worker processes can be scaled up accordingly.
- Partition Refresh
In the upcoming release of 11.70xC5, we are adding two significant features to IWA. While the current version of IWA requires a complete reload of data when data is refreshed on the host (i.e. IDS), the partition refresh feature now allows users to selectively update one or more partitions. For example, if you have a terabyte size data warehouse already loaded in IWA and updates were made to 10 gigs of data in certain partitions, you can choose to only reload the 10 gigs of changed data. Thee load time for such a refresh is then greatly reduced. In other competitive systems, this is often referred to as Incremental Update or Trickle Feed.
- Loading IWA from Mach11 Secondaries
Taking advantage of the Mach11 capabilities in IDS, one can deploy IWA in a mixed-workload environment by dedicating the primary node for OLTP transactions and having a Mach11 secondary handle the analytic queries. The secondary can be a HDR, a RSS or a SDS node. This secondary node can now handle the loading of data into IWA and can also host the IDS optimizer which determines whether to route a particular query to IWA. The ability to run mixed-workload effectively in a single system is one of the “holy-grails” of database management.
Our goal is to continue to build a base around our Informix customers with analytics needs. We also want to reach beyond the current base into the so-called “white-space”. While there are other overlapping products within IBM’s Information Management portfolio, we believe that the state-of-the-art in-memory technology in IWA together with the strong OLTP capabilities within IDS, provides a unique offering at a price point that is second to none.
For more information, please send a comment to the blog or email me directly at email@example.com.
SQL OLAP Functions: A Primer
Informix is adding a set of SQL OLAP functions in an upcoming release. The following write-up is to introduce the concept of such functions which may be new to Informix users. It is not meant to be a reference manual that lists the complete syntax and optional parameters. These analytic functions are valuable to users because they simplify the expression of complex but commonly used business questions and are efficient without resorting to subqueries or other calculations.
OLAP is OnLine Analytical Processing (as opposed to OnLine Transaction Processing – OLTP). OLAP is generally described as part of Business Intelligence that allows users to perform business reporting for sales, marketing, inventory control, business processes, etc. It usually works with multi-dimensional databases requiring frequent roll-up, roll-down and slicing and dicing.
While OLAP is the overall encompassing term for the type of processing to be done, there are specific functions known as SQL OLAP functions that are supported by different database systems and/or BI tools such as Cognos, Business Object, etc.
SQL OLAP functions are not new. In fact, they adhere to and extend the ANSI SQL-99 standards, known as the “Elementary OLAP” package. Many other databases and BI tools have provided these functions for some number of years, but SQL OLAP functions have not been supported by the Informix Dynamic Server (IDS) until now.
The OLAP functions to be supported by Informix are as follows:
· Ranking RANK(), DENSE_RANK(), DENSERANK(), CUME_DIST(), PERCENT_RANK(), NTILE() · Numbering ROW_NUMBER(), ROWNUMBER() · Aggregate RATIO_TO_REPORT, RATIOTOREPORT · First/Last FIRST, LAST
· Ranking RANK(), DENSE_RANK(), DENSERANK(), CUME_DIST(),
· Numbering ROW_NUMBER(), ROWNUMBER()
· Aggregate RATIO_TO_REPORT, RATIOTOREPORT
· First/Last FIRST, LAST
Note that if the underlying database does not support these OLAP functions, as in the case with Informix currently, then the user must rely on the BI tool to perform these calculations. If the underlying database system supports such OLAP functions, users can choose to submit SQL directly to the database or to submit it through the BI tool. In order to take advantage of the underlying data base support of OLAP functions, the tool must recognize this fact and generates the appropriate SQL to take advantage of it. The difference in performance can be enormous, as in orders of magnitude differences. A new version of Cognos Enterprise BI will generate SQL to take advantage of the OLAP functions implemented in IDS in an upcoming release.
SQL OLAP Functions: Concepts
A window partition is a subset of rows returned by a query, as defined by one or more columns in a special OVER() clause. SQL OLAP functions are applied to these partitioned result sets within the scope of a single query expression.
Olap_function () over (partition by col1, col2…)
Non-analytic functions compute their calculations with respect to every row in the result set. However, an OLAP function in a query is applied with respect to the contents of its window partitions. An example with the RANK function is as follows:
Select date, store_name, sum(dollars) as sales_dols,
rank () over (partition by date order by sales_dols desc) as date_rank
from period, store, sales
where period.perkey = sales.perkey
and store.storekey = sales.storekey
and state = “CA”
group by date, store_name
order by date;
DATE STORE_NAME SALES_DOL DATE_RANK
2012-01-02 Beaches Brew 785.55 1
2012-01-02 Instant Coffee 457.75 5
2012-01-03 Instant Coffee 713.75 1
2012-01-04 Instant Coffee 1031.50 1
The query returns ordinal rankings for sales figures. The stores are ranked according to their sales totals for each value; as the date changes, the ranking values are reset. Multiple sets of ranked values (1 to n) are computed within a single query expression.
In practice, OLAP functions allow application developers to compose analytic business queries more easily and more efficiently. For example, moving averages and moving sums can be calculated over various intervals, aggregations and ranks can be reset as selected column values change; and complex ratios can be expressed in simple terms. Within the scope of a single query expression, several different OLAP functions can be defined, each with its own partitioning rules.
Types of OLAP Functions
OLAP functions supported by Informix fall into four categories:
The OVER Clause
The OLAP OVER clause differentiates OLAP functions from other analytic or reporting functions. The OVER clause has three distinct capabilities:
Depending on the type of OLAP function in question (ranking, aggregation, etc.), the OVER clause is subject to different usage rules. The following discussion explains the concepts of window partitions, ordering partitioned rows, and window frames in general terms.
OLAP Window Partitions
A window partition is a set of rows that are grouped together for the purpose of applying an OLAP function. The OLAP function is applied to every row, but is calculated with respect to the rows in the partition. If no partition is specified, the OLAP function is computed over the complete intermediate result set.
Ordering Rows within Partitions
The OLAP ORDER BY clause is separate and distinct from the ORDER BY clause that can be appended to any query expression. Typically, the ORDER BY clause for an OLAP function defines the expressions for sorting rows within window partitions; however, the ORDER BY clause can be used without a preceding PARTITION BY clause, in which case the sort specification ensures that the OLAP function is applied to a meaningful (and intended) ordering of the intermediate result set.
This specification is a prerequisite for the ranking family of OLAP functions; it is the ORDER BY clause, not an argument to the function itself, that identifies the measures(s) for the ranking values. In the case of OLAP aggregations, the ORDER BY clause is not required in general, but it is a prerequisite to defining a window frame. The partitioned rows must be sorted before the appropriate aggregate values can be computed for each frame.
In the following example, the window partition is defined by the “Dimension” column. The five rows that contain the value “A” comprise a window, as do the five rows for “B” and “C”. The Meaure column is the input data for an OLAP RANK function; the rows within each partition are ordered by the Measure values. When the RANK function is applied, it is calculated over each partition. The Measure values are ranked 1 through 5 within each partition.
The OLAP ROW_NUMBER function is also calculated in this example, without a PARRITION BY clause, to produce consecutive row numbers for the entire result set.
The SQL for this query would look like this:
Select row_number() over () as row,
rank () over (partition by dimension order by measure)
You can define window partitions according to the values in a single dimension or you can specify multiple dimensions. For example, you could partition the rows based on city and state values or month, quarter, and year values.
For non-ranking OLAP functions, such as aggregations, you can define a window frame. A window frame defines a set of rows within a window partition. When a window frame is defined, the OLAP function is computed with respect to the contents of this moving frame rather than the fixed contents of the whole partition.
The definition of a window frame can be row-based (ROWS specification) or value-based (RANGE specification).
Row-Based Window Frames
The reference point for all window frames in the current row. The SQL OLAP syntax provides mechanisms for defining a row-based window frame as any number of rows preceding and/or following the current row.
In the following example, rows 1 through 5 represent a partition; each row becomes the current row as the OLAP window frame slides forward. The frame in this case is defined as between current row and 2 following, so each frame includes a maximum of three rows and a minimum of one row. When the frame reaches the end of the partition, only the current row is included. The shaded areas indicate which rows are excluded from the frame at each step.
In other words, the window frame in this case imposes the following rules:
The following example applies these rules to a specific set of values, showing the OLAP AVG function that would be calculated for each row. The sliding calculations produce a moving average with an interval of three rows or fewer, depending on which row is the current one.
1 A 10 53.3
2 A 50 90.0
3 A 100 240
4 A 120 310
5 A 500 500
The SQL for this query would look like this:
Select row_number () over () as row,
avg (measure) over (partition by dimension)
order by measure
rows between current row and 2 following) as OLAP_AVG
The averages are computed as follows:
Similar calculations would be implemented for all subsequent partitions (B, C, etc.) in the result set.
Value-Based Window Frames
The previous example demonstrates a row-based window frame definition. The SQL OLAP syntax also supports another kind of window frame, whose limits are defined in terms of a value-based or range-based set of rows rather than a specific sequence of rows.
For example, a frame could be defined as the set of rows with Year values some number of years preceding or following the current row’s year. A specific example might be:
Order by year asc range between current row and 1 preceding
where 1 preceding means the current row’s Year value minus 1
This kind of range specification is inclusive. If the current row’s Year value is 2000, all rows in the window partition with Year values 2000 and 1999 qualify for the frame, regardless of the physical position of those rows in the partition. The rules for including and excluding value-based rows are quite different from the rules applied to row-based frames, which depend entirely on the physical sequence of rows.
Put in the context of an OLAP AVG calculation, the following partial result set further demonstrate the concept of a value-based window frame. The frame consists of rows that:
The SQL for this query would look like this:
select row_number() over () as row,
avg (measure) over (partition by dimension
order by year asc
range between current row and 1 preceding ) as olap_avg
The averages are computed as follows:
Most of the information provided here can be found in various manuals of database systems/BI tools that have implemented SQL OLAP standard. The examples used here are taken from the Red Brick Warehouse SQL Reference Guide with slight modifications.
IBM has obtained distribution rights to the following white paper published by Bloor Research (author: Philip Howard) titled "IBM Informix in hybrid workload environments". It discusses key features such as Flexible Grid, Time-Series support and Informix Warehouse Accelerator which makes Informix ideal for Analytic applications.
What is the significance of a moving a dot on a graph? If it is Gartner’s Magic Quadrant for Data Warehouse Database Management Systems which it published on Jan 31, 2013, then it is rather significant.
IBM has obtained the right to distribute Gartner’s report. However, we must reference the link and not send the document around. Here is the link:
To appreciate the movement of the dot, one must compare the Magic Quadrant from last year (Feb 2012) and this year’s (Jan 2013) [courtesy of Glen Zimmerman, IBM]. It shows that Gartner has recognized the fact that IBM’s “completeness of vision”, not only is it in the desired upper right hand quadrant, has moved further right in its “visionary” axis beyond that of Oracle. It also managed to stay ahead of SAP (with HANA), Microsoft and EMC (with Greenplum) in both “vision” and “ability to execute”.
As you will see in the report, which goes into fair amount of detail on each vendor, IBM’s portfolio for Data Warehousing includes, ISAS, Netezza, IDAA on System Z, the newly announced PureData System appliances and for the first time, the Informix Warehouse Edition. More importantly, it points out that the Informix offering is the only in-memory solution for data warehousing and analytics within IBM.
Since this is a blog for Informix technologies, I will expand more on the progress of the Informix Warehouse Accelerator product found in the Informix Warehouse Edition.
Since its release in April 2011, IWA has attracted significant market adoption now spanning some 17 countries around the world and across many diverse sectors such as retail, transportation and government. While the majority of IWA customers are current Informix customers using Informix for OLTP, the product has displaced non-IBM vendors previously used for the customers' analytical applications. Customers can now use Informix as an integrated, cost-effective platform for highly reliable transactional applications as well as for large scale data warehousing and analytical needs. We recognized the game-changing BLINK technology from IBM Research for its in-memory, columnar, highly parallel capabilities and have brought the product to market almost 2 years ago.
We are pleased that Gartner has taken notice of our product and that we played a role, as part of IBM, to “move the dot”.
For information about the upcoming Informix 12.10 release, please register to attend the webcast scheduled for March 5th at https://event.on24.com/eventRegistration/EventLobbyServlet?target=registration.jsp&eventid=571631&sessionid=1&key=6740A7076E0FE9C59EBFDC80C8487D34&sourcepage=register.
In-Memory Databases Go Mainstream
While much have been written about the subject, including several blog entries I have written here, it takes a database market share leader like Oracle to make it mainstream.
In the super-hyped Oracle OpenWorld event in San Francisco this week, the headline news is the announcement of Oracle 12c and the In-memory option. Several articles have already been published in the press, including:
1) Oracle Open World 2013: Oracle launches in-memory database search
2) Oracle's Ellison promises 'ungodly' database speed with new in-memory option
3) Oracle's Ellison Tries To Outmaneuver SAP Hana
Based solely on publicly available information from these articles, I will point out a few similarities as well as differences from our own offering, namely the Informix Warehouse Accelerator.
With the Informix/IWA combination, we store OLTP in its original row format in the Informix database, while in-memory processing is done using a columnar approach. Difference is that IWA never stores the columnar data on disk. HANA (from SAP), in contrast, uses only columnar storage for both OLTP/OLAP processing.
Regardless of tradeoffs between different approaches, the most important point is that customers should feel that in-memory database is no longer an esoteric technology to watch out for. It has arrived and it is now mainstream. Informix has a strong offering and it has already proven itself with customers.
Carlton Doe offers an updated 1-day Proof of Technology (PoT) on IWA.
Its objective is as follows:
In this Proof of Technology, participants will install and configure the accelerator server. They will learn how to analyze queries to determine what tables and columns should be accelerated for greater performance. Finally, using a combination of graphical and command line utilities, they will create, deploy and load marts of accelerated data. They will execute queries against both non accelerated and accelerated data to see the dramatic and exponentially better response times Informix Warehouse Accelerator provides.
For information about this PoT, contact Carlton at 972-561-6103 or firstname.lastname@example.org.
There are numerous Informix events around the world where IWA is discussed. An upcoming webcast (Oct 1) by one of our partners, Advanced Datatools can be found at:
Query Acceleration for Business using Informix Warehouse Accelerator
A draft IBM Redbooks publication (Planned Published date of November 2013)
IBM Informix Warehouse Accelerator (IWA) is a state-of-the-art in-memory database designed to exploit affordable innovations in memory and processor technology and trends in novel ways to boost query performance. It is a disruptive technology that changes how organizations provide analytics to its operational and historical data. Informix Warehouse Accelerator leverages columnar, in-memory approach to accelerate even the most complex warehouse and operational queries without application changes or tuning.
This book provides a comprehensive look at the technology and architecture behind the system. It contains information about the tools, data synchronization, and query processing capabilities of Informix Warehouse Accelerator, and steps to implement data analysis by using Informix Warehouse Accelerator within an organization.
Table of contents
Chapter 1. Introduction to Informix Warehouse Accelerator
In building a set of messages towards the upcoming IBM Insight (formerly IOD) conference, this blog entry is being written as part of that effort (in collaboration with IBM Marketing):
The Internet of Things is changing our world with the pervasiveness of connected sensors and devices, expanding the volume of captured data exponentially. This new computing model requires us to rethink our data management strategies in several ways. Since data is captured from numerous devices across multiple data types, succeeding in the IoT space requires the ability to securely combine structured, unstructured, Time-Series, spatial and other sensor data into a single source of intelligence. We need to make sense of this huge influx of data by aggregating and analyzing data locally, at the gateway level, to use it in meaningful ways. This can only be achieved by using an intelligent database with reliability, flexibility, performance and simplicity to handle real-time and context computing on the edge. Embedding an enterprise-level database, like Informix, coupled with the power of a warehouse accelerator, can optimize business results with speed of thought insight.
There are many ways sensor data can be used. Data can be used for real-time analytics to gain intelligence and respond to events as they happen. You can use historical analysis to look through record history for trends, analyze collections of sensors for correlation and formulate hints and suggestions based on usage and patterns. As tremendous growth opportunities emerge, so do challenges. Sensors collect and forward data for a single measurement, but consumers will have many devices and want a consolidated view. These devices generate a huge amount of data and limited space will create network and latency concerns. In addition, the variety and volume of data makes it hard to locate data and perform analytics that join different kinds of data together. Databases solve these problems when data is organized locally in a compact form that is easy to search and use. Simple SQL and JSON application development interfaces provide a flexible schema without requiring upfront definitions of their types. IBM Informix hybrid capabilities provide management of SQL and NoSQL/JSON data seamlessly, in one database.
Warehouse accelerators improve the performance of traditional data warehousing queries. In the “sensor analytics zone” within IBM’s IoT architecture, IBM Informix Warehouse Accelerator speeds complex queries for relational and Time-Series data. You can scale on a cluster and provision both IDS and IWA in the cloud to analyze much more data from different sensors and gateways. IWA can perform analytics of stored sensor data, up to 1000x faster than Time-Series alone. Informix is leading the way with unique capabilities to harness IoT data at the edge and I’ll be talking more about them at Insight 2014.