In our first article, we introduced the overall concepts of on demand business, business process management (BPM), business rules engines, and business intelligence. We investigated various business rules engines and discussed how to differentiate one from another. In this article, we will focus on the role of business intelligence in an intelligent and flexible BPM solution. In particular, we will discuss the application of on-line analytical processing (OLAP) technology for advanced analytics, considering both OLAP access and integration. This includes drilling down to the technical details of implementing a dynamic pricing model. Finally, we will introduce DB2® Cube Views, an extension to the OLAP paradigm which, when used with OLAP products, can be used to produce optimized OLAP solutions for enterprises.
How important is business intelligence for business process management? As our first article stated, business intelligence provides insight for corporate strategies on customers, sales, financial planning and competition. Business processes implemented by the business process management system need to be driven by actionable intelligence. This actionable intelligence goes beyond traditional concepts such as reporting and charting. Actionable intelligence is derived from advanced data analytics and can be used to drive strategic business operations if it is integrated with the business process engines.
In our example, Cristina White, the CIO of International Foods Market (IFM), takes on the challenge of transforming the IT environment to drive the transformation of IFM into a responsive on demand business. She understands the critical role of business intelligence in business process management, and, to understand how to deploy BI to support her goals, she asks Greg, the architect of IFM, to give her his thoughts. Greg, having a very extensive knowledge of business intelligence, points out three architectural drivers:
- Avoid data silos
- Avoid isolated decision making
- Reduce latency in the decision-making process
Real time data access and analysis
Let's take a closer look at each of those architectural drivers.
Data from diverse sources may take a long time to be consolidated into a consistent, unified global view that allows the business analyst to discover new business opportunities or detect anomalies. In many cases, the massive amounts of data from brick-and-mortar stores, web sites, call centers, inventory, and CRM systems are often treated as individual silos that do not communicate with each other. The data from these systems may be represented in different formats, stored in different databases and captured in diverse applications. Effort is required to cleanse, reconcile and integrate data from various silos around the enterprise.
In the traditional business environment, business planning or financial modeling used to be an offline activity. First, some statisticians would crunch big numbers from the enterprise data and come up with a stack of reports and analysis, then business analysts would interpret the analysis and develop a business model and plan. High ranking executives and officers decide the budget based on the analysis and their business intuition. As we see, the activities are exclusively disparate and disconnected. With any luck, the offline reporting and offline analysis might provide companies lucrative opportunities. Without luck, they might fail in their mission and not be able to provide accurate, secure and real-time business intelligence information.
Latency in the decision process
Many large companies would reap tremendous benefit if they could reduce the latency between business events and responses. A slow sale of one product in one region might be trivial and be overlooked. However, if it repetitively occurs in many stores, its impact might be significant and need investigation. Unfortunately, the business intelligence information needed to detect market shift and customer attrition is not available to the decision makers in time to trigger proactive action. As a result, companies are reactive in trying to mitigate events rather proactive to take advantages of events.
The evolution of business intelligence
Figure 1. The evolution of business intelligence
Isolated data, whether it is transactional or operational, demands integration. The integration process, whether it is manual or automated, has a profound impact on the data warehouse latency that constrains our ability to get real-time information. To understand the options available to address the need for real-time information, it’s worth looking at the evolution of business intelligence.
The Evolution of Business Intelligence can be illustrated through 4 models:
In the traditional model, transaction data is captured and propagated to a staging server through a batch process which is normally scheduled to run daily or weekly. Next, data from the staging server goes through an extract transformation load (ETL) process to be cleansed, transformed and reconciled with the data warehouse, again based on a scheduled batch process. The data warehouse then feeds the consolidated data into analytics applications with which business analysts generate the reports. These reports are delivered to the proper decision makers to make business decisions. The whole cycle normally take weeks to complete.
In the second model, the transaction data is directly integrated into the data warehouse by leveraging business process management (BPM) or enterprise application integration (EAI) process. Normally the transaction data is sent as a message to a queue that is polled by the EAI broker. Once the EAI broker receives the message, it can cleanse, transform, aggregate or consolidate data as a business object so that it conforms to the data warehouse specification and requirements. The business object is then populated into the data warehouse by leveraging database connectivity technologies. Once data is available in the data warehouse, analytics processing is applied to the data and business intelligence is derived. The business analysts apply their knowledge and expertise to collect useful information and present it to the decision makers.
In this model, the transaction system has embedded business intelligence capabilities. This model provides a powerful middle tier with both data integration and data presentation capabilities. A lot of ERP and supply chain vendors, such as SAP and PeopleSoft, are taking this approach. By doing so, their customers can easily transform data into information for effective decision making. This model is widely adopted in planning and forecasting applications which help to set the goals and objectives for an organization. The analytics, such as the scorecard, are generated dynamically to help align the activities and execution of the organization.
In this most advanced model, the transaction data is automatically captured and integrated into the data warehouse which feeds the data to business intelligence tools, such as OLAP or mining tools. The output of the business intelligence, in turn, is directly fed back to the front-line business decision makers in the form of recommended actions (dynamic price changes, for example). This forms a closed-loop and thus creates a zero latency environment. A zero latency environment allows the company to integrate analytics into day-to-day business operation and shorten the time between business decision and business action. To implement this system the closed-loop process should be automated. This can be achieved by building a data warehouse in real time, integrating a real time analysis engine, and leveraging a real time rules engine. Second, this automated closed-loop processing should occur in real time. The closed-loop environment dynamically adjusts business operations based on messages generated by a decision engine. These two criteria create the foundation for faster decision-making, faster time to market, and greater market opportunities.
IFM uses OLAP as the reporting interfaces to help analysts perform decision support and strategic planning by analyzing historical transaction data. OLAP provides a multidimensional view consisting of categorical attributes such as Products and Market and numeric attributes like Sales and Profit. The categorical attributes form dimensions and the numeric attributes form the measures of a multidimensional cube. Dimensions can contain hierarchies that specify aggregation levels. The measure attributes are aggregated to different levels of detail by applying mathematical functions such as sum, average, and variance to a combination of dimension attributes.
OLAP data can be explored by using different navigational operators like drill-down, roll-up, dicing, slicing and pivot. Typically, the user begins with an aggregate level, inspects the entries visually, selects sub-ranges of data to inspect further based on business hypothesis or financial strategies, drills down in more detail, inspects the entries again and either rolls-up to a higher level view or drills down to an even more atomic level.
Although OLAP is an excellent technology, the OLAP API has no universal standard. In relational databases, SQL (Structured Query Language) and ODBC (Open Database Connectivity) has been universally accepted and developed. In contrast, currently no single product in OLAP dominates markets strongly enough to win the platform hegemony. As a result, there are many factions of application programming interfaces (APIs), data definition languages (DDL), or data manipulation languages (DML) in the OLAP space. The problems are rooted in the fact that major leading companies are interested in competition rather than cooperation, while other vendors shun away from the battle by incorporating different API access to maintain and expand their market share. Below we briefly discuss the major players in the OLAP API field.
- Hyperion Essbase Report Writer API
The Hyperion Report Writer API, a text-based report description API provides the best-in-class financial reporting and consolidation functionality. Although it is one of the dominant APIs in the OLAP space, it is completely proprietary. It allows users to select data, modify the layout and format the result set.
- Microsoft MultiDimensional Expression language
Microsoft MultiDimensional Expression (MDX) language is the OLAP query language behind Microsoft OLEDB MD, a multidimensional database extension to Microsoft’s new OLEDB data-access layer. The format and structure of MDX is analogous to SQL. You use a FROM clause to specify a data source, a WHERE clause to filter out the data, and a SELECT clause to specify or compute the members in your result set.
Compared to SQL in relational tables, using MDX in multidimensional databases is more complicated and difficult than creating basic queries with SQL. SQL queries retrieve data in two-dimensional format; however, MDX retrieves multidimensional data, which allows business analysts to perform tasks such as roll-up and drill down, and slice and dice data according to business requirements. There are three levels of MDX usage: basic data retrieval in 1-2 dimensional results, MDX formulas, and complex data retrieval in n-dimensional structure of a cube.
- Others
- XML/A
XML for Analysis is derived from Microsoft’s OLE DB for OLAP and therefore shows many similarities to its forebear. The major difference here is that it’s based on Web services and is therefore platform independent and not tightly coupled to any particular client or server.
- MDAPI
MDAPI, proposed by the OLAP Council, is designed to support a heterogeneous computing environment, which distinguishes it from Microsoft's OLE DB for OLAP API. It enables client and server communication through standard technologies such as DCOM, CORBA, and Java. The APIs provide Java libraries and COM objects and can be integrated with programming languages like C++ and Java. Besides, application development tools such as Visual Basic and Java Script can access the API, too. Unfortunately, there is no real support for this API.
- JOLAP
JOLAP, a Java-based multi-vendor OLAP API initiative, is based on the Java® API for the J2EE platform. It enables the manipulation of OLAP data and metadata in a vendor-independent manner. It carries two major missions: universal OLAP query and managing OLAP data and metadata. The goal of Microsoft’s OLE DB for OLAP and XML for Analysis, and the OLAP Council’s obsolete MDAPI is to create a universal OLAP query API. JOLAP extends basic functions, with planned support for creation, storage, access and maintenance of data and metadata in OLAP servers and multidimensional databases.
- XML/A
With no universal standard, we are left with the need to utilize multiple interfaces for the foreseeable future. For the near-term, it is still very important to select OLAP reporting engine technologies that support the OLAP technology that is being employed in one's environment. Fortunately, most leading reporting engines do support multiple OLAP technologies.
The new paradigm to OLAP access and integration
Armed with Greg’s BI knowledge, Christina gathers the business process and business rules engine development team to discuss the integration process and evaluate several market products. After many sessions of brainstorming and evaluating pros and cons, they reach the following design principles. Not only do these principles make the integration process robust and resilient, but they also make the business intelligence available 24-by-7 around the world for different levels of decision stakeholders.
The First Principle: XML for data interchange
Since XML format is universally accepted as a common language for data exchange, the development team determined that OLAP data in XML format would be better than a non-XML proprietary format. Multidimensional data in XML format has many advantages from an integration perspective. XML technologies have advanced to such a point that the development team can directly leverage existing parsers, data validation tools and XSL transformations. Currently XML includes XML schemas and DTDs which provide the ability to validate, or verify, the structural content of a document. Validating a document helps to prevent errors when the data interacts with legacy systems which expect a particular structure. In addition, transforming XML data on the fly makes it possible to use a single source for multiple outputs, whether those outputs are integrated into different legacy application, databases or J2EE applications.
The Second Principle: J2EE-compliant technologies
The new OLAP application is expected to support standard J2EE application development. J2EE defines an entire development paradigm for building applications. In this paradigm, HTML pages are used for static content, JSP for dynamic content and servlets/EJB for application logic. Conforming to this paradigm enables integration across the BPI/EAI server, the business intelligence application, and operational applications, resulting in a more efficient and effective architecture.
The Third Principle: Analyze relational and multidimensional data all in one application.
The business intelligence system should be able to retrieve both relational and multidimensional data sources in a real-time fashion. Due to the expansion and merger of companies, IFM’s data now reside in disparate data sources, such as Financial, HR, CRM, ERP etc. Optimally, being able to access the real-time information from legacy relational databases as well as multidimensional databases in one application helps IFM to detect market change and respond in a timely fashion. Without that, data from the operational system and business intelligence remain silos.
Based on the above three principles, Christina asks Greg to dive into the market and propose a business intelligence solution which will meet the company’s requirements.
Scenario solution architecture overview (flexible dynamics pricing)
Referring to back to the information in article 1 and article 2 of this series, Christina has a requirement to transform their legacy pricing system into a new flexible pricing model that is based on current profit value derived from historical and analytical information in the data warehouse. Besides, this dynamic pricing model should be flexible enough to take into account new factors, such as supply chain, sales, customer responsiveness and competition. To create this kind of flexibility, Greg realizes that they can leverage business process management as the integration hub between business intelligence and the decision rule engine. When a market niche is discovered and creates a new requirement, they can quickly change the business process flow using WebSphere® Business Integrator (WBI) process design tools. This architecture can bring real-time analytics to the business process with the flexibility of providing diverse business decision rules. As a result, it allows IFM to anticipate fluctuations in price responsiveness and set the price accordingly. The final architecture for the dynamic pricing model is proposed as the following.
The store manager will log on to the portal application and enter order information. The portal front end then generates an XML order file which the JTEXT Connector parses into a PurchaseOrder business object. Furthermore, the connector invokes a map to transform the PurchaseOrder into a WebSphere Business Integration ORDER generic business object. Since the collaboration is subscribed to process the ORDER object, the collaboration then triggers its predefined business process to invoke getProfit% from the analysis engine and getMarkup from the rules engine. Afterwards, a map is applied to transform the WBI ORDER object into the application PurchaseOrder business object through the JTEXT Connector. The JTEXT Connector then writes out the result in XML format to the file system. The portlet then reads in the file and displays the result in a nice table format. Figure 2 represents the above architecture flows1.
Figure 2. Architecture flows
OLAP access technology quick overview
The data warehouse team consolidates data from different data sources (such as ERP, CRM and operational stores) and builds a star schema which can be leveraged in the DB2 OLAP server to define the cube. Leveraging the DB2 OLAP server’s functionality, the team builds a multidimensional cube which contains dimensions of product, time, store, population, customer and measure. Once the cube is available, there are three major tasks they need to tackle.
Accessing a cube, due to its many possible factions of API, is complex. To make it even worse is the API complexity itself. The real issue here is the complexity of navigating the cube to get to the cell of information that business analysts want.
Take IFM for example. There are many ways to slice and dice the cube to retrieve the profit information. They can use the average product profit by region for the last month, average product category profit at the corporate level for the last quarter or average product profit at the store level for the last month. Each of the above possible calculations results in different ways of slicing and dicing the cube, which produces the result sets in one-dimension, two-dimensions or n-dimensions.
OLAP API can retrieve the result in textual format, but to have flexible navigation of the cube, you need an OLAP reporting utility. An OLAP reporting utility allows users to drill up to see the aggregated values, drill down to see more detailed values, slice certain dimensions to compare the values, and dice the cube to investigate the most atomic information.
Figure 3 below shows an example of IFM OLAP report which covers profit percentage in three dimensions (product, market, and time). The business analyst can freely navigate through the cube to see all different possible intersection of the three dimensions, with freedom between seeing the data in tabular format or chart format. For some advanced OLAP reporting utilities, they might also have a banding system to indicate the performance and possible alerts which can trigger external actions.
Figure 3. Example IFM OLAP report
Business analysts have the business insight and expertise to navigate the cube to retrieve the critical information but don’t necessarily have the technical knowledge to understand the OLAP API. The traditional approach is that business analysts select a report they like and the IT specialists figure out the corresponding OLAP query. There is always the latency and miscommunication between business analysts and IT specialists, which becomes one of the obstacles to making the company more responsive and on-demand.
OLAP reporting utilities have advanced so much that they can provide a graphical query builder which allows business analysts to choose various dimensions and dynamically roll up, drill down, slice and dice the cube. Once business analysts are satisfied with the view, the query builder will generate the OLAP query. Business analysts can then save the query and mail it to the IT specialist who will use the query to enable business process to access business intelligence. IFM is taking this approach and the most applicable profit calculation level turns out to be the average product profit by region for the last month.
How to enable OLAP data in XML format?
The next step for the development team is to render the data in XML format in an OLAP reporting utility (in our scenario, Alphablox) so that the BPI/EAI application can access the business intelligence information and process it accordingly.
The example shown in Listing 1 demonstrates the process and the steps involved.
- Define an HTML page with a standard DataBlox. Lines 2-3 show how to include a DataBlox and define its data source.
- Use DataBlox properties or methods to specify its data source and query string. Lines 6-17 demonstrate how to set up the query by accepting two possible parameters from the user's input (productId and state) to find out the profit percentage of that product.
- Define both the application which contains the JSP files and data source which contains the cube.
- Invoke the application, being sure to add the render attribute to the application’s URL:
.../AppName.jsp?render=XML
Using our scenario as an example, we would invoke the above application by accessing the following URL:
http://businessIntelligence:9080/IntegratedAnalysis/profit_XMLDatablox.jsp? render=XML&productId=300-20&state=Utah |
The URL has the first parameter “render=XML” which triggers the result set to be rendered in XML and the dynamic parameters such as productId to specify the product and state to choose a specific state.
Listing 1. Rendering the data in XML format in an OLAP reporting utility
1: <%@ taglib uri="bloxtld" prefix="blox"%>
2: <blox:data id="profitData" visible="false"
3: dataSourceName="Advance"
4: />
5: <%
6: String productId=(String)request.getParameter("productId");
7: System.out.println("productId="+productId);
8: String state=(String)request.getParameter("state");
9: System.out.println("state="+state);
10: String queryString="<Sym <Column ( Year, Measures) 'Profit %'"+
11: "<Row (Product Market) "+
12: "{DECIMAL 5} {OUTALTNAMES} "+ "'"+productId+"'";
13: if (state!=null)
14: queryString=queryString+" '"+state+"' !";
15: profitData.setQuery(queryString);
16: profitData.connect();
17: profitData.refresh();
18: %>
19: <html>
20: <body>
21: <h2>XML Datablox view</h2>
22: <blox:display bloxRef="profitData"/>
23: </body>
24: </html>
|
After months of integrating and developing the dynamic pricing project, the team has conquered all the obstacles and resolved all the technical issues. The pilot run of the program is represented in Figure 4. The manager can log on to the portal server and use the order portlet which allows her to select a product and specify a quantity. Next the portlet dynamically accesses the profit percentage from the business intelligence tool and then propagates this data to the rules engine to figure out the markup percentage. Figure 4 below shows the profit percentage is 30%, 33%, and 26% and that the recommended markup percentage changes according to a dynamic pricing algorithm where the markup percentage might either increase when the profit margin is flat or decrease when the profit margin is rising.
Figure 4. Pilot run of BPM program
New Technologies: DB2 Cube Views 8.1
A new product, DB2 Cube Views, recently has created a lot of excitement among OLAP technologists. From IFM integration perspective, this product, an accelerator to an existing OLAP server, provides an optimized OLAP solution by enabling the database with OLAP capability. OLAP developers are interested in the following notable features:
Centralized metadata repository
The segregation between multidimensional metadata and data warehouse metadata has existed for many years. The cause is that multidimensional metadata has been captured and preserved in proprietary repositories of specific OLAP vendors. As a result, these repositories form individual silos which prevent the interchange of metadata. DBAs have suffered through trying to synchronize metadata between relational and multidimensional databases, and panicked over performing impact analysis on metadata change from data warehouse to multidimensional.
To alleviate this problem, DB2 Cube Views includes traditional metadata and multidimensional metadata in one single repository by extending its catalog to accommodate multidimensional objects. It can also import from and export to XML files that define the metadata in either relational or multidimensional databases. This capability allows end-to-end metadata flow and reduces the maintenance cost ripple of schema changes to BI tools. It especially reduces repetition and errors in managing multiple metadata repositories.
Centralized metadata repository
The segregation between multidimensional metadata and data warehouse metadata has existed for many years. The cause is that multidimensional metadata has been captured and preserved in proprietary repositories of specific OLAP vendors. As a result, these repositories form individual silos which prevent the interchange of metadata. DBAs have suffered through trying to synchronize metadata between relational and multidimensional databases, and panicked over performing impact analysis on metadata change from data warehouse to multidimensional.
To alleviate this problem, DB2 Cube Views includes traditional metadata and multidimensional metadata in one single repository by extending its catalog to accommodate multidimensional objects. It can also import from and export to XML files that define the metadata in either relational or multidimensional databases. This capability allows end-to-end metadata flow and reduces the maintenance cost ripple of schema changes to BI tools. It especially reduces repetition and errors in managing multiple metadata repositories.
Optimized OLAP-style data access
Materialized Query Tables (MQTs) are leveraged in DB2 to provide precomputed, aggregated, stored data which will be repeatedly accessed. By leveraging MQTs, costly table joins and computation-intensive calculations can be performed once, and later similar queries can use the existing data to return results much faster. Building MQTs correctly can bring a significant performance boost. An interesting feature in Cube Views is that the Optimization Advisor wizard can help to generate MQTs. By analyzing your metadata and your requirements, the Optimization Advisor will build the set of recommended summary tables, which users can optionally tailor to fit their needs. The picture below shows an example of how a Materialized Query Table is generated for one particular cube model in DB2 Cube Views.
Figure 5. Generating a materialized query table in DB2 Cube Views
The DB2 optimizer will rewrite incoming queries based on cost analysis. Rewriting queries to use MQTs improves performance when results can be returned from a pre-computed summary rather than performing the aggregation at run time. Since the DB2 optimizer is aware of the summaries, it can rewrite the queries to access the summary table (MQT) instead of performing brand new join and aggregate operations. The rewriting process is transparent to the users, so users can continue to write queries against base tables, but the DB2 optimizer will rewrite the queries to access the materialized query table to compute the result.
Open standard interface: Web services
As the industry is shifting from tightly coupled client-server architecture to loosely coupled service-oriented architecture, Web services emerge as the new kid on the block embracing existing technologies as well as new protocols. OLAP is catching the wave. OLAP Web Services allows clients to use XPath (XML Path Language) to query OLAP metadata and data.
How does XPath work? XPath is utilized as a query language for an OLAP cube. Web Services for DB2 Cube Views transforms the XPath Query into multipass SQL statements and retrieves results. While the XML engine in relational databases returns 2-dimensinal tabular data in XML documents, OLAP Web Services need to add multidimensional structure with aggregation hierarchies in the XML document.
OLAP Web Service, with the overhead of XML n-dimensional parsing, is not intended as a replacement for the existing OLAP API for roll-up, drill-down, slice-and-dice functionality since OLAP’s basic requirement is to retrieve results at the speed of thought. However, it can accelerate the speed for integration processing. Take our Markup Scenario for example, the developer can leverage the same skills (such as XML and XPath) to retrieve the data from either relational or multidimensional databases. They are not required to have deep knowledge of the OLAP API and the integration process becomes cost-effective and highly productive.
Looking ahead to article 4 in our series, we will continue to discuss how the latest technologies in data warehousing help companies to be more responsive and be more proactive. These new technologies encompass detecting anomalies, discovering hidden patterns and initiating dynamic scoring, as well as building sense-response systems. We will also demonstrate integration of these technologies by showing how IFM monitors their business information to detect deviations in the inventory system and how they take proactive measures.
1 Note that the order could have been sent as an MQSeries message and a WBI MQSeries connector could have been used in place of the JText connector.
John Medicke is the chief architect of the On Demand Solution Center in Research Triangle Park, NC. He has worked in industry solution development for last seven years across various industries including financial services, retail, health care, industrial, and government. He is the author of the book Integrated Solutions with DB2 as well as multiple articles in various journals. You can contact John at medicke@us.ibm.com.
Margie Mago works for IBM at its Research Triangle Park, North Carolina location. She is a developer and solution architect in the Software Group On-Demand Solution Center. She has participated in several retail sector solution integration projects since joining the group. You can contact Margie at mmago@us.ibm.com. .
Feng-Wei Chen works for IBM at its Research Triangle Park, North Carolina location. She is a software developer in the Software Group On-Demand Solution Center. She has participated in several solution integration projects. She has been involved in the design and architecture of solutions related to database systems and business intelligence for three years. You can contact Feng-Wei at chenf@us.ibm.com.
Comments (Undergoing maintenance)





