After a bit of a break for vacation, it's a good point to catch up on some items from the past month. One item I thought I'd call out was the recent release of a new IBM Redbook IBM Information Server: Integration and Governance for Emerging Data Warehouse Demands which I helped to write.
I've commented recently in this blog on trends in Big Data and some of the associated aspects of Information Governance. Both of these trends are impacting the way we traditionally look at and work with data warehouses, those centerpieces of many organizations' enterprise information architecture. What we've seen organizations wrestling with include:
- Demands for more and faster access to data to quickly accommodate changing business requirements
- Demands to incorporate and integrate more types of data at greater volumes and faster speeds than ever before
- Demands to incorporate deeper analytical capabilities into the warehouse to predict customer churn, improve segmentation for marketing, etc.
- Demands to improve the governance and raise the confidence of users in the breadth and quality of data stored in the warehouse
In the Redbook, we talk about some of the recent additions to IBM's Information Server product line that help to meet these emerging challenges.
For instance, IBM InfoSphere Data Click is designed to help a business user perform self-service operations to select and load data from a data warehouse to a data mart without requiring experience in designing a target model. At the same time, there are governance and quality requirements around the data to ensure that only certain data can be accessed and copied and that the right quality of data is delivered. These aspects are built into the InfoSphere Data Click design.
For the business user, what they get is a two-click experience selecting a prebuilt blueprint and then offloading the data to an environment where they can build and run the reports they need. For the IT staff and the data stewards, it's a configuration based approach to provide the business users with the right tools for easy access but without requiring the creation of complex scripts or database access since InfoSphere Data Click takes full advantage of the IBM Information Server processing and metadata functionality.
To address the range of incoming Big Data sources to a data warehouse (or offload warehouse data to a Hadoop platform), IBM InfoSphere DataStage incorporates:
- Usage of a Big Data File Stage to load data to or extract data from Hadoop systems
- Capability to push down processing from an ETL flow design into Hadoop, taking advantage of the native processing power there
- Integration with IBM InfoSphere Streams to integrate with real-time, low-latency analytics processing
These additions allow for broad integration between Big Data and the traditional warehouse data.
From the governance perspective, IBM Information Server now supports information governance policies and rules within its business glossary, allowing data stewards to connect more of the information landscape together and tie it into the governance requirements of the organization. These capabilities naturally support the needs and questions of an information governance organization such as:
- What policies do we need to address?
- What governance rules are incorporated in the policy?
- What assets or data are governed by the policy?
- What quality validations are needed to enforce a governance rule?
By incorporating this type of information within a business glossary, users gain broader visibility into the overall governance requirements.
If you're looking at any of these aspects of integration with or governance over your data warehouse, have a look into some of the new capabilities we note in the Redbook.
As always, the postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.