Virtualize IBM Z data pattern

Access data across IBM Z® and other data sources, including joining data, without the need to copy and replicate that data. Deliver more current and accurate data with in-place data access to consuming applications, including analytics.

Overview

← Back to Application modernization patterns

Data is an integral element of digital transformation for enterprises. New services need simplified access to IBM Z data for business operations that require read and updates through APIs. Frequently, IBM Z data also needs to be combined with other data sources.

But as organizations seek to use their data, they encounter challenges that result from diverse data sources, types, structures, environments, and platforms. Those challenges apply equally to data that is stored on IBM Z, which contains most operational data in large organizations. A common concern is that data on IBM Z is difficult to access and transform.

One approach is to move all data into a single data store, such as an ODS or a data lake, which can create more challenges. The complexity of data copy processes results in data latency, poor data quality, increased cost, risks, and security challenges. With data virtualization, you can access data across many data sources without the need to copy and replicate data.

Solution and pattern for IBM Z®

The foundation for consuming IBM Z data through data virtualization across data sources is the implementation of the Enable modern access to IBM Z data pattern. That pattern supports access to real-time transactional data in IBM® Db2®, IMS, and other data sources. You can access Db2 and IMS through SQL, Java® Database Connectivity (JDBC), and REST API by using IBM® z/OS® Connect EEIBM® Data Virtualization Manager for z/OS® can provide SQL access to all IBM Z data sources. For through REST API, you can add z/OS Connect EE.

The term data virtualization is overloaded. The main adopted use case for Data Virtualization Manager for z/OS is the mapping of traditional IBM Z data sources such as VSAM, IMS, or Adabas into relational views for modern access through SQL or API. In contrast, the main use case for data virtualization in IBM Cloud Pak® for Data is to gain a single view of disparate data without data movement and to manage data with less complexity and risk of error.

The foundation for accessing data across disparate data sources is the IBM Watson® Knowledge Catalog in IBM Cloud Pak for Data. It is more feasible and less costly to maintain metadata across different data sources instead of constantly moving terabytes of changing data. Watson Knowledge Catalog is a data catalog tool that powers the intelligent, self-service discovery of data structures, models, and more. You can access, curate, categorize, and share data, knowledge assets, and their relationships wherever they are, backed by active metadata and policy management. The cloud-based enterprise metadata repository also activates information for AI, machine learning (ML), and deep learning. As shown in the following diagram, in Watson Knowledge Catalog, you can discover, govern, and catalog the metadata of IBM Z data that is stored in Data Virtualization Manager for z/OS and Db2 for z/OS.

IBM data virtualization is designed as a peer-to-peer computational mesh, which offers a significant advantage over a traditional federation architecture. By using innovations in advanced parallel processing and optimization, the data virtualization engine can rapidly deliver query results from many data sources. Collaborative highly paralleled compute models provide superior query performance compared to federation, up to 430% faster against 100 TB data sets. IBM data virtualization has unmatched scaling of complex queries with joins and aggregates across dozens of live systems. IBM Z data can be accessed through SQL.

Data virtualization can simplify development of consuming applications, including infusing AI into business applications. It also allows those applications to access current and accurate data at its source.

Advantages

Accessing IBM Z data in place provides several critical business benefits:

  • Reduces the risk of impacted data integrity
  • Reduces the cost that is involved in data movement
  • Increases data quality
  • Preserves the existing data management and recovery processes
  • Allows cloud applications to access the data at its underlying format and context

Considerations

Data virtualization in IBM Cloud Pak for Data is the foundation for rapid ML model development and deployment, infusing AI into business applications.

With a centralized view of data, including IBM Z data, within Watson Knowledge Catalog, you can build, test, and train ML models on the platform of your choice. You can then deploy AI models to Watson Machine Learning for z/OS to address more complex information needs within business services that run on z/OS.

Contributors

Maryela Weihrauch
Distinguished Engineer, WW Data and AI for IBM Z Technical Sales and Customer Success Leader IBM

Sueli Almeida
DB2 for z/OS Cloud Enablement IBM