Manage all of your data – any type, on any cloud, from any vendor – with a high-performance universal query engine and data fabric

By | 3 minute read | May 10, 2021

 

95% of businesses are operating in hybrid cloud environments with about two-thirds using multiple cloud providers.[1] This ongoing sprawl of data across multiple data stores, multiple locations, multiple clouds and even multiple vendors has driven many organizations to seek out ways to streamline data management through a single source – a process called data store convergence. Previous attempts have relied on physical data movement and ETL processes alongside vast unstructured data repositories like the data lake. Today, there is a better solution, one that utilizes a universal query engine as part of a data fabric approach with embedded governance on an insight platform that provides flexible access to a host of data storage, governance, and analytics capabilities.

AutoSQL Query Engine Knowledge Graph

Earlier this year at the Think 2021 virtual conference, IBM introduced the latest advancement in streamlined data management with the AutoSQL capability. AutoSQL simplifies your data landscape by utilizing a single distributed query across your disparate data sources. AutoSQL is a universal query engine that is part of the intelligent data fabric capabilities of the next generation of IBM Cloud Pak® for Data.  AutoSQL accesses databases, data warehouses, data lakes, and streaming data to have queries executed against them without additional manual changes or data movement. The universal query engine does so by executing distributed and virtualized queries 53% [2] faster than the industry standard. The universal query engine is combined with existing data virtualization capabilities in the data fabric to make this possible without moving data – allowing data to be queried easily across multiple clouds, both public and private. AutoSQL is also vendor agnostic, allowing queries to be executed on open file formats on any vendor’s cloud. When taken together, the effect is a reduction in effort and costs. Minimizing data movement, reducing time spent adjusting queries to specific sources and being able to access all data at a single platform avoids data movement penalties and allows personnel to make more effective use of their time.

Watch our data virtualization webinar 

These capabilities and benefits are also bolstered by the embedded governance found within IBM’s intelligent data fabric as part of the next generation of Cloud Pak for Data. With a leading data platform, queries can be made across all data stores on data that has verified quality and validity. Consequently, insights can be trusted more easily and therefore have a better chance of influencing action, as a result of the trust inherent in the common data foundation built on a data fabric. Moreover, given the importance of data for AI models in particular, better quality data can mean more efficient use of AI models – with a reduction in the number of updates that need to be made for instance if the model was trained on data that was slightly off from reality. Automated metadata tagging, can also save time and improve accuracy so that data users have a better idea of which datasets to query. And, automatic data masking features make regulatory compliance much easier, so that data users don’t have to worry about accidentally accessing data they shouldn’t. With data integration and data preparation listed among the top three technologies organizations would like to automate by the end of 2022 [3], these advancements can’t be implemented soon enough.

View our data fabric infographic 

Still, governance, the data fabric, and AutoSQL are just a few things possible with an insight platform like IBM Cloud Pak for Data. The end-to-end, pre-integrated solutions inclusive of the multiple facets of data collection, governance and analysis mean that the patchwork solutions of the past can be replaced by an interconnected and flexible solution which is built to grow with a business’s needs. Maintaining multiple licenses or upgrades can be replaced with a single platform solution. Of course, that doesn’t mean a business is locked in. As discussed previously, open data formats and vendor flexibility help in this regard so that companies can integrate their existing warehouses, data lakes, and additional solutions on any cloud. Either way, a single view of data and the ability to provide data users self-service capabilities remains. In addition, customers using IBM Cloud Pak for Data can look to the wide ecosystem of partners and their solutions to round out or augment the solution to their more tailored and specific needs.

The announcement of AutoSQL and the universal query engine on IBM Cloud Pak for Data is just one more advantage for businesses seeking out a more integrated, yet flexible solution. The predicted reduction in manual effort and cost will be a welcome addition the myriad cost-saving and automated features already available. Learn more about these advantages by IBM Cloud Pak for Data.

[1] https://www.bcg.com/publications/2021/navigating-multicloud-strategy

[2] Based on internal testing

[3] Source: Top Trends in Data and Analytics for 2021: Data Fabric Is the Foundation