Data virtualization, which has emerged alongside traditional solutions, aims to achieve the same end results as data lakes and data warehouses. Which is better: self-service virtualization or a comprehensive data warehouse? Is virtualization also a useful method for updating applications via the service interface?
Data virtualization is self-service for data users
Data virtualization is a new dynamic way to search and utilize data from different data sources. The virtualized view provides the developer with data without having to move or copy it first. A well-known use of virtualization is the development of analytics, but it is also increasingly being used to provide data for applications. Virtualization provides users with self-service to rapidly utilize the company’s and its stakeholders’ information sources.
Virtualization makes data directly available to users
Does data virtualization compete with data warehousing?
Data warehouses meet users’ anticipated needs well once data sources and data types have been established. However, constant changes are severely increasing the costs of data warehousing. Virtualization reduces the need to connect new sources to data warehouses. At the same time, the need to move and edit data between sources and data warehouses is decreased. Data transfer work is typically done using ETL (Extract, Transform and Load) tools. Allowing analysts to retrieve their data themselves through virtualization reduces dependence on IT experts required for download tools and it also speeds up development. Additionally, virtualization reduces the costs associated with telecommunication and storage. Removing unnecessary data frees up space on the data warehouse server and reduces the size of backup copies.
Data virtualization users see the virtualization layer as a database that they can query with various reporting, analytical and development tools. It is also easy to add new data sources without in-depth database knowledge. A well-designed virtualization tool integrates different database technologies, including common relational databases, NoSQL, Hadoop and standard file formats (e.g. CSV and Excel)by combining them into a single SQL view. It also identifies similar database schemas and presents them as a single schema (schema folding). For example, a Sales table can appear 10 times in the database of as many source systems, but it appears as a single table in the virtualized view.
Virtualization should not be commenced by replacing existing functional data warehouses. The view that combines data from warehouses and lakes is an excellent example of virtualization in action.
Virtualized data as a service for applications
In addition to analytical use, virtualization is an asset in development when data needs to be moved from traditional database-driven systems into new applications for mobile users.
Service interfaces, such as the popular REST API, have been developed to provide a fast and flexible way to feed data to applications. Custom API Gateway solutions have also been developed for API management. However, they do not have advanced tools for data manipulation. Data virtualization includes all the necessary tools ranging from the technical conversion of data (SQL –REST API) to the aggregation and conversion of its contents and data security management.
The benefits of virtualization
Data virtualization delivers the benefits of self-service, such as speed of work and iterative learning to fuel innovation.
There is no need to copy data from a source to data warehouses or applications, which reduces the costs of the technical platform and development.
The security risk is reduced. Virtualization can also be used to analyze sensitive data, even when it is prohibited to transfer data from the source to corporate data warehouses or applications.
Data virtualization and security
Many corporate data sources contain sensitive data that cannot be exported as is to a data warehouse or an application. In these cases, virtualization is the solution to the problem. For example, if customers’ personal data is needed for socio-demographic analysis, it can be pre-computed in the source database and the result provided to virtualization users. Security classifications can also be used to specify that sensitive data can only be accessed by those who are authorized to do so. Other people either do not see the data or it appears as a randomized string.
Data catalogs to support virtualization
Businesses have core knowledge about customers, products, markets, and many other aspects that are essential for business operations. Virtualization users need to know where to find the required data and whether it is reliable.
The library comparison has often been used to describe the use of a data catalog. A library catalog acts like a data catalog. It allows users to find the work they want by using a wide range of search criteria.
An advanced virtualization tool can leverage business vocabulary through an integrated data catalog solution. It allows users to see in which database and which part of the database (table and column) they can find the information related to the customer. If the technical metadata regarding the location of a book were missing from the library system, users would have to ask the librarian for help. Similarly, categorization of the business vocabulary used to support virtualization helps users find the information they need without help from IT. Carefully designed management ensures user autonomy and satisfaction.
Data catalogs are constantly being developed with AI-supported logic in order to make it easier to find data. For example, machine learning models can be used to automate the mapping of metadata. Similarly, independent correction mechanisms can be built to manage data quality. When bundled, narrow AI solutions start to gradually resemble human-like general AI. A conversation between AI and a user could sound like this:
What do you want to do? Put together a marketing campaign plan, a sales forecast, and a purchase proposal to our buyers regarding the spring sale of our company’s outdoor grills.
Are there any new information sources or do we rely on the ones we have used before? Run an analysis to see if there are new opportunities.
I found a consumer study in Chinese on the internet and it seems to be a public study. It requires logging in. Additionally, I found a related paid service offered by a Canadian marketing research company. Send me both links and I’ll tell you what to do with them.
Let’s include both sources of information. Here is my username for both services. Data has been virtualized and analyzed, and recommendations for action have been generated.
Here are three proposals, which are optimized based on the total margin, and they include consumer target groups, a campaign program, a sales forecast by brand, and a list of potential suppliers with target net prices for purchase negotiations. Send me proposals with standard descriptions.
Sent. Would you like to thank me for all this hard work? Oh yes, I forgot about your humanity algorithms. Thank you very much!
The dialogue above may sound like science fiction. However, there is already a solution on the market that bundles user-assisting tools with narrow AI, as shown below. In the next few years, increasingly comprehensive AI processes will be developed, but humans will still be needed for a long time to come up with new ideas and to weigh in on their company’s values, for example.
Features of the IBM Cloud Pak for Data solution for the development of analytics
Virtualization brings us one step closer to autonomous task analysis
Data virtualization is not a substitute for data warehouse development or even for data warehouses. However, it will bring new opportunities for users and substantially reduce IT tasks and costs. It also brings us one step closer to a world where AI autonomously performs analysis tasks from beginning to end, starting from data sources, and ending with a business recommendation.
In addition to analytics, data virtualization facilitates the development of applications. It can be likened to super glue, which attaches data from traditional database solutions to modern mobile and browser applications.
Our oceans sustain us. They give us oxygen and they capture carbon dioxide. They feed us and they provide a wage to 40 million people across the world. They bring us joy and they show us beauty. But we are not sustaining our oceans in return. We are taking more from them than can be replenished. We […]
Last week, when heading out for some groceries, I was met with a note on the entrance to my local grocery store saying the store was closed due to IT problems. Working in IT security this immediately sparked my curiosity. What was going on? I quickly pulled up my mobile phone to check the news. […]
As digitalization accelerates, we will next be facing a quantum revolution. Is your business prepared for the increasing pace of technological innovation? Data is plentiful, but how can you fully benefit from it? How can hybrid cloud services help you get the most from your data? These were some of the topics covered in the […]