Take a second to think about all the ways data has changed in the last 20 years. In the hardware space, our mobile phones started out as large handhelds with pull-out antennas and limited processing power. Now they are advanced pieces of technology with a computational power 32,600 times faster than the computers we used to reach the moon. The transformation in our phones is analogous to the evolution of the modern data architecture for enterprises. As front-end consumer applications have evolved, the number of resources needed to collect, store, and analyze the information flowing from consumers has grown. The average company has 110 SaaS applications, providing connections to an average of 400 data sources. To scale with this expansion, companies like IBM have proposed a new architectural approach known as “data fabric” that provides a unified platform to connect this growing number of applications. A data fabric can be thought of as what the name implies — a “fabric that connects data from multiple locations, types, and sources to facilitate data growth and management. IBM delivers this flexible architecture through Cloud Pak® for Data, an enterprise insights platform that provides the flexibility for companies to scale across any infrastructure using the world’s leading open-source orchestrator, Red Hat.

I will outline the data fabric architectural approach through the lens of a basic stock trading platform. Consumer-oriented trading platforms have gained traction over the past couple years, enabling users to drive their own financial destiny. But to bring the individual investor this power, there must be a strong data architecture in place to connect the live price feeds and analytics to advanced backend systems. Data virtualization facilitates this movement by working behind the scenes to unify multiple disparate systems.

Data virtualization

Data virtualization integrates data sources across multiple locations (on-prem, cloud or hybrid) and returns a logical view without the need for data movement or replication. The real value of data virtualization is that it creates a centralized data platform without large data movement cost. In terms of our stock trading platform, we have customer data, financial trading data and account data in separate storage locations.

Figure 1

As evidenced in Figure 1, financial data is located in a PostgreSQL cloud environment, while personal customer data is on premise in the respective MongoDB and Informix environments. Using our advance virtualization engine, you can query each of these sources together and save half the cost of traditional information extraction methods.

Data cataloging

Once this data is ingested, it needs a mechanism to curate, categorize and facilitate its sharing throughout an organization. For example, our stock trading platform may have multiple teams of data scientists focused on core customer initiatives such as UI optimization algorithms or understanding order flow. A data catalog, such as the IBM Watson® Knowledge Catalog, can facilitate the relationship between these roles and reduce the prep necessary to complete these tasks. Data catalogs bridge the gap between raw and useable data, allowing for the application of business context, data policies and data protection rules to your virtual data. For example, if the lead data steward at my trading platform wishes to censor credit card numbers as they flow to different data projects, I can apply a data protection rule on credit card numbers as shown in Figure 2:

Figure 2

Now, you have credit card numbers censored throughout your environment, improving trust in your company while also enhancing your ability to meet different government regulations.

With this rule applied, data scientists who view customer information see redacted credit card numbers as shown in Figure 3:

Figure 3

Now if this table is needed in a Python project, data scientists can export that same core data for analysis without seeing any confidential information, as shown in Figure 4:

Figure 4

This is how a data fabric architecture enables our trading platform to virtualize sources and access data across multiple environments, then organize this data and safely collaborate with key data personnel. If you’re curious as to how this demo was made and would like to see how our final trading platform effectively analyzes data, sign up for my 15 Minute Friday Session on July 8th in the form below.

Was this article helpful?

More from Cloud

IBM Cloud Virtual Servers and Intel launch new custom cloud sandbox

4 min read - A new sandbox that use IBM Cloud Virtual Servers for VPC invites customers into a nonproduction environment to test the performance of 2nd Gen and 4th Gen Intel® Xeon® processors across various applications. Addressing performance concerns in a test environment Performance testing is crucial to understanding the efficiency of complex applications inside your cloud hosting environment. Yes, even in managed enterprise environments like IBM Cloud®. Although we can deliver the latest hardware and software across global data centers designed for…

10 industries that use distributed computing

6 min read - Distributed computing is a process that uses numerous computing resources in different operating locations to mimic the processes of a single computer. Distributed computing assembles different computers, servers and computer networks to accomplish computing tasks of widely varying sizes and purposes. Distributed computing even works in the cloud. And while it’s true that distributed cloud computing and cloud computing are essentially the same in theory, in practice, they differ in their global reach, with distributed cloud computing able to extend…

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters