Take a second to think about all the ways data has changed in the last 20 years. In the hardware space, our mobile phones started out as large handhelds with pull-out antennas and limited processing power. Now they are advanced pieces of technology with a computational power 32,600 times faster than the computers we used to reach the moon. The transformation in our phones is analogous to the evolution of the modern data architecture for enterprises. As front-end consumer applications have evolved, the number of resources needed to collect, store, and analyze the information flowing from consumers has grown. The average company has 110 SaaS applications, providing connections to an average of 400 data sources. To scale with this expansion, companies like IBM have proposed a new architectural approach known as “data fabric” that provides a unified platform to connect this growing number of applications. A data fabric can be thought of as what the name implies — a “fabric that connects data from multiple locations, types, and sources to facilitate data growth and management. IBM delivers this flexible architecture through Cloud Pak® for Data, an enterprise insights platform that provides the flexibility for companies to scale across any infrastructure using the world’s leading open-source orchestrator, Red Hat.

I will outline the data fabric architectural approach through the lens of a basic stock trading platform. Consumer-oriented trading platforms have gained traction over the past couple years, enabling users to drive their own financial destiny. But to bring the individual investor this power, there must be a strong data architecture in place to connect the live price feeds and analytics to advanced backend systems. Data virtualization facilitates this movement by working behind the scenes to unify multiple disparate systems.

Data virtualization

Data virtualization integrates data sources across multiple locations (on-prem, cloud or hybrid) and returns a logical view without the need for data movement or replication. The real value of data virtualization is that it creates a centralized data platform without large data movement cost. In terms of our stock trading platform, we have customer data, financial trading data and account data in separate storage locations.

Figure 1

As evidenced in Figure 1, financial data is located in a PostgreSQL cloud environment, while personal customer data is on premise in the respective MongoDB and Informix environments. Using our advance virtualization engine, you can query each of these sources together and save half the cost of traditional information extraction methods.

Data cataloging

Once this data is ingested, it needs a mechanism to curate, categorize and facilitate its sharing throughout an organization. For example, our stock trading platform may have multiple teams of data scientists focused on core customer initiatives such as UI optimization algorithms or understanding order flow. A data catalog, such as the IBM Watson® Knowledge Catalog, can facilitate the relationship between these roles and reduce the prep necessary to complete these tasks. Data catalogs bridge the gap between raw and useable data, allowing for the application of business context, data policies and data protection rules to your virtual data. For example, if the lead data steward at my trading platform wishes to censor credit card numbers as they flow to different data projects, I can apply a data protection rule on credit card numbers as shown in Figure 2:

Figure 2

Now, you have credit card numbers censored throughout your environment, improving trust in your company while also enhancing your ability to meet different government regulations.

With this rule applied, data scientists who view customer information see redacted credit card numbers as shown in Figure 3:

Figure 3

Now if this table is needed in a Python project, data scientists can export that same core data for analysis without seeing any confidential information, as shown in Figure 4:

Figure 4

This is how a data fabric architecture enables our trading platform to virtualize sources and access data across multiple environments, then organize this data and safely collaborate with key data personnel. If you’re curious as to how this demo was made and would like to see how our final trading platform effectively analyzes data, sign up for my 15 Minute Friday Session on July 8th in the form below.

Was this article helpful?
YesNo

More from Cloud

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

Optimize observability with IBM Cloud Logs to help improve infrastructure and app performance

5 min read - There is a dilemma facing infrastructure and app performance—as workloads generate an expanding amount of observability data, it puts increased pressure on collection tool abilities to process it all. The resulting data stress becomes expensive to manage and makes it harder to obtain actionable insights from the data itself, making it harder to have fast, effective, and cost-efficient performance management. A recent IDC study found that 57% of large enterprises are either collecting too much or too little observability data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters