Have your lake and swim in it too

Share this post:

Looking around the industry, it is clear that many organizations are focused on providing an enterprise-wide data access strategy, notably through the creation of a “data lake.” The data lake methodology focuses on copying all data from across the enterprise into a single, centralized repository.  The data lake is usually hosted on a scalable clustered environment, from which data scientists can then consume data as needed.

On the surface, this approach seems simple, effective and comprehensive. However, there may be requirements which make data lakes less than ideal. For instance, this would be the case when there are changes to the information that the data represents and when those changes happen more often than scheduled extract-translate-load (ETL) operations.

The IBM Z brand has long been known for housing critical applications and the associated data. This is typically structured data stored in database systems or as data sets (structured files in the z/OS environment). With the enablement of a number of analytics and machine-learning software on z/OS and Linux on IBM Z, various use cases can be optimized locally on the IBM Z data. Not only that, but if the data on IBM Z is sensitive from a security or auditing perspective, that data can now remain within the IBM Z environment.

What does this enable? The data lake no longer has to be a one-size-fits-all solution, allowing data architects to provide more options for their data science team. Now machine learning use cases that primarily consume IBM Z data can be executed on data that is resident on that system. This means that it is possible that some data may not have to be moved into the data lake!

What are the implications? This will require your data scientists to think differently. They may have to do some of their analytics on another platform and then use the output of that as a feature for their IBM Z machine learning, in the event that more than just the IBM Z data is required to build a model. For example, social media data can be processed in the cloud, and the sentiment from that analysis can then be used to augment the IBM Z data. This has the benefit of keeping your systems’ data in-place, which means that access to it remains within the same auditing system, the physical encryption of that data on disk is not affected, and the permissions to that data continue to be enforced. This can be a huge benefit for data that is under scrutiny from a security perspective.

It is time to rethink the idea that everything needs to flow into a data lake. While there are many good use cases for data copying and aggregation, this is not a one-size-fits-all approach and there are legitimate cases where bringing the processing to the data can provide great benefits for an organization.

Learn more about the benefits you can derive from real-time data.

More Mainframes stories

Announcing our direction for Red Hat OpenShift for IBM Z and LinuxONE

LinuxONE solutions, Mainframes, Multicloud

In early July, IBM and Red Hat officially closed their most significant acquisition of 2019–an important milestone combining the power and flexibility of Red Hat’s open hybrid portfolio and IBM’s technology and deep industry expertise. The feedback from our clients and partners is clear. A recent IBM report found that 80 percent want solutions that more

Get ready, IBM z/OS V2R4 is on the way!

Hybrid cloud, Mainframes, Servers

The newest version of IBM Z’s premier operating system is jam-packed with innovation to help clients build applications and services based on the highly scalable and secured mainframe infrastructure. It provides performance and availability for on-premises or provisioned-as-a-service workloads as businesses continue their digital transformation. z/OS V2.4 is designed to improve the integration of z/OS more

The unbearable lightness of being: mainframe on the cloud

Cloud computing, Mainframes, Servers

Some time ago, having the words “cloud” and “mainframe” in the same phrase was considered implausible or even impossible. Working with mainframe-related technologies or even directly with any mainframe applications was always associated with green screens, blocky letters and this feeling of old technology. We could say this feeling still exists among a lot of more