Have your lake and swim in it too

Share this post:

Looking around the industry, it is clear that many organizations are focused on providing an enterprise-wide data access strategy, notably through the creation of a “data lake.” The data lake methodology focuses on copying all data from across the enterprise into a single, centralized repository.  The data lake is usually hosted on a scalable clustered environment, from which data scientists can then consume data as needed.

On the surface, this approach seems simple, effective and comprehensive. However, there may be requirements which make data lakes less than ideal. For instance, this would be the case when there are changes to the information that the data represents and when those changes happen more often than scheduled extract-translate-load (ETL) operations.

The IBM Z brand has long been known for housing critical applications and the associated data. This is typically structured data stored in database systems or as data sets (structured files in the z/OS environment). With the enablement of a number of analytics and machine-learning software on z/OS and Linux on IBM Z, various use cases can be optimized locally on the IBM Z data. Not only that, but if the data on IBM Z is sensitive from a security or auditing perspective, that data can now remain within the IBM Z environment.

What does this enable? The data lake no longer has to be a one-size-fits-all solution, allowing data architects to provide more options for their data science team. Now machine learning use cases that primarily consume IBM Z data can be executed on data that is resident on that system. This means that it is possible that some data may not have to be moved into the data lake!

What are the implications? This will require your data scientists to think differently. They may have to do some of their analytics on another platform and then use the output of that as a feature for their IBM Z machine learning, in the event that more than just the IBM Z data is required to build a model. For example, social media data can be processed in the cloud, and the sentiment from that analysis can then be used to augment the IBM Z data. This has the benefit of keeping your systems’ data in-place, which means that access to it remains within the same auditing system, the physical encryption of that data on disk is not affected, and the permissions to that data continue to be enforced. This can be a huge benefit for data that is under scrutiny from a security perspective.

It is time to rethink the idea that everything needs to flow into a data lake. While there are many good use cases for data copying and aggregation, this is not a one-size-fits-all approach and there are legitimate cases where bringing the processing to the data can provide great benefits for an organization.

Learn more about the benefits you can derive from real-time data.

STSM, z Systems Software Design and Development

More Mainframes stories

The IBM Z cloud-ready data center dream is now available

Today IBM announces the availability of its cloud-ready IBM z14 based on a single-frame design, originally unveiled in April 2018. In response to the desires of cloud service providers, managed service providers and enterprise customers, the now broadly-available IBM Z mainframe features a 19-inch industry-standard rack that fits neatly onto just two data center floor […]

Continue reading

Shhh… Meet the no-cost IBM secret

There’s something really gratifying about stumbling upon a secret; something about conquering the unknown and unlocking “upgrades” and “hacks” that benefit you both in the here and now and in the long run. These secrets are 100% worth sharing. That being said, there’s a secret I’d like to share with you. Something that I want […]

Continue reading

3 paradigm shifts for IT operations on IBM Z to support digital enterprise

Good news! IBM Z is perfectly equipped to be at the center of your digital enterprise; 80 percent of corporate structured data and 55 percent of all enterprise transactions reside on IBM Z with only 6.2 percent of total corporate server expenditure[1]. It is the only platform capable of encryption of 100 percent of your […]

Continue reading