Have your lake and swim in it too

Share this post:

Looking around the industry, it is clear that many organizations are focused on providing an enterprise-wide data access strategy, notably through the creation of a “data lake.” The data lake methodology focuses on copying all data from across the enterprise into a single, centralized repository.  The data lake is usually hosted on a scalable clustered environment, from which data scientists can then consume data as needed.

On the surface, this approach seems simple, effective and comprehensive. However, there may be requirements which make data lakes less than ideal. For instance, this would be the case when there are changes to the information that the data represents and when those changes happen more often than scheduled extract-translate-load (ETL) operations.

The IBM Z brand has long been known for housing critical applications and the associated data. This is typically structured data stored in database systems or as data sets (structured files in the z/OS environment). With the enablement of a number of analytics and machine-learning software on z/OS and Linux on IBM Z, various use cases can be optimized locally on the IBM Z data. Not only that, but if the data on IBM Z is sensitive from a security or auditing perspective, that data can now remain within the IBM Z environment.

What does this enable? The data lake no longer has to be a one-size-fits-all solution, allowing data architects to provide more options for their data science team. Now machine learning use cases that primarily consume IBM Z data can be executed on data that is resident on that system. This means that it is possible that some data may not have to be moved into the data lake!

What are the implications? This will require your data scientists to think differently. They may have to do some of their analytics on another platform and then use the output of that as a feature for their IBM Z machine learning, in the event that more than just the IBM Z data is required to build a model. For example, social media data can be processed in the cloud, and the sentiment from that analysis can then be used to augment the IBM Z data. This has the benefit of keeping your systems’ data in-place, which means that access to it remains within the same auditing system, the physical encryption of that data on disk is not affected, and the permissions to that data continue to be enforced. This can be a huge benefit for data that is under scrutiny from a security perspective.

It is time to rethink the idea that everything needs to flow into a data lake. While there are many good use cases for data copying and aggregation, this is not a one-size-fits-all approach and there are legitimate cases where bringing the processing to the data can provide great benefits for an organization.

Learn more about the benefits you can derive from real-time data.

STSM, z Systems Software Design and Development

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Servers stories

Three IBM Z security insights from Think 2018

The data breach threat is real. Recent high-profile breaches have focused boardroom attention on this issue. Data breaches are expensive, costing $3.6 million on average[1]. And they’re increasingly likely: an organization has a 28 percent chance of being breached in the next 24 months.[2] To stay out of the data breach headlines, organizations require security […]

Continue reading

The best and the brightest: “Thinking big” in the cloud

Greetings from Las Vegas at the first IBM Think Conference!  Although this is my first Think conference (and more about the differences later) I am a veteran of IBM Innovate, InterConnect and IOD among others.  The thing I enjoy most about these conferences is the chance to learn from SMEs and brilliant industry leaders who […]

Continue reading

IBM Z and healthcare: when data protection is crucial to your business

IBM’s “Big Iron” mainframe immediately conjures up images: rock solid, dependable,  security-rich—everything you might want and need in a platform. Is it any surprise that so many top companies rely on IBM Z, the new modern mainframe? These industries depend on the stability of the platform—in fact, they bet their business on it.  They know […]

Continue reading