Servers - Mainframe

Have your lake and swim in it too

Share this post:

Looking around the industry, it is clear that many organizations are focused on providing an enterprise-wide data access strategy, notably through the creation of a “data lake.” The data lake methodology focuses on copying all data from across the enterprise into a single, centralized repository.  The data lake is usually hosted on a scalable clustered environment, from which data scientists can then consume data as needed.

On the surface, this approach seems simple, effective and comprehensive. However, there may be requirements which make data lakes less than ideal. For instance, this would be the case when there are changes to the information that the data represents and when those changes happen more often than scheduled extract-translate-load (ETL) operations.

The IBM Z brand has long been known for housing critical applications and the associated data. This is typically structured data stored in database systems or as data sets (structured files in the z/OS environment). With the enablement of a number of analytics and machine-learning software on z/OS and Linux on IBM Z, various use cases can be optimized locally on the IBM Z data. Not only that, but if the data on IBM Z is sensitive from a security or auditing perspective, that data can now remain within the IBM Z environment.

What does this enable? The data lake no longer has to be a one-size-fits-all solution, allowing data architects to provide more options for their data science team. Now machine learning use cases that primarily consume IBM Z data can be executed on data that is resident on that system. This means that it is possible that some data may not have to be moved into the data lake!

What are the implications? This will require your data scientists to think differently. They may have to do some of their analytics on another platform and then use the output of that as a feature for their IBM Z machine learning, in the event that more than just the IBM Z data is required to build a model. For example, social media data can be processed in the cloud, and the sentiment from that analysis can then be used to augment the IBM Z data. This has the benefit of keeping your systems’ data in-place, which means that access to it remains within the same auditing system, the physical encryption of that data on disk is not affected, and the permissions to that data continue to be enforced. This can be a huge benefit for data that is under scrutiny from a security perspective.

It is time to rethink the idea that everything needs to flow into a data lake. While there are many good use cases for data copying and aggregation, this is not a one-size-fits-all approach and there are legitimate cases where bringing the processing to the data can provide great benefits for an organization.

Learn more about the benefits you can derive from real-time data.

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Servers - Mainframe Stories

The key to a hybrid cloud for trusted digital experiences

As organizations tap into new digital applications and services, they are poised to fully exploit the agility and scale of public cloud platforms. In today’s world, a new application can be brought to life in a matter of minutes, to provide innovative new functions from a unique combination of digital and enterprise services, such as […]

Continue reading

Integration by design: DS8880 and IBM Z

Through close coordination between engineering and product development teams, IBM DS8880 storage systems and their precursors have maintained a close relationship with IBM mainframe processors. The recent launch of IBM z14 — the latest in a long line of IBM Z mainframe solutions — provides an excellent opportunity to highlight how this close relationship offers […]

Continue reading

Get real-time automated security analytics on your mainframe

Time is of the essence Our security operations centers are inundated with records that might include information relevant to potential security breaches. The amount of data to be analyzed is overwhelming.  We must defend against malware, ransomware, privileged user abuse, hackers and other threats: often zero-day vulnerabilities which can be exploited immediately and run undetected […]

Continue reading