November 19, 2020 By Alan Drummer 4 min read

According to projections from IDC, 80% of worldwide data will be unstructured by 2025.[1] Even though most enterprises already use data warehouses to analyze structured data, more are now turning to data lakes to leverage unstructured and semi-structed data in its native format. This includes the growing volume of streaming audio and video, social media, and clickstream, sensor and log data.

Data lakes bring organizations new opportunities by providing access to a greater variety and volume of data that can fuel more accurate analytic predictions and models. They also deliver rich, actionable insights by enabling organizations to find interesting relationships, trends, patterns and anomalies that wouldn’t be visible otherwise.

A Ventana research study showed that among enterprises with a data lake:

41% reported gains in competitive advantage
37% noted lowered costs
35% enjoyed improved customer experiences
33% believed it helped them respond better to opportunities and threats
28% revealed an increase in sales

Harness more data for better, data-driven decisions

Organizations in almost every industry are benefiting from data lakes. Let’s look at three industries where value has grown especially fast.

Financial services
Financial firms are using the ability to organize and accommodate unstructured data to include location, IoT, sensor, clickstream and social media data in their analytics. As a result, they’re able to deliver personalized insurance offerings, fight fraud more accurately, and gain 360-degree customer views. Read the report and watch the webinar to learn how they’re getting results like these:

  • 300% higher conversion rate
  • 30% fewer fraud incidents
  • Nearly USD 4 million reduction in expenditure
  • 90% quicker time-to-value for big data analytics

Read TechTarget’s assessment of these financial services accomplishments in the report: It’s a new era in advanced analytics and AI.

Watch the webinar on how global bank ING accelerated time to market for new products and shortened the selling process from years to months.

Healthcare firms depend on unstructured data such as doctors’ notes, X-rays, CT scans and research articles that empower front-line caregivers with real-time data and predictions. Results include:

  • 25% possible annual savings[2]
  • 31% reduction in 30-day readmission for certain patients[3]
  • USD 500,000 savings per year[4]

Read TechTarget’s Welcome to the healthcare revolution to learn more.

Communication service providers
Communication service providers are analyzing video feeds, clickstreams and third-party data to predict and prevent churn, optimize networks, and detect fraud in real time. Results include:

  • 350% more fraud incidents discovered[5]
  • Up to 20% revenue loss reduction[6]
  • 5% – 10% reduction in customer churn[7]

Read TechTarget’s Data Analytics and AI are empowering transformation report to learn more about these advancements.

What elements make a data lake successful?

Governance is essential for a data lake to succeed. Without it, a data lake can quickly turn into an unmanageable data swamp in which users can’t find, trust, or use the data they need.

A governed data lake contains clean, trusted data from structured and unstructured sources that can easily be found, accessed, managed and protected. It enables self-service access to help users find relevant information through simple search interfaces.  Read the ebook “Governed data lakes for business insights” to explore the key building blocks of effectively delivering trusted data.

An enterprise data catalog is the foundation for data lake governance. A catalog organizes a data lake by automating data discovery, metadata generation, and the building of machine-learning-extracted business glossaries. It can also perform automated scanning and risk assessments of unstructured data, as well as track data lineage. Learn more about these capabilities when you read the ebook “Build a better data lake.”

AI-led automated data integration can efficiently cleanse and deliver trusted data anywhere, at any scale and complexity, on and across multicloud and hybrid cloud environments.

In-flight data quality and active metadata and policy enforcements help ensure trusted delivery to data lakes. Read how to save on data movement and storage costs and boost business productivity using IBM DataStage.

Having a single source for hardware, software, services and multivendor solutions can make it easier to build, manage, govern and secure the data lake. IBM provides that source. Learn more about the IBM and Cloudera partnership, with an ecosystem of offerings designed for faster analytics results at scale. See how freedom from vendor lock-in, enhanced self-service and optimized integration are some of the drivers that “provide unprecedented flexibility, choice and value for clients” in the race to implement AI insights. Read the analyst report on total value of ownership.

Start leveraging semi-structured and unstructured data today

When properly governed with an enterprise data catalog, and unified with data virtualization so data can be queried from a single source, a data lake can transform the fast-growing volume of new data types from a burden into a benefit that helps fuel new and actionable insights.

Over the next few years, as semi-structured and unstructured data become as much as 80% of the world’s information, companies that can include these types of data as they look for relationships, trends, patterns, and anomalies will have a growing advantage over those who can’t. To get started, read this ebook on how to build a better data lake.

[1] Timothy King, 80 Percent of Your Data Will Be Unstructured in Five Years, Data Management Solutions Review, March 28, 2019. Accessed November 24, 2020.

[2] Sabyasachi Dash et. al., Big data in healthcare: management, analysis and future prospects. SpringOpen, June 19, 2019. Accessed November 19, 2020.

[3] Marina Turea, Ultimate Guide to Big Data in Healthcare, Healthcare Weekly, September 26, 2020. Accessed November 19, 2020.

[4] Ibid.

[5] Communication Service Providers: Data Analytics and AI Are Empower Transformation, TechTarget Custom Media. Accessed November 24, 2020.

[6] Cloudera, Reducing Revenue Loss from Fraud by up to 20 Percent, Accessed November 19, 2020.

[7] Ibid.

More from Cloud

Sensors, signals and synergy: Enhancing Downer’s data exploration with IBM

3 min read - In the realm of urban transportation, precision is pivotal. Downer, a leading provider of integrated services in Australia and New Zealand, considers itself a guardian of the elaborate transportation matrix, and it continually seeks to enhance its operational efficiency. With over 200 trains and a multitude of sensors, Downer has accumulated a vast amount of data. While Downer regularly uncovers actionable insights from their data, their partnership with IBM® Client Engineering aimed to explore the additional potential of this vast dataset,…

Best practices for hybrid cloud banking applications secure and compliant deployment across IBM Cloud and Satellite

10 min read - Financial Services clients are increasingly looking to modernize their applications. This includes modernization of code development and maintenance (helping with scarce skills and allowing innovation and new technologies required by end users) as well as improvement of deployment and operations, using agile techniques and DevSecOps. As part of their modernization journey, clients want to have flexibility to determine what is the best “fit for purpose” deployment location for their applications. This may be in any of the environments that Hybrid…

Level up your Kafka applications with schemas

4 min read - Apache Kafka is a well-known open-source event store and stream processing platform and has grown to become the de facto standard for data streaming. In this article, developer Michael Burgess provides an insight into the concept of schemas and schema management as a way to add value to your event-driven applications on the fully managed Kafka service, IBM Event Streams on IBM Cloud®. What is a schema? A schema describes the structure of data. For example: A simple Java class…

SSD vs. NVMe: What’s the difference?

7 min read - Recent technological advancements in data storage have prompted businesses and consumers to move away from traditional hard disk drives (HDDs) towards faster, lower-latency solid-state drive (SSD) technology. In this post, we’re going to look at this new technology, as well as the fastest and most popular protocol available to connect it to a computer’s motherboard—non-volatile memory express (NVMe). While the terms SSD and NVMe are often used to describe two different types of drives, they are actually different data storage…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters