In a prior blog, we pointed out that warehouses, known for high-performance data processing for business intelligence, can quickly become expensive for new data and evolving workloads. We also made the case that query and reporting, provided by big data engines such as Presto, need to work with the Spark infrastructure framework to support advanced analytics and complex enterprise data decision-making. To do so, Presto and Spark need to readily work with existing and modern data warehouse infrastructures. Now, let’s chat about why data warehouse optimization is a key value of a data lakehouse strategy.

Read our blog on solving today’s challenges with a lakehouse architecture

Value of data warehouse optimization

Since its introduction over a century ago, the gasoline-powered engine has remained largely unchanged. It’s simply been adapted over time to accommodate modern demands such as pollution controls, air conditioning and power steering.

Similarly, the relational database has been the foundation for data warehousing for as long as data warehousing has been around. Relational databases were adapted to accommodate the demands of new workloads, such as the data engineering tasks associated with structured and semi-structured data, and for building machine learning models.

Returning to the analogy, there have been significant changes to how we power cars. We now have gasoline-powered engines, battery electric vehicles (BEVs), and hybrid vehicles. An August 2021 Forbes article referenced a 2021 Department of Energy Argonne National Laboratory publication indicating, “Hybrid electric vehicles (think: Prius) had the lowest total 15-year per-mile cost of driving in the Small SUV category beating BEVs”.

Just as hybrid vehicles help their owners balance the initial purchase price and cost over time, enterprises are attempting to find a balance between high performance and cost-effectiveness for their data and analytics ecosystem. Essentially, they want to run the right workloads in the right environment without having to copy datasets excessively.

Optimizing your data lakehouse architecture

Fortunately, the IT landscape is changing thanks to a mix of cloud platforms, open source and traditional software vendors. The rise of cloud object storage has driven the cost of data storage down. Open-data file formats have evolved to support data sharing across multiple data engines, like Presto, Spark and others. Intelligent data caching is improving the performance of data lakehouse infrastructures.

All these innovations are being adapted by software vendors and accepted by their customers. So, what does this mean from a practical perspective? What can enterprises do different from what they are already doing today? Some use case examples will help. To effectively use raw data, it often needs to be curated within a data warehouse. Semi-structured data needs to be reformatted and transformed to be loaded into tables. And ML processes consume an abundance of capacity to build models.

Organizations running these workloads in their data warehouse environment today are paying a high run rate for engineering tasks that add no additional value or insight. Only the outputs from these data-driven models allow an organization to derive additional value. If organizations could execute these engineering tasks at a lower run rate in a data lakehouse while making the transformed data available to both the lakehouse and warehouse via open formats, they could deliver the same output value with low-cost processing.

Benefits of optimizing across your data warehouse and data lakehouse

Optimizing workloads across a data warehouse and a data lakehouse by sharing data using open formats can reduce costs and complexity. This helps organizations drive a better return on their data strategy and analytics investments while also helping to deliver better data governance and security.

And just as a hybrid car allows car owners to get greater value from their car investment, optimizing workloads across a data warehouse and data lakehouse will allow organizations to get greater value from their data analytics ecosystem.

Discover how you can optimize your data warehouse to scale analytics and artificial intelligence (AI) workloads with a data lakehouse strategy.

Chat with a data management expert
Was this article helpful?

More from Analytics

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

IBM unveils Data Product Hub to enable organization-wide data sharing and discovery

2 min read - Today, IBM announces Data Product Hub, a data sharing solution which will be generally available in June 2024 to help accelerate enterprises’ data-driven outcomes by streamlining data sharing between internal data producers and data consumers. Often, organizations want to derive value from their data but are hindered by it being inaccessible, sprawled across different sources and tools, and hard to interpret and consume. Current approaches to managing data requests require manual data transformation and delivery, which can be time-consuming and…

A new era in BI: Overcoming low adoption to make smart decisions accessible for all

5 min read - Organizations today are both empowered and overwhelmed by data. This paradox lies at the heart of modern business strategy: while there's an unprecedented amount of data available, unlocking actionable insights requires more than access to numbers. The push to enhance productivity, use resources wisely, and boost sustainability through data-driven decision-making is stronger than ever. Yet, the low adoption rates of business intelligence (BI) tools present a significant hurdle. According to Gartner, although the number of employees that use analytics and…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters