IBM today announced the coming launch of IBM watsonx.data, a data store built on an open lakehouse architecture, to help enterprises easily unify and govern their structured and unstructured data, wherever it resides, for high-performance AI and analytics. The solution is currently in a closed beta phase and is expected to be generally available in July 2023.

What is watsonx.data?

Watsonx.data will be core to IBM’s coming AI and Data platform, IBM watsonx, announced today at IBM Think. With watsonx, IBM will launch a centralized AI development studio that gives businesses access to proprietary IBM and open-source foundation models, watsonx.data to gather and clean their data, and a toolkit for governance of AI.

Watsonx.data will allow users to access their data through a single point of entry and run multiple fit-for-purpose query engines across IT environments. Through workload optimization an organization can reduce data warehouse costs by up to 50 percent by augmenting with this solution.[1] It also offers built-in governance, automation and integrations with an organization’s existing databases and tools to simplify setup and user experience.

Supporting the data management life cycle

According to IDC’s Global StorageSphere, enterprise data stored in data centers will grow at a compound annual growth rate of 30% between 2021-2026.[2] With increased data volumes comes increased data silos, operational costs, and regulatory pressures, which can lead to greater scrutiny and demand for improved business outcomes from data, analytics and AI investments.

This proliferation of data spans every industry, and organizations have an opportunity to turn it into actionable insights that can inform revenue strategies and enhance operational efficiencies.

“The media and entertainment industry has undergone a significant digital transformation, with viewers consuming content across different devices and platforms,” said Vitaly Tsivin, EVP Business Intelligence at AMC Networks. “Watsonx.data could allow us to easily access and analyze our expansive, distributed data to help extract actionable insights and maximize our resource utilization to deliver superior user experiences for viewers of AMC Networks’ curated, high-quality content.”

Notably, watsonx.data runs both on-premises and across multicloud environments. The solution will help businesses harness their increasingly siloed data and apply advanced AI and analytics to derive actionable insights, all while supporting robust data governance and observability throughout the data management life cycle.

Strong partnerships for even stronger solutions

Watsonx.data is engineered to use Intel’s built-in accelerators on Intel’s new 4th Gen Xeon Scalable Processors and open-source query engines such as Presto, the Velox acceleration library and Spark, to deliver rapid and reliable data processing for high performance SQL querying, reporting, business intelligence, and machine learning.

“We recognize the importance of watsonx.data and the development of the open-source components that it’s built upon,” said Das Kamhout, VP and Senior Principal Engineer of the Cloud and Enterprise Solutions Group at Intel. “We look forward to partnering with IBM to optimize the watsonx.data stack, achieving breakthrough performance through our joint technological contributions to the Presto open-source community.”

IBM and Intel have a long history of collaboration on data and AI products, including the optimization of IBM Db2 on Intel Xeon platforms, AI acceleration with IBM Watson NLP Library for Embed with OneAPI, and now watsonx.data.

Watsonx.data will allow users to modernize their data repositories with data warehouse-like capabilities, while benefiting from low-cost object storage and open data and table formats like Iceberg, to help them make data-driven decisions.

“Open data lakehouse architectures powered by the Apache Iceberg table format give organizations the flexibility to use fit-for-purpose analytical solutions to future-proof their data platforms for all workloads,” said Paul Codding, EVP of Product Management of Cloudera. “IBM and Cloudera customers will benefit from a truly open and interoperable hybrid data platform that fuels and accelerates the adoption of AI across an ever-increasing range of use cases and business processes.”

IBM and Cloudera have a long-standing strategic partnership that includes certified product integrations and joint sales and support models.

Wasonx.data will be available on premises and across multiple cloud providers, including IBM Cloud and Amazon Web Services (AWS). This builds on last year’s announcement of IBM expanding their relationship with AWS to offer IBM software as a service on AWS. The solution will also be available in AWS Marketplace.

“Organizations are increasingly adopting data lakehouse solutions to support their growing data needs, especially as we see an industry-wide shift toward AI solutions,” said Soo Lee, Director Worldwide Strategic Alliances at AWS. “Making watsonx.data available as a service in AWS Marketplace further supports our customers’ increasing needs around hybrid cloud – giving them greater flexibility to run their business processes wherever they are, while providing choice of a wide range of AWS services and IBM cloud native software attuned to their unique requirements.”

The coming launch of watsonx.data will extend IBM’s market leadership in data and AI, most recently demonstrated by its evaluation as a leader in The Forrester Wave: Data Management for Analytics, by integrating with existing IBM solutions like StepZen, Databand.ai, IBM Watson Knowledge Catalog, IBM zSystems, IBM Watson Studio, and IBM Cognos Analytics with Watson. These integrations can enable watsonx.data users to implement various industry-leading data catalog, lineage, governance, and observability solutions across their data ecosystems.

Beyond launch, watsonx.data is expected to undergo continuous development, incorporating the latest performance enhancements to the Presto open-source query engine via Velox. Further development of watsonx.data will also incorporate IBM’s Storage Fusion technology to enhance data caching across remote sources as well as semantic automation capabilities built on IBM Research’s foundation models to automate data discovery, exploration, and enrichment through conversational user experiences.

Explore the interactive tour of watsonx.data

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.

[1] When comparing published 2023 list prices normalized for VPC hours of watsonx.data to several major cloud data warehouse vendors. Savings may vary depending on configurations, workloads and vendors.

[2] IDC, Worldwide Global StorageSphere Forecast, 2022–2026: An Installed Base of 7.9ZB of Storage Capacity in 2021 Came at a Cost of $370 Billion — Is It Enough? (IDC Doc #US49051122, May 2022)

More from Artificial intelligence

How generative AI delivers value to insurance companies and their customers

4 min read - Insurers struggle to manage profitability while trying to grow their businesses and retain clients. They must comply with an increasing regulatory burden, and they compete with a broad range of financial services companies that offer investment products that have potential for better returns than traditional life insurance and annuity products. Although interest rates have increased at an unprecedented rate over the past year as central banks attempt to curb inflation, a significant part of insurers’ reserves are locked into low-yield…

How to build a successful employee experience strategy

4 min read - Ever since the pandemic changed the corporate world, organizations have rededicated themselves to excelling at employee experience strategy. A successful employee experience strategy (EX strategy) is the best way to recruit and retain top talent, as employees increasingly make decisions on where to work based on how they respond to employee needs. Organizations can prioritize overall employee experience by being thoughtful about how to serve their workers during all stages of the employee journey, from the hiring process to the…

Best practices for augmenting human intelligence with AI

2 min read - Artificial Intelligence (AI) should be designed to include and balance human oversight, agency, and accountability over decisions across the AI lifecycle. IBM’s first Principle for Trust and Transparency states that the purpose of AI is to augment human intelligence. Augmented human intelligence means that the use of AI enhances human intelligence, rather than operating independently of, or replacing it. All of this implies that AI systems are not to be treated as human beings, but rather viewed as support mechanisms…

IBM watsonx AI and data platform, security solutions and consulting services for generative AI to be showcased at AWS re:Invent

3 min read - According to a Gartner® report, “By 2026, more than 80% of enterprises will have used generative AI APIs or models, and/or deployed GenAI-enabled applications in production environments, up from less than 5% in 2023.”* However, to be successful they need the flexibility to run it on their existing cloud environments. That’s why we continue expanding the IBM and AWS collaboration, providing clients flexibility to build and govern their AI projects using the watsonx AI and data platform with AI assistants…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters