Meet LDMs, the new (AI) sheriff in town

Woman working in a data center, looking at laptop

Author

Staff Writer

IBM

Thanks to ChatGPT and dozens of other chatbots built on foundation models, nearly everyone knows about large language models (LLMs). But what about large database models (LDMs)?

“LDMs are models tuned for deriving insight from large data sets and transaction flows rather than human language and text, which are the domain of LLMs and chatbots,” said Ric Lewis, IBM’s SVP of Infrastructure, at IBM 2025 Investor Day.

While LLMs are trained on publicly available data such as books, articles, Wikipedia and various other sources, their training materials do not typically include the vast amount of data within enterprises. In fact, only 1% of enterprise data is currently used in large language models.

LDMs, in contrast, are trained on transaction records, product information, client relationship data, training logs and employee records, among other sources of enterprise data. As a result, enterprises can use LDMs to uncover meaning in the untapped 99% of data found in their databases using conversational questions, in a process known as semantic search. Semantic search goes beyond matching keywords to understand the meaning and context behind a user’s search query.

“LDMs represent an exciting new way to leverage the data embedded in business applications and transaction flows to extract new insights and new value for the enterprise,” says Lewis in an interview with IBM Think. “While LDMs are just emerging, we are optimistic about their potential to be utilized to inform agentic applications and help businesses drive improved results,” he explains, adding that these models are already being adopted to infuse AI into transactional processes.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

How do LDMs work?

Consider, for example, a retail business looking to identify customers whose average spending power and purchase history mirror those of a shopper named Claire, who recently came into the store and expressed strong interest in a new beauty product. Traditionally, the retailer’s data scientist would start by defining the pipeline—the processes for turning raw data into useful answers to the specific business question under consideration. Next, they would formulate the database query with very specific terms, such as “Find all the customers between the ages of 20 and 40 who live in New York and who have spent at least USD 1,000 on beauty products this past year.”

The data scientist would then extract the necessary data, load it to another platform, and over a period of weeks or months, determine which customers resembled Claire in their database.

Given the extensive process involved in completing traditional database queries, “there’s a lot of data sitting on company mainframes that is not typically the target of generative AI and that enterprises are not getting any insights from,” Catherine Wu, the Program Director for Db2 at IBM’s Silicon Valley Lab, tells IBM Think.

Part of this comes down to the cost and security concerns associated with moving data to an external environment. “We have customers tell us that just moving data is 30-40% of their IT costs,” says Wu. “Also, once the data moves off their mainframe, they cannot track where it’s going, so that’s a big concern for customers.”

LDMs, in contrast, allow users to search databases and get answers much more quickly and easily, whether that database is on premises, in the cloud or a hybrid of the two. So the retailer in the above example could simply query the database to ask, “List the top 100 clients like Claire.” And a short time later, anyone with basic SQL training could pull that information without having to move data anywhere, Wu says. IBM launched its first database product using a large database model in 2022 called SQL Data Insights (SQL DI), which is part of the Db2 database for z/OS located on IBM Z mainframes, which power over 70% of the world’s financial transactions by value.

Smaller, better and faster

As Kate Soule, Director of Technical Product Management for Granite, said on a recent episode of the Mixture of Experts podcast, LLMs can “often be overkill.”

“The training and tuning requirements of LDMs can be accomplished with a different infrastructure than LLMs,” says IBM’s Lewis. “You don’t need massive farms of GPUs to go after the problems that most businesses are trying to solve. Compared to all the data that might be used to train an LLM, an enterprise database of transactions is relatively small.” But, Lewis says, company-specific data can create “specific models to deliver a specific outcome more cost efficiently and often more effectively.”

With IBM’s SQL DI, each value within a database column, regardless of its data type, is translated into a text token. “Consequently, the model perceives each database record as an unordered bag of words in an English-like sentence where each token maintains equal relationship with others, regardless of its position in the record,” says IBM Distinguished Engineer Akiko Hoshikawa. Next, SQL DI deduces the significant database values based on surrounding column values, both within and across table rows. With the model trained in this fashion, nearly anyone can run an AI query on relational data to detect and match semantically similar data directly within the database.

Mixture of Experts | 2 January, episode 88

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch all episodes of Mixture of Experts

LDMs in insurance, retail and fraud detection

While many companies are exploring LDMs as a proof-of-concept, some enterprises across insurance and retail are already using these tools to speed up the process of extracting value from their databases.

Thomas Baumann, a Data Evangelist at Swiss Mobiliar, the oldest insurance company in Switzerland, uses IBM’s SQL DI in several areas of the company. Baumann started using SQL DI to help the company better tailor car insurance quotes to increase sales. When a salesperson was interacting with a potential new insurance policy holder, they could enter a quote, and the LDM would extract the most similar previous cases to determine the probability of the customer accepting it.

“Then, the user can change some of the parameters, like decreasing the deductibles or offering a more aggressive discount, and then they recalculate the new odds for the likelihood of success,” says Baumann in an interview with IBM Think. “The quotes are much more sophisticated and tailored to individual customers than they ever were before.”

When using IBM’s SQL DI for Swiss Mobiliar’s auto insurance product, the company trained the model with approximately 15 million records of automobile insurance quote data, with each record containing several dozen attributes for each record, such as demographics, vehicle data and price. Baumann says that sales personnel found they could make more scientific quotes by checking the odds of various candidate quotes before selecting one.

As a result, they improved the closing rate on insurance sales by 7% over the course of six months—an improvement Baumann says would have taken approximately two years without using LDMs. Based on the success of this pilot, Swiss Mobiliar is now using LDMs for all its insurance products (with the exception of life insurance), from building insurance to household insurance.

“The two main benefits of SQL DI are that it’s very fast to move from an idea to pre-production,” says Baumann. “You also don’t need to move data from one platform to another.”

Beyond insurance, IBM’s SQL DI team is also working with several food retailers in the US and Europe who are interested in using LDMs to provide customers with more customized shopping experiences. A customer could, for instance, be holding one type of cereal in their hand and run a semantic query in the database to pull up alternate cereals that taste similar but offer a healthier nutrition profile. LDMs used to make suggestions are like “more sophisticated, personalized Amazon or Netflix recommendations,” says Hoshikawa.

Beyond customer-facing applications, companies are already deploying LDMs in many B2B areas, such as anomaly detection and real-time fraud detection. Any company that issues contracts, for instance, could use an LDM to quickly identify contracts that are out of the ordinary, says IBM’s Hoshikawa.

Meanwhile, LDMs can power more sophisticated real-time fraud detection as well. In addition to identifying transactions that don’t follow typical patterns, LDMs can query databases to identify records that include various attributes associated with suspicious behavior, such as companies missing Better Business Bureau reports or lacking physical addresses.

Lewis believes that LLMs and LDMs will be followed by many other specialized models. “We believe LDMs, just like LLMs, are a valuable tool to enable a wave of agentic applications and help drive improved outcomes,” he says. “But we do not expect them to always be used in isolation. In fact, we believe the ideal scenario is to incorporate LDMs into the enterprise data model and combine them with LLMs and other fit-for-purpose models to drive massively new value at scale for enterprises and for society.”

Similarly, Lewis does not expect one enterprise or organization to necessarily dominate. “Don’t assume it will be one company, or the company that has the most servers and the most GPUs, that is going to develop the Swiss Army Knife of models,” Lewis says. “I don't believe that. Just like I think we can gain the most insight by tapping into the specialized knowledge of subject matter experts across different fields, I believe the ability to combine LLMs, LDMs and future waves of purpose-built models will lead to genuinely brand-new insights, and results which are truly optimized.”

Unify and access your data to help scale your AI

Learn why the path to AI-ready data often starts with effective access to both structured and unstructured data and the challenges that can impede data leaders.

Resources

AI agents run on data—is yours ready?

Your data is your competitive edge. Learn how to unlock it securely and drive measurable ROI from AI in this short webinar.

Building unstructured data pipelines for enterprise AI

Join our webinar to see how IBM is extending data integration to unstructured data with automated ingestion, transformation and vectorization for AI.

The hybrid, open data lakehouse for AI

Simplify data access and automate data governance. Discover the power of integrating a data lakehouse strategy into your data architecture, including cost-optimizing your workloads and scaling AI and analytics, with all your data, anywhere.

Is your data ready for gen AI?

Explore our Data Matters hub to learn how you can tackle data and AI challenges like integration.

How the C-suite is turning information into impact

Explore insights from 1,700 CDOs in this cross-industry report for data leaders.

Unleash the power of AI for seamless data integration

Understand why organizations need to adopt a unified approach that lets them manage the full spectrum of integration capabilities from a single pane of glass, eliminating the need to rely on numerous tools.

Unify and access your data to help scale your AI

Learn why the path to AI-ready data often starts with effective access to both structured and unstructured data and the challenges that can impede data leaders.

From data chaos to AI clarity: Activating AI through high-quality enterprise data

Understand how focusing on well-governed, secure and collaborative access to data at scale empowers enterprises to maximize their AI investments.

Meet LDMs, the new (AI) sheriff in town

Author

The latest AI News + Insights

How do LDMs work?

Smaller, better and faster

Decoding AI: Weekly News Roundup

LDMs in insurance, retail and fraud detection

Share

Resources

The latest AI News + Insights