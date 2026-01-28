Artificial Intelligence Compute and servers IT automation

Production RAG for legal search: How Shorthills AI scaled data retrieval with IBM watsonx.data

By using Astra DB on IBM® watsonx.data® as the vector database and Langflow to accelerate iteration when building RAG systems, Shorthills gains a 60% improvement in recall and precision.

Published 28 January 2026
Modern artificial intelligence techniques such as semantic retrieval and retrieval-augmented generation (RAG) can reduce the time that law firms devote to hunting through long documents. They work by finding conceptually relevant sections, summarizing what matters and returning traceable citations so results can be validated.

Done well, this turns legal research from a manual, “open-ten-tabs” workflow into a guided, evidence-backed search experience without sacrificing the rigor legal teams require. In legal search, users don’t just need an answer; they need the right answer. That answer must be substantiated with the right authorities, the relevant exceptions and the exact passages they can cite—fast.

AI-powered insights for the legal industry

With this goal in mind, New Jersey-based Shorthills AI has developed a generative AI agent framework in the form of domain-specific optimized chatbots by using RAG and knowledge graph. This framework delivers AI-powered insights for the legal industry, where relevance, completeness and verifiable sourcing matter as much as raw speed.

IBM solutions also help produce a 4 times improvement in comprehensiveness—how well results capture the details and aspects of the user’s query. They also deliver a 9 times improvement in diversity—the ability to offer users multiple interpretations and angles rather than a single line of reasoning. This feature is critical to preparing arguments and rebuttals in legal workflows.

Putting it to use: Large-scale legal document search that improves relevance, completeness and trust

Legal departments sifting through a dataset of hundreds of thousands of legal documents require reliable retrieval, multiple angles on an issue and the ability to trace results back to source documents. Getting 70% of the answer can pose significant risks, and hallucinations are unacceptable.

A key constraint for many legal customers is deployment. Some organizations might be unable to share sensitive data in legal content with hyperscalers due to regulatory constraints. These constraints might require a knowledge base to remain on premises, such as content that includes personally identifiable information (PII) or protected health information (PHI).

Technical architecture: A scalable pipeline for hybrid RAG/keyword search and graph retrieval

Shorthills’ legal AI system consists of two pipelines:

  1. Document and data ingestion and processing
  2. Querying and retrieval, built to support multiple retrieval modes—keyword, vector and graph—and to route queries to the right technique based on intent

1. Ingestion pipeline: From raw legal files to searchable structures

Files are imported into a data lake and then prepared so that retrieval works reliably at scale:

  • Chunking documents into smaller pieces, with an emphasis on meaningful chunking—not simply splitting by character and word count—helps to preserve context such as definitions, citations and argument structures.
  • Entity extraction powered by a large language model (LLM) identifies legal-relevant fields, such as judge name, case name, case type, document IDs and judgment type. Then an entity relationship diagram is constructed so that relationships can be used later during retrieval.
  • Embedding documents with an embedding model and storing the resulting vectors supports vector and hybrid retrieval at scale. Shorthills relies on Astra DB as the storage foundation for the searchable data.
Processing PDFs and data storage Processing PDFs and data storage: New PDFs are read, split into smaller text chunks (for example, paragraphs or sections) and the IDs of processed files are saved back to the CSV. This approach creates structured, chunked text ready for analysis and updates the tracking record to prevent reprocessing.
Chunking and entity relationship extraction Chunking and entity relationship extraction: Chunks are sent to Amazon Bedrock to detect entities—like people, organizations—and their relationships, such as "works at", "located in". The raw LLM output (unstructured text) is temporarily cached in a local JSON file. This data is then parsed into clean, structured formats—entities as labeled items and relationships as connections between them. Embeddings (numeric representations) for the chunks are also generated and stored in a vector database for future retrieval.

2. Query pipeline: Routing, hybrid retrieval and reranking for relevance and latency

On the query side, Shorthills employs a pragmatic philosophy for their production environment: avoid reliance on a single universal search method.

  • Keyword search still matters for exact identifiers, such as document numbers or IDs.
  • Vector search improves semantic matching when users ask natural-language questions, such as recognizing that “display” and “screen” might refer to the same concept.
  • Graph search can capture linkages between documents, but at scale it can become slow and memory hungry—so it must be used selectively.
Search pipeline atop knowledge base Search pipeline atop knowledge base: A user query routes through the API Gateway and becomes a hybrid search (keyword and vector) powered by Amazon Bedrock, where relevant data is retrieved from watsonx. The data is then passed through a de-duplication and ranking layer and returned to the user as a refined, context-rich search result.

To implement this approach, the system includes routers that send a query to keyword, vector or graph search depending on the user’s intent. Each option carries different time and cost tradeoffs.

Other critical considerations included:

  • There is a need for a scalable hybrid search implementation that combines keyword and vector search with graph capabilities where appropriate. This approach optimizes the accuracy of search results and prevents systems from bogging down as the document store grows.
  • The system includes a reranking stage after retrieval to improve final relevance before results are returned to the user.

Finally, the experience extends beyond a chat-only interface. In this legal search use case, users can retrieve multiple data source types, including Word documents, images, PDFs and text files.

Benefits: Flexibility, security and scalability

A key reason Shorthills chose to build this AI assistant platform with IBM’s stack stemmed from the realities of enterprise deployment in the legal world:

  • On-premises support and data residency are essential for customers restricted from moving sensitive data to hyperscalers due to risk tolerance or regulatory requirements.
  • IBM watsonx.data unlocks the ability to scale search across large document volumes.
  • Enterprise-grade security and a scalable architecture enable AI-ready solutions and AI agents built with IBM watsonx.data and Langflow, a low-code, drag-and-drop environment for developing agents and RAG pipelines.

These processes had to operate within practical engineering parameters: LLMs are compute-hungry and on-premises scaling requires the right runtime and deployment tooling for efficiency.

In terms of measurable end-user outcomes, Shorthills reported:

  • Faster search time and more than 60% improvement in recall and precision
  • A roughly 4 times improvement in comprehensiveness and 9 times improvement in diversity
  • Better support for references and citations in returned results, which were previously missing

The impact is concrete: improved recall and precision, higher comprehensiveness and diversity and high-quality citation support. This outcome translates to end users spending less time hunting for material and more time evaluating it.

A foundation built to scale

Shorthills’ core lesson was that production search for AI solutions is an iterative engineering exercise. Scaling from a handful of documents to thousands (and beyond) changes the problem. Ultimately, “search” becomes a journey of continuous improvement—evolving through keyword, vector, hybrid and graph search, with careful routing so latency and cost stay predictable.

By building on IBM watsonx.data and Langflow, Shorthills implemented an AI-driven search system for legal professionals that can operate at enterprise scale, handle enterprise constraints, including on-premises requirements. The system delivers measurable relevance gains and provides the citations and breadth of perspectives legal end users need to determine outcomes with confidence.

Similar customer needs and governance requirements exist across industries such as healthcare and financial services. The scalability of the underlying infrastructure allows us to deploy similar solutions in multiple industries and worldwide.

As the next step, Shorthills sees this retrieval foundation extending into agent-based workflows. In these workflows, an agent can research, draft and package outputs for human review without having to rebuild the underlying data and retrieval stack each time.

Learn more about IBM watsonx.data

Paramdeep Singh

Cofounder and President

Shorthills AI

Chad Jennings

Global Head of Customer Voice and Product Experience

IBM