Do you have vast amounts of digital documents (such as PDF files, patents and corporate documents) piled up in your organization that are humanly impossible to read and digest? Wouldn’t it be nice if you could query against these document piles with questions like “List all the materials claimed by company X in the US patent office”? Deep search is an IBM Research® service that automatically analyzes enormous digital libraries and facilitates discovering unknown facts. It implements an AI-based approach to enable intelligent querying against document repositories. This capability has been demonstrated to aid innovation across various industries such as material sciences, insurance and drug discovery.

IBM deep search service

How does deep search work? Initially, as shown in figure 1, the digital documents are segmented into multiple components (heading, introduction, references and so on) using machine learning models and converted into structured data representations (such as HTML or JSON). These supervised learning models are customizable and highly accurate, making use of huge data sets and modern neural network topologies.

The second step of deep search involves using the existing data sources (corporate databases, publicly available data sets and the like) to identify the concepts (such as alloy, material) and relationships that are relevant to the context of knowledge discovery. Finally, a searchable and queryable knowledge base is built by linking the structured data formats of documents to the identified concepts and relationships.

Deep search in action

The document processing techniques coupled with the graph analytics provided by deep search can accelerate novel discoveries from document repositories across industries. The chemical company Nagase & Co has put deep search to extensive use in developing new compounds. ENI, an oil and gas company, is using the service for upstream exploration. Currently, deep search is also aiding drug discovery in COVID-19 research.

Knowledge discovery at scale

In addition to the knowledge engineering techniques described above, automatic analysis of a huge number of documents demands powerful storage, compute and network infrastructure. The deep search platform is currently available as a service through Red Hat® OpenShift® on IBM Cloud®. It can also be set up on your premises in an OpenShift environment on IBM Power Systems as well as Intel x86 servers. The software is designed as a group of cloud-based microservices that can scale along with the number of documents and hardware resources for large search applications. This hardware-software codesigned platform has demonstrated capability to ingest as many as 100,000 pages per day per core.

IBM Systems Lab Services can help your organization make better use of document repositories using the deep search platform. Our experienced consultants help you set up the OpenShift platform, work with your subject matter experts to build the knowledge bases and design queries to help you develop novel insights into your digital libraries.

>> Contact Lab Services today.

Was this article helpful?

More from Artificial intelligence

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

Introducing the watsonx platform on Microsoft Azure

4 min read - Artificial intelligence (AI) is revolutionizing industries by enabling advanced analytics, automation, and personalized experiences. According to The business value of AI, from the IBM Institute of Business Value, AI adoption has more than doubled since 2017. Enterprises are taking an intentional design approach to hybrid cloud and AI to drive technology decisions and enable adoption of Generative AI. According to the McKinsey report,  The economic potential of generative AI: The next productivity frontier, generative AI is projected to add $2.6…

Democratizing Large Language Model development with InstructLab support in

5 min read - There is no doubt that generative AI is changing the game for many industries around the world due to its ability to automate and enhance creative and analytical processes. According to McKinsey, generative AI has a potential to add $4 trillion to the global economy. With the advent of generative AI and, more specifically, Large Language Models (LLMs), driving tremendous opportunities and efficiencies, we’re finding that the path to success for organizations to effectively use and scale their generative AI…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters