How to integrate custom-built annotators into your data pipeline

Share this post:

Key Points:

  • Use annotation models generated with Watson Knowledge Studio to create a custom Watson Discovery Service configuration pipeline.
  • With business logic built into the ground truth, annotations carry meaning pertaining to a business scenario.
  • Apply document enhancing capabilities and extract information from industry specific or scientific domains.

Learn more about Discovery Service


Clients are looking to IBM Watson® to satisfy a wide variety of cognitive use cases spanning multiple domains. Extracting insights from these often-complex domains of knowledge, however, can be a challenge given the intricate nature of industry-specific data. For example, we often work with clients trying to discover trends, relations, and patterns within a company’s financial reports. Analysis of these reports requires deep subject matter expertise which can be challenging to find and scale across an organization. By deploying Watson Knowledge Studio models to Watson Discovery Service, users can apply the document enhancing capabilities of Discovery to extract information from industry or scientific domains.

  • Generate a simpler enrichment structure within Discovery Service to enhance query results.
  • Targeted relationship extraction where relations are surfaced and combined to generate trends, patterns, and actionable insight.

Using an investment banker persona to address questions around risks and issues across financial companies, let’s explore how it works.

Our user is Henry, an investment banking analyst tasked to compile pitch books: pitch books are marketing presentations filled with useful investment considerations. Henry fills his with investment considerations extrapolated from financial documents like SEC 10K reports.

While analyzing these documents, Henry looks for insights based on three business considerations.

  1. Threat – Which factors create a specific risk, and is there warning or indication of potential harm.
  2. Entity at Risk – Which part of the company is at risk by the threat.
  3. Relationship – The relation between threat and entity at risk, and how the entity at risk is affected.

Because of their dense and lengthy nature, reading and annotating these documents is time consuming and frustrating. Plus, documents are generated at a faster pace than Henry can review them.

Here’s where Watson Discovery Service comes into play. All the SEC 10K reports are collected and ingested into Discovery Service, and default enrichments applied to extract meaningful insight. Now Henry can quickly and easily explore and discover the entities, keywords, and concepts present in the documents.

From this view, Henry finds relevant entities and keywords within risk reports, but can’t determine how they’re related. Enter Watson Knowledge Studio, trained using a subset of the SEC 10K reports to expand and augment annotations. In this case, the annotation model followed Henry’s business logic, and when applied to the report looked something like this:

“Information system failures, network disruptions and breaches in data security that could have a material adverse effect on our ability to conduct our business.”

The model was deployed and integrated within the Discovery Service enrichment pipeline. Henry can now explore the SEC content and discover insight aligned to his business considerations. A simple query to the Discovery Service SEC collection shows the enhanced results:

Bonus, Henry can also rapidly perform this analysis for other companies he’s interested in and complete his pitch book in record time.

Discovery Service makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data — including your own proprietary data, as well as public and third-party data. You can test the service with our free, 30-day Bluemix trial to see how it can help you extract value from your data.


Learn more about Watson Discovery Service and try it for free with our 30-day trial.


More Developers stories

A recap: here’s what you missed at this year’s BoxWorks

October 14, 2019 | AI for the Enterprise

At BoxWorks 2019, we were able to showcase the IBM and Box partnership, along with how it works and what’s in store for the future. more

AIconics names IBM Watson Discovery Best Innovator in Natural Language Processing

June 20, 2019 | AI for the Enterprise, Discovery and Exploration

On June 11, the world’s only independently judged enterprise AI awards – the AIconics – named Watson Discovery the winner for “Best Innovation in NLP.” Natural Language Processing is the area of computer science and AI that governs the interaction between computers and human languages. Specifically, NLP concerns how computers process and analyze unstructured natural language data. more

IBM Watson Assistant gets smarter and faster, making customer service a breeze

June 20, 2019 | AI for the Enterprise, Conversational Services

We're excited to announce new Watson Assistant features that are designed to change the way businesses interact with their users. Watson Assistant not only helps answer customer questions quickly and accurately, but it also ensures that employees are empowered to do their jobs efficiently. more