Developers + APIs

How to integrate custom-built annotators into your data pipeline

Share this post:

Key Points:

  • Use annotation models generated with Watson Knowledge Studio to create a custom Watson Discovery Service configuration pipeline.
  • With business logic built into the ground truth, annotations carry meaning pertaining to a business scenario.
  • Apply document enhancing capabilities and extract information from industry specific or scientific domains.

Learn more about Discovery Service


Clients are looking to IBM Watson® to satisfy a wide variety of cognitive use cases spanning multiple domains. Extracting insights from these often-complex domains of knowledge, however, can be a challenge given the intricate nature of industry-specific data. For example, we often work with clients trying to discover trends, relations, and patterns within a company’s financial reports. Analysis of these reports requires deep subject matter expertise which can be challenging to find and scale across an organization. By deploying Watson Knowledge Studio models to Watson Discovery Service, users can apply the document enhancing capabilities of Discovery to extract information from industry or scientific domains.

  • Generate a simpler enrichment structure within Discovery Service to enhance query results.
  • Targeted relationship extraction where relations are surfaced and combined to generate trends, patterns, and actionable insight.

Using an investment banker persona to address questions around risks and issues across financial companies, let’s explore how it works.

Our user is Henry, an investment banking analyst tasked to compile pitch books: pitch books are marketing presentations filled with useful investment considerations. Henry fills his with investment considerations extrapolated from financial documents like SEC 10K reports.

While analyzing these documents, Henry looks for insights based on three business considerations.

  1. Threat – Which factors create a specific risk, and is there warning or indication of potential harm.
  2. Entity at Risk – Which part of the company is at risk by the threat.
  3. Relationship – The relation between threat and entity at risk, and how the entity at risk is affected.

Because of their dense and lengthy nature, reading and annotating these documents is time consuming and frustrating. Plus, documents are generated at a faster pace than Henry can review them.

Here’s where Watson Discovery Service comes into play. All the SEC 10K reports are collected and ingested into Discovery Service, and default enrichments applied to extract meaningful insight. Now Henry can quickly and easily explore and discover the entities, keywords, and concepts present in the documents.

From this view, Henry finds relevant entities and keywords within risk reports, but can’t determine how they’re related. Enter Watson Knowledge Studio, trained using a subset of the SEC 10K reports to expand and augment annotations. In this case, the annotation model followed Henry’s business logic, and when applied to the report looked something like this:

“Information system failures, network disruptions and breaches in data security that could have a material adverse effect on our ability to conduct our business.”

The model was deployed and integrated within the Discovery Service enrichment pipeline. Henry can now explore the SEC content and discover insight aligned to his business considerations. A simple query to the Discovery Service SEC collection shows the enhanced results:

Bonus, Henry can also rapidly perform this analysis for other companies he’s interested in and complete his pitch book in record time.

Discovery Service makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data — including your own proprietary data, as well as public and third-party data. You can test the service with our free, 30-day Bluemix trial to see how it can help you extract value from your data.


Learn more about Watson Discovery Service and try it for free with our 30-day trial.


Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Cognitive Enterprise Stories
April 24, 2017

The quest for total information awareness

WEX enables customers to start with basic search and discovery of information and then work up to advanced data and text mining, natural language processing, insights and visibility across multiple data sources.The first step about exploration and discovery of your data.

Continue reading

July 20, 2016

Artificial intelligence transforms the in-store shopping experience with the pilot of “Macy’s On Call”

From the introduction of the Internet to the advent of mobile phones, technology has fundamentally changed how consumers browse, purchase and interact with retailers. This transformation is entering the retail store itself now, as consumers increasingly seek enhanced mobile experiences with in-store navigation, special offers and personalization. In fact, 70% of U.S. shoppers say that say […]

Continue reading

September 26, 2016

Build your own Custom Language Model to convert unique Speech to Text

Modern day speech-to-text services are built using a corpus of general, everyday words and pronunciations. While this system works well for common conversation, it can fall short when it comes to accurately transcribing unique accents, industry specific words, or uncommon dialect. The majority of speech recognition services don’t offer tooling to train the system on […]

Continue reading