How to integrate custom-built annotators into your data pipeline

Share this post:

Key Points:

  • Use annotation models generated with Watson Knowledge Studio to create a custom Watson Discovery Service configuration pipeline.
  • With business logic built into the ground truth, annotations carry meaning pertaining to a business scenario.
  • Apply document enhancing capabilities and extract information from industry specific or scientific domains.

Learn more about Discovery Service


Clients are looking to IBM Watson® to satisfy a wide variety of cognitive use cases spanning multiple domains. Extracting insights from these often-complex domains of knowledge, however, can be a challenge given the intricate nature of industry-specific data. For example, we often work with clients trying to discover trends, relations, and patterns within a company’s financial reports. Analysis of these reports requires deep subject matter expertise which can be challenging to find and scale across an organization. By deploying Watson Knowledge Studio models to Watson Discovery Service, users can apply the document enhancing capabilities of Discovery to extract information from industry or scientific domains.

  • Generate a simpler enrichment structure within Discovery Service to enhance query results.
  • Targeted relationship extraction where relations are surfaced and combined to generate trends, patterns, and actionable insight.

Using an investment banker persona to address questions around risks and issues across financial companies, let’s explore how it works.

Our user is Henry, an investment banking analyst tasked to compile pitch books: pitch books are marketing presentations filled with useful investment considerations. Henry fills his with investment considerations extrapolated from financial documents like SEC 10K reports.

While analyzing these documents, Henry looks for insights based on three business considerations.

  1. Threat – Which factors create a specific risk, and is there warning or indication of potential harm.
  2. Entity at Risk – Which part of the company is at risk by the threat.
  3. Relationship – The relation between threat and entity at risk, and how the entity at risk is affected.

Because of their dense and lengthy nature, reading and annotating these documents is time consuming and frustrating. Plus, documents are generated at a faster pace than Henry can review them.

Here’s where Watson Discovery Service comes into play. All the SEC 10K reports are collected and ingested into Discovery Service, and default enrichments applied to extract meaningful insight. Now Henry can quickly and easily explore and discover the entities, keywords, and concepts present in the documents.

From this view, Henry finds relevant entities and keywords within risk reports, but can’t determine how they’re related. Enter Watson Knowledge Studio, trained using a subset of the SEC 10K reports to expand and augment annotations. In this case, the annotation model followed Henry’s business logic, and when applied to the report looked something like this:

“Information system failures, network disruptions and breaches in data security that could have a material adverse effect on our ability to conduct our business.”

The model was deployed and integrated within the Discovery Service enrichment pipeline. Henry can now explore the SEC content and discover insight aligned to his business considerations. A simple query to the Discovery Service SEC collection shows the enhanced results:

Bonus, Henry can also rapidly perform this analysis for other companies he’s interested in and complete his pitch book in record time.

Discovery Service makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data — including your own proprietary data, as well as public and third-party data. You can test the service with our free, 30-day Bluemix trial to see how it can help you extract value from your data.


Learn more about Watson Discovery Service and try it for free with our 30-day trial.


Technical Program Manager - Watson Implementations

More Developers stories
June 20, 2018

Box and IBM Watson unveil new skills to power intelligent enterprise cloud content management

IBM and Box are excited to announce the availability of a new service offering to help organizations build custom Box Skills that apply Watson AI technologies to the Box Skills framework.

Continue reading

June 15, 2018

IBM Watson Studio named winner for best innovation in deep learning

We're thrilled to share that IBM Watson Studio was named winner for the "Best innovation in Deep Learning" at the world-renowned, independently-judged AIconics awards in London this week. IBM won the award for "being a pioneer in deep learning, helping to drive machine learning applications towards the true potential of AI."

Continue reading

June 7, 2018

Journey to AI – Three lessons we learned about effective implementation

The potential in AI to improve our lives and our businesses is potentially limitless, but as with any new technology, we must approach it responsibly and with a willingness to learn and adapt. Here are three lessons we learned from our own interactions with clients and colleagues.

Continue reading