What's New

Five new services expand IBM Watson capabilities to images, speech, and more

Share this post:

Republished from the Watson Blog


Since its launch, you’ve made the IBM Watson Developer Cloud one of IBM’s most vibrant and innovative communities on Bluemix. Today more than 5,000 partners, developers, data hobbyists, entrepreneurs, students and others have contributed to building 6,000+ apps infused with Watson’s cognitive computing capabilities.

A few months ago we released eight beta Watson services so that this community can test drive them, think of new ways to apply and tap into Watson’s capabilities, and harden each service as we prepare them for general availability. The services—which range from Language Identification and Machine Translation to Visualization Rendering and User Modeling—are being embedded into a new class of cognitive apps.

One example is Red Ant’s Sell Smart mobile app, a retail sales trainer that lets employees easily identify unique customer buying preferences by analyzing demographics, purchase history, wish lists, pricing and other product information. Another is eyeQ’s eyeQinsights, which helps retailers understand how consumers make purchasing decisions while standing in the store.

Today, we are excited to announce the arrival of five additional new beta services to the Watson Developer Cloud. Available now, you can access the following free beta services on Bluemix:

We’ve included an overview of each service below. Our team will continue to add more services in the Watson Developer Cloud as they become available. Stay tuned.

New services

Speech to Text

Speech to Text is a cloud-based, real-time service that uses low latency speech recognition capabilities to convert speech into text for voice-controlled mobile applications, transcription services, and more. Transcriptions are continuously sent back to the client, and retroactively corrected as more speech is heard, helping the system learn.

The service is based on more than 50 years of speech research at IBM. It uses state-of-the-art algorithms based on convolutional neural networks or “deep learning”. Using these algorithms, the Watson team has published the best accuracy results (10.4% word error rate vs. 12.5% for the second best as of today) on the popular Switchboard Hub5-2000 benchmark, and provided technology that has been deployed on more than 500 million smartphones. This is the first time in 10 years that the IBM team is delivering speech technology broadly to developers. While the base algorithms are solid, the service will keep getting better as it gets more usage and training data.

Use Cases:

  • Enable voice control over apps, embedded devices or accessories
  • Provide transcription of meetings and conference calls in real-time
  • Critical building block for “Speech-to-Speech” translation

Text to Speech

Text to Speech converts textual input into speech, and provides the option of three voices in English or Spanish, including the American English voice used by Watson in the 2011 Jeopardy match. Text to Speech generates synthesized audio output complete with appropriate cadence and intonation. The user can input any English or Spanish text to generate speech output, a service that has potential applications for the vision-impaired, as reading-based education tools and for multiple mobile apps.

Use Cases:

  • Assistance for the vision-impaired, reading and language education
  • Enable the audio reading of texts and emails to drivers
  • Critical building block for “Speech-to-Speech” translation

Visual Recognition

Visual Recognition analyzes the visual appearance of images or video frames to understand what is happening in a scene.

The Visual Recognition service includes an unmatched number of preset classifier and trained labels (2,000+), a taxonomy that recognizes 150+ different sports, and can ingest 1,000+ batch images with the ability to recognize multiple labels in a picture. Like the Speech to Text service, Visual Recognition relies on deep learning. Convolutional neural networks are used as semantic classifiers that recognize many visual entities such as settings, objects, and events. Input JPEG images into the service and you will receive a set of labels and probability scores such as such as “soccer, 0.7” or “baseball, 0.3”.

Use Cases:

  • Organize and ingest large collections of digital images
  • Build semantic association between images from multiple users
  • Understand consumer shopping preferences based on image queries

Concept Insights

Concept Insights handles text in a conceptual way, delivering a search capability that discovers new insights on text compared to traditional keyword searches.

Concept Insights links user-provided documents with a pre-existing graph of concepts based on Wikipedia (e.g. ‘The New York Times’, ‘Machine learning’, etc.). Two types of links are identified: explicit links when a document directly mentions a concept, and implicit links which connect the user’s documents to relevant concepts that are not directly mentioned. Users of this service can also search for documents that are relevant to a concept or collection of concepts by exploring the explicit and implicit links.

Use Cases:

  • Improve search queries with results that are more conceptually related
  • Locate sources of expertise across large or complex organizations
  • Deepen customer engagement by returning more relevant information

Tradeoff Analytics

Tradeoff Analytics enables dynamic real-time ‘tradeoff’ decisions across static or changing parameters, all delivered in an interactive visual display. Tradeoff Analytics enables better decision-making by dynamically weighing multiple, often conflicting, goals. This service uses Pareto filtering techniques to identify the optimal alternatives across multiple criteria. It then uses various analytical and visual approaches to help the decision maker explore tradeoffs and alternatives.

Tradeoff Analytics can be used to help make complex decisions like what mortgage to take, which treatment option to follow, what car to purchase.

Use Cases:

  • Enable retailers and manufacturers to determine product mix
  • Allow consumers to compare and contrast competitive products or services
  • Help physicians select optimal treatment options based on multiple criteria

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More What's New Stories

Retirement of IBM Watson IoT Context Mapping

Thank you for using IBM Watson IoT Context Mapping On 12/21/2017 we are retiring the IBM Watson IoT Context Mapping

Continue reading

Watson Speech to Text and Text to Speech have released Lite Plans!

On Nov 1st, 2017, both Watson Speech to Text and Watson Text to Speech released Lite Plans, as part of IBM Cloud Platform’s initiative to roll out Lite Plans across public cloud services.

Continue reading

Twilio Superclass: Hands-on training for developers

Going to be in New York on November 30th? Need an excuse to go to New York? Twilio is hosting a single day event in the Big Apple aimed right at developers who are interested in building customer engagement solutions.

Continue reading