It’s been said that a picture is worth a thousand words. Great tonality, clarity, diction and enunciation of spoken words can go a long way in creating the best and most memorable pictures. Artificial Intelligence has progressed to the point where it can now effectively articulate the above.
I wanted to find out if it were possible to have a female artificial intelligence voice portray the main character in my book, “Miraculous,” in such a convincing way that the listening audience would believe that she is the actual character in the book.
How I used Watson APIs to bring my main character to life
After auditioning many different AI characters from a variety of company’s, I discovered and settled upon IBM Watson’s Text to Speech API that synthesizes text to audio in various languages, voices and dialects. I chose the “Allison”voice, as she possesses a very sweet, attractive tone that also fits the age range of Hailee Tupper, the main protagonist in my book.
In order to assist her in acting out scenes in my book, I utilized the Text to Speech API’s “Expressiveness” feature which extends SSML with an expressive element that you can use to indicate a speaking style of GoodNews, Apology, or Uncertainty (available only for the U.S. English Allison voice). Learn more about the Expressive SSML, IBM Watson’s expressive speech service.
There are thousands, upon thousands of different word combinations in literature and Watson’s Allison voice responds uniquely to each. When one or more of the available three expressive speech emotions are applied solely or in combination, spaced at different intervals, an expanded range of emotions are possible.
Fictional characters speak differently in terms of being short, medium or long winded. This influences the number and frequency of breaks and pauses that must be calculated in and applied to sentences. The overall mood of a particular scene in a book can also influence pause rate application like: suspense, tranquility, jubilance etc.
Below is an example of what can be accomplished using the above technique.
Do you have to be a computer Tec or coder to do a project like this? I don’t think so. I, by any stretch of the imagination don’t even come close to falling into any of those two categories. What I will say, however, is that it takes patience, practice and creative a drive. It’s like taking on the role of a story director. The process involves a lot of copying and pasting. The key is in learning how and where to paste the code into the text to get the desired effects.
For those who might be interested in doing a similar project, I’m willing to share my knowledge and expertise to help you achieve the highest quality results, perhaps though a free video.
I would just like to conclude by saying that it is a wonderful and fascinating experience working with IBM’s Watson. If it were possible I would like to shake his hand.
To get started with using Watson’s Text to Speech API visit our developer guide page. For more information on how to access a copy of my audiobook, please click here.
Neural Machine Translation technology comes standard in the IBM Watson Language Translator service. NMT is an advanced machine translation method based on deep learning, the development of which has led to remarkable improvements in translation fluency and the achievement of higher human evaluations.
Announcing updates to the IBM Watson Visual Recognition: A price reduction for Custom Classification events, and two models becoming generally available. Our blog highlights all these exciting changes.