With the release of the Watson Unity SDK in 2018, myself and Amara Graham (Keller) set out to build a chess game that could be completely voice controlled.
In order to tackle this, we had three major tasks ahead of us, each of which had a clear solution.
First of all, we would need our application to be capable of understanding speech, as well as being able to process it. Enter Watson Speech to Text.
Next, we would need to create a conversational AI that could follow and understand a game of chess. Enter Watson Assistant (formally conversation).
Lastly, we would need to be able to respond to players with speech. You guessed it: enter Watson Text to Speech.
Using Watson Speech to Text, we were able to capture most of the audio and translate it accurately, even with the fact that we were mixing UK and US English accents. Pieces, colors and other elements were captured really well.
The board positions were a bit trickier, as they are not typical elements of language. We don’t say “D4” or “F8” very often in everyday speech. To deal with this, we used a custom voice model, Which is something that is covered in-depth in Amara Graham (Keller)’s article How to Build a Custom Language Model. We expanded this to load an entire JSON model, which we could add to over time as we gathered more detailed training data.
This caused a massive jump in the reliability of the Speech to Text system, showing how valuable it is to train Watson in your preferred domain language.
Creating a conversational AI
Using Watson Assistant, we were able to construct a dialog tree that gave our AI the knowledge of how to move through a chess game. Below, you can see an example of an “intent”, i.e. how the AI will act when it believes you want to make a move.
You can see that — as well as recognising the intent to move — it knows that it requires certain information, and will attempt to pull it out. You require a full PieceName (i.e. “Queen’s Rook”) or at least a PieceType (such as “Rook”) and we can infer which one from the board state. It also requires a place to move to. It will continue to prompt you for this information.
Replying back with a voice
Using Watson Text-To-Speech was a breeze, and the only real issue we hit was with our own programming.
We had slight issues with Watson tripping over its own speech. For example, Speech o Text would hear Text to Speech and the whole thing would end up talking to itself for a while.
The solution to this was relatively easy: we just kept the mic off while the Text to Speech was playing.
Giving Watson some smarts
The focus of this was to try to make chess more accessible via a voice interface, not to build DeepBlue all over again. While we could have looked at Watson Machine Learning, ee didn’t want to reinvent the wheel.
The simple solution was to convert outboard to Forsyth-Edwards Notation, and ask a chess engine for the answer over the Universal Chess Interface (UCI).
Myself and some colleagues from IBM’s Emerging Technology attempted a two-day project just before Christmas this year, because we thought we might have the tech to tackle a big problem: cell phones in prisons, we know prisoners have them, but they are rarely ever tracked down. This is something the UK government is currently planning to spend […]
Animating emotion. This is a project to show of the use of affective computing (Emotional AI) in the Watson suite, which is IBM’s collection of cloud-based AI API’s. We hooked this into The Waston SDK for Unity, which allowed us to use this as a 3D Environment. The goal here was simple: to create an […]