With the release of the Watson Unity SDK in 2018, myself and Amara Graham (Keller) set out to build a chess game that could be completely voice controlled.
In order to tackle this, we had three major tasks ahead of us, each of which had a clear solution.
First of all, we would need our application to be capable of understanding speech, as well as being able to process it. Enter Watson Speech to Text.
Next, we would need to create a conversational AI that could follow and understand a game of chess. Enter Watson Assistant (formally conversation).
Lastly, we would need to be able to respond to players with speech. You guessed it: enter Watson Text to Speech.
Using Watson Speech to Text, we were able to capture most of the audio and translate it accurately, even with the fact that we were mixing UK and US English accents. Pieces, colors and other elements were captured really well.
The board positions were a bit trickier, as they are not typical elements of language. We don’t say “D4” or “F8” very often in everyday speech. To deal with this, we used a custom voice model, Which is something that is covered in-depth in Amara Graham (Keller)’s article How to Build a Custom Language Model. We expanded this to load an entire JSON model, which we could add to over time as we gathered more detailed training data.
This caused a massive jump in the reliability of the Speech to Text system, showing how valuable it is to train Watson in your preferred domain language.
Creating a conversational AI
Using Watson Assistant, we were able to construct a dialog tree that gave our AI the knowledge of how to move through a chess game. Below, you can see an example of an “intent”, i.e. how the AI will act when it believes you want to make a move.
You can see that — as well as recognising the intent to move — it knows that it requires certain information, and will attempt to pull it out. You require a full PieceName (i.e. “Queen’s Rook”) or at least a PieceType (such as “Rook”) and we can infer which one from the board state. It also requires a place to move to. It will continue to prompt you for this information.
Replying back with a voice
Using Watson Text-To-Speech was a breeze, and the only real issue we hit was with our own programming.
We had slight issues with Watson tripping over its own speech. For example, Speech o Text would hear Text to Speech and the whole thing would end up talking to itself for a while.
The solution to this was relatively easy: we just kept the mic off while the Text to Speech was playing.
Giving Watson some smarts
The focus of this was to try to make chess more accessible via a voice interface, not to build DeepBlue all over again. While we could have looked at Watson Machine Learning, ee didn’t want to reinvent the wheel.
The simple solution was to convert outboard to Forsyth-Edwards Notation, and ask a chess engine for the answer over the Universal Chess Interface (UCI).
The Emerging Technology team within IBM Research UK, have been working with FOX Sports to enhance their viewer’s experience of watching ‘The Beautiful Game’. The IBM Player Spotlight Built with IBM Watson is an AI-powered tool built for the FOX Sports’ studio team to access complex statistical analyses using an easy-to-use conversational interface. With the […]
How we made a game of learning Qubits . In this article we will briefly show the tool we built while learning quantum computing; alas, this article is not about teaching these concepts. We will be covering these in a series of articles soon. At the start of our journey we were learning […]
A Clash of Worlds IBM Watson has been used to assist some of the world’s biggest companies and help tackle some of humanities greatest threats. Now, it’s being used to assist Leatherhead Football Club, a team that play in the 7th tier of English Football and are made of up delivery drivers, car salesmen and […]