Hand-made telepresence robot helps improve classroom communication
“Today’s lecture is about cognitive computing, and its impact on… blah, blah, blah…”
The lecturer begins to drone on. And, worse, the students and lecturer are not physically in the same classroom. But such is life in today’s era of remote, video conference-style higher education options.
From left: IBM researcher Akihiro Kosugi, Mocoro, IBM intern Shogo Nishiguchi at IBM Research-Tokyo
Without a way for the lecturer to monitor and respond to reactions, it is difficult to determine what information resonated (or bored) the students. Oftentimes, students are in a lecture hall, staring at their instructor, projected over a widescreen. And, the lecturer has little control over zooming in or out, or changing what he or she sees of the students, either. The same situation applies to the students who are joining the class from a remote location.
To solve this digital distance challenge, IBM researcher Akihiro Kosugi came up with the idea of a cognitive user interface, to act as an observation window, built into the computers connecting students and lecturer. The interface allows the lecturer to move a camera around the classroom to better interact with the students.
Then, Akihiro and IBM Research intern Shogo Nishiguchi took this idea of an autonomous virtual assistant with fluid interaction between student and lecturer another step forward, and built a telepresence agent that could attend courses with students and lecturer. The bot, called Mocoro, is designed to express, through simple facial expressions, confusion, boredom, or other body language. To help the lecture flow smoothly, Mocoro does not interrupt. Instead, the lecturer must notice Mocoro’s expressions, such as turning pale, or looking down, and ask it: “Is anything wrong?” Then, Mocoro might respond with something like: “I am sorry, but, I could not understand what you just said. Would you repeat it for me?” This way, the interaction feels like a normal classroom discussion.
Shogo initially called the bot a “moderate conversational robot.” But to give it a name students and lecturers could relate to, he picked the first two letters of each word to create Mo-co-ro. Mocoro is a virtual avatar displayed on a small monitor that was then built into a hand-made wooden block – perfect for a table top.
The reason I named the agent Mocoro is simply because it sounds cute. Cuteness is one of the important factors for robots to be loved by everyone. – Shogo Nishiguchi
By being a stand-alone bot, Mocoro can connect its features of speech-to-text, text-to-speech, morphological analysis and dialog scripting to understand what is spoken among people.
What motivated you to work on this telepresence agent research?
Shogo Nishiguchi (SN): The accessibility research team at IBM Research – Tokyo has been collaborating with The University of Tokyo on Senior Cloud project, which includes a remote lecture project to create opportunities for the elderly to share their knowledge and expertise, from their home, with the students in a classroom. And I wanted to build an intuitive interface to improve remote communication that went beyond only communicating through a display. There should be a better interface for people in remote locations to convey non-verbal information. And that motivated me to create Mocoro.
Why make Mocoro an on-table robot instead of something integrated into another device, like a wearable?
Akihiro Kosugi (AK): Virtual reality devices, which can bring a more engaging experience are emerging, but they still require people to wear something. We cannot ask senior lecturers – the main subjects of our research work – to wear a VR device during an entire class. Can you imagine wearing such devices for an hour or more?
We wanted to put the presence of the agent, Mocoro, in a physical form so that it can benefit users, without needing to wear additional devices, and still be of practical use.
How does Mocoro know when to provide visual cues?
SN: Mocoro captures the content of conversation using the Watson Speech-to-Text API available on IBM Bluemix. It counts the number of relevant keywords such as nouns or fillers in a certain period. This way, it can estimate the speed or smoothness of the speech. And if that exceeds the threshold – when someone speaks too quickly, for example – it looks down gradually, giving the lecturer a sad look.
We specifically made Mocoro very polite. It does not interrupt conversations between people, and it will not speak unless someone asks it a question. For example, Mocoro may turn pale and give a sad look, but a student, or the lecturer recognize the change and ask it a question before it will reply.
How do students and lecturers interact with Mocoro?
SN: We’re starting an interactive experiment, soon. I hope Mocoro can be a natural part of the interaction between lecturer and students. By placing Mocoro somewhere within their line of sight, while not disturbing the lecture, Mocoro can give a sense of comfort to the lecturer, who feels as if students are in the same room.
For example, I hope that through Mocoro, the lecturer may be able to naturally sense that the students may not understand a part of the lecture. The Mocoro in the classroom will demonstrate how confused the students might be, in a recognizable way. And I hope that Mocoro helps each student feel that he or she is not the only one who may not understand the lecture, and to feel comfortable asking questions.
What does Mocoro’s camera do?
SN: Mocoro is a fully autonomous robot which can behave and talk without human control. Or, it can be manually tele-operated. When Mocoro acts as an autonomous robot, the camera on its head detects what people are looking at, and where, and Mocoro moves its face to them. When Mocoro is used manually, the camera will record the remote location and send the video image in near real time to the lecturer’s screen.
You mentioned Watson Speech-to-Text; what other technologies does Mocoro use?
SN: Other than Akihiro’s carpentry skills, I also used a morphological analyzer to catch conversation topic transitions, and display them in a tag cloud. In addition, I used IBM Watson text-to-speech API, WebRTC (Web Real-Time Communication), and a dialog script that I created to enable Mocoro’s conversational function.
This research is partially funded by the Japan Science and Technology Agency under the Strategic Promotion of Innovation Research and Development Program. What do you hope to achieve by the end of the grant?
AK: This work is part of the Senior Cloud project with The University of Tokyo. Its objective is to improve senior citizens’ active participation in society, including helping them share their experience, knowledge and skill through courses or other media.
Among a series of experimental research efforts, I have been involved in the telework and telepresence research effort to help lower the barrier, and improve the experience of remote communication among people with diverse backgrounds. We aim to put all of the project’s research accomplishments into a platform that helps everyone collaborate effectively regardless of time, location or physical challenges.
What is next for this work, and for Mocoro?
AK: I would like to make a remote and real-time communication experience for more people, such as cognitive ability assistance and efficient embodiment. That’s my dream as I continue to research human-computer interaction.
In a recent publication, IBM researchers describe a novel speaker diarization algorithm that can consider not only speaker information, but also identifying clues about individual recording environments that help differentiate between the speakers, resulting in improved diarization accuracy for our in-house, real test cases as well as public benchmark data.
The 45th International Conference on Acoustics, Speech, and Signal Processing is taking place virtually from May 4-8. IBM Research AI is pleased to support the conference as a bronze patron and to share our latest research results, described in nine papers that will be presented at the conference.