CEO of AI avatar start-up Soul Machines explains the power of pairing biologically-based models with AI

AI engineer Mark Sagar is best known for his ability to create life-like animations, winning Oscars for his technical efforts in Avatar and King Kong. Sagar first revealed his incredibly realistic and interactive BabyX avatar in 2014 and continues to refine the virtual baby by combining his vast knowledge of computer graphics, human physiology and artificial intelligence. Most recently Mark founded Soul Machines, dedicated to bringing his work on emotionally responsive avatars to businesses.

What compelled you to work on BabyX and teaching systems to learn?

I wanted to explore the essence of animation: Is it possible to create a digital creature which can appear life-like, emote and learn through experience in the way we do?

Our experience of the world depends on our actions, perceptions, emotions and our memories. The motivation for BabyX was to create a holistic biologically-inspired model of the driving forces, architectures and processes from which the behavior we so often take for granted emerges, as a means of exploring theories of our nature and to show how the various underlying systems interconnect.

What advances in your project excite you most?

We are currently working on the next version of BabyX which is much more detailed and has a full body. It even has digital lungs since breathing is an important component of simulating vocalizations. We want BabyX to be able to play with virtual objects and draw on the screen to creatively interact with the user. I am very excited about what the creative possibilities may be between humans and machines.

Your work has often stood at the intersection of creativity and science. Do you think machines can someday be taught to be innately creative?

Yes, I do. We want to start exploring what may be the motivating systems with the version of BabyX under development. I think creativity play is key to exploration and discovery, so making an AI machine play and be naturally curious is key. Beauty is a really interesting question, but I think it—partially, at least—comes down to harmony and clarity and resonance in stimuli which have or suggest biological value. For example, we seem to be innately rewarded to seek harmony in nature—it resonates with our evolutionarily refined pleasure circuits—so we seek beauty. So does this mean we need to bootstrap this at the core of our AI? Or is it something that emerges? But what a wonderful question to explore.


I think we will see increasing use of virtual assistants as a general interface and when dealing with more complex or unclear situations requiring dialogue and feedback.


Looking ahead three to five years, what do you think interaction between humans and machines will look like?

Interacting with machines will make increasing use of our natural faculties. I think the way we interact with machines is dependent on the task. For example, it may be easier to type in a sum on a calculator than to dictate it, or to point to a location on a map rather than enter coordinates.

I think we will see increasing use of virtual assistants as a general interface and when dealing with more complex or unclear situations requiring dialogue and feedback.

We will see computers able to have reasonably natural chat and conversations in certain domains. I do think voices will include a degree of emotional tone, etc. in the response. However, these conversational agents will mainly use large statistical models based on conversational data rather than the machine having grounded understanding, so it will still be more pattern matching rather than understanding. Some generalizations across different topics may be possible with the machine recognizing larger patterns

In terms of machines having the proper grounded understanding, or context, in which the machine has experience of the world that it can relate to directly or build metaphors from, in three to five years we will see progress in this area, but still at a basic level.

If you had to choose one area in AI outside of your current focus that you think is poised to have the biggest impact on our lives within the next few years, what would it be?

We have had an exponential rise in the amount of video posted online through social media, etc. The increased use of video analysis in conjunction with contextual analysis will end up being an extremely important learning resource for recognizing all kinds of aspects of behavior and situations. This will have wide ranging social impact from security to training to more general knowledge for machines.