Until now, conveying emotion effectively has been limited to human voices. To meet demand for compelling spoken content, DeepZen wanted to offer an alternative to costly, time-consuming studio recording.
DeepZen developed deep learning and neural networks to recognize emotion in text and produce human-like speech, supported by IBM Watson Machine Learning Accelerator solutions that combine accelerated computing with AI software.
4 monthssaved on expected time-to-market, helping DeepZen capture first-mover advantage
Bringsemotive speech to new markets thanks to lower cost and time overheads
Rapidgrowth for DeepZen enabled by IBM support in reaching potential clients
Business challenge story
Capturing the magic of the spoken word
Humans are unique in their ability to communicate emotion through speech. From television advertising to audiobooks, actors inspire empathy and seize people’s attention by instilling their words with feeling.
However, recording voices is a laborious and expensive process. Studio space and talented actors are in short supply. Producing a typical audiobook costs thousands of dollars and takes weeks. This has created an enormous unmet demand for spoken content among the large audience of people who are visually impaired, dyslexic or simply enjoy listening to recorded speech. In addition, more companies are using spoken content to strike a chord with customers on digital channels, further raising demand.
DeepZen was established to fill this gap in the market. The company set out to combine text-to-speech and natural language processing techniques to provide a cost- and time-effective alternative to studio recording.
Taylan Kamis, CEO and Co-founder of DeepZen, explains: “Our aim isn’t to put voice actors out of jobs, but rather to solve the capacity issues in the current market. We identify emotion in text automatically and use voice samples – for which we pay royalties to voice actors – combined with speech synthesis technology to produce convincing voice audio.
“To do this, we needed to create large and complex neural networks. These require extensive amounts of processing power to produce accurate results fast, so we needed the right technology platform to bring our vision to life.”
Unleashing AI innovation
DeepZen selected the IBM Watson Machine Learning Accelerator platform to support its solutions for AI production of audiobooks and voiceovers. The solutions combines IBM Power® Systems AC922 servers featuring the latest IBM POWER9™ processors, popular deep learning frameworks and AI tools for efficient development, offering the company a comprehensive environment for AI innovation. Power Systems AC922 servers pair POWER9 CPUs, NVIDIA Tesla V100s with NVLink GPUs, as well as NVIDIA's software stack and AI platform to provide massive throughput capability for high performance computing, deep learning and artificial intelligence workloads.
Kerem Sozugecer, CTO at DeepZen, recalls: “We discovered the technology at an IBM PowerAI meetup held in London. At the event, we spoke to engineers from IBM and NVIDIA, who explained the unique capabilities of the platform and how it’s designed to accelerate creation of AI solutions.”
DeepZen joined the Startup with IBM program, then called the IBM Global Entrepreneur Program, giving it access to technology credits it used to try out resources in the IBM Cloud™. Soon, the company was ready to purchase the IBM Watson Machine Learning platform, and engaged IBM Platinum Business Partner Meridian IT to host the solution in its data center.
Featuring Large Model Support, the IBM solution facilitates ultra-high performance even for the largest and most complex machine-learning tasks. DeepZen can switch between multiple deep learning frameworks, all optimized for use on the IBM POWER Architecture® and NVIDIA AI platform.
“IBM Watson Machine Learning Accelerator can handle very large models, giving us the freedom to experiment,” comments Sozugecer. “We can also switch between TensorFlow, Caffe and PyTorch frameworks or use them in parallel. Models optimized for all of these frameworks are readily available on NGC, NVIDIA's software hub, allowing us to instantly begin development at the highest throughput. This level of flexibility and performance is a real enabler for innovation.”
Beating competitors to the punch
With AI development tools and enormous processing power at its fingertips, DeepZen has raced ahead of its original development schedules. As a result, the company is grasping first-mover advantage in a market that holds vast potential.
“IBM Watson Machine Learning Accelerator accelerated our model training times significantly,” says Kamis. “Four months ahead of target, and we’re ready to go into production, allowing us to start capitalizing on the huge opportunity in front of us.”
By enabling production of audiobooks and voiceovers at a fraction of the cost and lead time of traditional methods, DeepZen will bring compelling spoken content within reach of new customers. Users can generate voices directly from text and edit them using the company’s tools, giving them unprecedented flexibility.
“We’re disrupting the audiobook and voiceover industry – without replacing the role of voice actors,” comments Kamis. “Supported by IBM technology, we can enable users to refine voice audio until it’s just right – or create multiple versions in different accents or languages, for example.”
Working with IBM, DeepZen is primed for global expansion. The company can easily scale up its IBM Watson Machine Learning Accelerator solutions, and is drawing on IBM resources to launch its revolutionary offering in the market.
Kamis concludes: “We’ve developed our solutions to integrate seamlessly with IBM Watson natural language processing, which opens the door to existing Watson clients. Even before we shared a single line of code with the IBM team, we could tell that they were as excited by our concept as we were, and the level of engagement with us hasn’t dropped since. Together, we’re creating human-like speech that’s convincing, inexpensive and rapid to produce.”
Deep Zen Ltd. (DeepZen) is creating sector-specific solutions that produce human-like quality of speech imbued with emotion. The company aims to help clients avoid the cost and time of recording sessions in studios, and to give them the flexibility to edit speech using intuitive tools. DeepZen is headquartered in London, UK.
Take the next step
To learn more about accelerated computing from IBM, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/it-infrastructure/power/accelerated-computing