More than just text-to-speech: Trint cracks open audio content for search

Share this post:

Trint CEO and co-founder Jeff Kofman worked with a team of developers to build the cloud-based transcription platform because, as he says, “I’ve lived the problem.” As a 30-year broadcast journalist and war correspondent (ABC, CBS, CBC), Kofman estimates that he’s manually transcribed thousands of hours of interviews. “I understood that there are a lot of people like me who would love to have transcription relief,” he says.

Using proprietary technology, Trint (transcribe + interview) goes beyond transcribing recorded speech to change how consumers use and deploy media. With money from the Knight Enterprise Fund and the Google Digital News Initiative and support from Cisco and BBC Worldwide Labs, Trint began beta testing with Thomson Reuters, NPR, and The Washington Post. The service launched last year after acquiring a waitlist of thousands.


How did you realize there was a market for automating speech-to-text?
The world is about recorded talk. Cisco’s latest assessment says that more than 80 percent of the internet is now audio and video; that means spoken word. We need a way to get at it without someone having to manually transcribe it. There’s a real need to make it automatically searchable. This started from my work as a journalist, but it became clear that almost anyone in the world who deals with communication of any sort — academics, lawyers, pretty much all of us — has a need for something like this.

Is automated speech-to-text technology the last mile for media?
It’s the second-last mile. Speech-to-text has been around for more than a generation. The fundamental science has not changed massively in recent years. But now computers have the speed and the ability to store vast data sets and to access those datasets incredibly rapidly. And in the last ten years we have seen this staggering advance of speech-to-text from being this clunky futuristic thing that didn’t really deliver what it promised, to today where you have Siri and Alexa.

Throughout the arc of my career I have watched every aspect of media technology transform. I started in local TV in Toronto, where we used manual typewriters and teletypes that clattered away typing out on multi-ply carbon paper. I started shooting stories on three-quarter-inch videotape. It required two people to carry the equipment because it was so big. Almost everything about media has been transformed in the last thirty years: we can now condense video and send it over the internet; we can write on laptop computers. In the whole process we’ve gotten to a place we could never have imagined — except this one part of the workflow, how we extract the content of recorded talk. It has moved from cassette recorders to digital recorders, but you still have to hit play, stop, and type.

What is Trint’s fundamental innovation?
Trint created a platform where the heavy lifting is done by automation. Our editor marries the text to the audio automatically and that’s really the magic. It transcribes speech to text in minutes but also allows users to search, verify, correct, edit and export content. We don’t claim that Trint is perfect, but you can search it and find the moment you’re looking for — you can listen to it and verify, and if the automated speech-to-text makes a mistake you can correct it.

So, if you do an hour-long interview and you want to find a reference, you don’t need to scroll through and say, “Oh I think it was about 40 minutes in, no go back, go forward.” If you had a podcast and you wanted to find something, the only way to do that was to listen to it until you heard it. With Trint you can post a searchable audio document. And you can share the original with anyone you want.

And yet, even automated, the transcription still has a rate of error. Is it possible to have perfect transcription?
The rate of accuracy depends on the quality of your audio. If you record this interview in a Starbucks with music on and a lot of chatter around and the cappuccino machine screaming, it won’t work. That’s our biggest challenge, explaining to people that we’re only as good as what you give us.

One of the things that we’re trying to do is develop a mobile recording app that will help people identify whether their audio is good enough or not. I don’t believe we’ll ever be able to transcribe really bad audio. If the audio doesn’t have enough clear content it’s just not going to come through.

You see this technology as transforming how we consume media such as podcasts or video. How?
We’re helping content producers work more efficiently and breaking a bottleneck in their workflow, but we’re ultimately opening up content to the world, to make it searchable in a way that’s not possible now. This is not simply transcription-to-print but an end-to-end publishing platform. That means you can take an interview like this, and make content that you can publish on social media or your website in seconds. That’s the last mile.

What then is the next frontier?
We have a series of innovations in our pipeline. The first one is instant captioning for audio and video. Right now if you want to caption video you need to transcribe it, put it into a file and into an edit program. Then a highly skilled editor has to assemble it. Our feature for captioning a video is so easy a 10-year-old or an 80-year-old could do it. It’s just one of a whole series of about of twenty innovations on our roadmap. We expect to release at least half of them by the end of 2017. Trint is a word we invented. People ask me where I see the company in five years. My answer? In the dictionary.

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Technology Stories

Laura Walker, CEO, New York Public Radio

Laura Walker, CEO of New York Public Radio, sits down with Robert Schwartz, Global Leader of Agency Services at IBM iX, to talk about creating an audio brand (not just a radio brand) at WNYC studios.

Continue reading

Has human communication become botifed?

How often do you receive a message and wonder, “Did a person really write this?” When receiving a message online, it can often be difficult to discern whether it came from a human or a chatbot. Given the rapid rise in automated forms of communication, it would be foolish to assume that words attached to […]

Continue reading

Designing like you’re there: What I learned using VR for architecture

In 2013, I was working on the Xiqu Centre for Chinese Opera in West Kowloon, China, with the architectural theater planning consultancy Fisher Dachs Associates alongside Bing Thom Architects. This space was curvy—curvy beyond nearly any project we had done, and changes were coming fast. Today the balcony railing swept, wave-like, seamlessly into the side […]

Continue reading