Extracting Text from Media to Facilitate Discovery

Originally published on the Clarify.io blog. View archived copy.

In my last post I talked about faster and better discovery has led to faster invention. That’s an exciting idea, but today I’d like to consider the benefit of extracting text from media, and throwing it into the realm of discoverability.

elvis Sound recordings can contain words, music, or noises, like barking dogs, gunshots, etc. Most of those can be represented as text:

“When we kiss my heart’s on fire,

Burning with strange desire.

And I know, each time I kiss you

That your heart’s on fire too.”

[chirping birds]

Those lyrics certainly don’t compare to the way Elvis sings Surrender, and “[chirping birds]” does a pretty inadequate job of representing these 11 hours of birdsongs, but they’re useful nonetheless.

The amount of audio and video being recorded is now doubling every year, and that rate is increasing. This means that more and more knowledge is being capture in media files. If we don’t turn those files into text, that knowledge remains largely inaccessible.

Here are two very concrete examples:

A few weeks ago I was showing the communication director of a large bank how easy it was to find things that people say in videos. I found all the mentions of “consciousness” in a library of TED videos in a few seconds. When I turned to ask him what he thought, his face was a mixture of disbelief and anger. Instead of a banal “Wow, that’s really cool”, I heard “I spent five days of my life last month finding every mention of “bitcoin” by our Chairman in the past year. I had to watch every single recorded video in which my team knew he appeared. And I’m sure we missed quite a few.”

He did the math and came to the conclusion that indexing all of their video content with Clarify’s technology would more than pay for itself the next time they had a project like that one. Just one.

One of my nephews is dyslexic. The effort he has to put into reading leaves very little room to concentrate on the thread of information extracted from the text. This is catastrophic, although less catastrophic than it used to be. He does most of his reading by listening to audio books. But he still faces a disadvantage that his classmates don’t even consider.

When he studies for tests, he has no easy way of searching for passages he wants to review. He has no way of skimming his books, using titles and text call outs as clues to find the content he’s interested in.

Luckily a few companies in the educational space are working with Clarify’s technology to address this very problem. Mobento is continuing to make inroads into secondary schools and universities because they make content searchable. If they combine what they’ve already done with external markers that allow the equivalent of skimming, they’ll make it possible for my nephew to absorb, discover, and review his school “books” just as effectively as his classmates.

In both of these situations, being able to treat media as text in order to make it both discoverable and randomly accessible is critical. Although we should consider the inadequacy of the textual representation, we shouldn’t dwell on it. This is an important step in our journey to make the world’s knowledge accessible to everyone.