// Blog

Discovery and Descriptions Beyond Text

Originally published on the Clarify.io blog. View archived copy.

A few weeks ago I talked about how important it was to be able to make the content of audio and video files discoverable by turning them into searchable text.

Today I’d like to take a peek into the future, and consider what we might be able to do if we can extract some of the other dimensions of this media.

Not all data can be represented as text. Images, videos, and audio recordings contain lots of data that is, at best, poorly represented by words. Elvis singing Surrender and birds chirping are two audio examples.

Let’s consider the picture to the right:

Eurasian Hoopoe

Creative Commons 2.0 (by-nc-sa)

“This is a bird.”

“This is a hoopoe.”

“This is a hoopoe sitting on a branch.”

“This is a hoopoe sitting on a branch, bathed in late afternoon sunlight, facing the viewer and its head turned to its right.”

No words can adequately describe that image.

 

We know that reading the lyrics to Surrender won’t come close to conveying the emotions in Elvis’ voice, but we don’t need the King of Rock-and-Roll to teach us that neither intent nor context can be captured in speech as text.

In their book Weekend Language, Andy Craig and Dave Yewman provide a great thought experiment to understand the inadequacy of a textual representation of a recorded sound. They ask us to consider two different recordings of the words “Good morning.

They ask us to consider how “an annoyingly perky colleague who loves early morning meetings” might have said them, as opposed to “a red-faced boss who’s angry that you’re late for the meeting with clients.” They are completely different, neither adequately represented by the two words spoken.

So how do we extract emotion, intent, and context from recorded sound? How do we extract the thousands of words that can be used to describe a simple image? More importantly, how do we make that data discoverable?

I don’t know yet. This is something we spend a lot of time thinking about at Clarify. In the past 20 years our industry has done a great job of making textual data discoverable. It’s now time to think beyond text and make non-verbal discovery possible.