// Blog

Searching Audio and Video Metadata with Clarify

Originally published on the Clarify.io blog. View archived copy.

Searching for words in audio and video content is great but often you also want to search and filter by metadata such as title, recording date, or a variety of other things. The Clarify API allows you to easily associate arbitrary metadata with each audio or video bundle and then search and filter on it.

Suppose you have a video library that you want to index. You use the Clarify API to create a bundle for each video and set the media_url to each video file. All the spoken words will then be searchable through the API. However, you probably also have other data for each video such as title, description, date, author, tags etc. You can use the bundle metadata to make that searchable. You could do the same with phone calls logged in a CRM.

How to Use It

Clarify metadata is schema-less which means you can just start using it — there is nothing to configure. For each of your Clarify applications, a field type is determined the first time each field name is used. The field types are based on the core JSON data types: string, number, and boolean, along with a couple of extras: arrays of strings and date/time as a string (in ISO 8601 format, ex. "2014-03-25T14:23:45.000Z".)

Metadata is represented as a single-level JSON object containing zero or more fields assigned to a value of one of the types listed above. For example:

	<div id="crayon-57fa40c321fc5412499152" class="crayon-syntax crayon-theme-classic crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
		<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
		<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div><span class="crayon-language">PHP</span></div></div>
		<div class="crayon-info" style="min-height: 16.8px !important; line-height: 16.8px !important;"></div>
		<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

{ “title”: “Searching Audio”, “description”: “This video shows how easy it is to search on spoken words.”, “author”: “John Doe”, “date”: “2014-03-25T14:23:45.000Z”, “rating”: 8, “public”: true, “tags”: [ “technology”, “programming”, “search” ] }

1
2
3
4
5
6
7
8
9
{
  “title”: “Searching Audio”,
  “description”: “This video shows how easy it is to search on spoken words.”,
  “author”: “John Doe”,
  “date”: “2014-03-25T14:23:45.000Z”,
  “rating”: 8,
  “public”: true,
  “tags”: [ “technology”, “programming”, “search” ]
}

Search and Filter

You can search and/or filter on metadata. What is the difference between searching and filtering?

Search performs text searching of bundles. A search query is typically a simple space-separated string of words, usually typed in by an end-user (think of a search engine like Google.) Search queries are evaluated by first processing the words (for example, folding plurals or taking the stem form of a verb), then finding all bundles containing the words, locating where each word was found in the bundle, and scoring the results. Which metadata fields are actually searched in can be controlled with the API query_fields parameter.

Filters on the other hand, can contain more complex query expressions and provide a precise way of finding, including, and excluding bundles. Filter queries can contain a series of comparisons which evaluate to a boolean expression and they are typically computer-generated, often based on input from a user. Each bundle either passes the filter test and is allowed in the result set or it fails and is omitted. A filter expression could be generated from user-interface controls, for example date choosers, select popups etc. and could also include other comparisons based on things such as authenticated user id, special flags etc.

Some Examples

In the video library example described above, the metadata could look something like this:

	<div id="crayon-57fa40c321fdc210414845" class="crayon-syntax crayon-theme-classic crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
		<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
		<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div><span class="crayon-language">PHP</span></div></div>
		<div class="crayon-info" style="min-height: 16.8px !important; line-height: 16.8px !important;"></div>
		<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

{ “title”: “Searching Audio”, “description”: “This video shows how easy it is to search on spoken words.”, “author”: “John Doe”, “date”: “2014-05-15T14:23:45.000Z”, “tags”: [ “technology”, “programming”, “search” ] }

1
2
3
4
5
6
7
{
  “title”: “Searching Audio”,
  “description”: “This video shows how easy it is to search on spoken words.”,
  “author”: “John Doe”,
  “date”: “2014-05-15T14:23:45.000Z”,
  “tags”: [ “technology”, “programming”, “search” ]
}

Doing a search with the query word search would find the bundle and return a response collection that included the locations of the word searching in the title, search and word in the description, search in the tags, as well as any occurrences of these words in the spoken audio.

For the CRM application example above, the metadata could be something like:

	<div id="crayon-57fa40c321fe2867996630" class="crayon-syntax crayon-theme-classic crayon-font-monaco crayon-os-pc print-yes notranslate" data-settings=" minimize scroll-mouseover" style=" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;">
	
		<div class="crayon-toolbar" data-settings=" mouseover overlay hide delay" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><span class="crayon-title"></span>
		<div class="crayon-tools" style="font-size: 12px !important;height: 18px !important; line-height: 18px !important;"><div class="crayon-button crayon-nums-button" title="Toggle Line Numbers"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-plain-button" title="Toggle Plain Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-wrap-button" title="Toggle Line Wrap"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-expand-button" title="Expand Code"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-copy-button" title="Copy"><div class="crayon-button-icon"></div></div><div class="crayon-button crayon-popup-button" title="Open Code In New Window"><div class="crayon-button-icon"></div></div><span class="crayon-language">PHP</span></div></div>
		<div class="crayon-info" style="min-height: 16.8px !important; line-height: 16.8px !important;"></div>
		<div class="crayon-plain-wrap"><textarea wrap="soft" class="crayon-plain print-no" data-settings="dblclick" readonly style="-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;">

{ “user_id”: “USER_1234”, “first_name”: “John”, “last_name”: “Doe”, “company”: “Acme Inc.”, “phone”: “5551234567”, “date”: “2014-05-15T14:23:45.000Z” }

1
2
3
4
5
6
7
8
{
  “user_id”: “USER_1234”,
  “first_name”: “John”,
  “last_name”: “Doe”,
  “company”: “Acme Inc.”,
  “phone”: “5551234567”,
  “date”: “2014-05-15T14:23:45.000Z”
}

For a search and filter, the parameters could be something like:

query = "doe"
query_fields = "insights.audio_words,first_name,last_name,company"
filter = "user_id == 'USER_1234' && date >= '2014-05-01T00:00:00.000Z'"

This search request would return all bundles which contain the word doe in the first_name, last_name, or company metadata fields or in the spoken audio AND has the metadata user_id field equal to USER_1234 and the metadata date field in or after May 2014.

I hope this gives you some idea of the power of the metadata search and filter functionality. If you have any questions or comments, feel free to reach me at ivo@clarify.io or Tweet us at @Clarify_inc.