Buzz’s Blog: On Web 3.0 and the Semantic Web

Feb 17 2010   12:38AM GMT

The Parallel Worlds of Media Databases and Media Metadata

Roger King Roger King Profile: Roger King

Searching traditional business data: straight-forward.

Managing advanced forms of media, such as images, sound, video, natural language text, and animated models have been discussed a number of times in this blog in the past.  Traditional information systems, such as relational databases, have been engineered largely to handle the sorts of data we have in business applications, primarily simple numeric and character string data.  To the SQL database programmer, the nice part is that the data speaks for itself.  If a field is called Name, and the value is Buzz King, the semantics of “Buzz King” is pretty obvious, and it can be processed in a largely automatic fashion.  The same goes for a field called Age, with a value of “97”.  

Searching advanced media: far, far more difficult.

But modern media is far more complex than this.  “Blob” data like images, and continuous data, like sound, video, and natural language text, are very difficult to search and interpret automatically.  There are two approaches that have been taken to resolve this dilemma.  

Tagging: the simple approach.

The first is tagging.  Descriptive terms, often taken from large, shared vocabularies, at attached to pieces of media.  These vocabularies can be very domain-specific, dedicated to areas like medicine, law, and engineering.  

Intelligent processing software: the second approach.

The second technique is the automatic processing of pieces of media using image processing, natural language, and other highly intelligent software.  These applications are very sophisticated and understood only by experts.  And, these applications often demand a lot of processing time, and this makes bulk processing impossible. It’s also true that the results can be haphazard.  Some pieces of media can be interpreted precisely, others not so precisely – and dramatic mistakes are frequent.  A tennis court might be mistaken for an airplane runway.  There’s a huge trust factor involved in cranking up image or sound processing software or natural language software.  

Often, we can provide feedback so that these applications can learn, over time, the way we want media to be interpreted.  We can help the software learn the difference between a tennis player and a member of a ground crew on a small runway. All of this is hugely expensive, in terms of the cost of developing the software, and in terms of the physical resources needed to run the software.

A middle ground?  Not really.

So, is there some middle ground?  Something simple, yet more “intelligent”?  Yes, and the answer is to take a sophisticated approach to what otherwise might be very simple tagging techniques.  However, the core problem with tagging remains: we search and process tags – and not the actual data.  It is an indirect, but fast process.  The goal is to come as close as we can to simulating the results of such things as image processing, but to do it with a simple, yet comprehensive, accurate tag-based technology.

We’ve looked at some of the solutions that have been proposed.  They include Dublin Core, MODS, and MPEG-7.  The first is very simplistic.  The second is more sophisticated, in that the terminology used is broader and far more precise.  The third is very aggressive in that it supports the complex structuring of tag data elements.  

So, what are we really doing?

In essence, we build a hierarchy of metadata and then instantiate it for every piece of media we want to catalogue and later search.  What we are doing is creating a parallel database, one where every piece of blob or continuous data is accompanied by a possibly very large tree of structured tagging information.  The parallel database has its own schema and an instance of it is created for every piece of media in the original media database.

The end result?  Instead of creating some sort of media-centric query language, like an SQL-for-video, we give up on trying to search the media database itself.  The query language remains largely ignorant of the nature of blob and continuous media.  We can continue to refine and expand the schema of the parallel database until search results are satisfactory.

More later…

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: