Posted by: Roger King
databases, Multimedia, namespaces
Two postings ago, we looked at SQL-based relational databases and why they are not well suited to managing advanced forms of media, like images, language, video, and sound. In the previous posting of this blog, we looked at the various semantic levels at which we can search advanced media-bases, such as video. We noted that we can’t really “search by meaning”, and that the best we have been able to do is find more sophisticated ways of simulating a truly intelligent search.
Making faux-intelligent searches more effective.
In this posting, we look at a key method for making these simulated intelligent searches more accurate, and that is by using human experts to train the database search facility. To do this, we need four main components: a media base for which we want to develop an effective search facility, a feedback cycle involving skilled experts, a body media artifacts to use during the training process, and an initial search facility that we want to train. We don’t need the the first item in order to engineer our first cut at a media search facility.
The initial search facility.
There is a wide class of techniques that are used to search advanced media, like images, sound, and video.
One approach is to base the search facility on a hierarchy of classifications. Consider a database of digital photographs. We might have two main categories: inside and exterior. These might form the first two branches of a hierarchy. Exterior shots might be subdivided into shots in sunlight on land, shots in sunlight on water, shots at night on land, and shots at night on water. These would be further divided, and clearly, the categories would be more sophisticated than the somewhat silly ones I am suggesting here.
Importantly, this hierarchy might be very broad and very deep, thus forming a huge inverted tree, with the top node being called photographs.
Also importantly, the words in this hierarchy are likely to come from a namespace shared by professional photographers.
It isn’t enough to simply have a nice, standardized hierarchy for classifying photographs. We need to be able to automatically place photographs in their proper categories in the hierarchy. Each one will be assigned a term that comes from the photographer namespace and appears on a leaf (a node with no branches below it) of our inverted tree. That term and all the terms down branches of the inverted tree leading to that leaf would apply to a given photograph.
How do we do this? We do it with image processing techniques, something we will discuss in a subsequent blog posting. For now, we’ll just say that there is a large body of existing software that can classify images and video and sound, using a variety of heuristics. This software can judge the amount and nature of light in a scene and use it to decide if a photograph was taken indoors our out of doors, for example.
A body of training media artifacts.
This might be a subset of the media artifacts that we want to put into our database when it is deployed for use by non-experts. Or this might consist of a well understood set of test media artifacts with which our experts are familiar and is used specifically for training our system. (Again, in our example, these are digital photographs.)
The feedback loop.
This is often called “learning”, and it refers to the process of allowing experts to provide accuracy feedback on the results of search attempts. Essentially, the feedback loop provides a way for experts who are familiar with our photography namespace to reclassify a photo if the search facility has it associated with an inappropriate or non-optimal leaf in our tree.
During the training process, we let the system automatically classify the photos in our training set, but every one of them is carefully analyzed by our experts and reclassified as necessary. The search facility doesn’t simply accept the new classifications. It responds by altering (and perhaps extending) its method for deducing the proper classification of a given photo. We will look at this again, as well, in a subsequent posting of this blog.
The (always growing) database of media.
Once the search engine has been put into place, and once it has been trained (at least enough to use it in production mode), it’s time to load the entire body of animation artifacts. Quite likely, the feedback loop will be left alive and the training process will continue indefinitely on a selective basis, depending on how well the search facility seems to be performing. But the larger body of media artifacts will be classified automatically from here on out. This is the only way to create a media search facility that scales to truly vast libraries of media artifacts.
More to come…