The problem of searching for media assets.
We’ve already looked at advanced media, in particular video, audio, and animation data, in previous blog postings. In particular, we’ve looked at the subtle and complex nature of media asset semantics. We’ve seen that interpreting a piece of video, for example, is far, far more difficult than interpreting an integer or character field. Since the goal of the Semantic Web effort is to make the searching of the web highly automated, advanced media is becoming a huge and critical research and development focus for the builders of next-generation web development applications.
Just how do we provide an environment where media assets can be searched in a mostly automatic fashion, so that a human does not have to painfully paw through hundreds or thousands (or millions) of video chunks to find the right one? We’ve looked at emerging technologies for marking up advanced media information, and for making it usable in a variety of web applications. We’ve also looked at the dramatic challenge presented by mega apps to would-be users; the interfaces to these applications are truly massive and cannot present to the user the way in which they are meant to be used.
The problem of proprietary formats.
One specific, and very difficult problem, is the massive heterogeneity, not just of media formats, compression technologies, and container technologies, but of the applications themselves. If we are going to automate the searching of complex modeling, video, audio, and other media assets, we’re going to have to address a key question: since many media apps make use of their own proprietary data formats, how are we going to provide automated ways of searching media assets that are stored in these formats?
The problem of highly imperfect generic formats.
There are indeed many existing, as well as soon-to-emerge, standards for importing and exporting data between powerful media applications, but transformations in and out of these formats are often “lossy”, in that information is lost or changed. In fact, locating and downloading assets that are in supposedly-generic form is often very frustrating, because these assets end up not performing well. They can be difficult to edit and reuse. 3D animation models regularly blow up when animators try to import them into animation applications and the manipulate them. A hawk may look like a hawk until you try to render it with its wings flapping, and suddenly it’s a blob of geometric garbage.
One possible direction.
So, what do we do about the fact that many media assets must be manipulated by the original applications that created them? How can we facilitate reuse? It’s extremely unrealistic to expect users to master perhaps dozens of video or audio or animation applications. Filtering assets according to their file extensions is a good idea, and it is a well established practice.
But what we really need is a globally-known site that either literally or conceptually centralizes the massive network of import/export relationships, along with information about the relative success of these mappings. Are they ever lossy? If so, can they be fixed? What series of applications might we want an asset to be imported/exported through so that in the end it is in a usable format, given the applications that the user owns and has mastered?
There is much to be done. Right now, searching for and reusing media assets is a painstaking, trial-and-error-prone process.]]>
Metadata: making that ratio small.
Here’s something that’s very important: Much of the ongoing research and development that is loosely categorized as Semantic Web and Web 3.0 efforts is focused on a specific technical goal, one that has been at the core of information management technology since the mainframe era that was epitomized by the IBM 360 series. That goal is to leverage metadata as much as possible.
It’s our best weapon against the truly staggering amount of information on the Web. This includes traditional text-based and numeric data, as well as books, medical advice, photographs, entertainment and training videos, music and recorded books, investment information, educational materials, scientific materials, e-government information, etc., etc. How can we possibly organize information and then search it in a way that scales? The Web is far from a closed world. In traditional data processing environments like banking, insurance, and credit card processing, we could get our arms around all of the data, as vast as it may have seemed. But the world of information today is an open world, effectively infinite in size.
Very informally, if you look at the size of the metadata divided by the size of the data itself, the smaller that fraction the better. In traditional relational databases (built with database management systems, such as Oracle, MS SQL Server, MySQL, PostgreSQL, or DB2), the extreme focus on minimizing this ratio has enabled the fast processing of extremely large volumes of data. The tradeoff is that the table definitions (or the “schema”), which form the heart of the metadata are very, very simplistic.
The old days: relational database schemas.
An insurance claim may be defined as a table with such columns as Subscriber_Name, Medical_Provider, etc., and thus, may consist of little or no more than a series of simple character and numeric fields. But if we need to process fifty thousand of them tonight, we must be able to bring many such table rows into memory at once, and quickly move through them. The database world was an extension of the paper world: a row in an insurance claim table was effectively an electronic successor to the traditional claim form.
Today: a far more challenging problem.
But on the new Web, information can be far more complex in nature, making the metadata to data ratio far larger. We’ve looked at some of the emerging technology and technical trends for embedding metadata in advanced forms of data (and for processing that metadata); this data includes books, images, video, modeling and animation, and sound. This new generation of information formats make up our personal health records and medical records images, industrial training materials, university “distance” courses, and the like. Each instance of these tends to be far more unique than individual insurance claim forms. And, it takes a lot of metadata to properly convey their “meaning”.
What we’re struggling with right now is to succinctly specify the meaning of modern media assets and to automate searching based on this metadata. This is our only hope for leveraging that ratio of metadata size divided by data size.]]>
We’ve looked at the emerging Semantic Web technology in the previous postings of this blog. The idea is to have a far, far smarter Web, one where the process of finding and interpreting and making use of far flung information can be largely automated. This is in sharp contrast with today’s Web, where these things have to be done in a painful, extremely time-consuming fashion.
So that is the key challenge. It has to do with searching the kinds of information that are important to us in our daily lives. This information, as it turns out, is very difficult to process automatically. Why is this?
The complexity of modern multimedia.
I teach a very basic 3D animation class to mostly computer science students. We use Maya, arguably the most popular 3D animation application, one that is used in the making of many animated features. The interesting thing about animation is that it is truly multimedia. It can give us a lot of insight into what we need the new Web to do for us.
That’s because the number and diversity of applications that are used for drawing, documenting, modeling, animating, motion capture, texturing, video rendering, video editing, video conversion and compression, sound editing, in even small projects, can be very impressive. Correspondingly, the wide variety and complexity of media formats involved in an animation project can be overwhelming.
What happens in an animation project? The workflow might begin with vector storyboard drawings to break the story down into scenes. In a typical animation project, 3D models in a variety of proprietary formats are used. Models must be transformed as they are exported from one application and imported into the next. Multiple video renders of animated models are made, and they must be edited together, along with multiple sound files. Multiple video and audio formats might be used. 2D images are used for textures; photographs of butterfly wings can be used to make an animated butterfly very realistic, and a checkerboard image made with Photoshop can be used to make a Linoleum floor. And along the way, a variety of note taking, screen capture, and conferencing software might be used to facilitate group communication.
There is also a heavy focus on reuse in an animation project. Building every model, editing every texture, creating every environment and background, recording every sound from scratch is frequently intractable. If existing assets cannot be tailored and reused, the project would be far too expensive and time consuming, and would demand too wide a variety of professionals to always be available. This raises the multimedia stakes, as assets of widely differing forms must be constantly reconfigured and used in concert in new ways.
But what’s the real problem? We aren’t all trying to produce complex animated videos. But very interestingly, in our everyday lives we essentially face the animator’s challenge when we try to find and use information on the Web. That’s because we’re often looking for things whose meaning, whose interpretation, demands focused human thought. We are looking not for business data, but for pieces of media, and the problem is that today, most of our searching has to be based on tags or brief textual descriptions that are associated with pieces of media, and not on the true meaning of the media itself.
The needs of the business world are not our needs.
It’s the subjective nature of media assets – this is what is at the heart of the problem facing us. Existing technology for searching the web is based on keywords and very short pieces of text.
There is other technology, though, under active development, stuff that serves as the information storage backbone of most commercial websites. It’s the technology that has for decades been used in-house (not on the Web) by businesses when they process large databases. But this stuff was designed to handle traditional business data forms, like integers, character strings, real numbers, dates, timestamps, and full text.
There is more, though. All of the major database management systems, along with tools for building and searching advanced websites are being retrofitted (or in some cases, built from the ground up) to manage more than keywords and text, more than standard business data.
But up to now, the focus has not been on supporting the kinds of information you and I are most interested in. The focus has been on extending database and Web technology to support xml documents, as well as more complex data objects, like those inside a Java program, as well as other forms of data found inside programs. This includes arrays and lists and short pieces of textual data, like the names of diseases.
In other words, we’ve been busy extending our support of the business world, so they can store complex business data in databases and make that information processable over the Web. You and I have largely been left out.
Finally, we are attacking our needs.
But there now many ongoing efforts to extend database and Web technology to make it useful to us. The new focus is on supporting blob and continuous media like images, video, and audio. This is extremely hard to do.
Why? Because the strongest means by which we deduce the meeting of business data is by looking at its internal structure and the terms that are used to describe that structure. A relational table named Prescriptions, with a character attributes Patient Name, Doctor’s Name, and Medication, and with a numeric attribute Dosage, is pretty easy to interpret.
But what do we do with a photograph, which is just a grid of pixels with no internal structure? Or a long series of images, along with a sound track, put together to form a piece of video?
The U.S. military has been pumping money into image processing for several decades, and so all is not lost. There is a vast body of mathematical research and software development that allows us to write programs that can find a particular face in a crowd and search satellite photos for airplane runways. But in general, we cannot at this time write a program that can process an arbitrary photo or video clip and tell us what it means. That means we can’t quickly search vast media database for useful pieces of information.
The goal behind the Semantic Web effort is to build a new generation of websites whose information can be searched automatically, and where information from multiple sites can be automatically integrated. To do this with numeric and character based data is quite doable. But when it comes to multimedia, like images and sound and video and 3D models and engineering designs, well, we have a long way to go. The meaning – in other words, the semantics – of these forms of data are complex and subtle, and highly dependent upon an individual’s interpretation of that media.
So, we see that we have only just begun our journey to create the new Web.]]>
Basic, long-standing, core concepts.
What emerged has certainly stood the test of time. Big time. Opinions differ widely on just what constitutes the core constructs. Different people have used different names for these terms, and, although the idea was to specify something formal, the definitions of these constructs were generally sloppy. But here is a reasonable specification, in its most rudimentary form:
There are objects (which might also be called entities, things, or concepts). Objects have unique names.
Objects are interrelated by attributes (which might also be called relationships or properties). Attributes are directional, and they have names.
In other words, things in the world can be represented as a simple directed graph. We could say that there are objects called Chickens that have an attribute called Are. The value of this might be an object called Birds. Birds might have an attribute called Lives-In, which links Birds to the object Barnyard. There might be an object called Mr. Fried, which has an attribute called IS, which connects Mr. Fried to the object Chickens.
There are many popular various of this basic idea that have emerged, and they tend to be of the following nature:
One idea is to make a sharp distinction between the notion of a subtype (or sub-kind or subset) and other attributes. So, our attribute Are might become a core concept itself, and we might name it Is-A. Chickens IS-A Birds, People IS-A Biped, etc. Other attributes like Lives-In would be considered inherently different from Is-A.
We could introduce another generalization. A general term for attributes Lives-In and other similar attributes might be Has-A. In fact, we could stop using special words for attributes in general, and just use the terms Is-A and Has-A. We would then say that Marriages Has-A Wife, as well as a Husband, as well as a Date.
These general ideas are actually old, and actually significantly predate computing. We have been struggling with the problem of describing real world objects (like Cows), real world concepts (like Marriages and Respects), and their interrelationships and categories since the emergence of the earliest philosophers. Aristotle distinguished between objects and their attributes, and carefully studied and described many animals and plants.
What does it all mean for the new Web?
So, what does all this mean to us, today, and what does it have to do with modern Web technology? Well, first of all, these concepts of objects and attributes have spread throughout all of computer science.
There have been some significant extensions, like distinguishing between an attribute that we might call a relationship, which interconnects complex objects or notions (like a driver owning a car) and attributes that interconnect complex objects and notions with atomic or simple things (like a car having a color or a driver having a name). Generally, these latter, simple kinds of attributes are now what we call attributes, and are considered inherently different from (and simpler than) relationships.
Another extension that has become a core concept in programming languages is something we might call an object identifier, which is a unique number or other identifier for individual objects; this allows us to carefully distinguish between two people who have the same mother, and two people who have mothers who just happen to have the same name.
Programing languages also introduced the concept of methods, or little programs that can give life to objects. You might be able to tell a marriage object to tell us the names of the husband and wife.
But basic concepts have not changed. There seems to be something natural and fundamental about them.
Building a new world out of old concepts.
And the Web? A revolution is happening today. We are developing languages that allow Web designers to embed machine-readable specifications in Web-resident information. This will largely automate the process of searching the Web, as well as the integration of information at multiple sites. This will in turn lead to the discovery of knowledge by putting together diverse information from across the Web. We have discussed these emerging technologies in the previous postings of this blog; they are heavily and deliberately built on top of ideas that date back to the 1950’s, and in fact can trace their roots to ancient Greece.]]>
The Semantic Web.
In this posting, we will focus on the Semantic Web, which is a global effort at radically improving our ability to search the Web.
Currently, to search the web, we type in keywords into a search engine like Google, which then searches its vast index of webpages for pages that have these keywords in them. Because this sort of search is very low-level, and not at all tied to the true meaning or purpose of the information stored in webpages, searching is painfully iterative and interactive. A user must chase down countless URLs returned by a search engine to see if any of them are relevant. Quite frequently, they are not. And so, the user must refine the set of keywords and tries again. It might take many attempts before a satisfactory result is obtained.
One of the primary goals of the Semantic Web is to automate the process of searching the Web. There are two stages to this. First, people who post information on the Web must capture knowledge about the meaning of their information; this knowledge is commonly called “metadata”. The metadata is then store with the posted information.
The second stage happens when users search the Web. Rather than using the low level keyword search approach, the search is at least partly automated. The iterative process is sharply reduced by employing a smart search engine that knows how to find relevant information by searching for metadata that pertains precisely to whatever it is that the user is seeking.
The bottom line.
The Semantic Web would be able to ease the burden of searching for information, as well as find vast stores of “hidden data” that reside in databases that are accessible via webpages, but whose contents right now are not seen by search engines.
Ultimately, we would want the Web to be entirely searchable by software, without any humans guiding the process. This would be the true Semantic Web.
Namespaces and triples.
In past postings of this blog, we have discussed a handful of key approaches to implement the Semantic Web. One idea is to tag information with standardized sets of terminology called “namespaces“.
We have also looked at the idea of embedding these tags in things called “triples“. In this posting, we look at this concept more closely and consider an existing language that would allow people to specify these triples.
RDF and SPARQL.
The most well-known standard for specifying triples is RDF, which stands for the Research Description Framework. SPARQL is a query language, heavily influenced by SQL, that can be used to search data that has been structured using RDF.
This is the first of a series of blog postings in which we will first look at RDF, and then at SPARQL. Then, we’ll consider the big issue: will RDF and SPARQL enable the development of the true Semantic Web?
So, what is RDF? At its highest level, RDF is used to describe anything that can be found on the Web. RDF has an XML syntax; in other words, RDF can be written as an XML program, using a set of predefined “element” and “attribute” tags. (XML and XML languages were discussed in an earlier posting of this blog, as was XML and declarative languages.)
We might remember that on its own, XML is impotent. It is not in itself a programming language. It is simply a language standard for taking a set of tags and using them as “elements” and “attributes” in a declarative, data-intensive languages. A good example is SMIL, which is used to define multimedia presentations.
Here is a fragment in RDF, using its XML syntax. Note that XML languages are embedded languages, with opening tags beginning with <> and closing ones ending in </>
This looks complicated, but it’s not. This simple example illustrates the power of RDF. It uses a set of standardized RDF-specific tags, and the second line of code tells us where these tags come from: the w3.org site, which contains a vast store of information about advanced web technology. In other words, we can go to w3.org to find the precise definition of RDF specific tags.
RDF is engineered to also use other sets of tags, in particular, domain-specific tags. In this example, these tags come from a (non-existing) url called someurl.org. The tags themselves are prefaced with “zx:” in the rest of the code, so we know which tags are native RDF and which come from a domain-specific set of tags (called a namespace).
The xml “element” called Description is an RDF-specific tag that tells us we are giving the description of some resource on the Web, namely one at a (non-existing) website called awebsite.org.
The whole piece of code is one triple: It says that the topic of the resource at www.awebsite.org/index.html is funstuff. Here it is as a triple, with all the xml syntax and the namespace information removed:
www.awebsite.org/index.html <topic> funstuff.
Let’s overview this again. RDF is an XML language, so it uses the syntax of XML. One of the primary concepts in XML is that of an “element”, and Description is an XML element, one defined in the RDF standard. The piece of code begins with two namespace statements, one telling us which RDF specification we are using, and the second telling us that we will also be using some tags from another, domain-specific specification, which includes the tag “topic”. Then there is the guts of the triple, telling us that we are listing the topic of a Web-resident resource.
More on this in the next posting…]]>