Text archives - Buzz’s Blog: On Web 3.0 and the Semantic Web

Buzz’s Blog: On Web 3.0 and the Semantic Web:

Text

Oct 3 2009   9:12PM GMT

Multimedia: The Problem of Subtle Semantics



Posted by: Roger “Buzz” King
3D animation, 3D modeling, advanced Web apps, automating Web searches, blob data, continuous data, databases, information, Multimedia, rich internet apps, Semantic Web, smart search engines, tagging, Text, Web 2.0, Web 3.0, web applications, Web development, Web development frameworks, XML

The challenge of the Semantic Web.

We’ve looked at the emerging Semantic Web technology in the previous postings of this blog. The idea is to have a far, far smarter Web, one where the process of finding and interpreting and making use of far flung information can be largely automated. This is in sharp contrast with today’s Web, where these things have to be done in a painful, extremely time-consuming fashion.

So that is the key challenge. It has to do with searching the kinds of information that are important to us in our daily lives. This information, as it turns out, is very difficult to process automatically. Why is this?

The complexity of modern multimedia.

I teach a very basic 3D animation class to mostly computer science students. We use Maya, arguably the most popular 3D animation application, one that is used in the making of many animated features. The interesting thing about animation is that it is truly multimedia. It can give us a lot of insight into what we need the new Web to do for us.

That’s because the number and diversity of applications that are used for drawing, documenting, modeling, animating, motion capture, texturing, video rendering, video editing, video conversion and compression, sound editing, in even small projects, can be very impressive. Correspondingly, the wide variety and complexity of media formats involved in an animation project can be overwhelming.

What happens in an animation project? The workflow might begin with vector storyboard drawings to break the story down into scenes. In a typical animation project, 3D models in a variety of proprietary formats are used. Models must be transformed as they are exported from one application and imported into the next. Multiple video renders of animated models are made, and they must be edited together, along with multiple sound files. Multiple video and audio formats might be used. 2D images are used for textures; photographs of butterfly wings can be used to make an animated butterfly very realistic, and a checkerboard image made with Photoshop can be used to make a Linoleum floor. And along the way, a variety of note taking, screen capture, and conferencing software might be used to facilitate group communication.

There is also a heavy focus on reuse in an animation project. Building every model, editing every texture, creating every environment and background, recording every sound from scratch is frequently intractable. If existing assets cannot be tailored and reused, the project would be far too expensive and time consuming, and would demand too wide a variety of professionals to always be available. This raises the multimedia stakes, as assets of widely differing forms must be constantly reconfigured and used in concert in new ways.

But what’s the real problem? We aren’t all trying to produce complex animated videos. But very interestingly, in our everyday lives we essentially face the animator’s challenge when we try to find and use information on the Web. That’s because we’re often looking for things whose meaning, whose interpretation, demands focused human thought. We are looking not for business data, but for pieces of media, and the problem is that today, most of our searching has to be based on tags or brief textual descriptions that are associated with pieces of media, and not on the true meaning of the media itself.

The needs of the business world are not our needs.

It’s the subjective nature of media assets - this is what is at the heart of the problem facing us. Existing technology for searching the web is based on keywords and very short pieces of text.

There is other technology, though, under active development, stuff that serves as the information storage backbone of most commercial websites. It’s the technology that has for decades been used in-house (not on the Web) by businesses when they process large databases. But this stuff was designed to handle traditional business data forms, like integers, character strings, real numbers, dates, timestamps, and full text.

There is more, though. All of the major database management systems, along with tools for building and searching advanced websites are being retrofitted (or in some cases, built from the ground up) to manage more than keywords and text, more than standard business data.

But up to now, the focus has not been on supporting the kinds of information you and I are most interested in. The focus has been on extending database and Web technology to support xml documents, as well as more complex data objects, like those inside a Java program, as well as other forms of data found inside programs. This includes arrays and lists and short pieces of textual data, like the names of diseases.

In other words, we’ve been busy extending our support of the business world, so they can store complex business data in databases and make that information processable over the Web. You and I have largely been left out.

Finally, we are attacking our needs.

But there now many ongoing efforts to extend database and Web technology to make it useful to us. The new focus is on supporting blob and continuous media like images, video, and audio. This is extremely hard to do.

Why? Because the strongest means by which we deduce the meeting of business data is by looking at its internal structure and the terms that are used to describe that structure. A relational table named Prescriptions, with a character attributes Patient Name, Doctor’s Name, and Medication, and with a numeric attribute Dosage, is pretty easy to interpret.

But what do we do with a photograph, which is just a grid of pixels with no internal structure? Or a long series of images, along with a sound track, put together to form a piece of video?

The U.S. military has been pumping money into image processing for several decades, and so all is not lost. There is a vast body of mathematical research and software development that allows us to write programs that can find a particular face in a crowd and search satellite photos for airplane runways. But in general, we cannot at this time write a program that can process an arbitrary photo or video clip and tell us what it means. That means we can’t quickly search vast media database for useful pieces of information.

The goal behind the Semantic Web effort is to build a new generation of websites whose information can be searched automatically, and where information from multiple sites can be automatically integrated. To do this with numeric and character based data is quite doable. But when it comes to multimedia, like images and sound and video and 3D models and engineering designs, well, we have a long way to go. The meaning - in other words, the semantics - of these forms of data are complex and subtle, and highly dependent upon an individual’s interpretation of that media.

So, we see that we have only just begun our journey to create the new Web.

Mar 5 2009   5:05AM GMT

Multimedia, what is it? Why do we care?



Posted by: Roger “Buzz” King
Web 2.0, Web 3.0, Web development, Multimedia, SMIL, Text, the Semantic Web, Video, XML

This is the fourth in a series of blogs on the Semantic Web and Web 2.0/3.0.

To get us going here, just what is “multimedia”? At one level, it simply refers to applications that manipulate, store, and/or present multiple kinds of media, such as text, video, relational data, sound, animation, etc. More pragmatically, it refers to the introduction of blob and continuous forms of data into applications that traditionally manage simple data, like character strings and numbers. In its most aggressive form, multimedia refers to the sophisticated integration of traditional, blob, and continuous data into integrated data forms that convey their own semantics.

A quick note: Blob data is data that is stored in a semantics-less fashion, usually as simple binary or character data. This could be almost any sort of data, such as images, video, sound, or natural language - but the key element is that the language or system being used to manipulate it doesn’t have an appropriate, specific data type. Blob data is often large, of variable size, and usually requires a sophisticated, outside application to interpret and present it. It is the default, catch-all way to store advanced forms of data in relational database management systems.

Another quick note: Continuous data is data that has a temporal aspect or can be broken down into segments that have their own identity. The visual part of video can be broken into clips; in fact, it can be broken all the way down to individual pixel-based images. Sound can be cut into pieces. James Joyce’s Ulysses is a big piece of continuous textual data. Like blob data, we typically need an outside appliation to interpret it. (Even the most complex application, a human, generally has trouble doing this with Ulysses.)

Back to the Semantic Web, Web 2.0/3.0, and multimedia.

In a previous blog, we tried to define these two terms and explain why they are very different concepts. The Semantic Web is an attempt to automate the searching of the web and the integration of data collected on the web; the idea is to greatly ease the painful interactive nature of using a search engine like Google. Web 2.0/3.0 (and no, there is no sharp distinction between the two) are largely about performance, of making web applications as responsive as possible, potentially as responsive as desktop applications.

But they share one common goal: effectively managing advanced forms of media. From the Semantic Web perspective, how can we search things like sound, images, video, and natural language in a semantically-meaningful way? We use sophisticated tags and image/sound processing to do this, but it is only a small step toward a solution.

From a Web 2.0/3.0 perspective, how can we deliver up such forms of media in way that is highly responsive? Video streaming on the web is a huge challenge, for example. Or, how can we interact with video in a responsive way, in such things as games and digital libraries?

The web, in fact, is inately multimedia: we take images, icons, links, text, video, sound, and various user controls like buttons and menus, and put them together in highly sophisticated ways. And behind these web pages, databases often sit, populating dynamic pages with information in response to user requests - this data is virtually invisible to search engines. This is what makes the Semantic Web in particular such an incredible challenge. How can we ever hope to search the web automatically?

There are modest advancements that have been made. One example is something called the Synchronized Multimedia Integration Language (SMIL, pronounced like the facial phenomena). It is an XML extension that supports basic constructs to glue multiple forms of media together in two dimensions and in temporal sequences. Using XML elements and attributes (the basic constructs of XML), we can create multimedia presentations in a precise, unambiguous way.

SMIL presentations can be processed automatically. This is very significant.

So, multimedia: it’s at the core of both the Semantic Web and Web 2.0/3.0. It is one of the basic motivations for their existence.