Buzz’s Blog: On Web 3.0 and the Semantic Web


May 2, 2010  7:27 PM

The Challenge of Complex Media in a Relational World, Part 3



Posted by: Roger King
databases, Multimedia, namespaces

Two postings ago, we looked at SQL-based relational databases and why they are not well suited to managing advanced forms of media, like images, language, video, and sound. In the previous posting of this blog, we looked at the various semantic levels at which we can search advanced media-bases, such as video. We noted that we can’t really “search by meaning”, and that the best we have been able to do is find more sophisticated ways of simulating a truly intelligent search.

Making faux-intelligent searches more effective.

In this posting, we look at a key method for making these simulated intelligent searches more accurate, and that is by using human experts to train the database search facility. To do this, we need four main components: a media base for which we want to develop an effective search facility, a feedback cycle involving skilled experts, a body media artifacts to use during the training process, and an initial search facility that we want to train. We don’t need the the first item in order to engineer our first cut at a media search facility.

The initial search facility.

There is a wide class of techniques that are used to search advanced media, like images, sound, and video.

One approach is to base the search facility on a hierarchy of classifications. Consider a database of digital photographs. We might have two main categories: inside and exterior. These might form the first two branches of a hierarchy. Exterior shots might be subdivided into shots in sunlight on land, shots in sunlight on water, shots at night on land, and shots at night on water. These would be further divided, and clearly, the categories would be more sophisticated than the somewhat silly ones I am suggesting here.

Importantly, this hierarchy might be very broad and very deep, thus forming a huge inverted tree, with the top node being called photographs.

Also importantly, the words in this hierarchy are likely to come from a namespace shared by professional photographers.

It isn’t enough to simply have a nice, standardized hierarchy for classifying photographs. We need to be able to automatically place photographs in their proper categories in the hierarchy. Each one will be assigned a term that comes from the photographer namespace and appears on a leaf (a node with no branches below it) of our inverted tree. That term and all the terms down branches of the inverted tree leading to that leaf would apply to a given photograph.

How do we do this? We do it with image processing techniques, something we will discuss in a subsequent blog posting. For now, we’ll just say that there is a large body of existing software that can classify images and video and sound, using a variety of heuristics. This software can judge the amount and nature of light in a scene and use it to decide if a photograph was taken indoors our out of doors, for example.

A body of training media artifacts.

This might be a subset of the media artifacts that we want to put into our database when it is deployed for use by non-experts. Or this might consist of a well understood set of test media artifacts with which our experts are familiar and is used specifically for training our system. (Again, in our example, these are digital photographs.)

The feedback loop.

This is often called “learning”, and it refers to the process of allowing experts to provide accuracy feedback on the results of search attempts. Essentially, the feedback loop provides a way for experts who are familiar with our photography namespace to reclassify a photo if the search facility has it associated with an inappropriate or non-optimal leaf in our tree.

During the training process, we let the system automatically classify the photos in our training set, but every one of them is carefully analyzed by our experts and reclassified as necessary. The search facility doesn’t simply accept the new classifications. It responds by altering (and perhaps extending) its method for deducing the proper classification of a given photo. We will look at this again, as well, in a subsequent posting of this blog.

The (always growing) database of media.

Once the search engine has been put into place, and once it has been trained (at least enough to use it in production mode), it’s time to load the entire body of animation artifacts. Quite likely, the feedback loop will be left alive and the training process will continue indefinitely on a selective basis, depending on how well the search facility seems to be performing. But the larger body of media artifacts will be classified automatically from here on out. This is the only way to create a media search facility that scales to truly vast libraries of media artifacts.

More to come…

April 20, 2010  7:13 PM

The Challenge of Complex Media in a Relational World, Part 2



Posted by: Roger King
continuous data, databases, namespaces, Semantic Web, SQL, tagging

In the previous posting of this blog, we looked at SQL-based relational databases and why they are not well suited to managing advanced forms of media, like images, language, video, and sound.

Searching by semantics.

Here, we look closely at one specific issue related to managing complex media: How to categorize and search advanced forms of media by their “meaning” or “semantics”. This is extraordinarily difficult, and in fact, in general, it is impossible. This is why we usually rely on relatively low-level heuristics and can only simulate search-by-semantics in simplistic ways.

Consider a library of soundless video clips. Let’s assume there are many thousands of them, and they vary in length from seconds to hours. First of all, the only clips we can afford to download and actually view in real time are the ones that are only seconds or minutes in length, and we can do this only if we are somehow able to limit the search space to a small handful of candidates. Keep in mind that a video can consist of twenty to forty images per second.

So what do we do?

Searching previews.

We could search tiny samples of our video clips, perhaps taken from the beginning, the middle, and the end of each clip, but this doesn’t actually well, either. We need something that can scale, that is automated.

Searching tags.

The dominant technique is to extract information concerning low level attributes of the video clips (such as their format and pixel count) automatically, and then have experts add more tagging information by using widely adopted, formal namespaces. We might use a geography namespace to mark clips as having rivers and mountains in them.

These two forms of tagging information might be encoded together using the very popular MPEG-7 language. This creates a very indirect way of searching video clips. We don’t actually search them. We search the hierarchically constructed MPEG-7 tag sets that describe the videos. This at least allows us to use SQL in a reasonably straightforward way to do the searching.

Searching for specific images.

There is very good technology for processing images for fixed pixel-based subcomponents like individual faces. We can also search for video clips that have any faces at all in them.

In general, it’s easier to search for things made by people because they tend to be more angular and regular in shape. These include specific buildings and types of aircraft.

Searching for colors and shapes.

We can also search for more abstract subcomponents of images, like polygons, circles, and the like. Despite the fact that video images are pixel-based (or “raster”), there is good technology for isolating the lines that form the boundaries of subcomponents.

And we can look for colors and compare the relative location and dominance of various colors, like images where 63% of them are a particular shade of orange.

Searching for change over time.

We can also search for pattern changes in the series of images that make up a video clip.

But none of this has much to do with the real meaning or semantics of images and the video clips they form. Taking this next step is huge challenge.

Semantics.

How can look for a setting sun or a ball moving across a tennis court, without knowing the details of the sunset or the particular tennis court in advance?

We can use the colors and shapes approach to look for a big orange ball descending below a possibly-jagged horizontal line. We could look for a small, white or yellow spherical object move across a big green rectangle.

One way to raise the bar a bit is to use domain-specific knowledge about the images being processed. It’s a whole lot easier to spot that tennis court if we know that’s what we’re looking for. Then we can fill our searching software with lots of detailed information about the various sorts of tennis courts. We can also more easily isolate the tennis court in a larger image if we know it’s there somewhere. This gives us an extra edge, so we can perhaps find the court, even if it turns out to be brown and not green, or if the surrounding terrain is almost the same color as the court.

We of course never get away from searching by heuristics that only simulate the process of determining the true meaning of a series of images. We can never truly search by semantics.

But we can do something else: we can get humans into the loop and train our software to do a better job. We’ll look at this next.


April 13, 2010  5:57 PM

The Challenge of Complex Media in a Relational World, Part 1



Posted by: Roger King
blob data, continuous data, databases, Multimedia, MySQL, Oracle, PostgreSQL, SQL, SQL Server, tagging, Video

Relational databases: the dominant technology.

Relational database management systems, such as MySQL, Oracle, MS SQL Server, DB2, and Postgresql, support the relational model. A database is broken up into tables, and each table consists of rows. Each row is a series of values. A row in a table called Insured Drivers in a motor vehicle database might consist of:

Fred, 2010 Toyota Prius, State Farm Insurance, 1112233444.

1112233444 might be a unique identifier that the government assigns to each driver. This would be the “primary key” for the table Insured Drivers. The point is that human names are not at all unique, and so in relational databases, we introduce artificial keys in order to disambiguate queries. We still need the value Fred in the row because we want to know how to address him with a letter or email.

Problems with relational databases.

There are a few critical points to note with this approach. First, such a simple way of representing data allows the database to quickly deliver large sets of rows from this table to the memory of a computer, so that they can be effectively searched in bulk. We might want to know the names of all people who drive a Toyota Prius and are insured by State Farm, for example.

Another thing is that we might like to be able to put more complex items in a row. We might want to have another value in a row, one that gives a driver’s address. But an address has a few parts to it, and is not itself a simple value like a name or a car model or the name of an insurance company.

It is important to also note, however, that relational databases do indeed support the creation of more complex values, such as an address. But the more complex values we put in rows in tables, the harder it is to read in a large number of rows at once.

In fact, we could create a value that represents a very complex object, one that refers to rows in other tables. For example, we might want to replace the value Fred with a reference to a row in another table called Licensed Drivers, because there is a lot we might want to know about Fred, other than just his name. But then it would become very difficult to read in lots of rows of a single table quickly.

It might be that if we follow a link to another table that describes drivers, these rows might themselves have links in them, thus allowing a value in a row to actually consist of an object, like we would in Java or C++. And in general, these links between tables could be chained together, and extend arbitrarily far. Do we chase all of these linked references down for every row of Insured Drivers, or do we not follow any of these links so we can read in a large number of rows? Then we would worry later about getting more information on each driver.

Importantly, relational databases are still very much the dominant database technology in use in businesses and other organizations, as well as on the Web. We need to keep in mind that we have already aggressively extended them by supporting values that have internal structure (like addresses) and with the ability to create complex objects (like drivers). How far do we go in extending them?

Where we stand today.

Indeed, the extensions we have already made to relational databases have created a serious optimization problem.

But it’s worse than that. Here’s something else to consider. Relational databases were born into a world where flat business data was pretty much the only game in town. However, relational databases are being asked to manage far more sophisticated forms of data, like photos and video clips and voice tracks. There are a couple of problems that crop up. First, a row with a video clip as a field could be huge. We might only be able to read in a single row at a time and this could make searching an entire table intractable. Worse, how do we even search for rows that contain certain pieces of video? How can we search for all video clips that show Fred getting into a car accident?

Where to go from here.

In previous postings of this blog we have looked at media databases, and in particular, at techniques that can be used to tag complex forms of blob and continuous media (like photos and video clips). What’s important to note, though, is that there is a major dilemma right now in the world of database software. Can we continue to shoehorn more and more complex forms of data into relational databases, or do we need to throw in the towel and start over?

More on this next time…


April 2, 2010  10:41 PM

More on ‘the Moron App’ called Mozy: What can go wrong with a web app



Posted by: Roger King
advanced Web apps, Web 2.0, web applications

In my last blog posting I detailed my frustrations with Mozy, the offsite backup utility. Rather than being a great example of what modern web apps can do, it appears to have server-based bugs, widely varying temporal behavior, and unstable client-server communication properties.

In short, it does not work – and I would strongly advise against buying it.

I reported in my last blog posting that my (paid) account on two Windows 7 machines was now working, after I had spent many hours playing with it. But I have to take that back now. It no longer works.

I have another (paid) account for two Snow Leopard machines, and it is a horrific failure on those.

Here are some thoughts on it, from the perspective of what can go wrong with a web app.

Server-based bugs.

This can be very frustrating – bugs that appear to be on the server side of the application, making it impossible for the user to fix things from their end. I have tried hard to experiment with every possible setting or piece of data that I can control from my end. This has been an exhaustive and exhausting process. I have done this sort of tinkering with many applications.

But unlike with the vast majority of web apps I have run, there seems to be nothing I can do to make Mozy work.

Widely-varying temporal behavior.

When you start Mozy up, it begins a long process of scanning the client machine’s drive and organizing the files it wants to upload. I assume it is also processing the files of mine that it has upstream. It gives some feedback as it moves from stage, but for me, at least, it simply spins forever. It seems to be working, but many days can go by, and it’s up to you to decide when to give up and stop it. And only very rarely does it actually upload any files.

Not knowing how much time to give it, and all the while letting it eat up cycles and memory, is more than frustrating. This application does more harm than good when it comes to managing your computer.

Unstable client-server communication.

The Mozy client-side program will sometimes, after minutes or hours or days of churning away, suddenly return an error message that says it could not communicate with the Mozy server.

Irrelevant help advice.

This is not a problem specific to web apps, but I figured I would mention it anyway.

I spent a lot of time pawing through the Mozy documentation, as well as Googling to see what other users are experiencing. It seems clear that I am not alone in my evaluation of Mozy. But I found nothing that appears to be useful for helping to get this app running properly.

A sucker.

I feel like a real chump for buying two paid accounts (each supposedly backing up two machines). Because I am an academic (professor of Computer Science), I will continue to monkey with it. That’s just how I am.

But the Mozy folks should be ashamed of themselves. A backup utility that is at best highly unpredictable is worthless.


March 28, 2010  8:57 PM

Mozy sure likes to mosey along: a Web 2.0 app with mixed results



Posted by: Roger King
Rich Web Apps, Web 2.0, Web 3.0, web applications, web services

A popular class of web apps.

One of the more popular offsite backup services is Mozy. I have two accounts, with each one supporting two computers. One on account I have two Windows 7 machines, and on the other, two Snow Leopard machines. This sort of application is becoming the method of choice for protecting files from loss.

But Mozy is an interesting example of technology that isn’t quite there yet.

A trait of new technology in general.

We do this a lot, excitedly jumping into emerging technology, embracing it, and putting up with its “tinker-with-me-nonstop” funkiness.

The Mozy backup service: the idea.

The idea behind Mozy is that you create backup sets and a schedule of when you want your offsite backup folders to be updated. You can, in principle, have your documents, mail, and other important files backed up late at night when your machine is idle.

The Mozy backup service: the reality.

But right now, I am on a Snow Leopard machine on which Mozy has been “Scanning for files” for many, many hours. It says that 0% percent have been prepared and 0% percent have been transferred. Reinstalling, including getting rid of all of Mozy’s support files and starting from scratch, does not fix the problem. Neither does rebooting or cursing.

Mozy likes to behave differently on my other Mac. It seems to scan okay, but then hangs up when it is time to start uploading. Right now, it’s been hung for several hours. In perhaps another several hours, it will stop and do one of two things: give me an error message that says that the connection with the Mozy server was cut, or tell me that my files have been backed up – which of course will not be true.

Yes, Mozy will actually tell me that the last backup failed, and at the same time tell me that all my files are backed up. When I go to the Mozy server to check things out, it turns out that my files have not been backed up for weeks.

On my two Windows machines, Mozy has yet another behavioral pattern. It actually works, as advertised. But this is only because I have spent many hours with it. And it is still not at a state where it can be left alone to do its thing, and I mess with it almost every day.

Welcome to the new Web.

The situation with Mozy and my machines is representative of how many new, Web 2.0 applications perform. To get things going and to keep them going, you have to either have an infinite amount of futzing time or you’d better be a programmer. Or both.

Mozy does indeed give specific error messages. They are numbered and if you look in their Help file, there are very understandable explanations of what the numbers mean. It’s just that the error messages returned don’t seem to have anything to do with what is happening.

There is a log file, too, with lots written in it. But the stuff in there seems to have nothing to do with what’s going wrong, either.

So that’s the lesson of the emerging Web 2.0 world. You are going to be more intimately involved with the web apps you use than you would imagine in advance, especially given the hype and promises made by vendors.

More on this soon…


March 24, 2010  1:52 AM

Text display for folks with kaleidoscope vision, part 3.



Posted by: Roger King
3D animation, 3D modeling

People with limited vision.

One of the things I teach is 3D animation (using Autodesk Maya). I also happen to have had cornea transplant surgery in both of my eyes, as a result of a degenerative disease that caused my corneas to thin gradually, thereby losing their structural integrity. The corneas, by the way, are the clear outer surface of the eye.

This is the third blog posting in a series relating to the use of modeling and animation software to develop technology to assist folks with limited vision caused by deformed corneas.

Diseased corneas.

Today, we look at using animation software to simulate the treatment of eyes diseased by keratoconus, the disease that affected my eyes. Keratoconus causes the corneas to thin and lose their structural integrity. Folds and bulges develop. Eventually, all you can see are multiple distorted, fragmented, and overlapping versions of everything you look at.

A simple treatment.

As it turns out, a short or medium term treatment for the condition, which works until the corneas become so thin that they absolutely must be replaced, is very simple: scleral contact lenses can be placed on the eyes, forming a sort of false corneas. I’ve been told that the same treatment can be applied to folks with astigmatism or cornea damage from a botched Lasik surgery.

As near as I can tell, this is why it works: As ambient light passes into the eye, it of course goes through the transparent outer surface of the eye, the cornea. If the slope of the cornea varies, the angle of refraction of light as it passes through the cornea will vary. But if a large contacts lens, one that covers the sclera or white part of the eye, is sitting on top of the cornea, it holds a layer of fluid over the cornea. This means that light passes through the contact lens, through the fluid, and then through the contact. Since the layer of fluid is close in density of the cornea, the angle of refraction remains more consistent over the cornea as a whole. Thus, a smoothing effect takes over. The light is refracted more uniformly before it passes into the eyeball.

Modeling the disease and the treatment with 3D modeling.

What does this have to do with 3D animation? If you have keratoconus or an astigmatism or a cornea damaged by bad laser surgery, this is a very cheap, non-surgical, and effective treatment. (An astigmatism is not a caused by a cornea defect, however, and I don’t fully understand why the scleral lens treatment works.) And, as it turns out, the optics of the treatment can be easily simulated with a 3D animation application, and this presents the possibility for refining the treatment using simulation.

How would this be done? The effects of an irregular cornea can be modeled by a sphere whose surface has been altered using deformation primitives. The effects of keratoconus lead to a distinctive shape of the cornea. Since the eye puts more pressure on the center of the eye, the middle of the cornea is pushed out into a cone shape – thus the name of the disease, which means “cone shaped cornea”. The lack of structural integrity also causes irregularities in the slope of the cornea in general. Ophthalmologists routinely take topographical maps of corneas and these maps could be used to create a simulated cornea within an animation application. (I use Maya in my research.)

The effects of a scleral lens laid over the cornea can also be simulated. Animators routinely model transparent surfaces and fluids, and can very carefully control the effects of lights going through surfaces and fluids. This means a program like Maya can model both the disease and the treatment. This is very exciting.

More on this next time…


March 12, 2010  4:49 AM

Text display for folks with kaleidoscope vision, part 2.



Posted by: Roger King
3D animation, 3D modeling, Maya

One of the things I teach is 3D animation (using Autodesk Maya). I also happen to have had cornea transplant surgery in both of my eyes, as a result of a degenerative disease that caused my corneas to thin gradually, thereby losing their structural integrity. The corneas, by the way, are the clear outer surface of the eye.

This is the second blog posting in a series relating to the use of modeling and animation software to develop technology to assist folks with limited vision caused by deformed corneas.

Bad corneas.

The problem with my corneas was that they were not smoothly curved. Why? Because my corneas had grown thin, their slope varied somewhat erratically. In other words, my corneas were not smooth like a basketball. There were bumps and folds throughout their surfaces. Thus, light passing through them wasn’t properly focused, the way it would be if it passed through perfectly spherical corneas.

Since my corneas didn’t do their job of pre-focusing light, my lenses could not do their jobs by completing the focusing of light. So, the world was choppy and filled with multiple, overlapping images. It was like looking through a kaleidoscope. The disease is called keratoconus.

An idea from the world of 3D animation.

In my research at the University of Colorado, I have been looking into the following approach. The overall idea is to create an inverted, deformed view that compensates for the distortion caused by a person’s irregularly shaped cornea.

The first step is to use Maya to create translucent hemisphere with varying slopes, thus simulating the effects of keratoconus. The hemisphere would be created by inputting a map of the varying slope of the user’s cornea. These maps can be quickly and cheaply made by using equipment commonly found in the offices of opthomologists.

The second step involves automatically creating a compensating view, by making use of deformer primitives in Maya. The way in which the deformer primitives are used to create the compensating view would be calculated from the topographical map of the diseased cornea. The view would at least partially undo the deformations caused by the diseased cornea.

In the third step, the individual with a deformed cornea could look through this software-generated view while looking at written text, webpages, images, and the like. This would be done by moving the material to be viewed into a Maya scene, and then rendering it from the perspective of someone looking through this compensating view. The user would look at this distorted version of the visual material, which for the user with the bad cornea, would look better than the original.

Another approach.

Rather than trying to compute the shape of a compensating, deformed view and then building this with Maya, a different approach involves allowing a user with deformed corneas to interactively deform their view by using Maya deformers.

First, whatever material is being viewed would be read into the animator’s design window in Maya.

Second, the user could manipulate the view until the material being viewed becomes viewable. This distorted view would then be used to create a deformation template that could be applied to any material that the user needs to view.

Again, in the third step, the material would be read into a Maya scene, deformed, and then rendered for the user to view.

And another approach.

There is a serious problem that complicates these two approaches, and that is that a weak cornea is, by its very nature, unstable, and so the deformities caused by it shift constantly.

So, another way to attack the problem involves simulating the effect of placing a large “scleral” lens that straddles most of the eye. We will look at this idea in the next posting of this blog…


February 25, 2010  10:25 PM

Text display for folks with kaleidoscope vision, part 1.



Posted by: Roger King
3D animation, 3D modeling, Add new tag, limited vision, Maya, rich internet apps, Rich Web Apps, Text, Web 2.0, Web development

I did a series of two blog postings several weeks ago on accommodating people with limited vision. It was motivated by the fact that I have had cornea transplants and cataract surgery, and have spent many years with limited vision.

One of my motivations for looking at vision problems has to do with developing technology that can be used to aid people with limited vision make full use of the Web.

Distorted vision.

This time, I’d like to look at an issue that is specific to people with vision problems that cause angular distortion, as opposed to vision that is very unfocused or opaque. This was my problem. Before I had cornea transplants, my vision was perfectly clear and somewhat unfocused; the dominant symptom was that the world around me was fragmented into overlapping, broken images. People who have had laser surgery to fixed their nearsightedness, but where too much of the cornea was shaved off, can have similar symptoms.

Why are the corneas so critical to vision? The corneas, which are the clear outer surface of the eye, prefocus light for the lens. If the corneas don’t do their job right, the lenses cannot do their job.

It’s called Keratoconus.

My corneas thinned as I aged, until they were so thin they lost their structural integrity. If you look directly into the eyes of a person, you could consider the top-to-bottom axis to be “u” and the side-to-side axis to be “v”. The front-to-back axis could be consider “w”. Imagine examining someone’s corneas, in particular, looking up and down the u axis, and right and left across the v axis. If that person has normal corneas, the corneas have a smooth, spherical slope into the w axis. My disease caused the slope of my corneas to vary significantly at various points across both the u and v axis. This caused light going through my corneas to be refracted at widely different angles. This created a sort of kaleidoscope effect.

The challenge for someone with kaleidoscope vision is to extract an accurate mental image based on your shattered view of the world. This disease, by the way, is called Keratoconus, which is basically Latin for “cone-shaped corneas”. The name comes from one of the primary symptoms used to make a diagnosis: super-thin corneas tend to get pushed outward by the center of the eyeball, turning the cornea from a basketball to a football (or cone) shape.

The parallel between Keratoconus and 3D model deformation.

I teach an introduction to 3D animation class, and years ago, I noticed that some of the “deformer” effects available in Autodesk Maya, (the gold standard in 3D animation) could be used to simulate the distortions caused by my eye disease. As part of my research at my university, I’ve been experimenting with using deformation effects available in Maya to compensate for the distortion caused by Keratoconus.

A couple of examples.

Below are two sets of images. The two with red characters on a white background contain the integer 7, showing a common effect of keratoconus: multiple, overlapping images. The two with yellow characters on black backgrounds are the word “cow”, showing another common sort of distortion; in particular, one shows a horizontal distortion, and the other, a vertical distortion.

Note: You may have to click on the icons below to download the actual images.


In the next posting of this blog, I will describe some of the deformation effects in Maya, and how they can be used to actually compensate for the problem caused by Keratoconus.


February 17, 2010  12:38 AM

The Parallel Worlds of Media Databases and Media Metadata



Posted by: Roger King
3D modeling, blob data, continuous data, databases, MODS, Multimedia, namespaces, SQL, tagging, Text, the Metadata Object Description Schema, Video

Searching traditional business data: straight-forward.

Managing advanced forms of media, such as images, sound, video, natural language text, and animated models have been discussed a number of times in this blog in the past.  Traditional information systems, such as relational databases, have been engineered largely to handle the sorts of data we have in business applications, primarily simple numeric and character string data.  To the SQL database programmer, the nice part is that the data speaks for itself.  If a field is called Name, and the value is Buzz King, the semantics of “Buzz King” is pretty obvious, and it can be processed in a largely automatic fashion.  The same goes for a field called Age, with a value of “97″.  

Searching advanced media: far, far more difficult.

But modern media is far more complex than this.  ”Blob” data like images, and continuous data, like sound, video, and natural language text, are very difficult to search and interpret automatically.  There are two approaches that have been taken to resolve this dilemma.  

Tagging: the simple approach.

The first is tagging.  Descriptive terms, often taken from large, shared vocabularies, at attached to pieces of media.  These vocabularies can be very domain-specific, dedicated to areas like medicine, law, and engineering.  

Intelligent processing software: the second approach.

The second technique is the automatic processing of pieces of media using image processing, natural language, and other highly intelligent software.  These applications are very sophisticated and understood only by experts.  And, these applications often demand a lot of processing time, and this makes bulk processing impossible. It’s also true that the results can be haphazard.  Some pieces of media can be interpreted precisely, others not so precisely – and dramatic mistakes are frequent.  A tennis court might be mistaken for an airplane runway.  There’s a huge trust factor involved in cranking up image or sound processing software or natural language software.  

Often, we can provide feedback so that these applications can learn, over time, the way we want media to be interpreted.  We can help the software learn the difference between a tennis player and a member of a ground crew on a small runway. All of this is hugely expensive, in terms of the cost of developing the software, and in terms of the physical resources needed to run the software.

A middle ground?  Not really.

So, is there some middle ground?  Something simple, yet more “intelligent”?  Yes, and the answer is to take a sophisticated approach to what otherwise might be very simple tagging techniques.  However, the core problem with tagging remains: we search and process tags – and not the actual data.  It is an indirect, but fast process.  The goal is to come as close as we can to simulating the results of such things as image processing, but to do it with a simple, yet comprehensive, accurate tag-based technology.

We’ve looked at some of the solutions that have been proposed.  They include Dublin Core, MODS, and MPEG-7.  The first is very simplistic.  The second is more sophisticated, in that the terminology used is broader and far more precise.  The third is very aggressive in that it supports the complex structuring of tag data elements.  

So, what are we really doing?

In essence, we build a hierarchy of metadata and then instantiate it for every piece of media we want to catalogue and later search.  What we are doing is creating a parallel database, one where every piece of blob or continuous data is accompanied by a possibly very large tree of structured tagging information.  The parallel database has its own schema and an instance of it is created for every piece of media in the original media database.

The end result?  Instead of creating some sort of media-centric query language, like an SQL-for-video, we give up on trying to search the media database itself.  The query language remains largely ignorant of the nature of blob and continuous media.  We can continue to refine and expand the schema of the parallel database until search results are satisfactory.

More later…


February 6, 2010  3:17 AM

The five What? dimensions of mega apps.



Posted by: Roger King
3D animation, advanced Web apps, Multimedia, Semantic Web, Web 2.0, Web 3.0, Web development

This blog is (for the most part) dedicated to advanced web research and development, in particular Web 2.0/3.0 and Semantic Web efforts. Please peruse earlier postings for lots of material on both topics.

One thing we have discussed are the “mega apps” used by video, audio, and animation professionals.

These applications are highly sophisticated in their capabilities, vastly huge in the size of their interfaces, and because of these two things, it is very, very difficult to learn to use them by experimentation. You have to be trained, and professionals spend years learning to use them properly and creatively. These applications can be viewed as the ultimate challenge for the next generation of web application development. Can media and animation applications be deployed as web applications that would actually perform well enough? It would be a great relief to many artists and media professionals to not have to install and maintain their mega apps on their desktop machines.

But now, let’s look at what makes these applications so intimidating. Consider five questions that relate to complex applications.

  1. 1. What does the application do?

How long it takes to answer this question and how much specialized terminology is sprinkled throughout the answer are very telling. Can you say what a 3D application does? How would you explain it to a non-animation professional?

  1. 2. What does the GUI look like to the user?

The depth of this answer – in a literal sense – is also very telling. Mega apps have deeply layered interfaces, and at any given time, only a fraction of their capabilities are visible. Learning to peel back the layers and master these applications can be extremely frustrating and typically takes professional training itself.

  1. 3. What algorithms are used internally by the application?

Animation, special effects, video editing, and other media applications do in a number of difficult-to-learn steps things that could take days or weeks or months to do manually. Figuring out how the application gets its job done and how that relates to the tasks a media professional wishes to carry out – that is what separates gifted artists from gifted artists who also have a gift for software.

  1. 4. What is the process for using the application?

A typical mega application is so complex that there is no way that its two-dimensional interface on a single bit mapped display can walk the user through the suggested processes for using it. And, there is a creative element that is discovered in real time by the artist, anyway.

  1. 5. What is the application’s role in media workflow?

In many media environments, like animation, a wide variety of mega apps must be used in a intricate, iterative, and ever-changing workflow. New apps come along all the time, old ones are extended and reengineered, and specialized products that hook up various mega apps into cohesive workflows are becoming a huge business in themselves.

So, what’s the lesson?

If you want to get a feeling for just how high the bar has been set for the next generation of web apps, think about your favorite desktop mega app (or the downsized, but still sophisticated version or competing product you might be using), and think about these five questions. Now think about the web applications you actually use. Note taking applications? List making applications? Mailers? Messaging apps? See how far we are from making the new web truly compete with the desktop world? Are we really going to switch to net computers that contain only highly limited operating systems and access applications only over the web?

Hmm.


Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: