I teach an introductory 3D animation course at my university. The primary application we use is Autodesk Maya, the gold standard of 3D modeling and animation applications. Maya is a vast application that can take many years to learn and that requires a lot of skill to use effectively. So trying to learn Maya in a semester isn’t realistic.
As it turns out, the do-it-in-a-semester approach is even more challenging – because creating a complete animated video, something my students are required to do, involves other complex applications. At a minimum, scenes rendered into video with Maya have to be edited with a video editing application and a soundtrack synchronized with the video.
The complexities of workflow in animation projects are far more extensive than this, and we will look at this in a future posting of this blog.
Today, we focus on the favored technology for providing application-to-application workflow: plugins. Basically, the world of animation and media apps is exploding with them. Here’s what the landscape looks like.
Making applications fragile: plugins.
Since they might focus much of their career on a single application, animation, photography, video, and audio professionals can panic at the thought of any substantive changes in the application that drives their daily work. One major reason is workflow. Even minor updates to an application can lead to broken plugins. This is because vendors who sell plugins that allow an application to import and export from other applications are often far behind the version schedule of the two applications. This is true even when the plugin vendor sells one of the two applications. Also plugins tend to lag in bug fixing also, as plugin vendors struggle to understand the internal workings of applications they do not sell.
The one-of-a-kind nature of app-to-app communication.
Master each plugin can be a huge investment in itself; this is because complex forms of data are typically being passed between one app and another. And interfaces between various applications tend to be one-of-a-kind, and only partly undocumented. Plugins can also demand that the users program complex configurations.
Moving 3D models between applications, moving scenes between a modeling application and a renderer, if not done very painstakingly, can lead to nasty results. Often, some of the information that the user knows must be configured doesn’t seem to appear in the menus or palettes of the plugin. It might also be true that plugins for two different renderers conflict with each other.
The various kinds of plugins.
Plugins also come in many different flavors, interacting with import/export applications very differently. Here are a handful of sorts of plugins:
1. external app-to-app connections: import/export of minimally-compatible standards.
These are very common, and typically don’t involve any external plugins to be installed. They require that information (such as models or scenes or complete videos) be exported, stored in the file system of the computer, and then separately imported into the second application. Many 3D modeling applications do this by using the OBJ format. But this sort of import/export tends to be highly error prone, as various pieces of information get dropped or damaged.
2. app-to-app connections: richer standards for saving files.
There are a number of popular and emerging standards that are much richer, however, such as the FBX format (from Autodesk, the folks who sell Maya), which can accurately store a large amount of information about an animated scene. Sometimes a plugin must be installed on one app or the other, but many cutting edge applications come right out of the box supporting richer data standards.
Another example is the Collada XML exchange format that can be used to insert models into game engines so they can be used in interactive games.
3. Bringing one app into the GUI interface of another app.
These plugins can greatly ease the hassle of moving data around. The drawing program Google SketchUp can have third party plugins installed that allows it to render to external applications. This sort of plugin will create a menu or palette in one or the other application. (I happen to use the IMSI renderer with SketchUp.)
Plugins can, in this way, effectively extend the capabilities of one application with some or all of the capabilities of a second app. The 3D application Vue, which is used for building environments and settings (like deserts or oceans), can be used from within Maya by using a plugin supplied by the Vue folks.
4. Plugins that allow the re-synching of assets from an outside application.
Poser, the popular application for modeling humans and critters, can be used to import models into Maya, and not only is the Maya GUI extended with Poser options, but Poser can be told periodically (from within Maya) to re-sync models from Poser.
5. Extracting focused assets from one application and importing them into another.
Sometimes the focus of a plugin is to extract highly specialized pieces of information that don’t make up any sort of standard piece of media, like models or images or video. For example, Smart Sound provides a plugin for their application SonicFire, so that sound markers from Apple’s Final Cut video editor can be used within SonicFire to time a soundtrack within SonicFire to a video created with Final Cut.
6. The hardwired connection between Adobe’s Soundbooth and Premiere Pro.
These perhaps hardly qualify as plugins at all, but Adobe has created a standardized format for representing audio so that their sound editor SoundBooth and their video editor Premiere Pro can work together fluidly. Users end up adopting the Adobe format as a primary audio format within their overall workflow.
We will look more at media workflow…
I took an informal survey of people I know, asking them what software they use to collect, organize, and share ideas – and what I got back was a surprisingly broad set of applications. I tend to know Mac folks, and so the list below is rather Mac-heavy.
I collected this list because that’s what the new world of web and desktop apps are all about: IDEAS.
Notebooks as a note taking paradigm. These applications use the metaphor of a physical notebook, often including covers, tables of contents, sections, and after-notes. Check out Notebook and NoteShare. This second product makes it very easy to share (for reading and writing) notebooks over the Web.
Buckets of notes as a note taking paradigm. These applications allow a user to break notes up into folders and subfolders. Check out Yojimbo and SOHO. Evernote is a combined web and desktop app.
Word processing apps. MS Word and Apple Pages, of course.
Networks as a note taking paradigm. VooDooPad claims to do this, but I can’t figure it out.
Organization enforcer apps. These applications embody specific organizational philosophies. One example is OmniFocus. It supports the Getting Things Done paradigm, which I don’t actually understand. There is a web app that does the same thing: Nozbe.
Outlines, task lists, and stickies apps. These simple idea applications are of course extremely popular. Take a look at rememberthemilk and zenbe lists. They are both web based. I hate stickies; they’re a nice way of simulating the mess you get with pieces of paper everywhere. A less annoying variation on this is Edgies; your notes end up stuck to the side of your display and pop out when you click on them.
Email, messaging, and video conferencing. These applications enable dynamic, real-time communication, and are often used to coordinate team activities. Sometimes, you can record interchanges for later viewing and posting. Gmail provides a nice combination of email, messaging, and video messaging capabilities. Skype is a free conferencing app. Commercial ones are WebEx and Adobe Connect; these are subscription services – and they are not cheap.
Photo collections, slideshows, & videos. A lot of photo management apps support the posting of slideshows on the Web. Adobe Lightwave and Apple Aperture are good ones. YouTube is of course widely popular.
Multimedia presentations. These sorts of applications are only now emerging. One powerful (but relatively low-level and still fairly academic) approach to multimedia presentation development is the XML language SMIL; it allows you to program sophisticated presentations involving audio, video, images, text, and animation, and it provides powerful controls for interrelating them in 2-space and over time. Of course, there is PowerPoint and Apple Keynote; they are arguably multimedia, but multimedia is an afterthought with them, and so they aren’t really designed to present truly integrated, multimedia presentations.
Blogs and RSS feeds. Check out this highly informative blog. (It’s mine and you are about to loop.)
Social websites. Yep, Facebook.
Stories, scripts, and storyboards. The Montage movie script writing app and the StoryMill writing app are sold by Marinersoft. Another popular writer’s app is Scrivener. Probably the most popular movie script writing software is FinalDraft. Toon Boom sells a very fancy storyboarding application.
Diaries & journals. Two popular ones are Mariner Software’s MacJournal and Journler.
Spreadsheets & simple database systems. We all know about spreadsheet apps, but there is a simple desktop database app from Apple, called Bento.
Mind-mapping. Most folks know about MindManager. Another one is Curio. There is also one with a pretentious name: TheBrain.
Wikis, forums, and content management sites. You can download free server-based apps to install on your server from bitnami, and there are websites that provide free services, in case you don’t want to spend countless hours figuring out how to build a wiki or forum site, or (Heaven help you) a Drupal or JOOMLA content management system. Google Sites is very good.
Screen & audio capture. I like Camtasia for screen and audio capture. But there are others, like Sreenium. Good audio capture apps are WireTap and the free Audacity. There are lots of higher end audio apps, like Peak and GarageBand. You don’t need me to give you a url for GarageBand.
Finally, I note that there are many applications, including some of the ones above, that provide multiple ways of managing ideas.
Managing Animation Assets.
One of the most challenging applications of next-generation web technology is the support of sites that provide media assets that are used in 3D animation. Animation applications are used in television and movie productions, training and informational videos, product design and CAD, magazine ads, and on websites. Animated projects can be extremely expensive to develop, demanding highly skilled and experienced artists who are familiar with complex applications for modeling, animating, rendering, video and sound editing, storyboarding, and compositing.
This has created a rapidly-growing, high-dollar market for animation assets. There are a handful of sites that sell extensive libraries of animation assets, in particular 3D models of characters, buildings, vehicles, weapons, animals, plants, cityscapes, and the like. These models are used by a wide class of individuals, including professional animators, architects who want to place their designs in attractive surroundings, medical and scientific writers, and hobbyists.
These assets can vary from being free to costing several hundreds of dollars. They can be rudimentary, or extraordinarily detailed and real-life.
But it can be very problematic to search these sites. Why?
Lack of tagging standards.
There are standards for tagging image and video data, publications, and many other web-based resources. These include MPEG-7, the Dublin Core, and the Metadata Object Description Schema, which have been discussed previously in this blog. But when it comes to complex forms of information, such as animation assets, it is a free-for-all, and the searching process is manual, highly iterative, and painstaking, even if you already know what sites are likely to have content you are interested in.
Complexity of evaluating an asset.
Another problem is that it is time consuming to evaluate even a single asset once it has been identified. Models have to be read into animation applications and they are highly complex. Often, you can’t even download a model without buying it.
Interdependency among collections of assets.
Assets like animation models typically must be used in combination with other models and elements of animated scenes. There can be many conflicts between assets, such as highly varying vertex and line densities, differing artistic style, and in their materials and textures. To put together a reasonably matched sets of assets can be extraordinary time-consuming.
Complex and error-prone import/export processes.
There are also dozens of commonly-used model formats used by modeling and animation applications. There are indeed a handful of standards, with names like obj, FBX, and Collada. Translating between the many proprietary and standardized formats can be very error-prone (with information being lost or changed), and the process often demands that the animator have access to applications that he or she doesn’t own. And these applications can easily cost thousands of dollars, and take years to master, and so a given animator is only likely to actively use a small set of them, often only one.
The challenge of Web 3.0.
Some people define Web 3.0 as the successor to Web 2.0 technology, which is meant to produce web applications that approach desktop applications in their interactive performance. Web 3.0, some say, would extend this technology to web apps that make use of advanced media in their interfaces and/or provide access to large media bases. Perhaps the biggest challenge facing Web 3.0 developers would be to attack the problem of animation assets, in particular, tagging, organizing, interrelating, searching, evaluating, and transforming them.
More on this in future entries of this blog…
In the last posting, we looked at the various ways in which folks (like me) with limited vision who are using the Web can help themselves and be helped by others. Today, we look at another issue, one that at first glance, seems to not be directly related to the Web: electronic book readers. However, portable devices are becoming extensions of the Web and the Internet; as more material becomes distributed as electronic downloads, this issue is actually very relevant indeed.
There is a controversy going on over Amazon’s Kindle reader, and whether it should be used by colleges and universities as a cheaper way to distribute textbooks. The issue has to do with the difficulty that blind students and students with limited vision have when using the Kindle. Objections to its use might be pressuring Amazon to support better text-reading technology, adjustable/large fonts, and audible menus.
I do have to say that the look and feel of readers like the Kindle do provide many folks with limited vision a far more accessible reading source than traditional computer screens.
But these devices could make life far easier for folks with limited vision. Here are some thoughts:
The limitations of paper.
There several problems with paper. Fixed font sizes is one.
So is the tendency for (especially cheap paperback) books to have lines of text packed tightly together. This makes it hard to track lines across the page. And people who have distorted vision because of diseased corneas and other conditions find that lines of text lay on top of each other; when lines are separated by sufficient blank space, and when fonts are reasonably large, distorted vision becomes far less of a problem.
Another problem has to do with the cheap, low contrast paper that is used in paperbacks and professional books. Almost ironically, slick, expensive paper can be a problem too, because its reflective nature increases distortion.
We all have (or will have) limited vision.
In truth, electronic book readers can benefit a large chunk of the reading population. If you live long enough, you are quite likely to develop some sort of vision problem. Ever noticed that ophthalmologist and optometrist offices are often filled with older folks? Down the road a ways, that could be you with nearsighted vision or an astigmatism that can’t be fully corrected with glasses or contacts, or with developing cataracts or macular degeneration.
In fact, somewhere around the age of forty, most of us start needing reading glasses, and the fixed focal length of aging, stiff lenses start making the process of reading somewhat less fun.
Cheap books are often hard to open flat, making magnifying glasses harder to use, as well.
Gracefully adaptive is the answer.
Many folks with limited vision routinely change fonts sizes and styles on electronic documents that we must read. We monkey with browser settings.
Supporting highly flexible settings, rather than just a few alternative font sizes, would make electronic book readers far more adaptable and usable. Being able to increase the space between lines and to change font styles would be a great help.
Being able to turn audio reading on and off fluidly and being able to speed audio reading up for skimming would be great, too.
Also, being able to render images with varying quality would help individuals make that trade-off between visibility and rendering time. Many of us would gladly wait a few seconds to have a clear, sharp image pop up.
The bottom line.
Good/bad vision is a spectrum, with only very young and healthy people being at the far “good” end. Electronic book readers offer an incredible opportunity to make reading far easier for countless people. All we have to do is engineer adaptability into the devices.
I’ve been a little remiss lately in posting to this blog. I try to do it every week, but I’ve recently had one in a long series of eye surgeries, and I have been having trouble reading printed material and computer displays. This blog is about advanced Web technology, and for this posting, I’d like to look at the plight of folks with limited vision who are trying to be part of the web world.
Over the past few years, I have had cornea transplants. The corneas are the clear outer surface of the eye, which pre-focusses light for the lens. I have also had cataract surgery. This is done to replace diseased natural lenses with plastic ones. I am also super-super-nearsighted and have extreme astigmatism in both eyes.
Limited vision and the Web.
Here are some of the things that have made a difference or hold promise for the future:
The basic approach: lower screen resolutions.
This is what a lot of us already do. I use a 30 inch display and set the pixel density far lower than its maximum. I crank up the brightness. This is something that an individual can do on their own, assuming they can afford an expensive display.
The application approach: enlarging fonts on GUIs and in viewing windows.
This is something else I do. It’s hard to enlarge fonts in the GUIs of applications (like Microsoft Word), but many applications allow you to enlarge fonts inside the main work window, without enlarging fonts in the document (or other artifact) that is being created.
The browser approach: browser plugins for enlarging areas of webpages.
There are a couple of these out. I have a sister who is a science writer, and she pointed me toward http://lab.arc90.com/experiments/readability/ ( I have since learned that there are others). This plugin works with Firefox. It extracts text from webpages and blows it up.
The webpage standards approach:
screen readers for text, graphics, and markups,
the use of high contrast colors and special style sheets,
alternative viewing pages and search pages.
Much of this is actively underway, but it has been very slow to be deployed in the real world. This approach calls for a lot of cooperation between web developers, web standards folks, and society in general to make a significant investment in developing effective technology, in particular screen readers. These readers allow people with limited vision to hear written text, and descriptions of graphic images and internal pieces of webpages. Alternative pages for use by folks with limited vision might not be cost effective for a business to construct, but there is a precedent: many companies (like Amazon) have alternative pages for extra-small devices, and ironically, these can be easier for folks like me to use because they get rid of many of the noisy images, boxes, lines, and unnecessary text.
Online volunteers: the social approach.
This is something that is very promising. If volunteers can make themselves available via phone or Skype when a person with limited vision needs help, they can quickly pull up the same webpages and walk the person through the process of using them. Over the years, I have had my wife and kids do this for me, often when we are not even in the same building.
So, what’s the bottom line?
Well, it’s very hard for people with limited vision to use the Web, and the growing use of video and images and animations in webpages (something that is arriving hand-in-hand with Web 2.0/3.0) is making it all the worse.
… Consider volunteering to be available on the phone or on Skype for a friend or relative or coworker who might need occasional help. And if you are a Web developer or operate a Web-based business or organization, consider the PR coup that would come with being at the forefront of making your website more accessible!
This blog is dedicated to the Semantic Web and Web 2.0/3.0 technology. In this posting, we consider privacy and the Semantic Web.
The Traveler had it easy.
There is a series of three science fiction novels by John Twelve Hawks. They concern a “Traveler” who battles the “Vast Machine”, which is a global grid of security cameras, governmental and corporate databases, and computers that collect information on people, track them, and manipulate society. They are very popular novels.
But these books are not all that imaginative.
Why not? If and when the Semantic Web ever emerges (please see previous postings of this blog), there will be a lot more than security camera footage and passive database systems out there. In his books, Twelve Hawks describes programmers working for the Vast Machine who pull information out of databases and plant information in databases, and who somehow locate and integrate information from many sources. It’s not clear how they do it.
The problem is tractability. Extracting the meaning of data (its “semantics”) is extremely difficult, and given today’s Web, it is a highly manual, painstaking, and ultimately intractable problem. Twelve Hawks’ Vast Machine isn’t all that much of a threat.
Consider, however, the emerging Semantic Web.
The whole idea of the Semantic Web, on the other hand, is to make databases proactive, to let them announce their content by using globally accepted standards. In this blog, we have looked at one proposed standard, called RDF, which is based on “triples” that interrelate information, and a Web-hopping query language called SPARQL that can concatenate triples that define information at diverse, independently-created websites – thus inferring new information. We’ve looked at the beginnings of this technology as it is taking form on the Web.
In other words, it might not be long at all before the least of our problems would be dastardly hackers who break into databases and pluck information – because the finding, integrating, and interpreting of data from highly divergent sources will become, in large part, automatic.
It will make the intractable quite tractable.
Okay, I confess…
It is not as simple as that, of course, and I am grossly overstating the danger. Presumably, private databases belonging to corporations and governments will not be loaded up with this sort of semantic metadata and placed on the open Web. And the sorts of inferences that can be made by unifying metadata from multiple sites will be fairly low-level, leaving a lot of difficult work for any Vast Machine that wants to manipulate our every move and thought.
But it is true that the potential for misuse will increase sharply. There will indeed be many isolated instances where innocently posted information from two or more sites will be automatically linked together because of uniformly-specified metadata. If one triple at one site has data marked up as “People OWN Kinds-0f-StampCollections”, and another site says that “Kinds-of-StampCollections HAVE Certain-Values”, a thief who knows little about philatelics might learn that Bob owns stamps from the Southern Confederacy, and that stamps from the Southern Confederacy are worth hundreds of thousands of dollars…
Just a thought for the next sci-fi writer.
Multimedia in computer science departments.
I teach in a computer science department, and in the previous posting of this blog, I argued that universities and colleges have been very slow to introduce basic animation skills into their curricula. In this posting, I argue that the same is true for basic media management skills, and that this is also a critical area of study for computer science students.
It’s part of a broader expansion of the discipline.
Well, for starters, the bounds of computer science have shifted and expanded greatly. It’s not about the development of techniques for building operating systems and compilers, and formal specifications of algorithms and their running costs, and the like. Not any more, it’s not. Much of the old problems now have fairly settled and widely used solutions. We are increasingly focused on the development of web-resident information systems, the automation of web searches, the development of web services, network and database security, medical information systems, the modeling of complex 3D models in engineering and entertainment, and the like – things that have been discussed in previous postings of this blog.
Another key area is the management of video, images, audio, animation, documents, and other advanced forms of media. These topics have also been discussed in this blog in the past.
Academics and the ignorance of real world tools.
It’s not that we academics don’t know that this is a critical area. The problem is that computer science faculty typically know little or nothing about large, commercial applications for creating, manipulating, and storing media, or about the emerging standards for formatting and tagging media.
But more significantly, there is a stuffy, longstanding belief on the part of computer science academics that teaching such practical things would turn our departments into trade schools, and that we teach “principles” and “formalisms”, and that we prepare students for the next fifty years, not the next five years.
Universities just have a lot of trouble evolving. We are big machines with tremendous inertia.
The necessary skills.
So what do students need to know? I admit that there is a broader question here. What is the right compromise between abstract, longstanding concepts and hands-on experience with real world tools? But surely, nobody thinks we should essentially ignore the enormous software technology base that is out there?
We cannot continue to turn out students who are only mildly aware of the vast sea of desktop and web applications for managing media, database management tools for storing and searching media, processing full text and natural language, compressing and cleaning audio and video, editing sound and video, standards for formatting images and sound and video and 2D/3D models.
You have to have some idea of what technology is out there if you are going to build the next version of that technology!
But maybe we’re doing too little too late.
Besides claiming that this knowledge is not “academic”, computer science departments claim that students pick this kind of stuff up on their own. This used to be a ridiculous claim. But in truth, energetic computing students do indeed pick this stuff up now, at least to some degree. But this is really an inditment of academic computing: the Web has become a vast, formal and informal learning grounds, and it is eclipsing computer science departments to a large degree.
Where to go from here.
So, what’s the real point?
We need to train a new generation of faculty members, radically evolve our curriculums, build computing labs that are equipped with advanced media applications and storage managers that professors actually understand, and above all else, reevaluate our position in the learning world. Students are turning away from us and toward a vast array of video, textual, and audio learning tools that have exploded onto the Web.
What’s missing in computer science curriculums?
I teach in a computer science department, and one thing is painfully true: universities and colleges have been very slow to introduce basic animation skills into their curriculums. This is a big problem.
2D and 3D graphics and animation are popping up everywhere, and more and more, programmers discover they have to be part time artists. This is particularly true for developers of Web 2.0/3.0 apps. Web app developers find themselves using graphics tools to build both user interface controls and to create animated models. And, as applications that can convert 3D models and animations into lightweight renderings become more efficient and more powerful, Web app developers are having to add 3D tools to their quiver of arrows.
2D animation engines include Adobe Flash, Microsoft Silverlight, and HTML 5. The drag-and-drop application that generates Flash animation, Adobe Flash Developer (recently renamed from Flex Developer), along with Microsoft Blend, which generates Silverlight code, give the programmer a way to develop animation without having to work with an artist’s interface. But in truth, drag-and-drop tools only get the programmer so far. In order to refine, extend, and debug interfaces, the programmer has to master the two XML languages used by Flash Developer and Blend (MXML and XAML, respectively) to define interface components and 3D models. The programmer also needs to be comfortable with the two languages that these XML specifications compile down to (ActionScript and C#). and that means learning their extensive animation capabilities.
And it’s not just small scale modeling and animation.
Sophisticated 2D and 3D animation is also confronting the young programmer. Game, feature film, animation short, training video, and TV show development are rapid growth areas. Interestingly, while powerful GUI-based applications like Autodesk Maya, Autodesk 3DS Max, Toon Boom Animate, Vue, and Poser are used largely by non-programmers, there is a critical niche for the programmer-animator. Scripting languages are used to perform many basic modeling and refinement tasks. Not to mention the fact that someone has to build these huge animation apps, and these folks, well, they’re programmers. The point is that it’s hard to build an application that creates things you do not understand.
The emergence of canned content and cheap animation apps.
There is also an explosion of applications that provide canned animation capabilities, and there are a growing number of websites that sell animation content. This makes it feasible for programmers to create basic animations for websites and desktop applications, without the need for full-blown animation artists. DAZ3d.com and contentparadise.com are two highly popular content sites. And sophisticated animation projects can be developed with applications that are cheap (or free). These include DAZ, Blender, and Carrara.
The bigger picture: the boundaries between disciplines are breaking down.
Perhaps the most compelling reason for universities and colleges to start treating animation as a first class academic citizen is that the nature of computing itself is rapidly undergoing an expansion. Computer science graduates are finding jobs in the financial, communication, genetic engineering, mechanical and electrical engineering, alternative fuels, architecture, advertising, business, and medical industries – and all of these professional disciplines have substantive animation components. It’s the age of merging fields, with borders collapsing, and computing skills becoming necessary in almost all walks of life. As non-technical types must be able to do basic programming and software configuration tasks, programmers are learning that they need a non-programming area of expertise in order to stay competitive – and tossing animation skills into the pot is a sure plus.
What “declarative” really means
In programming languages, we use the word “declarative” to refer to a language that does not force a programmer to specify more sequencing information than is strictly necessary. The idea is for the program to tell the computer what needs to be done, and not precisely how to do it. Instead of an algorithm, we provide a static specification of what the result will look like. In an imperative (or non-declarative) language, the programmer might specify that an array is to be read from position 0 to position 99, and that at each position in the array, the value at that position is to be increased by 1. In a declarative language, the programmer might be able to simply state that every entry in the array is to be incremented by 1.
But what does the word “declarative” really mean, English-wise? Well, it refers to the process of making a declaration, of making a formal statement about something.
Searching web-based media assets: today’s tools
What does this have to do with the Semantic Web and/or Web 3.0? That’s what this blog is dedicated to: next generation web technology.
A major growth area for the web will be applications that manage complex forms of media, and the automatic searching of blob and continuous media, such as images, video, sound, animation, 3D models, and of mixed-mode media. These will present a major challenge. Simply put, our best technology for making advanced forms of media searchable is tagging. And this low-level tool doesn’t come close to allowing us to search according to the true meaning of media assets. Searching for blob and continuous media is still painstaking and manual.
So, how could we make things like video and 3D models more searchable? How could we improve the search process? Two important technologies offer significant help. The first is more sophisticated, high level, and content-ful tagging protocols, such as MPEG-7. Another is image processing, which is actually a highly developed area, since the U.S. government has poured many millions of dollars into it over the past several decades. It’s also true that language processing tools have been used to parse and interpret textual descriptions of media, but this sort of freeform analysis is difficult to make accurate and predictable, given the extreme complexity and ambiguity of natural language. People write “stories” with language, and a long piece of text has to be read from beginning to end, in order to understand it.
What about using notes?
But perhaps the future lies is a sort of compromise technology, one where tagging information made with tools like MPEG-7, combined with image processing, and/or natural language processing, is used to cut the search space from many thousands of media artifacts to something that could be processed interactively by humans. This is in contrast to downloading potentially huge files and viewing them in real time. Even downloading small video and audio clips and low-pixel count preview images can overwhelm the average interactive web user. These often don’t give an accurate vision of what the full pieces of media contain.
The answer might lie in highly organized “notes”, written with note-taking applications. These applications provide quick, compact, and highly visual ways for people to document their thoughts. They range from lists to outlines to hierarchically structured blocks of text to diagrammatic “mind-maps”. Note-taking applications often support video and images and sound; in a way that might seem ironic, an individual could create mini-multimedia artifacts to facilitate the searching of large multimedia assets. But in truth, this is could be a very powerful technique – because one of the primary attributes of most note-taking applications is that they provide “at-a-glance” semantics. In other words, if used right, a note or a list or a mind-map is captured on in a single screen image. And, when users build more complex notes, these applications typically facilitate very top-down structures. Notebooks have tables of content; hierarchical notes have root nodes; mind-maps are expandable.
And above all else, there is something about the note-taking philosophy that encourages compactness. In other words, they are in a sense, declarative. A note makes a quick, firm statement.
This blog is dedicated to the discussion of emerging web technologies. Today, we look at a the rapidly growing world of media applications, and their impact on the Semantic Web.
The problem of searching for media assets.
We’ve already looked at advanced media, in particular video, audio, and animation data, in previous blog postings. In particular, we’ve looked at the subtle and complex nature of media asset semantics. We’ve seen that interpreting a piece of video, for example, is far, far more difficult than interpreting an integer or character field. Since the goal of the Semantic Web effort is to make the searching of the web highly automated, advanced media is becoming a huge and critical research and development focus for the builders of next-generation web development applications.
Just how do we provide an environment where media assets can be searched in a mostly automatic fashion, so that a human does not have to painfully paw through hundreds or thousands (or millions) of video chunks to find the right one? We’ve looked at emerging technologies for marking up advanced media information, and for making it usable in a variety of web applications. We’ve also looked at the dramatic challenge presented by mega apps to would-be users; the interfaces to these applications are truly massive and cannot present to the user the way in which they are meant to be used.
The problem of proprietary formats.
One specific, and very difficult problem, is the massive heterogeneity, not just of media formats, compression technologies, and container technologies, but of the applications themselves. If we are going to automate the searching of complex modeling, video, audio, and other media assets, we’re going to have to address a key question: since many media apps make use of their own proprietary data formats, how are we going to provide automated ways of searching media assets that are stored in these formats?
The problem of highly imperfect generic formats.
There are indeed many existing, as well as soon-to-emerge, standards for importing and exporting data between powerful media applications, but transformations in and out of these formats are often “lossy”, in that information is lost or changed. In fact, locating and downloading assets that are in supposedly-generic form is often very frustrating, because these assets end up not performing well. They can be difficult to edit and reuse. 3D animation models regularly blow up when animators try to import them into animation applications and the manipulate them. A hawk may look like a hawk until you try to render it with its wings flapping, and suddenly it’s a blob of geometric garbage.
One possible direction.
So, what do we do about the fact that many media assets must be manipulated by the original applications that created them? How can we facilitate reuse? It’s extremely unrealistic to expect users to master perhaps dozens of video or audio or animation applications. Filtering assets according to their file extensions is a good idea, and it is a well established practice.
But what we really need is a globally-known site that either literally or conceptually centralizes the massive network of import/export relationships, along with information about the relative success of these mappings. Are they ever lossy? If so, can they be fixed? What series of applications might we want an asset to be imported/exported through so that in the end it is in a usable format, given the applications that the user owns and has mastered?
There is much to be done. Right now, searching for and reusing media assets is a painstaking, trial-and-error-prone process.