Buzz’s Blog: On Web 3.0 and the Semantic Web

Jan 22 2011   11:16PM GMT

Visualizing media-rich data: Part 2

Roger King Roger King Profile: Roger King

So, what makes data difficult to visualize?  In other words, if our goal is to provide an interactive, highly visual way of searching, using, and updating data, what are the big challenges?

The last blog posting.

In the last posting of this blog, we looked at visualizing data as charts and graphs.

We noted that even when all we want to do is visualize data as relational tables, this can be a huge challenge if the tables have a large number of tuples. If scrolling through tables is intractable, we might want to use a query language to focus our search.  But we also noted that writing SQL queries in a visual notation is difficult if the queries themselves are anything but very simple.

And, we noted that 3D graphics and animation is being used to search data that has a natural 3D-plus-time nature to it, such as maneuvering through a factory assembly line.

In this posting, let’s consider two other challenges facing developers of data visualization technology.

When the data is naturally visual, but when the average size of a data item is huge.

Searching a video or audio database can be extremely time-consuming.  The problem is that what we really want to do is search data by its true, innate meaning.  And what is the meaning of a piece of video?  Well, in the worst case, you have to watch all of it and then decide on your own just what it means.  We can’t watch an arbitrary number of long video segments.

There are ways to attack this problem, and they take the form of various tagging mechanisms.

In essence, we build a parallel database that contains descriptive data about the video (or audio) database.  We then use a query language to search this parallel database, and when we find a set of tags that seem to identify what we want, we then go ahead and download the video or audio pieces that is associated with these tags and examine them.  A very popular (and very sophisticated) facility for tagging media is called MPEG-7.

It’s also true that image processing techniques can be used to process video data directly, without a human having to view it interactively.  Often both tagging and image processing are used.

So, we might search for pieces of video that are 1) in color, 2) shot between 1960 and 1963, and 3) contain images of President Kennedy.  That last one, item 3, might be determined either by looking at tags or by using image processing.

When the data is abstract in nature and there is no obvious visualization.

Interestingly, it is something at the other end of the visualization spectrum that can prove even more challenging.  What about data that isn’t innately visual, like financial data?  This is particularly problematic when the data has many dimensions.

I teach database systems at the university level, and it is interesting to note that when textbooks cover data warehousing, which is a way of aggregating and searching data, the fictional applications used in the textbook diagrams have three dimensions. So, we might build a warehouse of financial data and lay it out in chunks that can be searched by Product_Number, Sales_Region, and Date.  This example can be laid out very nicely in a diagram, where these three data dimensions form cube, and searches correspond to isolating useful sub-cubes.

But suppose our data has to also be searchable by Customer_ID, Shipment_Cost, and Warrenty_ID?  Now, it’s kind of hard to draw a simple, elegant diagram.  This gives us some insight as to why it is difficult to visualize complex data.  Real world data can have dozens or hundreds of dimensions.

The larger question.

In my class the other day, we were discussing data visualization.  A student asked a very good question: Is it even reasonable to be trying to do this?  Is it a good idea to visualize multi-dimensional, abstract data?  Are we trying to fly over the Atlantic on a motorcycle?

We don’t actually know the answer to this yet.  There are things we could be trying to use, like particle dynamics and visual metaphors.

More, next time…

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • MindXchng
    I think that the differentiation between nominal, numeric data on one hand and continuous data such as Video/Audio data on other hand need a bifurcation at a much higher conceptual level. What I mean is that we can for example try to find the difference between humans and dolphins via one to one comparison, but we can also recognize that in biological nomenclature, these two species diverge at much higher classification level. I think same goes for data. Quantized data such a numbers and continuous data such as audio, video need much different level of treatment, because audio and video data types don't belong to the same data type as numbers, unless audio//video is decomposed at a sufficient level to match the numeric datatype. Then the question is what category audio/video(referred as AV datatype) belong to? do we have data classification nomenclature which has a place holder for such continuous datatype? or are we going to decompose (via tags in parallel databases) so that we know what the AV data mean in numeric datatype world. In decomposition approach, using fast processing power and using image processing, video/audio can be decomposed at a sufficiently detailed level to afford dissecting the AV data and extract meaningful numeric equivalent. With multiple dimensions added to AV data, the amount of memory needed to hold and subsequently to process this data can be huge. But with faster parallel processing processors and image processing, the AV data with its dimensions can be stored in a multi-dimensional matrix. I think and just my opinion is that with mathematical processing applied to AV data, we are trying to attach the mathematical meaning to the data; that is we are trying to quantize the continuous data so that it is suitable to processing by routines that have historically acted only on numeric data. If that is the case then each decomposed component of the continuous data need to be processed separately and then a aggregate function such as Artificial intelligence (AI) need to be used to correlate each individual results to interpret the aggregate data and attach that meaning (AI output) to the continuous data. This is just one approach to handle the AV data, I am sure there may be other heuristic approaches (as mentioned in the article, particle dynamics, visual metaphors..)too, but I think the different approaches need to be evaluated to find which one lends more suitable to finding the Av data classification that fulfills the goals of 1. Extensible, 2. uniform approach, 3. borrows from the best of the solutions we have in related disciplines, 4. has potential to lend itself to faster and thus smaller (in terms of time required for meta data handling) solution to the problem of interpreting the AV data. Thanks,
    0 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: