Posted by: Roger King
So, what makes data difficult to visualize? In other words, if our goal is to provide an interactive, highly visual way of searching, using, and updating data, what are the big challenges?
The last blog posting.
In the last posting of this blog, we looked at visualizing data as charts and graphs.
We noted that even when all we want to do is visualize data as relational tables, this can be a huge challenge if the tables have a large number of tuples. If scrolling through tables is intractable, we might want to use a query language to focus our search. But we also noted that writing SQL queries in a visual notation is difficult if the queries themselves are anything but very simple.
And, we noted that 3D graphics and animation is being used to search data that has a natural 3D-plus-time nature to it, such as maneuvering through a factory assembly line.
In this posting, let’s consider two other challenges facing developers of data visualization technology.
When the data is naturally visual, but when the average size of a data item is huge.
Searching a video or audio database can be extremely time-consuming. The problem is that what we really want to do is search data by its true, innate meaning. And what is the meaning of a piece of video? Well, in the worst case, you have to watch all of it and then decide on your own just what it means. We can’t watch an arbitrary number of long video segments.
There are ways to attack this problem, and they take the form of various tagging mechanisms.
In essence, we build a parallel database that contains descriptive data about the video (or audio) database. We then use a query language to search this parallel database, and when we find a set of tags that seem to identify what we want, we then go ahead and download the video or audio pieces that is associated with these tags and examine them. A very popular (and very sophisticated) facility for tagging media is called MPEG-7.
It’s also true that image processing techniques can be used to process video data directly, without a human having to view it interactively. Often both tagging and image processing are used.
So, we might search for pieces of video that are 1) in color, 2) shot between 1960 and 1963, and 3) contain images of President Kennedy. That last one, item 3, might be determined either by looking at tags or by using image processing.
When the data is abstract in nature and there is no obvious visualization.
Interestingly, it is something at the other end of the visualization spectrum that can prove even more challenging. What about data that isn’t innately visual, like financial data? This is particularly problematic when the data has many dimensions.
I teach database systems at the university level, and it is interesting to note that when textbooks cover data warehousing, which is a way of aggregating and searching data, the fictional applications used in the textbook diagrams have three dimensions. So, we might build a warehouse of financial data and lay it out in chunks that can be searched by Product_Number, Sales_Region, and Date. This example can be laid out very nicely in a diagram, where these three data dimensions form cube, and searches correspond to isolating useful sub-cubes.
But suppose our data has to also be searchable by Customer_ID, Shipment_Cost, and Warrenty_ID? Now, it’s kind of hard to draw a simple, elegant diagram. This gives us some insight as to why it is difficult to visualize complex data. Real world data can have dozens or hundreds of dimensions.
The larger question.
In my class the other day, we were discussing data visualization. A student asked a very good question: Is it even reasonable to be trying to do this? Is it a good idea to visualize multi-dimensional, abstract data? Are we trying to fly over the Atlantic on a motorcycle?
We don’t actually know the answer to this yet. There are things we could be trying to use, like particle dynamics and visual metaphors.
More, next time…