Posted by: Roger King
There are many research and development areas in computing that come and go in cycles. At the top of a cycle, an old challenge is attacked with new supporting technology and new minds – and a significant advance is made. At the low end of a cycle, only very incremental advances are made and it seems that there might never again be any significant advances.
Data visualization is one of those perenially-challenging areas.
Researchers and developers revisit this domain periodically, and occasionally, significant advances are made. As is the case with many long-standing technical challenges, a handful of problems tend to remain largely unanswered.
What is data visualization and what are the long standing problems?
Creating a pie chart or bar graph is a form of data visualization.
In general, aggregation visualizations of common business data are one of the most powerful and well developed data visualization techniques. Lots of applications will create charts, tables, and graphs with a few simple commands.
But what about visualizing non-numeric data or data with many dimensions? It’s hard to use a graph for anything more than data with three dimensions (usually named x, y, and z).
Laying out relational tables visually is a form of data visualization.
Viewing diagrams of relational tables is a form of data visualization – but only if it provides visualization of the data in tables and not simply the structure of the tables themselves.
By laying out the attributes (like Customer Address and Name), the domains of attributes (like character strings and integers), Keys (like Customer Number), and other properties of database tables, a database user can quickly master the structure of a database.
But viewing relational data itself remains a big challenge.
Scale has a lot to do with this problem. There are often far, far more rows in tables than can be effectively searched visually.
Searching blob data (such as images) and continuous data (such as video) is a big challenge.
Relational tables often have attribute values that are difficult to search visually because of their huge size and their tendency to have content that is difficult to quantify. This includes data in the form of images, video, and audio. In general, searching media bases visually remains a key, major challenge.
Current technology relies heavily on using tags as surrogates for the actual data itself. We search for videos in .mov format, that are less than two minutes long, show images of the inside of Winchester Cathedral, and can be used without paying royalties – but we do all this without actually viewing the video until we have narrowed our search space down to a handful of videos.
Graphical languages for writing SQL queries remains a huge challenge.
Even when data is made up entirely of traditional business data like characters and integers, graphical querying remains a huge problem. Again, this is because of scale. How do you write queries that manipulate several tables and contain complex expressions?
Query by Example, which is very, very old, still stands out as one of the biggest advances in this area.
Viewing 3D data based on real world environments is a significant, more recent development.
There has been an explosion in 3D graphical and animation applications. It is now quite tractable to create realistic “immersive” environments for viewing such things as factories, warehouses, and business processes.
Importantly, this process has been made tractable with modern modeling and animation applications. In particular, there are simple modeling applications (like Google SketchUp) and lots of canned content (such as the models that can be downloaded and used with Google SketchUp).
There are a lot of canned movement tools, as well, things that can be used to walk a character through an environment or move a product down an assembly line. These are far less developed, but iClone is worthy of consideration. (See reallusion.com.)
We can be a lot more than simply passive with our 3D worlds, too. Tools for driving these applications graphically have been developed. This relies on real-time rendering in response to user input and borrows heavily from gaming technology and simulated worlds (such as those found in secondlife.com).
But what happens when the data is abstract and/or has many dimensions, and therefore does not lend itself to obvious, natural visualizations?
More next time…