BI archives - Custom Application Development

Custom Application Development:

BI

Jan 31 2009   10:12PM GMT

Data Normalization - Know Your Data



Posted by: Joe Coley
Database Design, custom application development, Database reporting, Business Intelligence, BI, database normalization

A post here in these ITKnowledgeExchange blogs that recently caught my eye was this one written by Stephen Harris entitled “Data Challenges Can be Solved With Business Intelligence“.  It is a rather lengthy post touching on several points about data challenges and BI.  What I immediately latched onto in his post was what he refers to as a motto - “Thou shalt know thy data“.

While I have never phrased my firm belief in knowing your data in the way he does, I certainly agree that knowing your data is an absolute must.  Furthermore, his reference to cleansing, auditing, securing, managing and refreshing data is also an essential ingredient toward any meaningful reporting - never mind the special requirements for an effective BI implementation.

Once again I find myself “down sizing” information and ideas I read about to the needs of the businesses which I service, the small ones.  I’ve blogged recently about reporting requirements in these economic times, and certainly “…having information about your business at your fingertips…” is critical, not just a “nice to have”.

Reporting, BI and data “cleanliness” all depend to some extent upon the normalization of the data.  I can’t imagine trying to normalize a database without knowing your data.  If you would like a quick introduction to the topic of normalization I found “Introduction to Data Normalization: A Database “Best” Practice” to be an excellent place to start. 

As with so many areas in development there are multitudes of tradeoffs which come into play with the design of a database.  It is absolutely critical that the developer know and understand the data pieces (fields) and how they relate, but just as critical is that the developer understand the reporting requirements and other characteristics of the data, the database itself, the network and hardware platform, and “how” data will be queried.  Many speed issues can actually be caused by a database which has been normalized to such an extent that in order to provide the reporting required in an acceptable time span many extra steps are required to prep the data for the presentation sequence desired. 

The more up close and personal a developer is with the data the greater the opportunity there is to evaluate the data quality.  After there have been a number of changes in the form of additions and subtractions to fields or tables in the database it is a good practice to review the design again to determine if there are changes that should be made to further normalize the database.  My experience indicates that often changes are desired.