Custom Application Development

Jan 31 2009   10:12PM GMT

Data Normalization – Know Your Data

SJC SJC Profile: SJC

A post here in these ITKnowledgeExchange blogs that recently caught my eye was this one written by Stephen Harris entitled “Data Challenges Can be Solved With Business Intelligence“.  It is a rather lengthy post touching on several points about data challenges and BI.  What I immediately latched onto in his post was what he refers to as a motto – “Thou shalt know thy data“.

While I have never phrased my firm belief in knowing your data in the way he does, I certainly agree that knowing your data is an absolute must.  Furthermore, his reference to cleansing, auditing, securing, managing and refreshing data is also an essential ingredient toward any meaningful reporting – never mind the special requirements for an effective BI implementation.

Once again I find myself “down sizing” information and ideas I read about to the needs of the businesses which I service, the small ones.  I’ve blogged recently about reporting requirements in these economic times, and certainly “…having information about your business at your fingertips…” is critical, not just a “nice to have”.

Reporting, BI and data “cleanliness” all depend to some extent upon the normalization of the data.  I can’t imagine trying to normalize a database without knowing your data.  If you would like a quick introduction to the topic of normalization I found “Introduction to Data Normalization: A Database “Best” Practice” to be an excellent place to start. 

As with so many areas in development there are multitudes of tradeoffs which come into play with the design of a database.  It is absolutely critical that the developer know and understand the data pieces (fields) and how they relate, but just as critical is that the developer understand the reporting requirements and other characteristics of the data, the database itself, the network and hardware platform, and “how” data will be queried.  Many speed issues can actually be caused by a database which has been normalized to such an extent that in order to provide the reporting required in an acceptable time span many extra steps are required to prep the data for the presentation sequence desired. 

The more up close and personal a developer is with the data the greater the opportunity there is to evaluate the data quality.  After there have been a number of changes in the form of additions and subtractions to fields or tables in the database it is a good practice to review the design again to determine if there are changes that should be made to further normalize the database.  My experience indicates that often changes are desired.

1  Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • GarretMott
    "After there have been a number of changes in the form of additions and subtractions to fields or tables in the database it is a good practice to review the design again to determine if there are changes that should be made to further normalize the database. " I couldn't agree more! Seeing systems that have been "tweaked" for a number of years, things can be a mess. In particular, the practice of reusing an existing field for new data can cause problems. Remit# holding the room occupancy does *not* make reporting easy! Disk space is cheap - add a new field. Then, when it's housecleaning time, delete the one that's no longer used. Normailzation can be carried way too far. Cost info is data I often see mis-reported. Normalizing a db may make you think you should use the cost info in the related Inventory table (instead of storing it in the OrderItem table). Problem is, that gets you the current cost, not the cost of the item when it was sold. Shipping Addresses are another example. Store them in the OrderHeader table - that way it will reflect where the year old order was really shipped, and not show what the address has been changed to. All good advice, Joe.
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: