Database Design archives - Custom Application Development

Custom Application Development:

Database Design

Apr 28 2009   2:02PM GMT

Data Deduplication in the Database



Posted by: Joe Coley
deduplication, Development, Database Design, database normalization

Data deduplication essentially refers to the elimination of redundant data.  (from Wikipedia)  As the term seems to be commonly used, deduplication really is referring to duplication of data on servers and perhaps shares throughout the domain.  I suspect that nobody who has been around IT very long would not understand that as time goes on it is not uncommon to find multiple versions (…as well as duplicate versions) of files throughout an enterprise (of any size).  This phenomenon adds considerably to time required for backups, can cause slowdowns in the network, as well as user confusion — none of which are desirable of course.

Database Normalization I believe runs a parallel to deduplication in that one goal of normalization can be elimination of redundant data.  Many of the same benefits of deduplication can be realized when a database is normalized - such as faster transmission (less data), and less storage space required etc.  Normalization has become very much a part of my most recent project - upgrading a 20 year old database application to a modern database using relational technology.  The project is no trivial task - but I’m having fun with it!  :-)

Jan 31 2009   10:12PM GMT

Data Normalization - Know Your Data



Posted by: Joe Coley
Database Design, custom application development, Database reporting, Business Intelligence, BI, database normalization

A post here in these ITKnowledgeExchange blogs that recently caught my eye was this one written by Stephen Harris entitled “Data Challenges Can be Solved With Business Intelligence“.  It is a rather lengthy post touching on several points about data challenges and BI.  What I immediately latched onto in his post was what he refers to as a motto - “Thou shalt know thy data“.

While I have never phrased my firm belief in knowing your data in the way he does, I certainly agree that knowing your data is an absolute must.  Furthermore, his reference to cleansing, auditing, securing, managing and refreshing data is also an essential ingredient toward any meaningful reporting - never mind the special requirements for an effective BI implementation.

Once again I find myself “down sizing” information and ideas I read about to the needs of the businesses which I service, the small ones.  I’ve blogged recently about reporting requirements in these economic times, and certainly “…having information about your business at your fingertips…” is critical, not just a “nice to have”.

Reporting, BI and data “cleanliness” all depend to some extent upon the normalization of the data.  I can’t imagine trying to normalize a database without knowing your data.  If you would like a quick introduction to the topic of normalization I found “Introduction to Data Normalization: A Database “Best” Practice” to be an excellent place to start. 

As with so many areas in development there are multitudes of tradeoffs which come into play with the design of a database.  It is absolutely critical that the developer know and understand the data pieces (fields) and how they relate, but just as critical is that the developer understand the reporting requirements and other characteristics of the data, the database itself, the network and hardware platform, and “how” data will be queried.  Many speed issues can actually be caused by a database which has been normalized to such an extent that in order to provide the reporting required in an acceptable time span many extra steps are required to prep the data for the presentation sequence desired. 

The more up close and personal a developer is with the data the greater the opportunity there is to evaluate the data quality.  After there have been a number of changes in the form of additions and subtractions to fields or tables in the database it is a good practice to review the design again to determine if there are changes that should be made to further normalize the database.  My experience indicates that often changes are desired.