Custom Application Development

Apr 28 2009   2:02PM GMT

Data Deduplication in the Database

SJC SJC Profile: SJC

Data deduplication essentially refers to the elimination of redundant data.  (from Wikipedia)  As the term seems to be commonly used, deduplication really is referring to duplication of data on servers and perhaps shares throughout the domain.  I suspect that nobody who has been around IT very long would not understand that as time goes on it is not uncommon to find multiple versions (…as well as duplicate versions) of files throughout an enterprise (of any size).  This phenomenon adds considerably to time required for backups, can cause slowdowns in the network, as well as user confusion — none of which are desirable of course.

Database Normalization I believe runs a parallel to deduplication in that one goal of normalization can be elimination of redundant data.  Many of the same benefits of deduplication can be realized when a database is normalized – such as faster transmission (less data), and less storage space required etc.  Normalization has become very much a part of my most recent project – upgrading a 20 year old database application to a modern database using relational technology.  The project is no trivial task – but I’m having fun with it!  🙂

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: