 




<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
	>
<channel>
	<title>Comments on: Compression, dedupe and the law</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/storage-soup/compression-dedupe-and-the-law/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/storage-soup/compression-dedupe-and-the-law/</link>
	<description>A SearchStorage.com blog.</description>
	<lastBuildDate>Wed, 15 May 2013 20:05:19 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: Chirag</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/compression-dedupe-and-the-law/#comment-7168</link>
		<dc:creator>Chirag</dc:creator>
		<pubDate>Fri, 08 Aug 2008 14:22:33 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/06/19/compression-dedupe-and-the-law/#comment-7168</guid>
		<description><![CDATA[Hi All, i have written a simple very basic tutorial on Dedupe. Please let me know, how can improve it and if I should add more references. Thanks]]></description>
		<content:encoded><![CDATA[<p>Hi All, i have written a simple very basic tutorial on Dedupe. Please let me know, how can improve it and if I should add more references. Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carter George</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/compression-dedupe-and-the-law/#comment-7166</link>
		<dc:creator>Carter George</dc:creator>
		<pubDate>Fri, 20 Jun 2008 20:02:06 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/06/19/compression-dedupe-and-the-law/#comment-7166</guid>
		<description><![CDATA[Tony,
Good post!  Let me add a couple of comments.   

I totally agree with you that inband compression is scary - it affects performance of writes and reads, and does not give you enough choice about what to compress or when.  Fortunately, in-band compression is not the only option these days.   Files can be compressed in the background, after they are saved.  There&#039;s no reason you have to compress a whole disk, share, or directory either.    Modern tools can use policies to decide what to compress (eg, all files with the extension .mp3, or all files that have not been modified for 60 days) and how aggressively to compress it (compress for maximum space saving or compress for fastest decompress time?).

Dedupe is just one form of compression, and by no means is it the most effective for online storage (the files on your hard disk or NAS share).    Dedupe is good for backups, because repetitive backups create duplicates.  You won&#039;t find as many dupes in your online set of files as you will in 30 or 365 days worth of backups.   Different compression techniques are called for if you are trying to reduce the size of an online data set.

Further, almost every file that is driving today&#039;s storage growth is already compressed.   Microsoft Office 2007 compresses every file on save, and PDF, JPEG (as you point out), all video formats, and most other common file types already include some form of compression done by the application itself when it saves its files.   Typically, that compression is some variant of a common generic algorithm - Lempel-Ziv for example.    So if the native format of a file is compressed, it&#039;s hard to say that additional lossless compression would alter its legal validity.

For compliance purposes, compression has to be bit-for-bit lossless, and that can be verified by taking and storing cryptographic checksums before compression.    That way, you can always compare your decompressed file with that original checksum to see if it&#039;s bit for bit the same.  That makes sense if you&#039;re talking about corporate memorandums; it might not be as important if you&#039;re talking about your JPEGs of the family vacation last summer.]]></description>
		<content:encoded><![CDATA[<p>Tony,<br />
Good post!  Let me add a couple of comments.   </p>
<p>I totally agree with you that inband compression is scary &#8211; it affects performance of writes and reads, and does not give you enough choice about what to compress or when.  Fortunately, in-band compression is not the only option these days.   Files can be compressed in the background, after they are saved.  There&#8217;s no reason you have to compress a whole disk, share, or directory either.    Modern tools can use policies to decide what to compress (eg, all files with the extension .mp3, or all files that have not been modified for 60 days) and how aggressively to compress it (compress for maximum space saving or compress for fastest decompress time?).</p>
<p>Dedupe is just one form of compression, and by no means is it the most effective for online storage (the files on your hard disk or NAS share).    Dedupe is good for backups, because repetitive backups create duplicates.  You won&#8217;t find as many dupes in your online set of files as you will in 30 or 365 days worth of backups.   Different compression techniques are called for if you are trying to reduce the size of an online data set.</p>
<p>Further, almost every file that is driving today&#8217;s storage growth is already compressed.   Microsoft Office 2007 compresses every file on save, and PDF, JPEG (as you point out), all video formats, and most other common file types already include some form of compression done by the application itself when it saves its files.   Typically, that compression is some variant of a common generic algorithm &#8211; Lempel-Ziv for example.    So if the native format of a file is compressed, it&#8217;s hard to say that additional lossless compression would alter its legal validity.</p>
<p>For compliance purposes, compression has to be bit-for-bit lossless, and that can be verified by taking and storing cryptographic checksums before compression.    That way, you can always compare your decompressed file with that original checksum to see if it&#8217;s bit for bit the same.  That makes sense if you&#8217;re talking about corporate memorandums; it might not be as important if you&#8217;re talking about your JPEGs of the family vacation last summer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storagezilla</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/compression-dedupe-and-the-law/#comment-7165</link>
		<dc:creator>Storagezilla</dc:creator>
		<pubDate>Thu, 19 Jun 2008 22:38:20 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/06/19/compression-dedupe-and-the-law/#comment-7165</guid>
		<description><![CDATA[Block based de-duplication is just one of what is a set of de-duplication technologies. Object based de-duplication (Single Instance Storage) is another and it doesn&#039;t modify the existing data it just places pointers where other copies of that data might reside. 

While in some cases it&#039;s become the archive your backup shouldn&#039;t *be* your archive. If someone goes down the route of invalidating compression due to the fact it modify&#039;s the data that makes everything written to a tape drive with onboard compression enabled (or encryption enabled) suspect. The same thing if you&#039;re using backup software to encrypt or compress data or using backup software at all since that will inject it&#039;s own meta-data into the mix as the backup is written out in the backup app format. 

Ultimately all of these de-dup technologies (Except encryption but including thin provisioning) are *capacity optimisation* technologies. There is something different to chose depending on your needs. 

I have a MS launch T-shirt somewhere which says &quot;We came, we saw, we DoubleSpaced&quot; ;)]]></description>
		<content:encoded><![CDATA[<p>Block based de-duplication is just one of what is a set of de-duplication technologies. Object based de-duplication (Single Instance Storage) is another and it doesn&#8217;t modify the existing data it just places pointers where other copies of that data might reside. </p>
<p>While in some cases it&#8217;s become the archive your backup shouldn&#8217;t *be* your archive. If someone goes down the route of invalidating compression due to the fact it modify&#8217;s the data that makes everything written to a tape drive with onboard compression enabled (or encryption enabled) suspect. The same thing if you&#8217;re using backup software to encrypt or compress data or using backup software at all since that will inject it&#8217;s own meta-data into the mix as the backup is written out in the backup app format. </p>
<p>Ultimately all of these de-dup technologies (Except encryption but including thin provisioning) are *capacity optimisation* technologies. There is something different to chose depending on your needs. </p>
<p>I have a MS launch T-shirt somewhere which says &#8220;We came, we saw, we DoubleSpaced&#8221; <img src='http://itknowledgeexchange.techtarget.com/storage-soup/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
