<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: VendorFights: Data Deduplication Edition</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/</link>
	<description>A SearchStorage.com blog.</description>
	<pubDate>Fri, 27 Nov 2009 10:21:33 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Scottjw</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7446</link>
		<dc:creator>Scottjw</dc:creator>
		<pubDate>Thu, 28 May 2009 14:26:41 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7446</guid>
		<description>Storash;

By default the hash index is limited to 3/16 of the maximum memory available, and is located on the C: drive. Both of these defaults can be changed. You may want to increase memory size in rare cases where there are millions of files to be hashed, or change the default storage location to suit your process. Note too the index will be smaller if less space is required--if you only have a few hundred thousand files, it will be smaller than if you had a million. 

How specifically that affects client application performance depends on total memory size. In a normal server it is not an issue as 3/16 of memory is not much, and it is only occupied for a very short period of time. If you have a server that is severely memory constrained, it may be an issue.</description>
		<content:encoded><![CDATA[<p>Storash;</p>
<p>By default the hash index is limited to 3/16 of the maximum memory available, and is located on the C: drive. Both of these defaults can be changed. You may want to increase memory size in rare cases where there are millions of files to be hashed, or change the default storage location to suit your process. Note too the index will be smaller if less space is required&#8211;if you only have a few hundred thousand files, it will be smaller than if you had a million. </p>
<p>How specifically that affects client application performance depends on total memory size. In a normal server it is not an issue as 3/16 of memory is not much, and it is only occupied for a very short period of time. If you have a server that is severely memory constrained, it may be an issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storash</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7445</link>
		<dc:creator>Storash</dc:creator>
		<pubDate>Thu, 28 May 2009 04:47:46 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7445</guid>
		<description>Scott,

With the clients continuously comparing the hashes with entries in a hash table, how is the client performance impacted as the local hash table grows in size. It tends to consume larger memory leaving less memory for apps to run. How is this handled ?

Ashwin</description>
		<content:encoded><![CDATA[<p>Scott,</p>
<p>With the clients continuously comparing the hashes with entries in a hash table, how is the client performance impacted as the local hash table grows in size. It tends to consume larger memory leaving less memory for apps to run. How is this handled ?</p>
<p>Ashwin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Clifford</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7123</link>
		<dc:creator>Paul Clifford</dc:creator>
		<pubDate>Sun, 18 May 2008 15:23:59 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7123</guid>
		<description>Why do we de-dupe?  Simply because intuitively we know that a lot of the data we manage is multiple copies, and we are already storing enough stuff.  It is growing too fast, costing too much, and continuing to spiral out of control.  So de-dupe makes sense - but, wait a minute.  Somewhere between 70 - 80% of purchased disk is wasted.  Now, I am not saying it isn't used, but it is wasted as the traditional approach of "carving" out a LUN, 10 times larger than the data we have, creates all this white space, that is wasted.  We create this canister that is seldom filled up on very expensive disks

The problem is traditional storage - building RAID sets and then stacking them in a box. New GUI's and features dont fix the underlying architectural problem - expensive filing cabinets. This worked in an older time when 100GB's of data was a "boatload". It is not working today.

So think about it, only about 20% of the space on all the disks I buy, contain data.  Then out of all the data I have, only about 20% of it is even accessed!   So, I buy all this 15K FC disks for only what amounts to 5% of active data.  This is ludicrous.  But folks, this is the world of traditional storage.  This is why we use de-dupe to try and reclaim some of this precious space.

How about changing the architecture?  How about designing systems with technology that meets performance and storage requirements?  How about designing systems to match IOPS requirements, and store everything else on SATA?  

Utilizing innovative technologies like "Real" Thin Provisioning (only really available from 3PAR and Compellent) makes a huge leap forward.  Automated ILM then manages the data so the right stuff is in the right place.  

After we have restructured the architecture, then, lets de-dupe.  We can bolt it on after we gain control, rather than leading with it.

Paul Clifford
Davenport Group
www.davenportgroup.com</description>
		<content:encoded><![CDATA[<p>Why do we de-dupe?  Simply because intuitively we know that a lot of the data we manage is multiple copies, and we are already storing enough stuff.  It is growing too fast, costing too much, and continuing to spiral out of control.  So de-dupe makes sense - but, wait a minute.  Somewhere between 70 - 80% of purchased disk is wasted.  Now, I am not saying it isn&#8217;t used, but it is wasted as the traditional approach of &#8220;carving&#8221; out a LUN, 10 times larger than the data we have, creates all this white space, that is wasted.  We create this canister that is seldom filled up on very expensive disks</p>
<p>The problem is traditional storage - building RAID sets and then stacking them in a box. New GUI&#8217;s and features dont fix the underlying architectural problem - expensive filing cabinets. This worked in an older time when 100GB&#8217;s of data was a &#8220;boatload&#8221;. It is not working today.</p>
<p>So think about it, only about 20% of the space on all the disks I buy, contain data.  Then out of all the data I have, only about 20% of it is even accessed!   So, I buy all this 15K FC disks for only what amounts to 5% of active data.  This is ludicrous.  But folks, this is the world of traditional storage.  This is why we use de-dupe to try and reclaim some of this precious space.</p>
<p>How about changing the architecture?  How about designing systems with technology that meets performance and storage requirements?  How about designing systems to match IOPS requirements, and store everything else on SATA?  </p>
<p>Utilizing innovative technologies like &#8220;Real&#8221; Thin Provisioning (only really available from 3PAR and Compellent) makes a huge leap forward.  Automated ILM then manages the data so the right stuff is in the right place.  </p>
<p>After we have restructured the architecture, then, lets de-dupe.  We can bolt it on after we gain control, rather than leading with it.</p>
<p>Paul Clifford<br />
Davenport Group&nbsp;&lt;a href="http://www.davenportgroup.com" title="http://www.davenportgroup.<br />
" target="_blank"&gt;www.davenportgroup.com&lt;/a&gt;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Toigo</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7122</link>
		<dc:creator>Jon Toigo</dc:creator>
		<pubDate>Sat, 17 May 2008 22:35:40 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7122</guid>
		<description>Beth, thanks for the reference to my blog questionnaire about de-dupe at DrunkenData.  We have about 10 vendor responses now and I was told at Storage Decisions to expect others from IBM/Diligent, FalconStor and Data Domain.

Missing is any response from EMC.  I have been told via email by an insider that they probably see no value in posting any responses since "there is no upside in it for them."  Curious that.</description>
		<content:encoded><![CDATA[<p>Beth, thanks for the reference to my blog questionnaire about de-dupe at DrunkenData.  We have about 10 vendor responses now and I was told at Storage Decisions to expect others from IBM/Diligent, FalconStor and Data Domain.</p>
<p>Missing is any response from EMC.  I have been told via email by an insider that they probably see no value in posting any responses since &#8220;there is no upside in it for them.&#8221;  Curious that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott W</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7121</link>
		<dc:creator>Scott W</dc:creator>
		<pubDate>Thu, 15 May 2008 20:16:01 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7121</guid>
		<description>Final piece of follow up here. First, we are absolutely not shelving Avamar. Not in any way shape or form. I suppose the comment that we are refining the market would be more accurately represented as "we refined" the market about 18 months ago. Not to say that there aren't overzealous sales people or partners or that it wasn't pitched as a panacea for backup by one of them. But it isn't. It does have places where it is a better fit than others. But it will be a core part of EMC backup strategy. Secondly, I have started to throw up some basic information about Avamar on my blog, feel free to read and question. Client and server descriptions are first, use cases come next, with a focus on VMware. That may come after EMC World, but it will come.</description>
		<content:encoded><![CDATA[<p>Final piece of follow up here. First, we are absolutely not shelving Avamar. Not in any way shape or form. I suppose the comment that we are refining the market would be more accurately represented as &#8220;we refined&#8221; the market about 18 months ago. Not to say that there aren&#8217;t overzealous sales people or partners or that it wasn&#8217;t pitched as a panacea for backup by one of them. But it isn&#8217;t. It does have places where it is a better fit than others. But it will be a core part of EMC backup strategy. Secondly, I have started to throw up some basic information about Avamar on my blog, feel free to read and question. Client and server descriptions are first, use cases come next, with a focus on VMware. That may come after EMC World, but it will come.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storage Dork</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7120</link>
		<dc:creator>Storage Dork</dc:creator>
		<pubDate>Thu, 15 May 2008 13:38:04 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7120</guid>
		<description>"Isn’t EMC shelving Avamar?"
I've done some research.  They are keeping Avamar but are refining the market they want to address with it.</description>
		<content:encoded><![CDATA[<p>&#8220;Isn’t EMC shelving Avamar?&#8221;<br />
I&#8217;ve done some research.  They are keeping Avamar but are refining the market they want to address with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Who needs data deduplication? &#124; SanGod.Com</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7119</link>
		<dc:creator>Who needs data deduplication? &#124; SanGod.Com</dc:creator>
		<pubDate>Thu, 15 May 2008 01:57:14 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7119</guid>
		<description>[...] Data De-Duplication on SearchStorage.com [...]</description>
		<content:encoded><![CDATA[<p>[...] Data De-Duplication on&nbsp;&lt;a href="http://SearchStorage.com" title="http://SearchStorage. " target="_blank"&gt;SearchStorage.com&lt;/a&gt; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Storage Dork</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7118</link>
		<dc:creator>Storage Dork</dc:creator>
		<pubDate>Wed, 14 May 2008 19:37:52 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7118</guid>
		<description>Isn't EMC shelving Avamar?  I know I've heard this somewhere out there.  Is there any truth to this?</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t EMC shelving Avamar?  I know I&#8217;ve heard this somewhere out there.  Is there any truth to this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Beth Pariseau</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7117</link>
		<dc:creator>Beth Pariseau</dc:creator>
		<pubDate>Wed, 14 May 2008 18:01:41 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7117</guid>
		<description>love the inflation adjustment, Jesse. i think you're right...and it's also very dependent on the environment, as we've seen in this discussion. 

still, i've at least gotten DD to cop to a single-stream transfer rate of about 150 MBps in the past. was hoping to have some kind of genuinely comparitive number like that w/r/t avamar. to at least have specific, numerical, competing *claims* would be a start...:)

EMC'ers always act shocked when i bring up that scalability thing about avamar, but they've got to know that persistent viewpoint is out there. i've tried to dig into it in the past, and haven't yet personally spoken with any avamar users with 1000s of servers protecting hundreds of TB as was mentioned above. i've spoken with one who had an overall environment that could be described that way, but had a more limited amount of avamar in production. 

tho really, at this point, all of the above might be moot in the face of the 'rip and replace' factor when it comes to EMC's ability to use / sell Avamar's software...</description>
		<content:encoded><![CDATA[<p>love the inflation adjustment, Jesse. i think you&#8217;re right&#8230;and it&#8217;s also very dependent on the environment, as we&#8217;ve seen in this discussion. </p>
<p>still, i&#8217;ve at least gotten DD to cop to a single-stream transfer rate of about 150 MBps in the past. was hoping to have some kind of genuinely comparitive number like that w/r/t avamar. to at least have specific, numerical, competing *claims* would be a start&#8230;:)</p>
<p>EMC&#8217;ers always act shocked when i bring up that scalability thing about avamar, but they&#8217;ve got to know that persistent viewpoint is out there. i&#8217;ve tried to dig into it in the past, and haven&#8217;t yet personally spoken with any avamar users with 1000s of servers protecting hundreds of TB as was mentioned above. i&#8217;ve spoken with one who had an overall environment that could be described that way, but had a more limited amount of avamar in production. </p>
<p>tho really, at this point, all of the above might be moot in the face of the &#8216;rip and replace&#8217; factor when it comes to EMC&#8217;s ability to use / sell Avamar&#8217;s software&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesse</title>
		<link>http://itknowledgeexchange.techtarget.com/storage-soup/vendorfights-data-deduplication-edition/#comment-7116</link>
		<dc:creator>Jesse</dc:creator>
		<pubDate>Wed, 14 May 2008 14:45:44 +0000</pubDate>
		<guid isPermaLink="false">http://storage.blogs.techtarget.com/2008/05/12/vendorfights-data-deduplication-edition/#comment-7116</guid>
		<description>Performance evaluations are always subjective, especially when they are being done by the company who made the product.  Testing is done in a pristine environment without much regard to the real-world, but yet the results are used to justify introduction of these products into the real world.

Any product can be set up to perform well in a lab setting that may or may not be indicitave of "Production" situations.

I always take such performance claims with a grain of salt.  I've worked in R&#38;D, specifically in Qualification and Test, and I remember times when the same test run back to back produced slightly skewed results.  Not enough to invalidate the test, but definately enough to make you wonder what else is going on in the background of the software.

I'm not a big fan of compression (or de-duplication as their calling it now).  I remember having a sales guy come in to pitch Avamar to me at one of my last jobs, and the fact that they wanted to sell me a product for tens of thousands of dollars to save a few bucks on my tape footprint was just staggering.

De-Duplication is essentially the same process used in zip/tar operations.  You're taking repetitive blocks of data within a stream and replacing it with a kind of count/pointer system.  All of it takes cycles.  If it is done on the host it takes CPU cycles, if it's done in-line via an appliance it increases latency.

Just my $2.12 (adjusted for inflation)</description>
		<content:encoded><![CDATA[<p>Performance evaluations are always subjective, especially when they are being done by the company who made the product.  Testing is done in a pristine environment without much regard to the real-world, but yet the results are used to justify introduction of these products into the real world.</p>
<p>Any product can be set up to perform well in a lab setting that may or may not be indicitave of &#8220;Production&#8221; situations.</p>
<p>I always take such performance claims with a grain of salt.  I&#8217;ve worked in R&amp;D, specifically in Qualification and Test, and I remember times when the same test run back to back produced slightly skewed results.  Not enough to invalidate the test, but definately enough to make you wonder what else is going on in the background of the software.</p>
<p>I&#8217;m not a big fan of compression (or de-duplication as their calling it now).  I remember having a sales guy come in to pitch Avamar to me at one of my last jobs, and the fact that they wanted to sell me a product for tens of thousands of dollars to save a few bucks on my tape footprint was just staggering.</p>
<p>De-Duplication is essentially the same process used in zip/tar operations.  You&#8217;re taking repetitive blocks of data within a stream and replacing it with a kind of count/pointer system.  All of it takes cycles.  If it is done on the host it takes CPU cycles, if it&#8217;s done in-line via an appliance it increases latency.</p>
<p>Just my $2.12 (adjusted for inflation)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
<!-- dynamic -->