Posted by: Beth Pariseau
data backup, data deduplication
The major story with Sepaton’s launch of a new midrange data deduplication system and support for EMC Corp.’s NetWorker last week was the shifting competition in the data deduplication market: Data Domain, which made a name for itself catering to midmarket customers, is pushing upmarket now that it’s part of EMC, and Sepaton, which has previously emphasized the enterprise, is gunning for Data Domain’s midmarket turf with the new S2100-MS2.
Along with the new configuration, though, Sepaton also made some modifications to its data deduplication algorithms that have not been as widely discussed in the industry. I followed up with some Sepaton executivesto get some more details on these updates, and thought they’d be worth throwing into the mix here.
First a couple of refreshers on Sepaton’s approach to data deduplication. It uses delta differencing to identify duplicates between two sets of backup data, along with content-aware integration with the major data backup applications – Hewlett-Packard Co. (HP)’s Data Protector; Symantec Corp.’s NetBackup; IBM’s Tivoli Storage Manager (TSM) and EMC Corp.’s NetWorker so far. This allows Sepaton’s data deduplication engine to identify objects within the backup stream that may be redundant, like Oracle and Word documents. Sepaton’s data deduplication also occurs post-process, rather than as data is ingested into the system, and uses forward referencing, a process which keeps the latest copy of data intact and eliminates duplicates from previous versions as opposed to eliminating duplicate data from the newest version upon ingestion.
There are two tradeoffs when it comes to doing delta differencing the way Sepaton does it: the fact that some applications, particularly those that might make large insertions into a database table as records are modified, like SAP, don’t necessarily lend themselves well to object comparisons, and a challenge to getting the most data reduction out of a delta comparison between incremental backups, which by definition don’t contain many duplicate objects.
With version 5.3 of its DeltaStor data deduplication software, Sepaton has updated its algorithms to better support incrementals using a new metadata “scraper” to give the data deduplication engine “hints” about what blocks within incremental backups can be deduplicated. These “hints” were also developed based on field-collected customer data and new heuristics added to the DeltaStor algorithm, according to executive vice president of engineering Fidelma Russo.
Additional application types such as SAP are now supported with this release, which involved adding a new process to Sepaton’s dedupe which allows it to more quickly compare incremental backups to one another, by generating a lightweight “fingerprint” to suggest which portions of the incremental backup might contain duplicate blocks.
This sounds somewhat like the hashing approach used by other data deduplication vendors, including Data Domain, but rather than performing a hash on all data coming in to the system as a primary means of locating duplicates, Sepaton uses this “fingerprinting” process to give the delta differencer “hints” about where duplicates might be located. “Its only goal in life is to sort through incrementals for the delta differencer — inline deduplication products use hashing to compare across all data — our process identifies only the probability of common data,” said Dennis Rowland, director of advanced technology for Sepaton. Rowland said Sepaton has a patent pending on its new process.
Backup expert W. Curtis Preston said he doesn’t think this latest update of Sepaton’s and its ramifications — a new alternative to deduplicating SAP, for example, which is a notoriously heavy application when it comes to generating backup data — have been well understood by the market yet. “I think it’s a very important release for [Sepaton],” he said.