Posted by: Graeme Elliott
There is currently a lot of discussion on the web regarding the pros and cons of inline versus post deduplication, mostly regarding how fast they can ingest the data. There are a number of other factors that need to be taken into account, especially when using the post-processing method.
While I believe that an inline VTL will never be able to ingest the data as fast as a post-processing VTL, the ingest rate is not relevant for most small to medium sites as they will never push the VTL’s hard enough for this to be an issue. Keep in mind that the following considerations may differ between post-processing VTL vendors and that there are other factors to consider, but in general:
1. Since post-processing requires a transient storage area to hold the data prior to its deduplication, consideration needs to be given to the size of this transient area. If the storage in this area is depleted, then it is not possible to write to the VTL. At the very minimum, at least one day’s worth of backups needs to be able to fit in this transient area.
2. If your backup software of choice requires the use of a virtual tape and it is in the process of being deduplicated, then deduplication will stop until the virtual tape is no longer being used. This increases the time for dededuplication and will require a larger transient area.
3. If replication is used, virtual tapes will not be replicated until the deduplication process for that virtual tape has finished. If combined with the second point, virtual tapes may take many hours to get to the destination site after they have been created.
4. From what I have seen in the VTL space, the post-processing method tends to lend itself better to a more scalable model. Since the inline appliances tend to be built in a single frame of hardware, simply add another node to the deduplication cluster as you need more processing power.
In summary Inline deduplication will require less storage and so will have a smaller footprint in your data center but will be slower ingesting the data and is less scalable than the post-processing deduplication offerings