Target de-duplication can be inline de-duplication or post process de-duplication.
Inline de-duplication is de-duplication that happens between the source and the target and is almost continuous. Post process de-duplication is a process in which data is de-duplicated at scheduled time frames after it has been transmitted by the source, but before it reaches the storage device. The data gets piped through the de-dupe hardware or software (as the case may be). The hardware or software remains in synchronization with the storage disk and evaluates data against the data existing in the storage for identifying and eliminating duplicate data, whatever the kind of target de-duplication used.
Post process de-duplication is advantageous to enterprises that have installed proprietary software at source. The source software at the enterprise-end does not have to be modified or redesigned to accommodate the demands of de-duplication hardware or software. The source systems can push the data into transmission without worrying about compatibility issues. Moreover, they do not have to install de-duplication hardware or software at every connection node for the purpose. The de-dupe hardware or software is centrally located and data from all nodes get automatically piped through the de-dupe device on the network. Finally, all de-duplication loads are taken away from the client systems and CPU power can be released for business uses of the enterprise. It is undeniable that target de-duplication is faster than source de-duplication. It pushes the data into the network and the de-dupe process operates at the storage end to match data faster and eliminate duplicates.
Post process de-duplication has several disadvantages, too. It is extremely bandwidth intensive and if the data volume in the enterprise is growing geometrically, target de-duplication may not really be the way to go. Large storage disk arrays may have to be pressed into service to store the data that is transmitted before the scheduled post process de-duplication can commence. This may have cost implications. Redesigning the proprietary software to accommodate the needs of de-duplication devices and processes, installing de-dupe hardware at every connecting node may prove to be more cost saving than using target de-dupe technologies. Source de-duplication may be further beneficial if the cloud service provider partnering with the enterprise determines charges on the basis of bandwidth usage.
Enterprises will have to determine the kind of de-duplication process that will suit them best. Data volumes, bandwidth availability, bandwidth costing and a host of other factors will have to be taken into consideration while determining the “best fit” for the enterprise. It is not an easy exercise.