De-duplication is a process of identifying duplicates in files and folders and removing them from the backup set(s). Inline de-duplication algorithms initiate the process before the backup is kick started and post-processing de-duplication algorithms complete take up the same task after the data has been uploaded. So, inline de-duplication generally happens at the client end of the backup process; and post-process de-duplication happens at the server end or the provider’s end of the backup process.
Inline de-duplication and post-process de-duplication algorithms are similar in construct, but backup specialists see a lot of advantages in inline de-duplication as opposed to post-process deduplication. Inline de-duplication requires two to four time less disk array I/O than post- process de-duplication. Since data is first de-duplicated and then stored, the backup process is faster, simpler and safer. There are fewer performance management complexities to grapple with.
The process of identifying duplicates in a backup set is a compute intensive activity. The algorithm design makes efficient use of CPU, memory, and disk resources to deliver cost effective solutions to customers. The systems are generally policy based and users may have the facility to measure the speed of de-duplication so that backup schedules are not compromised and over-run.
Modern day de-duplication systems deliver inline de-duplication systems that are fast, simple and safe. Improvements in CPU performance metrics over the last two decades have resulted in the development of efficient inline de-duplication controllers that can support 2.7 TB/hour of backup throughput. Many algorithms identify duplicate segments in the RAM, inline, without disk I/O overhead. They are programmed to store related segments and Meta data intelligently so that disk I/O is optimized.
Inline de-duplication systems send only de-duplicated data to the backup server and hence, replication processes can be started with no delay. Data integrity is never compromised as there are no post backup processes that can corrupt or damage data that has been sent to the backup repository.
Slow-speed inline de-duplication implies that backup will be slower and de-dupe based replication systems will be slower still. The restore point at the Disaster Recovery site will be lagging. Since client side agents are used to send de-duplicated data backups online, the process may become client intensive with increasing backup windows. In slow clients, the client may become unusable during a backup process. So, inline de-duplication is generally recommended where small amounts data is to be backed up.