The secret of saving disk capacity and improving disaster recovery in the Cloud is–data de-duplication and compression. While data deduplication, removes duplicate files and folders, data compression removes unnecessary bits from a data stream. Both these technologies are often used in conjunction with each other to save disk space in offline or online storage. Most Cloud backup and recovery vendors package data deduplication and compression as part of their data storage and recovery software.

How do these two technologies work?  Logically, data deduplication can be kick started when the data stream is being fed into the backup repository or is already residing in the backup repository. The former is known as source data deduplication (also called preprocess deduplication) and the latter destination data deduplication (also called post process deduplication). When data deduplication is applied at source the backup process may be slowed down as the deduplication algorithm examines every bit and byte of information that is identified for backup and compares the same with data already seeded into the backup for isolating and removing duplicate information. When the deduplication process is done at the destination, the backup process zips through but, large amounts of disk space will have to be allocated and will be consumed till the deduplication process is completed. But, in either instance, the deduplication process helps the organization store only unique pieces of information and saves costs associated with storage space and bandwidth.

Compression, like deduplication can be slated to work at source or destination. It can be lossless or lossy. Lossless compression is the process of identifying and removing statistical redundancy without loss of information. Lossy compression identifies unnecessary information and removing the same. Both these processes reduce the size of data files and are consequently designated as data compression technologies. Images can be compressed with “image compression technology” and is often used with snapshot technology.

Both the above processes–“deduplication” and “compression”–will have to be reversed when data is recovered to the same or different location on the network.  There may be a trade-off to be examined, evaluated in the context of cost savings. The implication is that there are extra computing costs involved in decompressing or deduplicating data. Moreover, compression is subject to space-time complexities and may require the use of expensive hardware if video decompression is a requirement.

