Storage efficiencies delivered by deduplication technology has been discussed threadbare. The cost benefits derived from dedupe has also been enthusiastically acclaimed. The technology has been harnessed into primary or secondary storage and benefits of each adoption have been thrashed out. But, what about data integrity—post dedupe? Does the rush to adopt the technology take into consideration long term data integrity and performance risk concerns? At what point will the dedupe technology break down? When will it fail to deliver the expected results? It cannot be consistent across volumes!
Imperfectly designed dedupe mechanisms can impact system performance and create data instability. Dedupe works by segmenting data into blocks and generating hash codes for each segment for comparison with stored hash values. Segments with same hash codes are discarded as duplicates. As volumes of data increase, hash comparisons can take longer time and performance degradations can occur.
Cloud service providers ensure dedupe performance and lookup efficiencies by storing the hash table in the DRAM. But, the DRAM can quickly overflow and occupy space in the hard drive causing latency issues to degrade performance. One method of overcoming the problem is to reduce or control the physical size of the hash table. Another is to reduce the size of entries into the hash table. Smaller tables will reduce the possibility of overflow. Alternately, the size of the DRAM can be increased. This has cost implications. A few vendors feel that it would perhaps be more effective if the efficiency of the lookup engine were enhanced. The engine should be intelligent enough to put frequently used sections of the hash table into memory. The RAM cache may be used for the index.
As a result, deduplication performance is a matter of concern. Cloud users must evaluate the deduplication engine that is offered and ensure that the vendor puts in all the hard work necessary to ensure optimum performance of the engine. We at Securstore (powered by Asigra) spend huge amounts of time and resources on ensuring that the customers are not disadvantaged by the poor quality of the deduplication. Asigra fine tunes their solutions to meet the demands of the data center and build up capacities for handling increasing data volumes. The heuristics algorithm, that is used automatically, assigns sections that must occupy the RAM and sections that must be relegated to the hard drive. The data correlation techniques used reflects the years of experience behind the use of the technology. Securstore is very proud to be working with Asigra’s award winning solutions.