Customers who elect to go with agentless backup must be judicious in their selection of what they plan to backup. Enterprises with large volumes of data and multiple systems for backup may find that amount of data that needs backup is often large and unwieldy. Selective backup or data pruning reduces the load on the WAN and system resources and helps speedy transmission and upload of data on to remote servers. De-duplication and compression technologies can be useful if used in conjunction with data pruning, in reducing the amount of data that needs to be uploaded.
The importance of the data pruning activity should not be underestimated. The first step in the direction of pruning data is setting the criteria for identifying and selecting data. Data pruning improves quality of data and is the right strategy for any data migration project.
Data pruning could involve:
1. Pre migration, data profiling and data scoping;
2. Identifying whether data is acceptable for migration;
3. Removing all data that cannot be justified;
4. Leveraging data migration tools for matching, standardizing, transforming and cleaning data;
5. Assigning value to data sets by removing duplicated data;
One significant data management solution that has been successfully tried and implemented in the cloud is tiered selection and backup. Tiered backup solutions divide the enterprise systems into tiers. Data from each tier is sampled before specific tier-customized selection policies are implemented. Bandwidth requirements for each tier can then be estimated, amounting to total bandwidth requirements of the enterprise.
Data pruning gains importance in the context of reduced storage costs. Most online backup vendors use the Backup PrOxy Server (BOS) model or Televaulting model in determining the pricing for storage. While BOS determines how much data the software can handle, Televaulting evaluates the amount of data that is actually stored on the server. As a result, data pruning can reduce the amount of data that is uploaded into the server and thereby reduces costs of transmission and/or storage.
Accurate data pruning can be a time consuming and resource intensive activity, but it is an activity that will ultimately ensure data quality; reduce volume and cost of data storage.