There has been a lot of brouhaha about information explosion, content management and all the associated terminology. Much of the discussion seems to skim over an important aspect of storage—the search algorithms! Content management is great, but it is no use, if there is no technology for discovering a specific piece of information buried in the volumes of content that has been uploaded to the online server.

Cloud search algorithms demand a new approach to information tagging and reporting. The “dumb disk at the end of a network” approach will defeat the purpose of the backup. But, when you get down to evaluating the issues, the complexity jumps out at you!

  • All information in the backup cannot be treated the same. There are information types and yet more types! Search algorithms for specific information types must be carefully architected and deployed.
  • Backed up information is not static. The information is constantly changing as content is updated, revised, added to and incrementally uploaded to the repository.
  • Information access and tagging must be rapid. Data must be tagged even as it is streaming into the server for search algorithms to function effectively.
  • Security and performance of search algorithms may require a sophisticated cloud as a tier management protocol.

Most cloud based search algorithms are software based. A number of heuristic and metaheuristic search methodologies may be adopted depending on the nature of the data that is being uploaded into the backup. All heuristic search algorithms are supported by thumb rules.

Algorithms may also see the data as a structured or unstructured space wherein it can test the structure for significant iterative pieces of information. The algorithm may identify the significant information as a “goal state” and may perform a “depth first” or “breadth first” search for it.

Whatever the methodology used, the search algorithm ends by placing all the information gathered into an index. Every time data in the repository is changed or incremented, the search algorithm will update the index. When users query for a specific data, the information is, first, searched for in the index that has been created. If the information is not available in the index, the search will return a negative.

So, how does one determine whether the cloud service provider is giving you a good search algorithm? You could begin by asking, what kind of search algorithm is being deployed by the vendor and examining whether that search algorithm will serve your purpose or not!