We hear so much about “unstructured” and “structured” data, but rarely pause to think about what the terms really mean or how the nature of the data can impact the way it is backed up, stored or recovered within the enterprise or for that matter in the cloud.
While the term structure is not formally defined, it can be understood by implication. Structure is a kind of discernable pattern that is useful for data processing. The patterns make the data model generally predictable. Structured data will fit neatly into relational tables, can be stored in databases or semantically tagged in documents.
Unstructured information refers to information that does not have a pre-defined data model or does not fit into relational tables. It is text heavy and can contain varied data such as dates, numbers or facts. Traditional computing systems will find such data irregular, ambiguous and difficult to process.
So how does one store unstructured data in the cloud or anywhere for that matter? What kind of software can process and store this data in meaningful ways for quick recovery?
Unstructured data may include books, journals, documents, metadata, health records, audio, video, log, files and unstructured text such as the body of an email message, web page or word processor document. Software systems that deal with such data elements, categorize them with reference to the packaging. The data element is a “book” or an “email message” first and that forms the fundamental structure of the data element. Data mining and text analytics techniques are then used to find patterns within the package and interpret the information. Metadata is then tagged or parts of speech are tagged for structuring purposes. UIMA standards may be used to process the unstructured information and extract meanings or create structured data out of the unstructured data elements. A number of commercial solutions are also available for analyzing and understanding unstructured data generated by business applications.
A number of cloud storage solutions have emerged to handle unstructured data. Cloud storage addresses the size and scale of unstructured data growth by providing scalable file systems with newer object based definitions that can manage an incredible number of objects efficiently. This is combined with low cost storage devices and servers and a centralized pool of hardware that is ready to use and a management console that centralizes administration.
Intelligent object stores and intelligently monitors data movement, space management and security functions. Objects are distinguished and moved discretely with awareness of the data or metadata associated with it; with ability to create and delete objects, write and read bytes from individual objects, and determine and acquire the attributes or Meta data from the objects. In fact, these solutions consider issues of data coherency, reliability, durability, replication and recovery with their own unique fingerprint.