Data organization is a very hot topic today. It is being discussed in all kinds of forums and by all kinds of entities. It is clear that, data organization is no longer a choice. It is a necessity. Organizations are acquiring data from multiple sources, in varying formats and data types. All this data needs to be organized, structured and managed if it has to yield value and benefit the organization. Here are a few tips on data organization that may prove useful to those grappling with the concept of data organization for the first time.

Raw data, generated from a variety of sources, is unstructured data. It has to be “normalized”. Normalization is the first step in data organization and is the process of organizing data into fields within tables. This has to be done with a view to reduce redundancy, dependency and improve performance. The end result is the isolation of data in independent constructs such that additions, deletions or modifications to a particular field can be confined to a single table without impacting other fields and then cascaded into other tables based on pre-defined relationships between tables. The level to which a database is normalized has a bearing on the way the database performs.

Normalization alone is not enough. It will have to be followed up with the identification of unique business keys in every table. The implication is that the row and column identifiers in each table must make sense to the business user. Unique keys (like the primary key or the secondary key) are basically row identifiers. If the unique key or row identifier in the tables relate to the business concepts, business users will find the data meaningful. For instance, a unique primary key for an employee table can be the Employee ID number.  The same logic applies to column identifiers. A number of tools are available to map these relationships and ensure performance optimization.

The next important step in data organization is the “decision’ about null values. Data organizers must decide upfront whether a column in a table can accept a null value.  The decision will form the basis for defining constraints for the column and have an impact on applications that access the information.  For example, the column “Date of Birth” in an employee table cannot take a null value. But, the “number of dependents” column can take a null value.

But, data organization processes do not stop here. It is a good beginning. But, it is just that—a beginning that will lead to a much deeper understanding of the data available with the organization and will be the foundation for any data mining and business intelligence exercises that may be undertaken in the future.