Posted by: Randy Kerns
There is a prevalent problem in Information Technology today – too much data.
Most of the data is in the form of files and called unstructured data. Unstructured data continues to increase at rates that average around 60% per year according to most of our IT clients.
Structured data is generally thought of as information in databases and this type of data is experiencing a much smaller increase in size than unstructured data. The unstructured data is produced internal to IT and from external sources. The external sources include sensor data, video information, and social media data. This type of growing data is alarming because there are so many sources and the information is used in data analytics that typically originate outside of IT.
The big issue is what to do with all that data that is being created. The data is stored while needed, which is during the processing for applications or analytics and while it may be required for reference, further processing, or the inevitable “re-run” in some cases. But what is to be done with the data later? Later in this case means when the probability of access drops to the point that it is unlikely to be accessed again. There is also cases when the processing is complete (or project is complete) and the data is to be “put on the shelf” much as we would in closing the books on some operation. Does the data still have value as new applications or potential usages develop? Will there be a potential legal case that will require the data to be produced?
The default decision for most operations is to save everything forever. This decision is usually made because there is no policy around the data. IT operations do not set the policies for data deletion. Because the different types of data have different value and the value changes over time, the business owners or data owners must set the policy. IT professionals generally understand the value but usually are not empowered to make those policy decisions. Sometimes the legal staff sets the policy, which absolves IT of the responsibility, but that may not be the best option. In a few companies, a blanket policy is used to delete data after a specific amount of time. This may not withstand a legal challenge in some liability cases.
Saving all the data has compounding cost issues. It requires buying more storage, adding products to migrate data to less expensive storage, and increasing operational expenses for managing the information, power, cooling, and space. Moving the data to a cloud storage location has some economic benefit, but that may be short-sighted. The charges for data that does not go away continue to compound. Storing data outside the immediate concern of IT staff takes away from the imperative to make a decision about what to do with it.
Besides the costs of storing and managing the data, the danger is that there may be some legal liability for keeping data for a long time. The potential for an adverse settlement based on old data is there and has been proven extremely costly. More impacting to IT operations is the discovery and legal hold required. Discovery requires searching through all the data, including backups, for requested information and legal hold means no deletions of almost anything – no recycling of backups. This causes even more operational expense.
Not establishing a deletion policy that can pass a legal challenge is a failing of a company and results in additional expense and liability. IT may the first responders on the retain-forever policy, but it is a company issue.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).