The Business-Technology Weave

Nov 28 2012   1:59PM GMT

Content: The proliferation of unstructured data

David Scott David Scott Profile: David Scott

We know that business success will increasingly depend on the most efficient use of its information assets.  These assets must be centrally managed for security purposes, and for quick dissemination to where they’re needed.  To do this, we need a special structure around data – all data – to manage it.  But in the vast majority of organizations, information is scattered throughout a variety of resources and vessels.  There is no cohesive, managed plan of access according to actual work, leverage, or liability to be had by the content of the individual information asset.  The data is “unstructured” – that is, there is no ready “handle” by which to identify data according to its relevancy, its level of importance, or its possible liability.

We have things locked up on servers, workstations, filed in various cabinets, and piled on desks.  Content is parsed, fragmented, dispersed, etc., between electronic documents, e-mail, images, and hardcopy – all being managed, if at all, within discreet “silos” of various “systems.”  Under these circumstances, the content is not only difficult to leverage – it is difficult to ensure that the most current version of information is shared across the organization.  It is even difficult to guarantee that the organization’s various disciplines are presenting compatible information to the outside.  Yet, The Gartner Group reports that 80% of all information generated by business today consists of unstructured data.  Compounding this situation is the estimate that the average employee spends 50% of his or her time looking for things. 

What exactly is unstructured data?  Unstructured data is anything that lies outside of a centrally managed, and accessible, “repository.”  (A repository can have special meaning, and we’ll revisit it shortly).  Unstructured data is generally in the form of documents, spreadsheets, presentations, e-mails, and any other electronic form for which no automated central control can be exercised. 

A piece of unstructured electronic content (such as a document) does not represent a “record.”  It does not belong to a community of other records, sharing similar structure and collective maintenance, through a common base of data:  a database.  Therefore, it’s not that we’re refusing to leverage these unstructured items (through the lever of common, related, or timely content) it’s that we’ve never had that lever with this unstructured material. 

Unstructured data also exists in the form of hardcopy data.  When hardcopy has no centrally managed measures for leveraging content, we know that reinforcing subject matter can be scattered throughout all manner of departments and disciplines; residing in filing cabinets, on shelves, on desktops and in desk drawers.  Frequently it’s stacked on the floor, or hidden away in boxes.  Industry analysts estimate that Fortune 500 companies lose $12 billion each year because they cannot manage and take full advantage of unstructured content. 

Regardless of your size – if you don’t know something exists, you can’t use it; if you can’t find it, you can’t use it; if you’ve lost it, you must recreate it; and if it takes time to find it, you’ve lessened your efficiency in using it.  This kind of inefficiency and duplicated effort is plain unaffordable. 

Unstructured Data’s Other Liability:  Consider too that it is impossible for a central authority to state with ringing certainty that your organization is not harboring inappropriate content.  Are you certain that employees aren’t downloading obscene or illegal material?  Can you certify that people aren’t passing around defamatory information through e-mail?  Can you be certain that employees are not using company e-mail accounts to post to Internet sites and blogs (web logs, message boards) that support positions or advocacy that is contrary to your organization’s positions?  Can you be certain that employees aren’t using these same company accounts as reference when ordering products and services of questionable repute?  All of these things leave an audit trail, outside the scope of your control in the absence of content management.  These things can impact your organization’s good name and can bring harm to your business. 

What about hardcopy?  How can you know that laxity and carelessness aren’t contributing to loss of this material?  Anyone can print out sensitive information, take it to an offsite meeting, and leave it behind.  How do you know whether staff is complying with your Acceptable Use Policy for this content and associated resources?  Without some kind of structure for review and report on data, you can be certain of nothing.  This is the gross inefficiency and liability that springs from unstructured data, the associated lack of control, and the uncertainties it brings.  Uncertainties will be increasingly unaffordable to business, as we shall see…

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: