It’s a popular time of the year for people like myself who publish any form of content to either reflect on the year that was or make predictions on the year that’s to be. Confidentially those are typically easy pieces to write and I’m generally happy to take advantage of such opportunities. However, I’m ending 2010 preoccupied with my latest concern which has me a bit on edge, and so I’m using my last post of the year to vent.
In the past few weeks I’ve participated in several conversations that focused on both cloud computing and data warehouses. As I’ve stated previously, I have some very real concerns about security in this ever growing amorphous collection of computing resources commonly referred to as “The Cloud.” Forgive a onetime science fiction fan a little leeway but I keep conjuring up images of “The Blob” whenever I hear that phrase. It’s sort of like the dimensions of our universe; no one is really sure where it begins or where (or if) it ends. So how do you lock it down and apply the necessary controls to sensitive data? Honestly, companies have struggled for years to properly classify their data and build appropriate controls trying to protect what needs protecting and that was when the data was stored in clearly identifiable repositories and servers. Now they’re moving the same information into an architecture that is harder to segment (because it defeats the very purpose of its design) which can often change dynamically. How can you properly secure and monitor a moving target? Based on my experience, I’m thinking you can’t.
As for data warehouses: Does anyone really know how these things are being used? After a recent call with a client, I had reason to question a few of my associates who either work with or have familiarity with how companies are using their related solutions and quite frankly I’m stunned. It seems that it’s quite common for data warehouse architects to reach out and grab data from whatever systems they happen to come across without even having a legitimate reason. One of my contacts told me that the project lead at his company is fond of throwing around the CEO’s name when met with some resistance, as if not sharing data from your applications database will create a blind spot and result in the company making a poorly formed decision. I clearly remember the original purpose of a centralized repository and that was to consolidate related information that allowed management to obtain a broader perspective on their business. It was never intended to duplicate all bits and bytes so that information existed in multiple locations and it was supposed to be driven by the business, not IT. But apparently it’s now quite common for the data warehouse team to participate in the change management process to determine if enhanced or newly implemented applications should be plugged into their repository. What if there’s a table with sensitive data that’s properly secured but is now being shared with the data warehouse? Is it properly secured? Who has access to the warehouse?
So what happens when you start using a cloud computing architecture to locate your data warehouse? You can’t provide the same enhanced level of protection to all your data because there’s a very real cost associated with that. And if you can’t properly predict where the data is going to be stored (either in the cloud or in a separate repository such as a data warehouse) how do you even know where to begin?
Perhaps when you consider how much audit and assessment work this is likely going to generate over the next few years, I should be more grateful than concerned. But I’m happier when things are done right to begin with and all I have to do is prove it.
Anyway, Happy New Year to all!