The other day I met a CIO friend who wanted to discuss a tricky situation in which he had landed; he worked in an industry which was in the thick of being projected as one of the industries that will benefit from investments in Big Data. His CEO wanted him to build a data warehouse to rival some of their global competitors, at least one of which was prominently talked about as the poster boy of Big Data analytics. He was thus under pressure to invest while the rest of his IT budget was under pressure.
Having a keen understanding of technology, his company and the industry, he was a non-believer in the Big Data story; according to him the hype around some of the Big Data insights were not commensurate to the investments made in the overall project. And there was nothing new since the first story broke out of one of the companies having found a use case that conventional technologies would not have delivered. He had many data warehouses and Business Intelligence successes in the past for which he was well known too.
By definition Big Data was all about big data sets that earlier available technologies could not bind together within tolerated elapsed time and budgets. Volume, Variety and Velocity defined Big Data; (Business) Value was added later. The availability of high compute resources and ability to store large volumes of data had made solving some problems easier, faster and cheaper; that is not necessarily success from the capitalized Big Data. It is just that larger data sets were analyzed as compared to the past.
The question at hand that needed an answer was whether he should let go and invest as directed by his CEO or he should help the business with a scalable data warehouse which would deliver immediate value. Is it possible to get started small with Big Data (an oxymoron if there was one) and then work with the business to find the needle (if they wanted to find the needle or a pin) in the haystack; after all Big Data is expected to throw up unknown possibilities by random correlations that human minds are not able to pick.
Big data works on “found” data, i.e. data that you have and complex algorithms which can provide some statistical probabilities. Analysts predict the value that different industries can gain from investments; no one is talking about the real value derived. Governments have been making investments with equal zeal as are large enterprises; the providers and consultants are happy to make hay not just while the sun shines but until by accident they discover a needle in the haystack and make a case study out of it putting pressure on the rest of the gold diggers.
What about the data that you don’t have? Can you draw negative inferences from Big Data? For that you have to know what you don’t have! Can what you have tell you what you don’t? The answer to that is still to be found; available data in a Big Data repository cannot indicate to what is missing. The concept of “found” data predicates that available data set is the whole universe from which correlations are to be created. And that is where many Big Data implementations are unable to deliver any meaningful insights.
The veracity (the 5th V) of information in a Big Data store can throw up many false positives which have been the bane of many projects. Data will never be clean unlike conventional data warehouses and the velocity will keep you challenged to move with agility. The ability to come out of the clean and complete data mindset is the beginning of what Big Data may enable. From here to get to Value is a long journey with no near-term goals; if you hit something, consider yourself lucky and celebrate.
My suggestion to my friend was to get started the way he believed he will be able to deliver what the business wanted. Forget the discussion on technology and focus on what matters, insights driven by data. If he can get traction from some CXOs based on the results, no one will grudge whether they came from Big Data or Small Data. The business leader in him understood while the technologist wanted to fight; for his benefit, I hope the business guy prevails.