At times the era of big data has taken on the flavor of the old West – the kind depicted in a movie like The Treasure of the Sierra Madre. While it was seldom an outright confrontation, there’s little question that conscientious data stewards were usurped in some organization by developers who slightly resembled freewheeling bandits such as those you didn’t “need no badges” as they went about their business in John Houston’s film.
We are a few years into this, and now there are signs that a bit of taming is going on in the Hadoop ecosphere. The recent approaches for bringing Hadoop-style data processing into wider production seem to bespeak a change. The General Data Protection Directive (GDPR) is in some part a driver of that change.
At last month’s DataWorks Summit in San Jose, California we spoke with Constellation Research analyst Doug Henschen, who agreed the shift of enterprises to include large-scale open-source distributed data processing in their analytics arsenal is now tempered by increased interest in data governance.
“What you see is companies re-platforming – that the buzz,” Henschen said in this episode of the Talking Data podcast. “Companies understand that they need a sort of next-generation information architecture.”
We have seen attempts before to add tooling to data lakes to tag and curate data, but now the push may be more fevered, and GDPR may be the impetus.
“The push for GDPR has gotten people thinking more and more about the governance aspects of that,” Henschen said. “As they are re-platforming they have an increased eye toward data governance, data lineage, access control, security — all of these good things that we have long required but haven’t necessarily nailed.”
Catch up with the big data doings in this edition of Talking Data. – Jack Vaughan