by Rob Barry
The Hadoop World conference attracted a diverse crowd this year with speakers from IBM, Facebook, Intel, Amazon, the telecom industry and others. With a growing set of discussion topics and wider number of sectors represented, it appears the Hadoop architecture for handling large data sets has tapped a mainstream artery.
“A year ago, if you looked at what people were doing with Hadoop, it was primarily focused on the Web space,” said Christophe Bisciglia, founder of Cloudera. “This year we’re seeing it across many verticals.”
Apache Hadoop began as a system for processing large volumes of data without depending on standard relational databases. Inspired by Google’s MapReduce and Google File System (GFS) papers, Hadoop was initially popular with advanced Web companies like Amazon and Yahoo.
As the tooling has matured, many have come to consider Hadoop a powerful scalable batch data processing system rather than simply a distributed file system. This is largely due to the efforts of early adopters.
Bisciglia said that while Cloudera has worked to simplify Hadoop’s packaging, configuration and deployment, Yahoo has been driving the scalability, API stability and security. Facebook has also contributed.
All this has worked to boost enterprise interest in the open source framework. The financial and telecommunications industries have taken notice.
“When you’re working with very large volumes of data, you have to make very important decisions about what to keep, and a lot of this involves the cost and size of your data warehouse,” said Bisciglia. “Hadoop enables enterprises to store and consume all of their data.”
The major benefit of Hadoop is that the storage and processing systems are married, which keeps data processing quite fast – well, this and the fact that it’s free. Of course, it may not be a technology for enterprises that prefer to keep their storage and processing environments separate.
Hooked into Amazon Web Services, Hadoop can deliver a lot of power on-demand. This could make the technology rather attractive to startups.
“Two [developers] in a garage can now process 50 terabytes of data for pennies on the hour,” Bisciglia said. “As more startups have started to do this, the traditional slower-to-act enterprises are starting to follow.”