Posted by: Brian Gracely
Big Data, Data Scientist, DBaaS, Facebook, Google, Hadoop, NoSQL, Open Source, Pivotal, Platform, VMware
This past week I had the opportunity to attend the GigaOm Structure:Data conference in NYC. Unlike many industry conferences, which are sponsored by a vendor or the agenda is dictated by a specific technology, this show did an excellent job of bringing together a broad mix of technologies, vendors, customers and thought leaders. While the hype of the conference was “Big Data”, the technology and deployability are still in the early stages for all but the top 1-2% of the industry. There is a summary from GigaOm here, as well as broad media coverage. Going back through my notes, I found the following thoughts most worthy of follow-up.
- Big Data is Difficult.
- Data Huggers are the New Server Huggers - Company after company I spoke with highlighted that existing organizational structures are their #1 challenge to Big Data strategy success. Organizations love their data. Organizations don’t love sharing their data with other groups, even within the same business.
- Forget the Economy, Big Data is the 1% Club - While Business Intelligence and Data Warehousing have been around for quite a while and are deployed at many companies, the companies that are able to leverage the newer technologies (Hadoop, NoSQL databases, R, etc.) to unlock business insight in real-time is still extremely small.
- Big Data != Fast Data - It becoming clear that there is a big difference between Big Data and Fast Data, both in technologies and use-cases.
- Hadoop is the Foundation, but beyond that… - While the Hadoop market is competitive (Apache Hadoop, Cloudera, Hortonworks, IBM, MapR, Oracle, Pivotal, SAP) are all trying to sell a Hadoop-centric product, the real wars will be with the tools, frameworks and extensions that are layers on top of Hadoop.
- “Telemetry” will make its way into your vocabulary – Whether it’s called “Internet of Everything” or “Sensor Data” or something else, you will begin to hear a massive push about how telemetry data will be attached to people and machines to drive real-time fast data and unlock new markets.
- Connecting to the legacy is key – Many companies are focused on being able to not only integrated legacy datastores into Hadoop-based “Data Lakes” or “Data Reservoirs”, but also focusing on how to integrated existing SQL tools and skills into a Hadoop environment. The SQL aspect is attempting to overcome the shortage of Data Scientists and extend Big Data out to more generalist business users.
- Data Scientists are in massive demand – This has been highlighted before, but it’s still a massive shortage in our industry. Not only is there demand for people to analyze the data, but also massive demand for people that can setup/run Hadoop environments and integrated legacy systems with Hadoop.
- Huge Opportunities for Big Data On-Demand – While many Cloud Service Providers offer various types of on-demand IaaS resources or on-demand Database services, the ability to experiment on Big Data or Fast Data use-cases is massive. With setup being (still) complicated, there are huge opportunities for Cloud SPs to expand their offerings to be turn-key, as various sizes, to accelerate the time to analysis and action.
- Bandwidth is Still a Problem – While Big Data might be a big deal, it still hasn’t overcome that pesky little physics issue – the speed of light. It will be interesting to watch how the location of data (on-premise vs. in public clouds) shapes the industry over the next 3-5 years.
- Get familiar with Open-Source Frameworks - Whether you’re deploying with Puppet or Chef, coordinating resources with Zookeeper, or developing tools that leverage Pig or Hive, it’s time to start familiarizing yourself with open-source frameworks and community-based knowledge sharing. Big Data (or Fast Data) is attempting to solve challenges that are beyond a single organizations, so using the tools and frameworks of the community will help accelerate your chance at success.
- Your Data is Your Next Product/Market – It was interesting to hear how many side conversations involved companies that currently possess massive amounts of industry-specific data that are now looking to unlock (and sell) this to external industries. For example, intelligent weather data could be extremely valuable to dozens of companies (finance, insurance, farming, transportation, grocery stores, airlines, etc.) that may be able to make better decisions from data that was never previously available to them.
- Big Brother Knows About You – You’re welcome to keep fooling yourself into believing that you have a level of privacy or information security. Think again. Every device you interact with, every transaction you make and every location to visit is being tracked, correlated, analyzed and acted upon by someone.