I know that “big data” has become one of the darlings of the storage industry, as evidenced by the number of times this term is used in online technical media articles. Storage Switzerland, the firm I work for, has added its voice to the chorus but hopefully has provided some clarification. A piece we did called “What is Big Data?” is the first in a series of articles on the topic that attempts to define this overused term and go into why it was created in the first place. What I’d like to do in this blog is synopsize that information and discuss what big data means to VARs.
We’ve coined the term big data and talk about it because it represents a problem. Basically, the sizes of data sets being analyzed and the speed with which users need results from that analysis have exceeded the capabilities of traditional IT infrastructures, especially storage.
There’s also a cultural aspect to big data. It seems that people (especially those outside of IT) assume that business value, market insight or even predictions of the future are available to all, if you can just analyze enough data. The movie Moneyball and several television shows, like “Person of Interest” and “Numbers,” are tapping into this sentiment. There’s even one, “Touch,” that puts the analysis engine inside a person’s head. Of course, all this serves to stoke the fires of big data expectations and enlarge the problem.
The term big data was originally applied (for the most part) to analytics applications. At Storage Switzerland, we call this (not surprisingly) big data analytics, referring to data mining, online transaction analysis, historical trending – basically, any problem that can be solved by comparing and cross-referencing structured information. These applications involve big databases and are pushing the popularity of the open source MapReduce solution, Hadoop.
Big data archive is the other main application, one that involves large (often enormous) files and typically large numbers of those files. The signature use case is in the media and entertainment vertical industry, where users process these large files, often in sequential workflows, to produce motion pictures and special effects — and then save everything. Again, storage is often the bottleneck as companies need these large files as quickly as possible to support our 24-hour information cycle. Or users need to perform their processing step when the previous user is finished — a user who may be located in another city or country — taxing networks as well as storage systems.
For VARs, big data is yet another buzzword that vendors are attaching their products to in an attempt to create a perceived need in the mind of IT users. As usual, VARs must be ready to clarify the concept for those users, who by this time have been told that everything is big data and every vendor has a solution for it. When you do run into a real big data application, it can be a big engagement with a big PO. There are several technologies that I’ve covered in this blog over the past several months that have viable big data solutions. One is the subject of this white paper on building a storage infrastructure for big data archives. Stay tuned to future posts for more big data-related solutions and more ideas about how to leverage all the attention it’s getting.
Follow me on Twitter: EricSSwiss