Cheap Computing

Feb 4 2013   9:13PM GMT

Cut Big Data Down to Small Data and Save Big Bucks

Robin Robin "Roblimo" Miller Profile: Robin "Roblimo" Miller

A visualization created by IBM of Wikipedia edits.
At multiple terabytes in size, the text and images
of Wikipedia are a classic example of big data.

Big Data was one of the buzziest buzzwords of 2012, and is buzzing only a little less in 2013. An awful lot of money is being flung at Big Data. That’s nice for companies that have an unlimited supply of cash, which yours probably does not. Let’s face it: Big Data is useful if you’re trying to determine what happened to the universe 17 milliseconds after the Big Bang. Big Data is useful in a business sense for Wal~Mart, which handles over one million customer transactions per hour. Big Data is also fine for the U.S. government, which owns six of the world’s 10 biggest supercomputers. Now let’s look at your business. How many supercomputers do you own? Probably none. So instead of seeing how Big your Data can be, you are probably better off cutting your data down to size. Let’s look at some ways of doing that — and how you’ll save a Big Bunch of Bucks in the process.

It’s easy to say, “storage has gotten cheap.” True. Your cost to store data, per Gigabyte, is lower than it has ever been, and keeps on dropping. But just having data around is meaningless. Sorting it, correlating it, and analyzing it is still costly, especially when it comes to human time at the end of the process, which is still needed to see un-obvious correlations between different data points that machines can not yet spot. The human doing that correlation might be you, and don’t you have enough work to do already without adding to the load?

Freelance CTO Mark A. Herschberg says, “In many cases a company can gain insights from ‘small data.’ Instead of culling through gigs of data take a sample. This can be as simple as picking data from a half dozen days throughout the year or picking every 1000th row of data. If you can get a sample of even a few thousand or tens of thousands of rows of data you can drop it into excel and do some basic data analysis on them. This can be done with just a few SQL statements and a few hours of time.”

Anne Rozinat, co-founder of Fluxicon, suggests learning about Process Mining, and has kindly given us two links to videos (Video One; Video Two) that explain the concept better than we can here in a few words. The point here is that process mining, done correctly, only requires generic PCs or servers that you already have, not specialized machines or costly supercomputer time. This technique is absolutely worth checking out if you are analyzing processes rather than discrete event or item characteristics.

Josh Farkas, founder of Cubicle Ninjas, says, “I feel that the shift from big data to small data in consumable chunks has already occurred, and that small businesses may have the upper hand.

“The enterprise has long had the data advantage, but the birth of small business software as a service was a game-changer. Many small businesses manage this data better because of the more robust web functionality available today.

“Touch interfaces have actually helped this trend too. Interfaces are expected to be clear, intuitive, and a pleasure to interact with. Because of this, big data can be refined more elegantly to just your most pressing bits.”

David Smith, VP of Marketing and Community for Revolution Analytics, contributes these suggestions:

    • Start with flat files. There are definite performance and administration benefits to working with a database, Hadoop, or other data management systems, but you’d be surprised at the insights you can find in a simple comma-separated data file exported from your web server or CRM system. These flat files can still be very large (millions or even billions of rows), but that doesn’t mean they’re unmanageable.
    • Use open source tools: There are many excellent tools available that work with flat ASCII files available free of charge. From simple tools like “grep” to more powerful data-processing languages, many of your data-processing needs can be met with open-source solutions.
    • Predictive modeling with R: If you’re looking to predict the future rather than just summarize the past, the R language is also open-source and includes every predictive modeling algorithm you’d ever need.
    • Get some help: While these approaches are powerful and inexpensive, they’re also not for the faint-of-heart: think coding, not pointing-and-clicking. You’ll need someone with the necessary expertise to help, but fortunately there now many graduate programs in Data Science which provide new jobseekers with skills in data management, open source tools and predictive analytics


“On the downside,” David notes, “while these ad-hoc approaches will give you useful information about your business at a snapshot in time, they don’t necessary lend themselves to automated production systems that can provide a real-time dashboard for your business. But even these simple steps can give SMBs enough guidance to improve their bottom lines and upgrade to production-ready Big Data systems.”

And finally, we have this statement from Rachel Delacour, CEO and cofounder of BIME:

  • No doubt, Big Data has become an overused term to lump together too many technologies, tools and terabytes that defy easy categorization or simply evoke images of the information deluge. But 2013 will make one thing abundantly clear: Every single business has to cope with this new data-driven world in which data is a currency, an asset and a liability. It all depends on how you tackle it and turn it into something immediately valuable. Instead of falling for analysts and vendors who tout the really big thing, more organizations will start thinking small data in 2013, particularly if they are an SMB.Here’s why: Big Data starts with a single consumer, a single sensor, a single transaction or a single click. It all flows from here, and the sooner a company picks up on those tremors, the better their read on the earthquake that may be coming. So the best way to think about Big Data is to think about its sources. Don’t become paralyzed by the size of the entire datasphere and instead focus on three simple questions: What types of data do I have? What new types of data can I access thanks to the Net? And what new questions can I keep asking to improve my business? Asking them is almost free. Tools in the cloud let every small explore data and get answers, fast and affordable.

46 experts responded to my online query, “Can cutting ‘big data’ down to ‘small data’ save money?” They all agreed that Big Data can easily get out of hand, and that there are many ways to cut big data down to a size that is affordable and useful for even the smallest business. Today we’ve looked at a small selection of the answers we got. Please stay tuned to Cheap Computing. We’ll run more responses to our “Big Data” question in the near future, and will also be asking our experts many more questions about how you can save money on your home and business IT needs.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: