Enterprise IT Watch Blog

Aug 4 2014   10:11AM GMT

Algorithms are not magic

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

Big Data
Big Data analytics

Algorithm image via Shutterstock

By James Kobielus (@jameskobielus)

People have invested the word “algorithm” with some sort of mystic power. In the popular mind, that word seems to stand for the secret sauce–or evil spirit–that animates big data.

Attributing the power of big-data analytics to some magical resource called “algorithms” isn’t terribly enlightening. It takes much more than algorithms–which are as diverse, malleable, and promiscuous as molecules–to extract meaningful insights from big data.

More than mere algorithms, what you need are data scientists who get the data in shape for statistical analysis and exploratory visualization. As I noted in this blog from last year, every step of the data scientist’s working method involves selecting from diverse options: analytic problems, subject populations, sources, samples, model versions, predictive variables, visualizations, and so on.

And, oh yes, of course….the right algorithms. Stepping through the standard methodology, as defined in the cited blog, is a sort of meta-algorithmic discipline at the heart of professional data science. If a data scientist makes the wrong choice at any step–including, but not limited to, selecting the right algorithm(s)–they may never find the underlying correlations they seek. Worse yet, they may “find” spurious correlations and thereby inadvertently deceive themselves and others regarding what’s actually going on in their problem space. There is no foolproof mental algorithm to steer statistical analysts in the right direction as they seek the baseline causal factors in any domain.

If you’re unfamiliar with statistical modeling best practices, you may think that the choice of algorithm is simple: just go with something that everybody talks about called “regression algorithms.” But you would be wrong. Not only are there other types of essential data-science algorithms (e.g., clustering and segmentation), depending on what you’re trying to accomplish, but as Vincent Granville states in this recent blog, even if you focus only on regression, there are hundreds of those algorithms to choose from. And you can blend them in countless permutations. You might even develop your own, if you have an especially astute mathematical mind.

The most enlightening aspect of Granville’s discussion is how he characterizes the statistical modeling scenarios within which each type of algorithm is best suited. For a working data scientist, the trade-offs and optimal blending of diverse algorithmic approaches must always be revisited in every new modeling exercise.

It’s clear that no one uber-algorithm will ever be suitable for illuminating the infinite range of statistical patterns that might inhere within real-world data.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: