Enterprise IT Watch Blog

Mar 2 2015   1:06PM GMT

Spark and its growing pains

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh


Big data image via Shutterstock

By James Kobielus (@jameskobielus)

Fending off industry hype requires that we stay focused on the maturity, or lack thereof, of any new technology. Just because pundits, developers, and venture capitalists are currently jazzed by this or that new tech doesn’t mean the bubble is robust enough to withstand full-bore commercialization. Enthusiasm withers fast when people start to realize that the quick riches they expected from promising new technologies may never materialize.

Let’s hope that commercialized Apache Spark offerings start to live up to the incessant hype that touts it as the evolutionary advance beyond Hadoop. It’s a promising technology, but, as we’ve seen with Hadoop, development into an enterprise-grade big-data platform takes years, requires substantial investments across the ecosystem, and may not happen unless the new approach hits does something better and/or cheaper than alternatives. Since the beginning of this decade, the Hadoop industry has steadily addressed those challenges and developed into a substantial and robust big-data platform.

Spark isn’t quite there yet, so we should give it time to come into its own. Almost a year ago, I started to toss my thoughts on Spark into the general pool of big-data punditry. I devoted my first Spark-centric post to a fairly detailed overview of what Spark is, does, and supports. A few months later, I looked at Spark’s evolving role in the hybridized ecosystem of big-data platforms. A few weeks ago, I commented on the arbitrariness of Spark’s inclusion in the Apache Hadoop project’s core scope. On the latter point, Spark’s focus on real-time, streaming, in-memory, and graph-centric machine-learning applications makes it quite distinct from “traditional” Hadoop, though both leverage HDFS as a storage subsystem.

Just as Hadoop’s issues have occasionally eclipsed its strengths in the minds of enterprise IT professionals, Spark’s immaturity is coming into clearer focus. For Hadoop professionals, this recent article reads like déjà vu. It cites the following growing pains with Spark on the road to becoming a robust enterprise-grade platform:

  • Lack of long-time, broad, or deep experience with Spark within the IT and big-data professions
  • Lack of detailed documentation on Spark that includes in-depth guidance on the toughest technical issues and advanced application scenarios
  • Lack of comprehensive tools for managing, monitoring, securing, tuning, optimizing, and recovering Spark jobs and clusters
  • Lack of Spark integration with a wide range of middleware and databases
  • Lack of broad range of commercial Spark solutions and technical support resources
  • Lack of broad API coverage for Spark that includes languages beyond the core of Scala

All of this sounds very much like the Hadoop market 2-3 years ago. Few industry observers doubt that the Spark industry will address each of these issues as the market matures. But, first, Spark must gain mainstream adoption at a reasonably brisk pace in order for that maturation to rise to the level at which Hadoop is now.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: