Open Source Insider

Nov 12 2015   9:09AM GMT

Hadoop is not the only fruit

Adrian Bridgwater Adrian Bridgwater Profile: Adrian Bridgwater


It is true, Hadoop is a key focal point for many of us when we talk about big data — and indeed, open source big data projects.


However, it’s important to think outside the Hadoop box for a number of reasons.

Outside the Hadoop loop

By its very nature Hadoop is open source, so many of its developers and other contributors will naturally revel in the openness of the entire open code surface and work on other projects as well… these ‘tangential’ (many of them substantial) projects are typically complementary to Hadoop.

Where projects in fact compete with Hadoop, that’s also a good thing as it keeps the overall drive for efficiency and functional excellence as sharp as it should be.

Why open source is so good

We might suggest that the there is a core reason for why open source is so well suited to big data… that is to say, if we accept that Hadoop is hard and that the actual implementation of big data analytics is still in its relative infancy, then we can see how the open customisability of open software structures could be better suited to big data projects as they now grow.


Looking outside the Hadooposphere, the Enterprise Apps Today website brings together a much needed selection pack cum Obligatory List Article of some of the other open source big data tools out there.

Lumify is an open source data integration, analytics, and visualisation platform built to help you understand the world of data.

Lumify features include its ability to analyze relationships, automatically discover paths between entities — it can also overlay data as layers on a map for a geographical view of the data model.

Talend Open Studio for Big Data provides simple graphical tools and wizards to generate native code that helps you leverage the full power of Hadoop

HPCC Systems Big Data — as detailed at the above link, “Is a platform for manipulating, transforming, querying and data warehousing your Big Data and is an alternative to Hadoop. It uses the Thor data refinery, Roxie data query/delivery engine and Enterprise Control Language (ECL) as an alternative to Apache Pig. (ECL is claimed to be 4.45 times faster than Pig on average.)”

You can read Paul Rubens’ piece at the above link for more clarification on the other tools available in this space.

Image credit:

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: