Microservices Matters

Sep 6 2012   6:14PM GMT

Can stream-based data processing make Hadoop run faster?

Jack Vaughan Jack Vaughan Profile: Jack Vaughan

The Apache Hadoop distributed file processing system has benefits and is gaining traction. However, it can have drawbacks. Some organizations find that starting up with Hadoop requires rethinking software architecture and that acquiring new data skills is necessary.

For some, a problem with Hadoop’s batch-processing model is that it assumes there will be downtime to run the batch in between bursts of data acquisition. This is the case for many businesses that operate locally and have a large number of transactions during the day, but very little (if any) at night. If that nightly window is large enough to process the accumulation of data from the previous day, everything goes smoothly. For some businesses though, that window of downtime is small or non-existent and even with Hadoop’s high-powered processing, they still get more data in one day than they can process every 24 hours.

For organizations with small windows of acceptable, an approach that adds components of stream-based data processing may help, writes GigaSpaces CTO Nati Shalom in a recent blog on making Hadoop faster.  By constantly processing incoming data into useful packets and removing static data that does not need to be processed (or reprocessed) enterprise organizations can significantly accelerate their big data batch processes.  – James Denman

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: