Enterprise IT Watch Blog

Oct 3 2014   3:14PM GMT

Converging in-motion and in-memory analytics

Michael Tidmarsh Michael Tidmarsh Profile: Michael Tidmarsh

In-memory analytics

Analytics image via Shutterstock

By James Kobielus (@jameskobielus)

Speed isn’t always a value. Faster data is not necessarily better data. If the data whizzes by faster than you can extract value, it’s a waste.

Stream computing is much more than low-latency middleware. Its value-added applications are several. It supports high-throughput filtering and analysis across disparate data streams. It delivers real-time updates to consuming applications. It enables rich query of high-velocity data. And it provides continuous updates of pre-processed intelligence to downstream repositories, ranging from small databases to big-data clusters.

In all of these ways, stream computing is a central component of any comprehensive big-data infrastructure. This recent article does a good job explaining how stream computing platforms, such as IBM InfoSphere Streams, can complement Hadoop, enterprise data warehouses (EDWs), in-memory databases, and other big-data platforms that are optimized for data that spans the latency spectrum from “at-rest” to “in-motion.”

What I found especially interesting was the discussion of “live data marts” that are refreshed by stream computing. Author Kai Wähner describes the concept as one of “provid[ing] end-user, ad-hoc continuous query access to this streaming data that’s aggregated in memory….A live analytics front ends slices, dices, and aggregates data dynamically in response to business users’ actions, and all in real time.”

What’s useful about this “live data mart” concept is that it blurs the increasingly arbitrary distinction between “in-motion” and “in-memory,” on the one hand; “in-motion” and “at-rest” on the other; and also (if it were possible to have a third hand) “in-motion” and “in-process.” The purpose of stream computing is to drive speedier results through delivery of live intelligence into live business processes. Ideally, every “at-rest” big-data repository–be it enterprise data warehouse (EDW), Hadoop, or whatever–can and should host live data in order to drive live decisions.

Live data marts should live on a converged infrastructure of stream computing, complex event processing, and various real-time-optimized big-data platforms, including the EDW. I’m happy that Wähner picked up on the notion that stream processing can figure into an EDW modernization strategy. I prefer to call this the “live EDW”:

  • Using stream computing to filter and reduce EDW storage costs
  • Leveraging the structured, unstructured, and streaming data sources required for deep analytics that are hubbed on the EDW
  • Combining streaming and other unstructured data sources to existing EDW investments
  • Delivering improved business insights from the EDW to operations for real-time decision-making

Essentially, the “live EDW” would aggregate at least one streaming source with other lower-latency sources into a conformed, continually refreshed in-memory data structure that drives real-time business processes.

 Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: