While adoption of Spark continues to grow, this year’s Spark Summit highlighted some of the ways in which the big data processing engine is still a work in progress. In particular, it’s stream analytics processing engine continues to struggle with a few hiccups that can limit its utility. That said, presenters still by and large believe Spark is the best option for streaming analytics, as other tools are even further behind in maturity.
The Summit also featured comments from Doug Cutting, the originator of Hadoop, on where the open source data processing industry is headed and what might be next. Take a listen to this edition of Talking Data to hear more about these topics and more from the 2016 Spark Summit.
The development of AI had been stalled out for years, but all of a sudden there’s been a huge surge in interest. Why now? The answer is big data. It turns out big data was the missing knowledge bank that was needed to make machines truly intelligent. But now that enterprises have stashed huge volumes of data, they are starting to unleash learning algorithms on it, creating large-scale learning opportunities.
Kafka and Spark Streaming are arising as central parts of new real time data architecture. In this podcast, the Talking Data crew hooks up with Jay Kreps and Doug Cutting, of Confluent and Cloudera, respectively, who outline some of big data streaming’s pertinent facts.
The recent PBS film The Human Face of Big Data stirred plenty of reaction of social media and in blogs. In this edition of Talking Data, we take a look at what the show got right and what it might have missed.
The documentary was certainly a high-level overview of big data geared mostly toward a popular audience. With that in mind, it did do a good job of introducing some positive examples of big data and analytics. But while the show was not uncritical, particularly around the areas of privacy and security, it missed some important opportunities to discuss the potential downside of big data, mainly as it relates to distributing the benefits of technology throughout society.
Take a listen to this podcast to hear more about how people are reacting to the documentary.
Batch processing came, went and returned. Now it may be leaving again, MapR’s Jack Norris tells the Talking Data podcaster Jack Vaughan in our latest episode. According to Jack Norris, senior vice president of data and applications, we will see more convergence in real time and batch architecture as Apache Spark joins Hadoop, and event streaming is matched with big data storage in the world of big data. Norris spoke about this and other pressing data topics in the podcast.
In this episode of the Talking Data podcast, Ed Burns and I discuss use cases for machine learning. Vibrant application areas include insurance risk analysis, credit scoring, recommendation engines and digital ad placement. While machine learning does seem to undergird a lot of modern big data analytics work, implementations still remain largely the province of the advanced data scientist. Machine learning methods have deep roots in statistics and artificial intelligence, and how quickly these methods can go mainstream remains a matter of conjecture. Check out podcast, and stay tuned.
There’s not doubt that IoT analytics has become one of the most hyped technologies in 2016, but behind the hype, there may be a glimmer of promise. In this edition of Talking Data, we try to look beyond all the excitement to see signs that the promise of IoT analytics is for real. We assess various ideas, such as analytics at the edge, smart cities and data privacy and security, and how they are likely to play out when businesses start to analyze IoT data.
In the year ahead, businesses are looking for new ways to analyze data and new tools to help them, according to Goutham Belliappa, an analyst with consulting firm Capgemini.
In particular, Belliappa is looking at ways businesses plan to monetize their data, analyze data from Internet of things-connected devices and make better use of cognitive computing. Take a listen to this podcast to learn more about why these trends are expected to be hot in 2016.
The Oxford Dictionary crew may have their fun selecting a word of the year. Heck, this year they picked an emoticon. You could say the glyphs have it. For the Talking Data podcast crew, no review of 2015 would be complete without a look at Apache Spark, which was a word with legs for sure. Spark is the Hadoop ecosystem component that continues to ascend in trends Google, and other. Catch this podcast, for auld lang co-syne.
Sometimes machine learning can seem to be a million miles away from our everyday realtity. But here is a story that hits home, even if you are not a marine researcher out on the frothy waves at this particular moment. It is a story about a contest sponsored by MathWorks to help find algorithms that can help identify right whales that are a member of a species that has seen some desperate days.
Today’s Talking Data podcast guest Kristen Khan of NOAA said she and her colleagues have watched the growth of machine learning projects that identify images. They have wondered: If such analytical algorithms could trim time from what is now a very labor intensive whale identification task, could that free up staff for more proactive efforts to save the whales? Listen here.