Pentaho has had a busy week — the firm has had its first week out in full public scrutiny as the new Pentaho, a Hitachi Data Systems company and staged its second annual PentahoWorld customer, partner, user & developer event.
As part of the shenanigans, Pentaho announced that customers including Halliburton Landmark, IMS and KDS are using its platform to “reimagine established industries” (as the PR spin doctors would say) to blend, integrate and orchestrate machine-generated big data and deliver for analytics embedded at the point of impact.
“Big data and the Internet of Things are disrupting entire markets, with machine data merging the virtual world with the physical world. We’ve really only just scratched the surface of how IoT will reshape sectors of the economy,” said Quentin Gallivan, CEO of Pentaho.
According to McKinsey Global Institute “The Internet of Things: Mapping the Value Beyond the Hype,” the IoT market could have an estimated total economic impact of $3.9 trillion to $11.1 trillion per year in 2025.
With this market opportunity also comes IT roadblocks. McKinsey notes that the lack of open-standards and agile platforms may slow the adoption process across the enterprise.
“IoT applications can get very complex very quickly due to the extensive breadth and diversity of data sources and analytics involved, as well as the challenge with standards,” said Vernon Turner, SVP of Enterprise Systems and IDC Fellow for The Internet of Things.
“For companies and developers looking to unlock the value of IoT, the focus will be on technology vendors that provide an open and agile platform,” added Turner.
Halliburton Landmark — Oil & Gas
“Oil and gas is an old industry with a new take on technology. At Landmark, a Halliburton business line, we’ve embarked on an enterprise-wide deployment of Pentaho across our multiple industry platform offerings to improve collaboration between oil and gas companies and the broad supply chain,” said Kumar Shanmugavel, Product Manager, Halliburton Landmark. “By expanding our advanced analytics capabilities to include monitoring machine sensor data, our deployment has improved pump safety and prevents spills by predicting failure rates, resulting in 60 to 80 percent less cost and 2x to 4x faster development.”
Intelligent Mechatronic Systems (IMS) — Automotive
“As a leader in the connected car industry, IMS is creating revolutionary, award-winning technology that enables drivers to be safer, smarter and greener,” said Christopher Dell, Senior Director, Product Development and Management, IMS. “Pentaho enables us to derive greater meaning from the big data collected from our connected car programs, increasing our competitive advantage and enabling us to offer customers the most comprehensive end-to-end connected car solutions on the market. For example, IMS is currently utilizing high performance analytics to drive better outcomes for both insurers and drivers in usage-based insurance programs. We are also looking to leverage this technology to grow our other connected car programs and services, such as road-usage charging and fleet management offerings, as well as expanding to new opportunities in the related IoT market.”
Kirchhoff Datensysteme Software (KDS) — Manufacturing
“Plastic compounding is complex and highly specialised process. The industry is an order-driven small batch process in which products can be manufactured in different plants; it’s therefore essential to identify the optimal production path,” said Oliver McKenzie, Managing Director, Kirchhoff Datensysteme Software. “Poly.MIS was built by industry experts on the Pentaho business analytics platform and helps plastic compounders diagnose the causes of poor throughput times and serves as a basis for continuous production path optimization, laying the fundament for a smart factory.”
How is Pentaho doing under its new uber-parent Hitachi Data Systems (HDS)?
Very well, thank you for asking, said the EMEA chief and the comms lead in a pre-conference informal session prior to this big data analytics driven conference.
From trains to televisions
Hitachi, it appears has a vested interested in the Internet of Things (and so, therefore, Pentaho’s data capabilities) as a company that produces everything from televisions to trains.
All these devices have connectivity these days, so data forms the lifeblood in what Hitachi likes to call ‘The Internet of Things that matter’.
(Ed — but ALL devices need love too right?)
NOTE: The event has a heavy Dev-by-Devs developer track, this is a hands on symposium with plenty of coding activity – people getting their hands dirty on interface connectivity… you know the kind of thing.
Pentaho CEO Quentin Gallivan
CEO Gallivan took the stage to claim that the “unstructured element of data is doubling every three months”… much of this coming from the Internet of Things, of course.
The growth areas for the Internet of Things (and the big data analytics that will support it) are in key areas such as predictive maintenance for industrial equipment and smart cities.
The difference, from Pentaho’s perspective, is that analytics has to start happening INSIDE the application itself so that it can start impacting application behaviour at the point of impact.
More technical sections of this keynote were delivered by Chris Dziekan — he is big data chief product officer and EVP at Pentaho.
It appears that much of the effort going into working with big data is focused on which data tool mechanics we use…
… auto-modelling (and in-line modelling) in the firm’s PDI product can start to build the data model in a more automated fashion. This type of analytical model editing also allows users to engage in the model editing process i.e. a data developer could start to input meta data to help define the schema emerging from a data lake as it comes out of the water.
As also know, big data configuration can be a hard thing to do. Pentaho has been working with its latest release to help create pathways to built in testing and troubleshooting.
Talking about the operations of his firm under its new parent Hitachi, Dziekan said “Hitachi allows us to stretch into new places and scale in new ways, without touching the Pentaho agenda.”
…. this blog will expand and link to other stories from PentahoWorld 2015.
The ONOS community and The Linux Foundation have now partnered in an attempt to impact on the future of networking.
ONOS is the open source SDN (software defined networking) Network Function Virtualisation (NFV) operating system for service provider networks – it is architected (logically, well… you’d be surprised if it wasn’t) for high performance, scale and availability.
The aim of this news to try and help ONOS to realise its full potential as an open source SDN and NFV project for service providers.
The new collaborative project will build open source technology for service providers to monetise SDN/NFV, while helping vendors and service providers invent new business models.
The partnership will focus on creating disruptive SDN solutions featuring open source software platforms, white boxes, a range of network control and management applications and the ability to create and deploy innovative services.
“Service providers are increasingly adopting open source software to build their networks and today are making open source and collaboration a strategic part of their business and an investment in the future,” said Jim Zemlin, executive director of the Linux Foundation.
“The Linux Foundation recognises the impact the ONOS project can have on service provider networks and will help advance ONOS to achieve its potential. The partnership combines the best of the two organizations’ capabilities in support of a strategic vision to transform service provider infrastructure with open source SDN and NFV.”
ON.Lab and the ONOS project will continue with their respective boards. ONOS’ mission will remain the same and that is to accelerate the adoption of SDN and NFV in mission critical networks based on open source platforms and solutions.
This time last year the Computer Weekly Open Source Insider blog reported on the inaugural PentahoWorld 2014 conference and exhibition.
As many will already know, Pentaho is an open source data integration and analytics company with a special penchant for data-driven intelligence, data warehouse refinery controls and data streamlining i.e. data goes on a journey, so let’s be aware of that element.
Data on a journey
As I have written elsewhere, sometimes data is stored, sometimes data in analysed in greater depth, or sometimes it is just passed along to the next node in the distributed data supply chain. Hitachi’s move to snag Pentaho is something of an affirmation of the need for these ‘data machining processes’.
It has been a busy 12 months for the firm — getting bought by Hitachi Data Systems (HDS) doesn’t happen without a couple of bumps, but (on the face of it so far) it appears that a) the users are being well looked after and b) the Pentaho name is being held intact inside the parent firm as an HDS brand.
The promise from Pentaho is as follows — attendees can learn more about the acquisition by HDS and the positive impact for users.
“Attend the Social Innovation breakout track presented by HDS to learn about the new solutions that will drive value in a world dominated by the Internet of Things,” says Pentaho.
Onward to PentahoWorld 2015 and we see the firm staging event #2 once again in its home state of Florida with roughly the same kind of audience composition:
• 75% developers
• 25% business decision makers for data and analytics
Extra developer love
The only thing is… this year there’s extra developer love with a new track called Developers By Developers.
This is a chance for programmers to have technical ‘how-to’ questions answered by the actual team that developed the products.
“Learn first hand from key Pentaho developers about customization, development best-practices and pro-tips, the latest techniques and sources for blending, next-generation plug-in development and more (can you think of anything more? Let us help!),” asks the firm.
Attendees can also expect some clarity on the Pentaho roadmap and what we can expect in Pentaho 6.0 and beyond.
Chief product officer Chris Dziekan will present the firm’s three-year roadmap that supports big industry trends and the Pentaho vision will be laid out by Quentin Gallivan, the firm’s chief executive officer.
Pentaho says customers across industries – Automotive, Aviation, Maritime, Oil & Gas and Telecommunications are using its platform to perform data-driven software application creation (these customers include: Halliburton Landmark, IMS and KDS).
The company will also use the event to talk about a new report titled “Delivering Governed Data For Analytics At Scale” — selected “findings” include:
◦ 52% of firms blend together 50 or more distinct data sources to enable analytics capabilities.
◦ 34% blend 100 or more data sources, and 12% blend 1,000 or more.
◦ More than 60% of survey respondents rated data quality, as well as security and privacy, as very important aspects of data governance.
◦ Data quality, security, and privacy are paramount in governance.
◦ Different types of data require different levels of governance. Data professionals recognize that all data is not created equal.
You want one fact more fact about PentahoWorld 2015? The hotel has a ‘lazy river’ again… oh, okay, I’m sold.
Basho Riak TS arrives this month, but what is it?
Well, first of all, what is Basho?
Basho is a ‘data platform’ (it’s software, of course) that provides the services to support multiple database models optimised for key value, time series and large objects.
So, what is Basho Riak?
Riak is a distributed NoSQL database.
… and what is Basho Riak TS?
Riak TS is a distributed NoSQL database and key/value store optimised for fast reads and writes of time series data.
Doh! TS = time series, get it?
NOTE: Time series data (and indeed time series applications) is sometime also called ‘time stamp’ data and is simply data that has been time coded… as we now build out the Internet of Things (with all its environment sensors etc.) the ability to know when data was created becomes arguably even more crucial than in the past.
Back to the news… this product then is a distributed NoSQL database architected to aggregate and analyse massive amounts of sequenced, unstructured data generated from the Internet of Things (IoT) and other time series data sources.
According to Accenture, the IoT will add $14.2 trillion to the global economy by 2030, enabling companies to capture new growth and boost revenue.
As more and more enterprise applications collect IoT data, specifically time series data from sensors, they need fast, reliable and scalable read and write performance (so says the company) — so to accomplish this, the data must be stored, queried and analysed together.
Unlike traditional databases, Riak TS is built to store and retrieve time series data with enhanced read and write performance says the firm — Basho insists that Riak TS can be operationalized (they mean deployed) at lower costs than traditional relational databases and is easy to manage at scale.
The PR quote parade
“At The Weather Company, we manage 20 terabytes of new data a day, including real-time forecasting data from over 130,000 sources. The sheer volume of time series data requires databases that can efficiently and reliably store and query time series data. Riak TS delivers on this need and allows us to perform the associated queries and transactions on time series data, while maintaining high availability and scale,” said Bryson Koehler, executive vice president and CIO, The Weather Company.
“The rise of unstructured data presents a significant opportunity for innovation. As a result, companies are demanding database solutions that are operationally easy and specifically optimized to handle this type of data. Built on the same core Riak foundation, we now provide a solution specifically optimized for storing and retrieving unstructured data, making us the only NoSQL player that has specialized offerings for key value, large object and time series data. With Riak TS, customers can more easily scale and execute on Internet-of-Thing uses cases and more,” said Adam Wray, CEO, Basho.
Couchbase Server 4.0 is designed to give software application development pros a route to building more apps on Couchbase.
What is Couchbase?
Couchbase is an open-source distributed NoSQL document-oriented database that is specifically optimised for interactive applications — the play here is: the power of SQL with the flexibility of JSON, in one place.
(Ed — aren’t, like, all applications, kind of interactive applications?)
When Couchbase says ‘interactive applications’, it is referring to document access, index and query power in terms of read and write data access.
The new release introduces Couchbase’s own SQL-compatible query language for this NoSQL system, potentially then expanding the total deployment areas for the platform.
According to the firm’s website, users can, “Sort, filter, transform, group, and combine data with N1QL (“nickel”) — a declarative query language that extends SQL for JSON — by leveraging language and framework integration and fluent APIs, or writing query statements.”
“Build and extend applications with greater agility by separating how data is queried from how it is modeled. This powerful abstraction enables applications to model data one way, but query it in many ways — including those that may not yet be anticipated.”
The firm’s products and engineering ‘veep’ Ravi Mayuram says that with N1QL, and what he calls “foundational improvements” like Global Secondary Indexes, Multi-Dimensional Scaling and Cross Datacenter Replication, the firm is a new breadth of functionality to deploy a single distributed database under the majority of web, mobile and IoT applications.
“N1QL (Nickel) helps developers build enterprise-class applications with less code and greater agility. N1QL is an efficient and complete declarative query language that makes it easy for developers familiar with SQL to build applications on top of a JSON data model that can be extended on demand,” said the company, in a press statement.
Couchbase Server 4.0 with N1QL also enables standard SQL-based reporting and data visualisation tools to access data stored within Couchbase.
Through ODBC and JDBC connectivity provided via Simba drivers, that can work with both the standard SQL-92 and N1QL dialects, for insight using the most widely adopted BI and data visualisation tools, including Microsoft Excel, Tableau, Looker, Qlik and more to access data stored in Couchbase.
Sometimes the best news is hidden… and isn’t always news.
You’d have to be looking hoard to find this… but deep inside a PDF white paper written by KPMG, the firm has justified its reasons for using, adopting, developing and subsequently releasing open source software.
KAVE man like big data
The KPMG Analytics and Visualization Environment (KAVE) is an open source big data offering.
The software itself is described as a modular big data platform that can be tailored to each user’s needs.
According to the firm, “Through complete use of your own data, you generate value; for people, for society, for customers, for businesses and governments. The first step lies in overcoming obstacles common to many organisations, to easily unlock the value of your data, external data, and to develop new applications.”
Validation for open source
KPMG has validated and justified its reasons for choosing open source components and says it selected software on the following basis:
- In environments where there are no ‘sufficiently advanced’ close-source competitors
- Where there exists a choice of licenses for use by commercial and non-commercial organisations
- Where the software in question exhibits what we might label as ‘class-leading’ performance or it can be said to be a class-defining solution with a history of excellence
- Where there exists dynamic and actively good support in terms of an vibrant user community and/or an open source contribution community
- Where there was full horizontal scalability for immediate use in full blown enterprise environments
Mendix is one of those companies that says things like — we drive digital innovation by empowering customers to bring new digital products to market.
It’s generically non-specifically hard to swallow, right?
What the firm actually does is produce a rapid application development platform for software engineering.
The firm’s new Mendix 6 version is remarkable for its support for offline functionality in mobile applications — it also has a model API and open source platform SDK offering enhanced import & export of application models.
Derek Roos, Mendix’s CEO and co-founder says that Mendix is the first in the industry to offer out-of-the-box offline mobile support across platforms and devices through a model-driven rapid mobile app development approach.
“Without any code, rapid developers can build mobile applications that make use of static resource storage, read access to data, and data entry caching to maintain consistency of user experience and performance even when disconnected or offline,” he said.
The Mendix 6 Model API and open source Platform SDK also boasts the chance to eliminate vendor lock-in – with the new model exchange functionality, application models can be easily exported for documentation purposes, or to port applications to other platforms, increasing transparency and eliminating lock-in concerns.
Model import capabilities support automated cloud migration of legacy applications, allowing (so says Mendix) teams to accelerate application modernisation at massive scale
Also here, new API’s allow static analysis on application models to check for inconsistencies, ensure quality standards, and improve maintainability.
Conferences come and go, but Apache: Big Data Europe and its sister event ApacheCon Core Europe 2015 is kind of special… as it’s a pure thoroughbred user conference.
Compatibility & quality paramount
As already reported on the Computer Weekly Open Source Insider blog, this event is dedicated to the vendor-neutral work of the Apache Foundation and its focus on products that power petabytes of data, teraflops of operations and billions of objects.
Keynote: open data advancement
The keynote panel at the event itself was called the ‘ODPi — Advancing Open Data for the Enterprise Panel’ and it featured – Anjul Bhambri, IBM; Konstantin Boudnik, WANdisco; Owen O’Malley, Hortonworks; Roman Shaposhnik, Pivotal — it was moderated by C. Craig Ross, director of developer programs, The Linux Foundation.
A reference implementation specification has been developed to produce a more formalised documented approach to using (and developing with) the technologies involved here — as usual, you can expect to be able to find these technologies available on GitHub.
Although the panel acknowledged that although there has always been a lot of opportunity with Apache Hadoop, there has also been a lot of fragmentation — so this event is a good opportunity for users to be able to find out which elements of the technology work together well.
Let’s get to the core
Customers need to figure out which distribution of Hadoop they need to go and we need to make it easier to make that decision easier — this way, those people involved with the project can focus on putting efforts into the Hadoop core and making the technology itself better.
At the point you might be wondering what ODPi is…?
This is a Linux Foundation Collaborative Project, which are independently funded software projects that harness the power of collaborative development.
“ODPi is a useful downstream project for the community to work on a common reference platform and set of technologies around Hadoop,” said Jim Zemlin, executive director at The Linux Foundation. “We’ve seen this model work with open source technologies experiencing rapid growth and know it can increase adoption and open up opportunities for innovation on top of an already strong Hadoop community.”
The panel confirmed that ODPi is in no way supposed to be a replacement for Apache and that the project itself is fully connected to the wider upstream projects here.
NOTE: ODPi uses an open governance model that is led by a community of developers who will form a Technical Steering Committee (TSC) based on expertise and value of contribution.
When asked what the projects goals are in the coming months, the panel confirmed that they are looking to finesse the finer details in the current release of the software.
This is clearly an important summit for the Apache team and a good opportunity for them to get together and agree on priorities (and key facilitating and steering technologies) as we now go forward — thinking about COMPATIBILITY will be a key part of this process i.e. the team will be looking to make the maximum amount of other software work on Apache Hadoop without unnecessary alterations.
In the future… speakers and audience members alike said they would like to see RFP (request for proposals) start listing a question point to assess whether particular technologies are ODPi compliant — it’s a good goal to aim for.
In terms of barriers to entry for Hadoop in the enterprise, the panel recognised that some obstacles will exist in some firms — but what matters is that we make technical resources available (with technical compliancy) to the people that want to use these technologies today without trying to stipulate draconian usage rules upon these development teams.
Governance will always be strong
Other elements of the discussion here focused on the importance of open governance — there’s a huge difference between a project that has been developed under the auspices of Apache as opposed to one that has just been “dumped” on GitHub.
Whether that’s a cruel term (dump) or not, the point is well made… The Apache Foundation continues to uphold its principles and for that we must applaud it.
MapR Technologies is a firm that provides a distribution of Apache Hadoop that integrates storage and database functions.
The firm has announced the addition of native JSON support to MapR-DB, its NoSQL database.
This, the firm claims, represents the first in-Hadoop document database for developers to create applications that use continuous analytics on real-time data.
“With these major enhancements, developers benefit from the advantages of a document database combined with the scale, reliability and integrated analytics of enterprise-grade Hadoop and Spark,” said the firm, in a press statement.
The MapR Distribution including Hadoop is architected to serve as a single platform for running analytics and operational applications.
MapR-DB enables continuous analytics on real-time data, while reducing cluster sprawl, eliminating data silos and lowering the TCO of data management.
The native JSON support in MapR-DB is said to let developers quickly stand up more business applications on more data types and sources.
MapR-DB supports the Open JSON Application Interface (OJAITM), which is designed to be a general purpose JSON access layer across databases, file systems and message stream — and this produces a more flexible and unified interface to work with big data.
“MapR continues to build on the innovative data platform at the core of its Hadoop distribution,” said Nik Rouda, senior analyst, Enterprise Strategy Group. “The addition of a document database capability (JSON) neatly extends the powerful NoSQL MapR-DB to seamlessly cover more types of unstructured business data. This makes it faster and easier to build big data applications, without the burden of shuffling data around first.”
A developer preview package of MapR-DB with native JSON support is available immediately.