VMware will offer a commercial version of Project Serengeti, its open source initiative to run Hadoop workloads in virtual infrastructures.
The new Big Data Extensions plug into vSphere and allow administrators to deploy, monitor and manage Hadoop clusters on VMs directly from vCenter. The extensions are also designed to improve the performance of Hadoop, the popular open source big data analytics platform.
“We’re making Hadoop a first-class citizen on vSphere,” said Fausto Ibarra, a senior director of product management at VMware. “It’ll be just like any other workload.”
Hadoop and other big data platforms typically require dedicated hardware, which can be cost-prohibitive for smaller organizations and also raises concerns around reliability. VMware released Project Serengeti last year to address these problems, and the Big Data Extensions further that cause by adding full enterprise support.
In preparation for today’s public beta release of the Big Data Extensions, VMware earlier contributed code to the Hadoop community that optimizes Hadoop’s placement of data when running on virtual infrastructure, Ibarra said. The vendor also worked with the makers of the leading Hadoop distributions to share virtualization best practices.
The Big Data Extensions support the following Hadoop distributions:
- Apache Hadoop 1.2
- Cloudera 3 Update 6
- Cloudera 4.2
- Hortonworks Data Platform 1.3
- Mapr 2.1.3
- Pivotal HD 1.0
The Big Data Extensions will be generally available by the end of the year. VMware also announced that Pivotal HD, its parent company EMC’s Hadoop distribution, has received VMware Ready certification.