Posted by: Dave Raffo
big data, hadoop, ibm svc, SSD, violin memory
Violin Memory CTO of software Jonathan Goldick sees solid state playing a key role in storage for Big Data, and he’s not talking about scale-out NAS for large data stores.
“We’re focused on the analytics end of Big Data – getting Hadoop and NoSQL into reliable infrastructures while getting them to scale out horizontally,” he said. “Scale-out NAS is a different part of the market.”
Today, Violin said its 3000 Series flash Memory Arrays have been certified to work with IBM’s SAN Volume Controller (SVC) storage virtualization arrays. Goldick pointed to this combination as one way that Violin technology can help optimize Big Data analytics. The vendors say SVC’s FlashCopy, Easy Tier, live migration and replication data management capabilities work with Violin arrays.
Goldick said running Violin’s SSDs with storage systems speeds the Hadoop “shuffle phase” and provides more IOPS without having to add spindles. SVC brings the management features that Violin’s array lacks.
“Hadoop is well-optimized for SATA drives, but there’s always a phase when it’s doing random I/O called the ‘shuffle phase,’ and you’re stalled waiting for disks to catch up,” said Goldick, who came to Violin from LSI to set the startup’s data management strategy. “We’re looking at a hybrid storage model for Big Data. You’ve heard of top-of-the-rack switches, we look at Violin as the middle-of-the-rack array. It gives you fault tolerance and the high performance you need to make Big Data applications run at real-time speeds.”
He said Hadoop holds data in transient data stores and persistent data stores. It’s the persistent data – which is becoming more prevalent in Hadoop architectures – where flash can help. “So you think of Hadoop not just as analytics but as a storage platform,” he said. “That’s where IBM SVC bridges a gap for us. When data is transient you don’t need data management services as much. When you start keeping the data there, it becomes a persistent data store of petabytes of information. You need data management features that enterprise users have come to expect – things like snapshotting, metro-clustering, fault tolerance over distance.”
Violin’s 3000 series is also certified on EMC’s Vplex federated storage system. EMC is talking about Big Data more than any other storage vendor, with its Isilon clustered NAS as well as its Greenplum analytics systems. EMC president Pat Gelsinger last week said Big Data technologies will be the focus of EMC’s acquisitions over the coming months.
If Goldick is correct, we’ll be hearing a lot more about Big Data analytics in storage.
“Last year Big Data was about getting it to work,” he said. “This year it’s about optimizing performance for a rack. People don’t want to run thousands of servers if they can get the efficiency from a rack.”
There are other ways of using SSDs to speed analytics – inside arrays, or as PCIe cards in storage systems or servers. Violin’s Big Data success will be determined by its performance against a crowded field of competitors.