Storage Soup

Oct 26 2011   8:45PM GMT

Violin tunes up for Big Data analytics

Dave Raffo Dave Raffo Profile: Dave Raffo

Violin Memory CTO of software Jonathan Goldick sees solid state playing a key role in storage for Big Data, and he’s not talking about scale-out NAS for large data stores.

Goldick says solid-state drives (SSDs) can help run analytics for Hadoop and NoSQL databases better in storage racks than in shared-nothing server configurations.

“We’re focused on the analytics end of Big Data – getting Hadoop and NoSQL into reliable infrastructures while getting them to scale out horizontally,” he said. “Scale-out NAS is a different part of the market.”

Today, Violin said its 3000 Series flash Memory Arrays have been certified to work with IBM’s SAN Volume Controller (SVC) storage virtualization arrays. Goldick pointed to this combination as one way that Violin technology can help optimize Big Data analytics. The vendors say SVC’s FlashCopy, Easy Tier, live migration and replication data management capabilities work with Violin arrays.

Goldick said running Violin’s SSDs with storage systems speeds the Hadoop “shuffle phase” and provides more IOPS without having to add spindles. SVC brings the management features that Violin’s array lacks.

“Hadoop is well-optimized for SATA drives, but there’s always a phase when it’s doing random I/O called the ‘shuffle phase,’ and you’re stalled waiting for disks to catch up,” said Goldick, who came to Violin from LSI to set the startup’s data management strategy. “We’re looking at a hybrid storage model for Big Data. You’ve heard of top-of-the-rack switches, we look at Violin as the middle-of-the-rack array. It gives you fault tolerance and the high performance you need to make Big Data applications run at real-time speeds.”

He said Hadoop holds data in transient data stores and persistent data stores. It’s the persistent data – which is becoming more prevalent in Hadoop architectures – where flash can help. “So you think of Hadoop not just as analytics but as a storage platform,” he said. “That’s where IBM SVC bridges a gap for us. When data is transient you don’t need data management services as much. When you start keeping the data there, it becomes a persistent data store of petabytes of information. You need data management features that enterprise users have come to expect – things like snapshotting, metro-clustering, fault tolerance over distance.”

Violin’s 3000 series is also certified on EMC’s Vplex federated storage system. EMC is talking about Big Data more than any other storage vendor, with its Isilon clustered NAS as well as its Greenplum analytics systems. EMC president Pat Gelsinger last week said Big Data technologies will be the focus of EMC’s acquisitions over the coming months.

If Goldick is correct, we’ll be hearing a lot more about Big Data analytics in storage.

“Last year Big Data was about getting it to work,” he said. “This year it’s about optimizing performance for a rack. People don’t want to run thousands of servers if they can get the efficiency from a rack.”

There are other ways of using SSDs to speed analytics – inside arrays, or as PCIe cards in storage systems or servers. Violin’s Big Data success will be determined by its performance against a crowded field of competitors.

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: