The “Google-like” software storage system that Berlin-based Quobyte introduced last year is getting an update.
With its new Quobyte 1.3 release, the German startup added space-saving erasure coding to protect file data, boosted the performance of block storage, enhanced the product’s management capabilities, and extended support from Linux to Windows.
The Quobyte software runs on commodity server hardware, uses a highly scalable POSIX-compliant parallel file system, and supports file, block and object storage. CTO and co-founder Felix Hupfeld compared the system to technology in use at Google, where he and co-founder Björn Kolbeck once worked as engineers in storage infrastructure.
Hupfeld said “Google-like storage” works with all workloads and cluster sizes and runs on any infrastructure, with only a few people needed for maintenance because the system is highly automated and fault tolerant.
Quobyte’s newly added erasure coding allows applications to directly write erasure-coded files, making the system useful for archival and primary storage, according to Hupfeld.
“We make erasure coding a primary storage access method,” he said. “We’re not recoding data in the sense that we write everything replicated and then recode.”
Hupfeld said modern CPUs are fast enough to render the resource impact of erasure coding irrelevant. He said there’s also no significant performance impact with file-based sequential workloads, such as media assets and engineering and scientific data.
“Where erasure coding is very efficient is when you write a file from beginning to end and don’t do in-place updates, like a virtual machine,” he said.
Hupfeld said performance becomes an issue with erasure coding for random I/O with block storage. He wrote in a blog post, “For a random write, the coding engine needs to read all data of the coding group first, recompute the coding parts, and then write out the modified original data along with the coding data.”
“For virtual machines, it would be a complete disaster if you used erasure coding because you’re recomputing data all the time,” he said in an interview.
Hupfeld advises against the use of erasure coding for virtual machines (VMs) and databases. He said customers should use replication with those block-based workloads.
Quobyte deployments range in capacity from less than 50 TB to petabytes. Customers include a German cloud service provider, an online video recording company, a container service provider and a U.S.-based university, according to Hupfeld. He estimated the average capacity at 200 TB and said users tend to look for alternatives to NetApp and Isilon at about 100 TB.
The Quobyte system makes three copies of data for full fault tolerance, so a customer with 100 TB of data would need 300 TB of storage. Using erasure coding with file data, the storage requirement could drop to 140 TB or 150 TB, depending on the encoding the customer chooses, Hupfeld said.
Quobyte’s “standard 8 + 3” erasure coding – or eight “data parts” and three “redundancy parts” – would enable the system to tolerate a failure of three storage drives.
“The good thing about erasure coding is it’s not just more efficient, it’s also more fault tolerant,” Hupfeld said. “You can lose more hard drives without losing data. And that just make it even a better candidate for archival data.”
Another file-centric enhancement with the 1.3 release is fully parallelized metadata operations. Quobyte rewrote one of the core parts of the database system to take advantage of modern multicore CPUs, Hupfeld said.
Quobyte also extended the product’s management capabilities with support for cross-interface access control lists (ACLs), integrated multi-tenancy, and hierarchical quota support for organizations with large-scale systems.
For block storage, Quobyte optimized the entire I/O path to improve performance and reduce latency to sub-milliseconds when system runs “on good hardware,” Hupfeld said.
The Quobyte software was in limited availability last year and became generally available in January. Hupfeld said the major focus in future product releases will be even more performance improvements.
“If you have more performance, the less hardware you need, the less power you need, and so on,” Hupfeld said. “Performance is very important.”
Since the product’s launch, Quobyte has added support for major container platforms, including Docker and Mesos. Hupfeld said a Quobyte volume driver would be available with the Kubernetes 1.4 release.
“What people are sometimes doing is attaching block storage devices to containers, but then this always gives these very tight couplings between containers and the data,” he said. “With Quobyte as a file system, you can give applications access to specific data like you used to in non-container environments.”
Nutanix, which has OEM deals with server vendors Dell and Lenovo, is now selling its software on Cisco UCS servers through channel partners.
Nutanix today revealed it has independently validated Cisco UCS C-Series servers to run Nutanix hyper-converged software. Nutanix has forged a meet-in-the-channel agreement with Cisco resellers to sell Nutanix Prism and Acropolis software on Cisco UCS C220 and C240 rack-mount servers.
Cisco has its own HyperFlex hyper-converged system as well as partnerships with VMware, Simplivity and StorMagic that allow their hyper-converged software to be sold with UCS servers. Cisco is not actively involved in the Nutanix arrangement, which is between Nutanix and Cisco channel partners.
“This is strictly a Nutanix initiative that will benefit Cisco UCS customers,” said Greg Smith, Nutnanix director of technical marketing. “The testing of our software was a Nutanix-driven initiative with support from several large Cisco partners who have deep expertise with UCS. We have worked with Cisco in the past and we currently work with them to make sure our joint deployments fully support Cisco networking.”
Nutanix will not sell its software directly to UCS customers. All deals will go through channel partners who will do all the integration work. Nutanix supports Cisco’s Application Centric Infrastructure (ACI) architecture for deploying applications.
“We know there is demand to use UCS for hyper-converged services, and early efforts to use UCS for hyper-convergence has driven that demand to Nutanix,” Smith said.
Nutanix named Sirius Computer Solutions, HCLTechnologies, and SVA among the partners who will sell its software with UCS servers.
Dell has sold its XC Series based on Nutanix software since 2014. The future of that relationship had been questioned after Dell said it would acquire EMC, which sells its own hyper-converged appliances and owns hyper-converged software vendor VMware. But Dell and Nutanix in June announced a multi-year extension of their OEM deal.
Lenovo this year began selling its Converged HX Series appliances running Nutanix Prism and Acropolis.
Nutanix now makes its software available on three of the four major server platforms. The missing vendor is Hewlett Packard Enterprise, which sells hyper-converged products based on its own software.
“We have been on a journey to evolve our product from a single point product to a platform,” Smith said. “We want our software to be able to run on a variety of hardware configurations anywhere in the data center.”
Converged data protection startup Rubrik turned to the cloud with a software release that allows customers to use Amazon AWS and Microsoft Azure as well as Rubrik’s appliances to store data.
Rubrik launched in April, 2015, with 2u appliances integrated with software that performs backup, deduplication, compression and version management. The new software release, Firefly, supports physical workload such as Microsoft SQL and Linux, and is available in a software-only version for remote and branch offices and public clouds.
Rubrik also closed a $61 million Series C funding round, bringing its total funding to $112 million
Firefly’s capabilities include search and analytics, archiving and copy data management.
Cohesity, which launched around the same time as Rubrik with a similar product, added cloud support last April. Rubrik CEO Bipul Sinha said Rubrik had its eye on the public cloud from the start. It referred to its appliance as a “cloud time machine,” from the start but the original version only had limited support for AWS and none for Azure.
“We started Rubrik with a focus on backup and recovery for VMware,” Sinha said. “But from day one we had a vision for cloud data management – backup, DR, orchestration, compliance, governance and more applications in the cloud.”
Firefly will be available as software only for remote offices. “Selling a full appliance for five to 15 VMs is not cost effective for customers,” Sinha said. “We are selling software-only for them, and they can replicate back into the data center to a Rubrik cluster, or to Amazon or Azure.”
Firefly provides a globally indexed namespace for data and uses zero-data cloning for instant access to data. If on-premise data is lost, customers can bring data back from the public cloud. Rubrik includes a single policy engine for automated orchestration, data permissions for data in the cloud and compliance reporting.
Khosla Ventures led the funding round, with previous investors Lightspeed Venture Partners and Greylock Partners participating. Sinha said the funding will be invested in sales and marketing and to support early customers.
“Our business is growing rapidly, he said, explaining how Rubrik grabbed a large funding score at a time when funding is hard to come by.
SANTA CLARA, California — Solid-state drives have been much faster than hard disk drives from the start, and now they’re dwarfing HDDs in capacity too.
At Flash Memory Summit this week, Seagate demonstrated a 60 TB 3.-5 inch SAS drive and Samsung said it would have a 32 TB 2.5-inch SAS drive out in 2017 and a 100-plus TB SSDs by 2020.
The largest capacity enterprise drive out now is Samsung’s 16 TB drive, which recently began showing up in all-flash arrays from NetApp and Hewlett-Packard Enterprise 3PAR arrays.
Samsung’s large drives are based its 512-Gb V-NAND chip. The vendor stacks 512 V-NAND chips in 16 layers to create a TB package, and 32 of those TB packages are combined into the 32 TB SSDs. Samsung points out its 32 TB will enable greater density than Seagate’s 60 TB SSD because 24 2.5-inch drives can fit into the same space as 12 3.5-inch SSDs.
Seagate senior director of product management Kent Smith said he expects the 60 TB drive to be available within a year. He said the drive will enable active-active archives. “Take a social media site with a lot of photos that people need to access quickly,” he said. “People hate waiting. This is for when you need lots of capacity but you need it to respond quickly.”
SSDs are already making 15,000 RPM HDDs scarce and relegating 10,000 RPM drives to servers. With the larger drives, SSDs can also move into traditional capacity workloads.
“Flash for bulk data becomes attractive in places where data center space is limited,” said DeepStorage consultant Howard Marks.
HDD giant Seagate is trying to show it is serious about SSDs. Its main spinning disk rival Western Digital has invested heavily in flash, including its $17 billion acquisition of SanDisk completed earlier this year. Seagate has been more active on server-side flash — it also launched new Nytro NVMe cards at FMS – but has been slow to embrace enterprise SSDs.
“It’s a surprise to me that Seagate hasn’t taken its dominance in hard drives and moved that to SSDs,” Objective Analysis analyst Jim Handy said during flash market update at FMS.
Samsung also had more products to talk about than big SSDs. The vendor said it expects to release a ultra-low latency Z-SSD and launch a 1 TB ball grid array (BGA) in-2017. Ultra thin BGAs are for notebooks and tablets, but the Z-SSD will be used for enterprise systems running applications such as real-time analysis. Samsung senior SSD product manager Ryan Smith said the first Z-SSD product will be 1TB with larger capacities planned.
One area Samsung is in no rush to be first in is quad-level cell (QLC) SSDs that store 4 bits per NAND cell. While other vendors said they would have QLC in 20017 or 2008, Samsung’s Smith said he sees no reason to hurry past triple-level cell (TLC) flash.
“We feel strongly that TLC is the right strategy,” he said. “What do you gain from QLC? We decided what we’re currently offering is the best choice.”
Cloudian and Amazon Web Services are now offering a new service that allows customers to use Cloudian HyperStore Hybrid Storage offering that stores data locally but leverages the S3 object storage.
AWS cloud storage will manage the usage and billing for the customers.
It targets applications and data that customers want to keep on-premises and operate in a hybrid cloud mode, said Paul Turner, Cloudian’s chief marketing officer. That kind of data is stored behind the organization’s firewall using the S3-compatible HyperStore software.
“What is different here is you can procure it from the Amazon marketplace. What we have done is implemented a service where you can go (to the AWS cloud storage) market place and sign up for the S3 service and do it locally,” Turner said.
“It’s in the customer data center and as the storage is consumed, you pay as you go and all the billing is done through Amazon S3,” he said. “It’s an OPEX spend which is unusual because up until now customer data center solutions are a CAPEX spend.”
The service is a hybrid cloud storage offering so customers can also use the HyperStore to tier data into the public cloud, either in S3 or Amazon Glacier. The HyperStore is an S3-compatible object storage product. The AWS cloud storage and HyperSore service currently is available in regions across the United States and EMEA.
“As we go forward we will roll it out in other regions,” Turner said.
The cost is three cents per gigabyte, based on the average usage.
“Customers have been asking for this and one thing Amazon does really well is respond to customers,” Turner said. “They will build what is needed.”
Hubstor is fine-tuning the Microsoft Azure-based cloud archive platform it launched in July. The Ontario, Canada, startup introduced CoolSearch, which it bills as searchable Microsoft public cloud-integrated deep storage for enterprises that must retain inactive data indefinitely.
Hubstor’s standard self-service active archive lets users access and share archived data stored in Microsoft public cloud storage. Hubstor’s role-based access controls are integrated with Microsoft Active Directory for user authentication.
Rather than knowledge workers generally, CoolSearch is aimed at privileged user groups that control access permissions. The idea is to enable corporate legal or security teams to quickly spin up high-volume, low-cost searches of unstructured data related to compliance, defensible data deletion or e-discovery.
The CoolSearch data-aware archive is an isolated tenant that resides in Hubstor’s Azure cloud or in a customer’s Microsoft public cloud account. CEO Geoff Bourgeois touts CoolSearch as an alternative to legacy approaches to searching discoverable storage.
“We’re responding to demand from organizations that don’t care about end user access. They just need searchable, fully managed cool storage for investigations, compliance, and litigation activity,” Bourgeois said.
After a query is run, CoolSearch deploys the results in Microsoft Blob Storage, which is Microsoft’s public cloud storage for infrequently accessed data. Hubstor scales down a CoolSearch search cluster once indexing is finished. As with its dedicated cloud archive service, Hubstor CoolSearch is available as a monthly subscription, with pricing based on consumption of Microsoft public cloud resources.
Hubstor provided a pricing chart based on a 100 TB CoolSearch cluster with triple redundancy, 25 TB of content indexing and 3% egress. Depending on the search cluster and its activity level, the vendor claims searched indexing costs range from 5 cents and 9 cents per GB. The Microsoft public cloud CoolSearch tenant can be switched to an inactive state to reduce costs when it’s not in use.
The CoolSearch managed service includes automatic data mapping to orphan users. PST splitting and optional deep processing aids discovery of stored Microsoft Outlook PST files. Policy-based index scoping controls which data gets ingested in a full context indexed search.
CoolSearch discovery searches accept keywords, wildcards, proximity, Boolean, boosting, grouping, fuzziness, and regular expressions. Searches restriction include location, tags, active or orphan users, groups or data owners. Options include full-content search or configured metadata fields. Full-text searches use hit highlighting, paging, sorting and relevancy to rank results. CoolSearch also allows customized metadata searches.
NVMe and PCIe solid-state drives (SSDs) may be a hot topic at this week’s Flash Memory Summit, but the SCSI Trade Association is trying to remind everyone that new serial-attached SCSI (SAS) technology is on the way.
Rick Kutcipal, president of the SCSI Trade Association and product planner at Broadcom, said he expects the upcoming “24 Gigabits per second” (Gbps) SAS device-connect technology – which actually has a maximum bandwidth of 19.2 Gbps – to see its first use with SSDs.
“The biggest advantages will be in solid-state memory,” Kutcipal said.
He said the SCSI Trade Association hopes to hold its first plugfest for so-called “24 Gbps” SAS in mid-2017. He expects host bus adapters (HBAs), RAID cards, and expanders to support the new SAS technology in 2018, with server OEM products to follow in 2019.
Kutcipal claimed the 19.2 Gbps bandwidth would have a 21.5% per-lane performance advantage over non-volatile memory express (NVMe) running on top of PCI Express (PCIe) 4.0. The maximum bandwidth for single-lane PCIe 4.0 is 15.8 Gbps, he said.
SAS typically uses one lane to the drive, and enterprise NVMe SSDs typically use four-lane PCIe, Kutcipal acknowledged. Four-lane PCIe would obviously be faster than single-lane SAS.
But Kutcipal said, “The lanes are not free. [They’re] actually very expensive, so the comparison has to be per lane. SAS can go x2 or x4 [lanes] to the drive. ”
SAS uses the small computer system interface (SCSI) command set to transfer data between a host and a target storage device. SCSI was developed 30 years ago when hard disk drives (HDDs) and tape were the primary enterprise storage media. Manufacturers have continued to use serial-attached SCSI as a drive-connect with faster SSDs.
The SCSI Trade Association’s efforts to promote a new SCSI Express (SCSIe) interface to run SCSI commands over PCIe have largely fallen flat in comparison to the momentum behind NVMe with PCIe-based SSDs.
The NVM Express industry consortium developed NVMe as a lower-latency alternative to SCSI. NVMe streamlines the register interface and command set for use with faster PCIe-based SSDs and post-flash technologies, such as Intel-Micron’s 3D XPoint.
“SAS is inherently scalable, and NVMe is not,” Kutcipal said. “NVMe will scale to tens of devices, and it’s pretty arduous scaling, while SAS can go to thousands of devices. And there are arrays out there today that are thousands of devices.”
Kutcipal said NVMe cannot solve PCIe’s scaling challenges.
“The limitation in the scalability of NVMe as a device connect is really inherent in PCIe, not in NVMe,”he said. “That’s a big fundamental limitation of NVMe. It relies on PCI Express as its transport in the device connect world.”
SAS can serve as a device/drive connect as well as a storage networking technology. But Kutcipal said the dominant role for SAS is connecting a host bus adapter (HBA) or RAID card to an SSD or hard disk drive (HDD). SAS has distance limitations for storage networking, limiting its use to SANs inside the data center, he said.
The upcoming SAS specification has two parts: the SAS-4 physical layer and the SAS Protocol Layer (SPL)-4. The SPL-4 specification is expected to be complete and ready for use later this year, according to Kutcipal. He said SAS-4 would lag SPL-4 by a quarter.
In addition to the speed bump, new features on the way with next-generation SAS include Forward Error Correction, to ensure data integrity, and continuous adaptation, to enable the SAS transmitter to operate optimally, even if the temperature or operating voltage changes, Kutcipal said.
Pivot3 more than doubled its revenue in the first half of 2016 over 2015, which its CEO attributes to customers buying its hyper-converged appliances as a platform rather than for single applications.
Pivot3 CEO Ron Nash said Pivot3’s revenue increased by 103% over the past six month as it added more than 400 customers. That includes customers Pivot3 added through technology it acquired when it merged with flash storage vendor NexGen Storage in January. But Nash said revenue from NexGen made up less than 10% of Pivot3’s revenue in the quarter. The bulk of the growth came from customers expanding their hyper-converged workload. Nash said until the last six months or so, almost every Pivot3 system was used for a single application. But customers are now adding other apps to their hyper-converged appliance and new customers are buying hyper-converged for more than one app from the start.
“Once customers start using it, they say ‘This platform stays up, it’s easy to operate and has a small footprint,’ and then they start loading more applications on it,” Nash said. “That’s the big change we’re seeing. Enough people have tried hyper-converged for a single app, and are now starting to buy it as a platform.”
He said 28% of Pivot3’s new sales in the first half of 2016 were for multiple applications from the start. The average spend of customers with multiple use cases is more than 500 % higher than customers with a single data center application use case. He pointed to a customer in the public transit industry with 6PB of data on 250 nodes.
The most common applications Pivot3 customers run are virtual desktops, backup, video surveillance and databases. Nash said the integration of NexGen’s quality of service with Pivot3’s hyper-converged appliances should prove particularly useful for multiple applications.
Despite the spike in sales, Nash said Pivot3 still rarely competes head-to-head with other hyper-converged products. He said three-quarters of Pivot3’s deals are against traditional server and storage products. The two best known hyper-converged products – Nutanix’s NX appliances and VMware Virtual SAN (VSAN) software — don’t show up in many competitive deals but do have an impact on Pivot3 by creating market awareness.
“Nutanix is out there spending tons of money educating market on hyper-converged infrastructure, which is fantastic for us,” Nash said. “I hope they keep advertising.”
As for VMware, Nash said he suspects it has a lot more VSAN customers than actual sales. “VMware doesn’t quote revenue, they quote customer numbers,” he said. “That’s what you say when you’re giving it away.”
Pivot3 also added Bill Stover as chief financial officer. Stover spent 18 years at Micron Technology, serving as vice president of finance and CFO of the public company. Nash said Stover’s background with a public company will help Pivot3 — still a private firm – grow into a more mature company.
The Pokémon Go craze – mainly its augmented reality capability and server crashes – contains lessons for storage administrators.
Pokémon Go demonstrates how next-generation applications can drive cloud adoption as well as the pitfalls of handling that rapid adoption, according to Varun Chhabra, director of product marketing for EMC’s Advanced Software Division.
“A lot of the applications we use today already use geo-location,” Chhabra said. “What is interesting about Pokémon Go is the scale of usage when combined with geo-location tracking and data. That makes it especially challenging. Tens of millions of people are playing it, and the numbers are still going up.
Chhabra said while Pokémon Go developer Niantic has not disclosed its back end or storage infrastructure for the game that is attracting millions of users, it has clearly mastered the use of location-based applications. At the same time, it has been plagued by server crashes – delaying the launch of the game in Japan – and security issues that suggest it is growing too fast for its own infrastructure to keep up.
“When we talk about cloud-native apps, the assumption is, everything will work out OK if you have the infrastructure,” he said. “But you still need to manage data, manage the scale of users and figure out where the bottlenecks are.
There is speculation that Niantic is using NoSQL or PostgreSQL as its back-end database and Google Apps for its Platform-as-a-Service (PaaS) layer. But it has suffered server crashes that cannot be traced to any public cloud problems.
“It seems like they’re using the public cloud today, but even then they’ve had a fair share of outages even when there have been no outages in the public cloud,” Chhabra said. “So you can still have challenges with the public cloud. It’s how you write the application, and how you’re handling access for an avalanche of data.”
Chhabra said commercial enterprise application developers can copy Pokémon Go’s success. For instance, retail stores can create apps to show shoppers in a store where a specific item is located. Or real estate agencies can develop an app with pop-ups showing which houses are for sale, where they are located, and their specs. These applications would tap into data that already exists.
“It should be easy to do, now that people are more comfortable holding up their screens without being embarrassed,” Chhabra said. “It’s more about creating an immersive user experience.”
He pointed to existing storage technologies such as object storage and data lakes that use analytics as tools that can be used in creating these immersive applications. But the development process is different than IT organizations are used to.
“You can’t throw the same approach at building an application for a geo-location mobile app than you do for traditional apps,” Chhabrasaid. “A lot of customers we talk to are talking about building apps from the ground up and learning how to use microservices.
“What is your storage platform doing for you natively to relieve the burden on developers? We’ve seen way too many examples of applications that don’t scale, and they crash the servers. Most businesses don’t expect to scale apps this fast, but they still have to test. Pokémon gets a pass, but most businesses don’t.”
EMC has contributed an open source Apache Mesos container volume driver that supports any network-attached block storage system equipped with a Docker plugin, including storage of EMC competitors.
The EMC container plugin integration for Docker is a joint project of Apache Foundation and EMC code, part of EMC Emerging Technologies Division. It builds on previous EMC container initiatives. The Docker Volume Driver Isolator module exposes native Docker functionality through a command line interface. It is part of the Apache Mesos distribution released in July.
“We’re making it possible for the community to do multi-tiered persistent storage within Docker, which up to now has been a struggle,” said Josh Bernstein, a vice president at EMC code.
Mesos orchestrates deployment of containers on premises or in cloud storage. The Apache Mesos cluster manager presents abstracted data center compute, memory and storage in an aggregated resource pool. Mesos resides in the kernel to isolate resources as applications are shared across a distributed framework.
Mesos lets users create a persistent volume to run a specific task from reserved disk. The volume persists on a node independently of the task’s sandbox and is returned to the orchestration framework when the task is complete. If necessary, new or related tasks launch a container that consumes resources from the previous task. Docker recommends Apache Mesos as an orchestration layer to implement large clusters of storage containers.
EMC’s container module communicates directly with Docker volume plugins, allowing developers to request a persistent volume from any block storage running under Mesos. Mesos then passes the file request to EMC, which searches available storage to identify the volume and deliver it to the destined container host.
“Before this feature, while users could use persistent volumes for running stateful services, there were some limitations. First, the users were not able to easily use non-local storage volumes. Second, data migrations for local persistent volumes had to be manually handled by operators. The newly added Docker volume isolator addresses these limitations,” according to an Apache Software blog posted July 27.
Enterprise adoption of Docker is picking up, although several hurdles remain before containers are as ubiquitous as that of virtual machines. The Apache Mesos integration foreshadows the open source EMC container EMC libStorage project. LibStorage is extensible abstraction and provisioning presented as common package for every heterogeneous storage and container runtime.