Hybrid cloud has had a boost recently with the emergence of file/object environments that allow customers to operate a single namespace between on-premise and public cloud locations.
One of the pioneers here is Cloudian, which offers its HyperStore object storage-based environment with file-level access via NFS and/or CIFS, in HyperFile. The latter capability was first introduced last December in a partnership with Milan-based Infinity Storage and that has now been cemented by the acquisition by Cloudian of the Italian firm.
But, how exactly can file and object co-exist? After all, file systems bar simultaneous user access via file locking while object storage has no such mechanism.
Talking this week to Michael Tso, CEO, and Caterina Falchi, new on-board VP of file technologies at Cloudian, it was interesting to delve into how the two sides – file and object – relate to each other in Cloudian, and the limits that places on possible workloads.
There’s no doubt that what Cloudian offers is an exciting development that allows customers to operate with file or object access with a single namespace between on-premises locations and public cloud services from Amazon, Google and Microsoft. It’s part of an emerging class of products, that also include those from the likes of Scality, WekaIO, Qumulo and Elastifile.
The fundamentals of Cloudian are that data is kept as objects. “The ultimate version of the truth is object,” said Tso. And S3” is the protocol by which data stored as objects is communicated with.
Now there is file access via NFS and CIFS, but data is converted to object format behind this. File locking exists in NFS and CIFS but once data is in object format it can, in theory, be altered by more than one party at a time.
How will this be handled? Tso and Falchi say global file locking is on the roadmap, but for now, “There’s file locking from the file side,” says Tso. “But it’s not easy from the object side. That’s because we don’t want to change S3 standards that do not contain any locking mechanism. “It’s something we still debating if we need to do,” he added.
“We’ve not had any major issues,” says Tso. “People manage access at the application level. The only time it would be a problem if there was some incidental change in the flow, where you don’t expect someone to come in from a different interface.”
So, like Google drive or Dropbox, if someone has access at the same time then different versions are created.
From that, said Tso, use cases that are beyond the pale are, “Remote and branch office stuff, where people are collaborating, several people working on the same document making multiple edits at the same time.”
But, he said, Cloudian will work for Internet of Things data, file sharing, and media archives, and looks to customers that want to move,, “from tape or Isilon [Dell EMC’s scale-out NAS product]”.
This year’s storage news so far has provided a firm impression of the increasing prominence of the cloud, and in particular of attempts to harness the public cloud and private datacentre in hybrid operations.
Now, recent IDC figures provide some evidence for a strong trends towards the cloud forming an important part of IT operations, as the table below shows.
In 2017, IT infrastructure spending for deployment in cloud environments hit $46.5 billion, a year-on-year growth rate of 20.9%.
Public cloud attracted the bulk of that (65.3%) and grew the fastest, with an annual rate of 26.2%. Meanwhile, spend on traditional, non-cloud IT was expected to have declined by 2.6% in 2017. It still formed the majority (57.2%) of user spending but was down from 62.4% in 2016.
This comes on top of recent news that have been centred on the efforts of vendors to provide a unified storage environment across the hybrid cloud, between on-premises and public cloud operations.
These have included: Cloudian’s upgrade to its Hyperstore object and NAS storage software to allow hybrid operations to multiple cloud providers; Qumulo’s launch in Europe of its hybrid cloud NAS, effectively a parallel file system that mirrors the likes of Dell EMC’s Isilon but between cloud and on-site locations, and; Microsoft’s purchase of Avere, a storage software maker that included hybrid cloud storage functionality.
Such products solve a bunch of problems for hybrid cloud storage. It has long been possible to work between private and public cloud environments, but getting data into and back from the cloud hasn’t always been so easy. And data portability between clouds has long been an issue.
It just wasn’t possible to handle data on a common file system or scheme (so put because object storage doesn’t use a file system, as such) as it is now with the type of products emerging.
These allow seamless operations between on-site and public clouds that mean the latter can be easily used for burst workloads or as a tier behind on-site performance layers.
That seems to me to be a significant landmark and we should expect to see further developments along these lines.
Sure, there will likely be a question mark over the fundamental resilience, availability and latency aspects of the use of cloud. After all, connection loss is only a misplaced JCB shovel away, but the appearance of near-true unified hybrid storage environments is a great step forward.
Microsoft’s acquisition of Avere for an undisclosed sum, announced at the beginning of the year, marks the swallowing of an always-interesting storage player and a significant move for Microsoft and its cloud strategy.
The move is a clear play to boost Microsoft’s hybrid cloud capabilities, and aims to meet the need of businesses for whom the cloud in its pure form still can’t cut it for their workloads, on grounds of availability or performance.
Avere’s products have always had something to do with improving performance across multiple locations
It started in 2008 with its NAS acceleration boxes – the FXT products, dubbed Edge Filers – that boosted access to core NAS clusters. Then Avere added the vFXT virtual versions of these and added cloud capability and tiering, within the cloud (using Amazon’s various classes of storage) and between on-site and cloud locations, including with a single namespace in its C2N hybrid cloud NAS-object storage appliance.
Such capabilities look likely to be added to the Azure stable at Microsoft and would offer customers a rich set of hybrid cloud possibilities, with tiering in the cloud and between on- and off-site locations.
The pull towards hybrid cloud is that increasingly organisations want data portability between on-site and cloud to be able to deal with availability issues as well as being able to burst for performance reasons.
What also stands out is that this is the first time I can recall a company like Microsoft – in the guise of cloud provider – acquiring a storage vendor.
The cloud is surely the future, with compute and storage increasingly provided as a service in the medium- to long-term, despite current concerns over availability, security etc.
Will this acquisition be the first of many in which storage is reconfigured as a hybrid function between datacentre and cloud?
Hitachi Data Systems is no more.
It has been rolled into a new division, Hitachi Vantara. That is, HDS, with its largely enterprise-focussed data storage products has been joined with the Internet of Things-focussed Hitachi Insight and the analytics arm, Pentaho.
The premise for the move is that we on the verge of a world in which data from machines will become increasingly important. So, potentially large amounts and varying types of data will need to be stored. And there is no question that to get the most from that data there will be a pressing need to make some sense of it via analytics.
That’s more or less the explanation of Steve Lewis, CTO of Hitachi Vantara, who said: “The reality for a lot of companies – and the message hasn’t changed – is that they are required to manage data in increasingly efficient ways. There will be more and more machine-to-machine data being generated and the questions will be, how do we store it, how long do we keep it, what intelligence can we gain from it?”
Hitachi Vantara would appear to be in a prime position to profit from an IoT future. It’s a small part of a vast conglomerate built mostly on manufacturing businesses whose products range from electronics to power stations via defence, automotive, medical and construction equipment, but also includes financial services.
That background should provide almost unique opportunities to develop data storage for a world of machine data and intelligence gathering therefrom.
Will there be any impacts on the datacentre and storage in particular?
Lewis said: “Storage will continue on the same trend with the growth of data volumes and the need for different levels of performance.”
“But, for example, where companies used fileshare as a dumping ground and didn’t know what they had, increasingly organisations need to know what data they hold, the value of it and make more use of metadata. ‘Metadata is the new data’, is something we’re hearing more and more.”
Lewis cited the example of the Met Police’s roll out of 20,000 body-worn cameras and the effects – with several GB of data per 30 minutes of video – on their networks (“never designed for video content”), on storage volumes, but also the need to store that data for long periods (100 years in the case of the police), be able to find it, make sense of it and delete it when required.
“So, it’s all less about initial purchase price and more about the cost of retention for its lifetime,” said Lewis.
Clearly, Hitachi Vantara aims to profit from these type of needs and plans to, said Lewis, “Develop its own IoT framework and operating environment.”
It should be in a good position to do this. Time will tell.
Data storage has many fundamentals, but a key one is the idea that what we store should be or form part of a single, reliable copy.
We know this is not practically achieved more widely and that multiple versions of files proliferate across corporate storage systems, via emails, the internet etc.
But, in some use cases it is absolutely essential that there is a single version of the truth, for financial transactions or in areas such as health records.
There are also good economic reasons to want to keep single copies of data. It’s simply cheaper than unnecessarily holding multiple iterations of files.
Enter Blockchain, which provides a self-verifying, tamper-proof chain (sharded, encypted and distributed) of data that can be viewed and shared – openly or via permissions – and so provides a single version of the truth that is distributed between all users.
It has, therefore, key qualities sought in providing storage. The Blockchain itself is in fact stored data, though practically it’s not a great idea to chain together more than small blocks because the whole chain would become too unwieldy.
So, it’s not storage as we know it. However, but some startups have started to apply Blockchain to storage.
These services allow “farmers” to offer spare hard drive capacity in return for cryptocurrency. The actual data is sharded and encrypted in redundant copies and protected by public/private key cryptography.
The Blockchain stores information such as shard location, that the farmer still has that shard and it is unmodified.
Storj and Sia offer storage at something like one tenth the cost of Amazon S3 because there are no datacentres to maintain.
Meanwhile, Blockchain has been brought into use to manage the storage of health records.
Like Storj and Sia, the data isn’t actually held on the chain, but is referenced and protected from it. Already, Estonia’s entire health records repository is kept this way.
There are other limited or private use cases too, such as backup product maker Acronis, which uses Blockchain to verify data held in its Acronis Storage cloud.
All this points in the direction of a potentially useful secondary/nearline storage use cases based on Blockchain. An organisation could make use of unused storage capacity around the datacentre in the manner of Storj or Sia and so achieve much better utilisation.
There may be products out there already that do this, and I’m sure their representatives will let me know, if they exist.
Meanwhile, there are much grander future scenarios based on Blockchain in development such as BigChainDB’s Inter-Planetary Database, that aims to be the database for the “emerging decentralized world computer”.
Somewhere down the line the Storj/Sia model could be universally applied to public cloud storage, but for now – given concerns over bandwidth and security in the public cloud – distributed private cloud based on Blockchain management would be a reasonable target.
There was a time not too long ago when backup software pretty much only handled one scenario, ie backing up physical servers.
Then came virtualisation. And for quite some time the long-standing backup software providers – Symantec, EMC, IBM, Commvault et al – did not support it, while newcomers like Veeam arose to specialise in protecting virtual machines and gave the incumbents a shove.
Then we had the rise of the cloud, and initially in backup products this was an option as a target.
But as the cloud provided a potential off-site repository in which to protect data it also became the site for business applications as a service.
That meant the cloud became a backup source.
There is some data protection capability in the likes of Office 365 but this doesn’t always fulfil an organisation’s needs.
There’s a the risk of losing access to data via a network outage, and there are compliance needs that might require, for example, an e-discovery process. Or there’s simply the need to make sure data is kept it in several locations.
So, companies like Veeam now allow a variety of backup options for software services like Office 365.
You can, for example, use Veeam to bring data back from the cloud source to the on-prem datacentre as a target. That way you can run processes such as e-discovery that would be difficult or impossible in the cloud application provider’s environment.
Or you can backup from cloud source to cloud target. This could be to a service provider’s cloud services, or to a repository built by the customer in Azure, AWS etc. Either option might enable advanced search and cataloguing to be made easier, or might simply provide a secondary location.
With the possibility of backup of physical and virtual machines in the datacentre and the cloud and then spin-up to recover from any of these locations, full interoperability between environments is on the horizon.
For now the limits are beyond those of the backup product, assuming it has full physical-to-virtual interoperability, but are those of the specific scenario. A very powerful dedicated physical server running high performance transactional processing for a bank, for example, could likely not be failed over to the cloud.
But nevertheless, the trends in backup indicate a future where the site of compute and storage can slide seamlessly between cloud and on-prem locations.
Veritas dates back to the early 1980s but disappeared for 10 years to become part of Symantec, with its NetBackup and Backup Exec leading the way in the data protection market.
But, in 2014 Veritas was burped out by Symantec into an environment where its leadership in backup could no longer be taken for granted.
The virtualisation revolution had transformed the likes of Veeam into mainstream contenders while newcomers and also-changed rivals such as Rubrik, Cohesity and ArcServe started snapping at heels.
And while the virtualisation transformation had largely done its work, other long waves started to break.
These were: The drive towards big data and analytics, which is also being driven by the upsurge of machine/remote data; a greater need for compliance, driven in particular by regulations such as Europe’s GDPR, and; the emergence of mobile and the cloud as platforms operating in concert/hybrid with in-house/datacentre IT environments.
Such changes appear to have driven Veritas to focus on “broad enterprise data management capabilities”, according to EMEA head of technology, Peter Grimmond.
According to Grimmond, Veritas’s thinking centres on four aims, namely: Data protection, ie backup and archiving; data access and availability, ie ensuring the workload can get to the data; gaining insight from the organisation’s data, and; monetising that data if possible.
Its product set fits with those general aims, with data protection and availability products (NetBackup, Backup Exec, plus hardware backup appliances); software-defined storage products (file and object storage),and; tools to help with information governance (data mapping and e-discovery tools, for example).
Compliance and the kind of data classification tasks that arise from it are strong drivers for Veritas right now.
“We are particularly focussed on unstructured data and how that can pile up around the organisation,” said Grimmond. “And whether that is a risk or of value to the organisation.”
That’s of particular use in, for example, any kind of e-discovery process, and as part of regulatory requirements such as for Europe’s GDPR. This gives the customer the “right to be forgotten” following a transaction, which for organisations can mean it needs to locate personal data and do what is necessary with it.
Veritas has also built in intelligence to its storage products. Its object storage software product – announced recently at its Vision event – for example, incorporates its data classification engine so that data is logged, classified and indexed as it is written.
This functionality has in mind, for example, Internet of Things and point-of-sale scenarios, said Grimmond.
DR in the cloud is available. Options exist that range from simply using cloud backup and recovering from that to customer infrastructure on-prem or in the cloud, to full-featured DraaS offerings.
“We’re looking at how to leverage the public cloud to do rapid recovery,” said Alon Yaffe, product management VP at Barracuda.
“The way cloud disaster recovery exists, the industry is asking customers to go with a certain vendor for everything,” said Yaffe. “But, there will be advantages for customers to make use of the public cloud and do it on their own.”
If they do that though, how does Barracuda benefit? After all, it makes its living providing services and products in this sphere.
According to Yaffe it will be by offering the intelligence to help orchestrate disaster recovery that customers put together using public cloud services. The company will be working on that in the next couple of years, said Yaffe, with the aim of providing orchestration for on- and off-site DR functionality.
That presumably means the type of orchestration that can allocate and provision data and storage – within the bounds of RTOs and RPOs – and make it available following an outage, in a fully access-controlled fashion so that customers can build DIY disaster recovery infrastructure from a mix of public cloud and on-prem equipment.
It’s an adaptation of the idea of cloud orchestration to the sphere of DR and should be a valuable addition to the datacentre.
Backup appliance maker Rubrik plans to add analytics to its products, including in the cloud.
Talking to ComputerWeekly.com this week CEO Bipul Sinha would not give details, but did say the company plans to add analytics, and not restricted to those that report on backup operations but more widely using metadata captured in backup and archive operations.
“To date what Rubrik has done has been to manage data backup, recovery, archiving. Going forward we’re looking at more analytics and reporting, doing more with the content stored,” he said.
Sinha he felt Rubrik had won customer trust with its scale-out appliance offerings and that now the company, “wanted to give more intelligence” and that its analytics would enable customers to “interrogate data to gain useful business information.”
The Rubrik CEO also said: “There’s a definite trend to making one single platform on premises and across the cloud” and said that any analytics functionality offered by the company would span the two.
“Competing legacy companies have not innovated so it’s breaking new ground,” he added.
That’s not strictly true, as Druva claims e-discovery and data trail discovery functionality with its inSynch product.
And backup behemoth Veritas recently added functionality that uses machine learning to ID sensitive and personal data to help with GDPR compliance.
To date though, the extent of analytics functionality in backup products has been limited, and some question to what extent backup and analytics can be merged, so we’ll have to wait and see what Rubrik comes out with.
Rubrik provides flash-equipped backup appliances that can scale out and which support most physical and virtual platforms, including the Nutanix AHV hypervisor.
NVMe offers huge possibilities for flash storage to work at its full potential, at tens or hundreds of times what is possible now.
But, but it’s early days, and there is no universally-accepted architecture to allow the PCIe-based protocol for flash to be used in shared storage.
Several different contenders are shaping up, however. We’ll take a look at them, but first a recap of NVMe, its benefits and current obstacles.
Presently, most flash-equipped storage products rely on methods based on SCSI to connect storage media. SCSI is a protocol designed in the spinning disk era and built for the speeds of HDDs.
NVMe, by contrast, was written for flash, allows vast increases in the number of I/O queues and the depth of those queues and enables flash to operate at orders of magnitude greater performance.
But NVMe currently is also roadblocked as a shared storage medium.
You can use it to its full potential as add-in flash in the server or storage controller, but when you try to make it work as part of a shared storage setup with a controller, you start to bleed I/O performance.
That’s because – consider the I/O path here from drive to host – the functions of the controller are vital to shared storage. At a basic level the controller is responsible for translating protocols and physical addressing, with the associated tasks of configuration and provisioning of capacity, plus the basics of RAID data protection.
On top of this, most enterprise storage products also provide more advanced functionality such as replication, snapshots, encryption and data reduction.
NVMe can operate at lightning speeds when data passes through un-touched. But, put it in shared storage and attempt to add even basic controller functionality and it all slows down.
Some vendors, for example, Pure in its FlashArray//X, have said to hell with that for now and put NVMe into their arrays with no change to the over all I/O path. They gain something like 3x or 4x over existing flash drives.
So, how is it proposed to overcome the NVMe/controller bottleneck?
On the one hand we can wait for CPU performance to catch up with NVMe’s potential speeds, but that could take some time.
On the other hand, some – Zstor, for example – have decided not to chase controller functionality, with multiple NVMe drives offered as DAS, with NVMf through to hosts.
A different approach has been taken by E8 and Datrium, with processing required for basic storage functionality offloaded to application server CPUs.
Apeiron similarly offloads to the host, but to server HBAs and application functionality.
But elsewhere, controller functionality is seen as highly desirable and ways of providing it seem to be focussing distribution of controller function processing between multiple CPUs.
Kaminario’s CTO Tom O’Neill has IDed the key issue as the inability of storage controllers to scale beyond pairs, or even if they can nominally, to actually become pairs of pairs as they scale. For O’Neill the key to unlocking NVMe will come when vendors can offer scale-out clusters of controllers that can bring enough processing power to bear.
Meanwhile, hyper-converged infrastructure (HCI) products have been built around clusters of scaled-out servers and storage. Exelero has built its NVMesh around this principle, and some kind of convergence with HCI could be a route to providing NVMe with what it needs.
So, with hyper-converged as a rising star of the storage market already, could it come to the rescue for NVMe?