In October, Cloudian’s CEO Michael Tso told ComputerWeekly.com that object storage is set to go mainstream, that it would be “the next NFS”, referring to the long-established file access storage protocol.
Then last week Molly Presley of scale-out NAS maker Qumulo said customers are coming back to scale-out NAS after object storage had made some gains.
That, she said, was because the products coming from relative newcomers in the market – so-called “modern file systems” – are “built to cope”.
“We are seeing customers that had moved to object storage coming over to scale-out NAS,” said Presley.
Object vs scale-out NAS numbers
Both sides can wheel out numbers to back up their views. IBM, while promoting its object storage products, provided IDC figures from 2016 that predicted object storage a capacity growing at an annual rate of 30.7% to achieve 293.7 exabytes in 2020.
Meanwhile, Qumulo points to a more recent IDC report that has scale-out NAS growth forecasts revised upwards by 2x with an $11 billion market by 2022 where they had previously predicted 10% annual increases.
Object storage has been a steadily-emerging storage technology for some years now. Its advantages lie in its ability to scale massively. This quality is a result of object storage doing away with the tree-like file systems in use in NAS (and indirectly in SAN) storage. When these scale to very large numbers of files they can start to slow up, and that has been a complaint with some leading but “legacy” scale-out NAS systems.
These sorts of issues helped object storage get a clear foothold in a market where previously those looking to deploy storage for large-scale file or unstructured data workloads went to scale-out NAS. Object storage’s flat file structure seemed to have done for scale-out NAS, especially when tied in with object’s affinity for the cloud in an increasingly cloud-oriented era.
But there have always been drawbacks to object storage too. These lie in limitations when working with existing application infrastructures, which are more often than not written for file system working (which includes locking that object lacks) but also in terms of performance. Object storage tends not to be the fastest while its data protection schemas are usually “eventually consistent”.
The scale-out comeback
And now there’s a bounce-back from scale-out NAS, according to Presley and the advocates of “modern file systems”.
That’s based on the fact that newcomers such as Qumulo but also including WekaIO and Elastifile are built for the cloud as well-as on-premises deployment.
But is also, as Presley said, “that all your data is available in a single tier. Applications are built to talk to file systems. You don’t have to re-write things to talk S3, for example. Modern file systems can do what object storage does, but with performance.”
So, things are getting interesting as the necessity of dealing with large amounts of unstructured data grows in importance. Scale-out NAS looked down-and-out, but now customers potentially have some serious, contemporary choices to make between object and file.
Product development in storage often feels like a bolting together of existing categories of product. You take one advance from here and mate it with another from over there. Getting the timing right so that offerings can combine an optimum amount of new functionality – and to meet the emerging needs of the market – is a key part of the art of productising technologies.
So, IBM’s recent announcement of its Elastic Storage 3000 seems to have landed well in those terms, nudging its nose ahead in a class of AI/HPC-focussed (and often NVMe-powered) scale-out storage systems.
It brings together super-fast NVMe-based flash storage, with its well-established Spectrum Scale parallel file system, with a bit of containerisation thrown in for good measure.
The target market is artificial intelligence/analytics and high performance computing (HPC).
Elastic Storage 3000 is based on IBM’s FlashSystem 9100 NVMe storage array, which is block-based storage.
The innovation in Elastic Storage 3000 comes with the use of NVMe and the addition of IBM’s Spectrum Scale parallel file system (formerly GPFS) to create a massively performant scale-out NAS product aimed at leading edge contemporary use cases based around unstructured data.
IBM has also provided for Spectrum Scale to be deployed via containers to allow for relatively rapid deployment. The company says it has been able to roll out Elastic Storage 3000 in less than three hours.
The IBM product joins a number of others who play in a similar space, but over some of which it has some advantages brought about simply by the timing of its development/release.
A near competitor is HPE’s combination of Apollo hardware with Wekaio’s Matrix parallel file system, which also leverages the high performance of NVMe flash storage media.
Meanwhile, another distributed file system relative newcomer, Qumulo, has its P-Series product, also NVMe-powered when on-prem, but able to add more contemporary nice-to/must-have functionality in the ability to operate in the cloud and as a hybrid file system too.
Pure Storage also aims at AI/analytics use cases with its FlashBlade arrays, which are available as all-NVMe.
Finally, Dell EMC Isilon – a pioneer of the scale-out NAS space – seems to be lagging in terms of adding NVMe flash to its products. It also lacks support for parallel I/O for POSIX-compliant clients, which Spectrum Scale offers.
Object storage, based on Amazon’s S3 – pretty much a de facto standard now – is set to go mainstream.
That’s the view of Cloudian’s co-founder and CEO Michael Tso.
He says it’ll be, “The next NFS”, and is set for widespread enterprise adoption in private cloud deployments.
Why is Tso so confident? He points to validation for Cloudian’s HyperStore S3 software-defined object storage from the likes of Seagate, VMware and Pure Storage.
Cloudian and Seagate joined forces this summer to launch the massively dense exabyte-scale HyperStore Xtreme, with Cloudian storage software and Seagate 16TB drives.
According to Tso, who says Seagate is now shipping more drives for private cloud deployment than for public cloud, the trend towards public cloud is swinging towards on-premises deployments.
For that, Seagate wanted to build its own array/chassis as a vehicle for its 16TB drives and looked to Cloudian to supply the storage intelligence on top.
Tso says, “When you get to 16TB and are looking forward to 20TB drives they are so tightly packed, so sensitive to airflow, for example, it takes array makers several months to develop chassis that can make use of them.”
“So, it’s essential from the point-of-view of Seagate that they move to shipping entire systems. They have to shrink the time-to-money, to ramp up the volume shipment of their drives with their own chassis. It means a lot of money sooner. And they care about object storage as a predominant way of storing data going forward.”
“They said, ‘We want to be first to market with high density systems and with your software, to be the first shipping 16TB drive systems.’ They want to show what their products can do much more quickly.”
“It brings the market schedule in by three to six months which results in $billions for Seagate.”
So, for Seagate, HyperStore Xtreme is a showcase, a quick route to market, and for Tso the selection of Cloudian is a validation of its product and of S3 object storage for private cloud deployments.
Meanwhile, Tso also points to VMware, with which it also struck a partnership this summer with the launch of Cloudian Object Storage for vCloud Director, which provides S3-compatible storage for the VMware cloud.
Tso says VMware chose Cloudian for object storage collaboration over Dell-EMC object storage products.
“They need a truly software defined system that can run in the cloud and on-prem under a single control panel,” he says. “A native S3 solution – it’s on a path of going from on-prem to public cloud to multi cloud.”
“Where data storage is, is where the VMs will move to,” says Tso.
“Everyone says, ‘I want S3’ and the industry leaders are coming to us,” he adds.
Finally, Tso also points to September’s announcement which saw Cloudian’s HyperStore S3 object storage software integrated with Pure’s CloudSnap to allow data movement between FlashArray and Cloudian.
But it also looks like a massive and potentially expensive gamble for Commvault, which saw revenues decreases and losses posted in its recent first quarter results.
That’s par for the course though. It is companies like Commvault that need to put down big stakes to make headway in the current ‘game state’ in storage and data protection.
So, what is that game state? Look at backup market players and their share a few years ago and it pretty much solely consists of the suppliers that had been in the market seemingly forever: Veritas/Symantec, Dell/EMC, IBM, HP and Commvault.
That all started to change with the advent of server virtualisation and the need to backup VMs and their data. It couldn’t be done the old way (one agent per physical server) and new companies emerged, the brightest of which was Veeam, which is now on revenues of more than $1 billion in a total market of $10 billion-plus.
That was the first wave. The next assault on the ‘traditional’ backup market has come on the back of hyper-converged infrastructure (itself an outgrowth of server virtualisation).
That has seen the rise of vendors that have put cloud-friendly backup capability into clusterable node appliances, such as Rubrik, Cohesity etc.
Compared to these, traditional backup software products are seen as old and clunky, costly and hard to set up. For sure, these appliance-based newcomers are minnows in the market so far but there’s a definite sense that this is the way things are going.
Meanwhile, there is also a movement towards backup and the data it collects as broad data platforms upon which analytics, for example, can derive extra value, such as Veritas has been moving towards.
So, what about Commvault’s move for Hedvig? It seems calculated to take Commvault in the direction of its competitors. It’s not a way to grab Hedvig customers, because there aren’t many of them.
It appears a gamble, but then all such moves are. In making it work Commvault will have tough tasks to tackle in integrating a very different software stack (based around block, file and object storage) into its own. On top of those technical tasks will be the need to re-vamp its product lines, presumably to bring them in line with the ways the market is changing. All that costs money and corporate mindshare.
Customers and investors will be keeping a sharp eye on how this progresses.
Looking at the promise from StorOne to offer its software-defined storage for $799 a month – that’s about £640 – for 15TB of file, block or object storage capacity, I thought I’d compare it with the cost of cloud storage and buying an entry-level storage array from one of the big five.
The StorOne offer comes out looking like a good one, superficially at least.
Its S1 software-defined storage platform handles SAN, NAS and S3 object storage with the lure of remarkable levels of capacity utilisation. In addition, after 60 months of the leasing-style arrangement ownership is transferred to the customer.
So, how does £640 a month compare to the cost of cloud storage?
With Amazon 15TB of standard access S3 object storage would cost you £360 per month. The infrequent access variant of that would be £196 per month. On top of this you’d have to add access charges, with Amazon billing you per request to browse, search, put, get, copy etc and by the GB to return data.
But not many people are going to use solely object storage are they anyway?
Meanwhile, 15TB of Amazon block storage would cost £1,740 per month for general purpose SSD, or £795 for “throughput optimised” spinning disk.
Oddly, Amazon file storage costs the most, with 15TB of standard Amazon Cloud File Storage costing £4,950 per month.
Having said all that, Amazon’s cost structure is nowhere near that simple in practice. You will actually be charged by the second for block and file storage and there are also many more levels of service to choose from than I’ve quoted here.
The actual nature of the workload and the ability to move data between tiers would make a huge difference to the bill in practice so it’s very difficult to give a monthly cost of cloud storage to compare against the StorOne offer.
But, what about buying your own storage array? A similar capacity storage array with flash drives from, say, NetApp or HPE, would hit you for about £25,000 including VAT.
If that hardware reached end-of-life over three years that’d work out at £694 a month.
On the surface of it that looks in a similar ball park to StorOne’s £640 per month. But then StorOne is software-defined storage, so you’re going to have to shell out for a hardware platform anyway, and that’ll probably double the cost.
So, at the end of the day it looks like going with an entry-level storage array wins out.
Unless you know better?
On the face of it, cloud storage should be an ideal candidate for an “excess capacity”, “sharing economy” business model.
Otherwise termed “Uberization”, we’ve seen the rise of apps and services that seek to marry excess capacity in cars, homes etc to those looking to use it, and generally at a cut price.
Decentralised storage pioneer Storj seeks to be the AirBnB of storage, and has just entered public alpha with its version 3 for potential customers that want to test it in their own environments.
With more than $30 million raised and public launch planned for January 2019, Storj promises storage without the need for a third party provider and at “SLAs comparable to traditional datacentres.”
Its target customers are those that use long-term archiving and S3-compatible object storage.
Essentially, Storj enables users able to make excess storage in under-utilised capacity available to others. Data is encrypted, sharded and distributed to those that have allowed their excess capacity to be used, so-called “farmers” who each only have a tiny fraction of a file and in encrypted form.
Users pay for storage in Storj tokens and the farmers get a percentage. It is planned that the network will scale to exabyte levels and, while decentralised, storage capacity itself will not be based on blockchain to avoid a massive and constantly-growing ledger that sucks up bandwidth.
S3 compatibility will mean that those currently using cloud storage will be able to switch to using Storj by simply by changing a few lines of code, its most recent white paper, published last week, claims.
Data protection comprises client-level encryption with erasure coding in case data is lost by failure or, more likely, the lack of availability of a storage node.
Again, bandwidth is cited as the driver here, with erasure coding – the ability to rebuild lost data from distributed parity data – being far more economical of capacity and throughput than retention of multiple copies. Storj hopes for six-nines levels of availability from its implementation.
The Storj network will comprise three components. Uplinks describe the software any user will use to interact with the network, while storage nodes are as described, the place where capacity resides. Meanwhile, satellites orchestrate connections between the two.
It’s an interesting development and possibly one that may suit organisations with cloud storage needs that aren’t super-quick. Storj has explicitly set out to address key likely concerns with such a modus operandi, namely trust (security, and weeding out bad actors) as well as issues of trustlessness, ie ensuring that the inability to rely on all parties is not critical to success.
Making sure these issues are solidly dealt with as well as those of cost and speed of access will determine whether uberisation succeeds or not in data storage.
IBM’s proposed $34 billion purchase of Red Hat looks like a good idea for both sides. But is the proposed coming together borne of inspiration, or desperation?
In other words, were the two companies a bit like the last ones to find a partner at the dance? The dance in this case is cloud, and more specifically hybrid and multi-cloud.
IBM, once had an identity as the mainframe seller you could never get fired for choosing.
Nowadays, big, on-prem compute and storage hardware and the software and services that surround them are still core to its portfolio, but are joined by a push to the cloud (and a mix of software offerings: Analytics, AI etc).
It is the fourth largest cloud player, but a very long way behind the big three in terms of market share: AWS, Microsoft Azure and Google Cloud Platform.
IBM Softlayer offers cloud compute with file, object and block storage with forays into containers, AI, blockchain and analytics.
Meanwhile, Red Hat has made its living since formation in 1993 as a purveyor of commercial distributions of open source software.
Its offerings centre on its Red Hat Enterprise Linux operating system, the OpenStack private cloud environment, OpenShift container management, and the JBoss and Ansible application platforms.
Pressure on IBM and Red Hat to seek what the other has comes from the rise of the cloud and especially the big three.
AWS, Azure and Google are increasingly able to provide mature compute and storage services for enterprise customers with pushes made towards analytics, IoT, containers etc.
This threatens IBM, not least in its own cloud business Softlayer, but more widely too as IT increasingly moves to hybrid and multi-cloud models. IBM’s revenues have been in decline for about five years until this year.
And there are threats too to Red Hat, which can see the cloud big three increasingly able to provide what it does but at less cost and with the convenience of the cloud.
So, it’s very easy to see a symmetry in the union of these two.
A well-integrated Red Hat could provide the cloud-era IP that IBM needs to keep up, allowing it to move more gracefully towards hybrid and multi-cloud offerings across its portfolio.
It could also help IBM upgrade its internal culture to a younger, more sparky one, with Red Hat far more attractive to developers.
But it’s also possible to see deep uneasiness as they come together.
IBM has stated that Red Hat will keep its independence, which includes partnerships with the main cloud providers. But will the freedom Red Hat gives to its developers survive?
The real measure of success will be IBM’s rankings against the big cloud providers and in its ongoing revenues. Superficially, what Red Hat has in terms of products can help this.
IBM is, however, likely to want to restrict development efforts along commercial channels. How it does that, without killing the creative heart of Red Hat, is key.
The headline grabbers in storage are usually the quick – flash, NVMe etc – or the harbingers of the next generation, such as the cloud. But in some ways these are the extremes, the outliers. In between, the vast bulk of the world’s business data resides on spinning disk hard drives and tape in long-term archives.
Of these, tape storage is set to see regular increases in capacity, with several other key advantages that make it likely to persist well into the future as the medium of choice for infrequently accessed data.
That was the key message in a recent article by Mark Lantz, manager of advanced tape technologies at IBM’s Zurich research facility.
In terms of its density, he said, we could soon see tape cartridges that run to hundreds of petabytes of data. That prospect came into view with the announcement last year by IBM of a new record in areal density for tape, based on nano-scale advances in tape and tape head technology.
That development could put 330TB on a standard tape cartridge, enough capacity for the contents of a bookshelf that would run from one end of Japan to the other, according to Lantz.
That’s not available yet, though, and maximum tape cartridge capacities currently run to about 15TB uncompressed (30TB compressed), while spinning disk HDDs can go to 60TB and flash drives to 30TB.
Nevertheless, for putting a lot of data in one place, Lantz is probably right to claim tape can claim top spot with the ability to build tape libraries that run to several hundred petabytes.
Tape also consumes no power when not in use, has failure rates several orders of magnitude lower than spinning disk, and the inherent offline “gap” between tape and the wider network provides a barrier to unauthorised access.
On top of all this, tape costs something like one sixth the cost of disk.
Of course, the kind of data you put on pricey flash is not the same as would go on magnetic tape; access times for tape are measured in seconds compared to milliseconds or fractions thereof for disk.
So, tape is best suited to infrequently-accessed data, and it seems to be the medium of choice for some of the biggest players in the cloud. Microsoft admitted in a roundabout way not too long ago that its biggest repositories of cold data are held on tape. Meanwhile, the last we knew, Amazon’s Glacier long term storage also relied on it.
And if tape fits the bill then you can be pretty sure it’s a future-proofed medium in the sense that tape capacities are set to scale – by about about 33% a year according to IBM – in a way that HDD and flash technology can no longer be due the limits on working in the media at ever smaller scale.
A lot of these are quite convincing arguments, but, there’s one key trend of contemporary IT that tape cannot fit with.
That’s the drive towards analytics, or at least not the kind of online analytics that seems to be on trend right now.
Many storage and backup vendors are increasingly working analytics into their offer, and this mirrors the trend of digital transformation in which organisations are going fully digital, with the aim, at least in part, to gain value from existing data.
That’s entirely possible as long as data resides on constantly-available media. But tape, with its long access times, precludes this.
Sure, you can access batches of data at a time from tape repositories and then run the numbers, but this isn’t analytics as it forms part of current trends.
So, tape is ideal to store very large amounts of data that you don’t need to access very often, but its use cases seem to be gradually narrowing as the need to apply intelligence to archives increases.
Tape doesn’t seem likely to die for some time yet, but it may be the case that, as advances are made elsewhere in IT, its field of operations will shrink.
The company – most well-known for its RING object storage product – is in the middle of efforts to achieve “multi-cloud” operations, in which customers can operate within and between public cloud and on-premises environments.
According to Scality CEO Jerome Lecat, the $60m will go towards “engineering efforts”. Adding, “The idea is to give freedom in a multi-cloud world. To be able to manage multiple clouds seamlessly with metadata search and the ability to move and replicate across clouds.”
It has gone some way to achieving this with its Zenko “multi-cloud controller”, although as yet that’s in beta with one firm, Bloomberg, and GA planned for later this year.
If it means anything “seamless multi-cloud operations” must mean the ability to operate in hybrid cloud fashion, with something like the ability to drag and drop files/objects between locations, between private cloud and public, and between public clouds. Like a user in an organisation can do between drives and locations on a LAN, in effect.
I asked Lecat whether Scality aims to make Zenko drag-and-drop. Unfortunately, he couldn’t give any definite answers here.
“The target is still the sysadmin kind of person, not the end user,” he said, implying that drag-and-drop simplicity is not needed for that target market, although that type of interface is in use commonly in other environments.
He went on to say, “Included in the latest RING software is the ability to visualise S3 buckets – it’s not quite drag-and-drop but you can see files in S3 buckets. But, in multi-cloud it is not drag-and-drop. It is a picklist, but you can pick the destination cloud and Zenko does the rest.”
Keen to get to what Scality was aiming for in its engineering efforts, I asked what needs to be done engineering-wise to get to seamless multi-cloud operations.
Lecat said: “Honestly, I’m going to pass on this question. There are problems to be solved but I don’t want to give them more visibility. It’s not easy to build this.”
He went on to outline what Scality had achieved
“What we have achieved is to provide a single namespace and validated four clouds across which data can be stored: Google, AWS, Azure and the Scality Ring [private cloud]. Also, we store in the native format of the cloud, for example, S3 in Amazon, Blobs in Azure, which is super-important to take advantage of value-added features in those clouds. And you can search and, for example, delete anything according to metadata attributes across those clouds.”
It’s an impressive list of achievements so far. But, it’s a case of watch-this-space to see whether the company can go further to achieve real seamless object storage ops between multiple public and private clouds.
A decade ago storage journalists were quite keen on a new technology around at the time. That was MAID – Massive Array of Idle Disks – which were basically disk-based backup target devices with lots of drives that could be spun down when not in use and so were suited to infrequently-used data.
The key attraction was access times quicker than tape, but avoiding some or most of the cost of powering and cooling lots of hard drives. A UK company called Copan was a pioneer of this, but lots of mainstream and lesser-known storage box makers got on board for a bit.
By the turn of the decade Copan had been swallowed up by SGI and MAID seemed to run into the sand. A range of explanations were proffered, that ranged from the unsuitability of HDDs to power down and up to the simple economics of still not being as cheap as tape.
It’s called Pelican and it weighs not far off a tonne and a half. Packed with 1,152 10TB drives in a non-standard 52U rack it can store up to 11.5PB.
The idleness of drives therein is enforced by the dual controllers (that also contain its object storage file scheme) that schedule and orchestrate spin up, spin down, rebuilds etc and the key operating principle that no more than 8% of drives can ever be spinning, which is what keeps Pelican within its cooling parameters.
Pelican is being developed and rolled out by Microsoft for its Azure cloud datacentres, and is explicitly only for those that are “not big enough” for tape infrastructure (Azure currently uses IBM TS3500 libraries), according to Russinovich.
Implicit in that, it seem, is an acknowledgement that even today’s MAID, with 10TB hard drives and massive density, still doesn’t compete with tape in all scenarios. If it did, Microsoft would roll it out to all their Azure datacentres and we’d be set to see it hit the wider market.
So, for now, tape may not be dead. And can rest easy for the time being.