On the face of it, cloud storage should be an ideal candidate for an “excess capacity”, “sharing economy” business model.
Otherwise termed “Uberization”, we’ve seen the rise of apps and services that seek to marry excess capacity in cars, homes etc to those looking to use it, and generally at a cut price.
Decentralised storage pioneer Storj seeks to be the AirBnB of storage, and has just entered public alpha with its version 3 for potential customers that want to test it in their own environments.
With more than $30 million raised and public launch planned for January 2019, Storj promises storage without the need for a third party provider and at “SLAs comparable to traditional datacentres.”
Its target customers are those that use long-term archiving and S3-compatible object storage.
Essentially, Storj enables users able to make excess storage in under-utilised capacity available to others. Data is encrypted, sharded and distributed to those that have allowed their excess capacity to be used, so-called “farmers” who each only have a tiny fraction of a file and in encrypted form.
Users pay for storage in Storj tokens and the farmers get a percentage. It is planned that the network will scale to exabyte levels and, while decentralised, storage capacity itself will not be based on blockchain to avoid a massive and constantly-growing ledger that sucks up bandwidth.
S3 compatibility will mean that those currently using cloud storage will be able to switch to using Storj by simply by changing a few lines of code, its most recent white paper, published last week, claims.
Data protection comprises client-level encryption with erasure coding in case data is lost by failure or, more likely, the lack of availability of a storage node.
Again, bandwidth is cited as the driver here, with erasure coding – the ability to rebuild lost data from distributed parity data – being far more economical of capacity and throughput than retention of multiple copies. Storj hopes for six-nines levels of availability from its implementation.
The Storj network will comprise three components. Uplinks describe the software any user will use to interact with the network, while storage nodes are as described, the place where capacity resides. Meanwhile, satellites orchestrate connections between the two.
It’s an interesting development and possibly one that may suit organisations with cloud storage needs that aren’t super-quick. Storj has explicitly set out to address key likely concerns with such a modus operandi, namely trust (security, and weeding out bad actors) as well as issues of trustlessness, ie ensuring that the inability to rely on all parties is not critical to success.
Making sure these issues are solidly dealt with as well as those of cost and speed of access will determine whether uberisation succeeds or not in data storage.
IBM’s proposed $34 billion purchase of Red Hat looks like a good idea for both sides. But is the proposed coming together borne of inspiration, or desperation?
In other words, were the two companies a bit like the last ones to find a partner at the dance? The dance in this case is cloud, and more specifically hybrid and multi-cloud.
IBM, once had an identity as the mainframe seller you could never get fired for choosing.
Nowadays, big, on-prem compute and storage hardware and the software and services that surround them are still core to its portfolio, but are joined by a push to the cloud (and a mix of software offerings: Analytics, AI etc).
It is the fourth largest cloud player, but a very long way behind the big three in terms of market share: AWS, Microsoft Azure and Google Cloud Platform.
IBM Softlayer offers cloud compute with file, object and block storage with forays into containers, AI, blockchain and analytics.
Meanwhile, Red Hat has made its living since formation in 1993 as a purveyor of commercial distributions of open source software.
Its offerings centre on its Red Hat Enterprise Linux operating system, the OpenStack private cloud environment, OpenShift container management, and the JBoss and Ansible application platforms.
Pressure on IBM and Red Hat to seek what the other has comes from the rise of the cloud and especially the big three.
AWS, Azure and Google are increasingly able to provide mature compute and storage services for enterprise customers with pushes made towards analytics, IoT, containers etc.
This threatens IBM, not least in its own cloud business Softlayer, but more widely too as IT increasingly moves to hybrid and multi-cloud models. IBM’s revenues have been in decline for about five years until this year.
And there are threats too to Red Hat, which can see the cloud big three increasingly able to provide what it does but at less cost and with the convenience of the cloud.
So, it’s very easy to see a symmetry in the union of these two.
A well-integrated Red Hat could provide the cloud-era IP that IBM needs to keep up, allowing it to move more gracefully towards hybrid and multi-cloud offerings across its portfolio.
It could also help IBM upgrade its internal culture to a younger, more sparky one, with Red Hat far more attractive to developers.
But it’s also possible to see deep uneasiness as they come together.
IBM has stated that Red Hat will keep its independence, which includes partnerships with the main cloud providers. But will the freedom Red Hat gives to its developers survive?
The real measure of success will be IBM’s rankings against the big cloud providers and in its ongoing revenues. Superficially, what Red Hat has in terms of products can help this.
IBM is, however, likely to want to restrict development efforts along commercial channels. How it does that, without killing the creative heart of Red Hat, is key.
The headline grabbers in storage are usually the quick – flash, NVMe etc – or the harbingers of the next generation, such as the cloud. But in some ways these are the extremes, the outliers. In between, the vast bulk of the world’s business data resides on spinning disk hard drives and tape in long-term archives.
Of these, tape storage is set to see regular increases in capacity, with several other key advantages that make it likely to persist well into the future as the medium of choice for infrequently accessed data.
That was the key message in a recent article by Mark Lantz, manager of advanced tape technologies at IBM’s Zurich research facility.
In terms of its density, he said, we could soon see tape cartridges that run to hundreds of petabytes of data. That prospect came into view with the announcement last year by IBM of a new record in areal density for tape, based on nano-scale advances in tape and tape head technology.
That development could put 330TB on a standard tape cartridge, enough capacity for the contents of a bookshelf that would run from one end of Japan to the other, according to Lantz.
That’s not available yet, though, and maximum tape cartridge capacities currently run to about 15TB uncompressed (30TB compressed), while spinning disk HDDs can go to 60TB and flash drives to 30TB.
Nevertheless, for putting a lot of data in one place, Lantz is probably right to claim tape can claim top spot with the ability to build tape libraries that run to several hundred petabytes.
Tape also consumes no power when not in use, has failure rates several orders of magnitude lower than spinning disk, and the inherent offline “gap” between tape and the wider network provides a barrier to unauthorised access.
On top of all this, tape costs something like one sixth the cost of disk.
Of course, the kind of data you put on pricey flash is not the same as would go on magnetic tape; access times for tape are measured in seconds compared to milliseconds or fractions thereof for disk.
So, tape is best suited to infrequently-accessed data, and it seems to be the medium of choice for some of the biggest players in the cloud. Microsoft admitted in a roundabout way not too long ago that its biggest repositories of cold data are held on tape. Meanwhile, the last we knew, Amazon’s Glacier long term storage also relied on it.
And if tape fits the bill then you can be pretty sure it’s a future-proofed medium in the sense that tape capacities are set to scale – by about about 33% a year according to IBM – in a way that HDD and flash technology can no longer be due the limits on working in the media at ever smaller scale.
A lot of these are quite convincing arguments, but, there’s one key trend of contemporary IT that tape cannot fit with.
That’s the drive towards analytics, or at least not the kind of online analytics that seems to be on trend right now.
Many storage and backup vendors are increasingly working analytics into their offer, and this mirrors the trend of digital transformation in which organisations are going fully digital, with the aim, at least in part, to gain value from existing data.
That’s entirely possible as long as data resides on constantly-available media. But tape, with its long access times, precludes this.
Sure, you can access batches of data at a time from tape repositories and then run the numbers, but this isn’t analytics as it forms part of current trends.
So, tape is ideal to store very large amounts of data that you don’t need to access very often, but its use cases seem to be gradually narrowing as the need to apply intelligence to archives increases.
Tape doesn’t seem likely to die for some time yet, but it may be the case that, as advances are made elsewhere in IT, its field of operations will shrink.
The company – most well-known for its RING object storage product – is in the middle of efforts to achieve “multi-cloud” operations, in which customers can operate within and between public cloud and on-premises environments.
According to Scality CEO Jerome Lecat, the $60m will go towards “engineering efforts”. Adding, “The idea is to give freedom in a multi-cloud world. To be able to manage multiple clouds seamlessly with metadata search and the ability to move and replicate across clouds.”
It has gone some way to achieving this with its Zenko “multi-cloud controller”, although as yet that’s in beta with one firm, Bloomberg, and GA planned for later this year.
If it means anything “seamless multi-cloud operations” must mean the ability to operate in hybrid cloud fashion, with something like the ability to drag and drop files/objects between locations, between private cloud and public, and between public clouds. Like a user in an organisation can do between drives and locations on a LAN, in effect.
I asked Lecat whether Scality aims to make Zenko drag-and-drop. Unfortunately, he couldn’t give any definite answers here.
“The target is still the sysadmin kind of person, not the end user,” he said, implying that drag-and-drop simplicity is not needed for that target market, although that type of interface is in use commonly in other environments.
He went on to say, “Included in the latest RING software is the ability to visualise S3 buckets – it’s not quite drag-and-drop but you can see files in S3 buckets. But, in multi-cloud it is not drag-and-drop. It is a picklist, but you can pick the destination cloud and Zenko does the rest.”
Keen to get to what Scality was aiming for in its engineering efforts, I asked what needs to be done engineering-wise to get to seamless multi-cloud operations.
Lecat said: “Honestly, I’m going to pass on this question. There are problems to be solved but I don’t want to give them more visibility. It’s not easy to build this.”
He went on to outline what Scality had achieved
“What we have achieved is to provide a single namespace and validated four clouds across which data can be stored: Google, AWS, Azure and the Scality Ring [private cloud]. Also, we store in the native format of the cloud, for example, S3 in Amazon, Blobs in Azure, which is super-important to take advantage of value-added features in those clouds. And you can search and, for example, delete anything according to metadata attributes across those clouds.”
It’s an impressive list of achievements so far. But, it’s a case of watch-this-space to see whether the company can go further to achieve real seamless object storage ops between multiple public and private clouds.
A decade ago storage journalists were quite keen on a new technology around at the time. That was MAID – Massive Array of Idle Disks – which were basically disk-based backup target devices with lots of drives that could be spun down when not in use and so were suited to infrequently-used data.
The key attraction was access times quicker than tape, but avoiding some or most of the cost of powering and cooling lots of hard drives. A UK company called Copan was a pioneer of this, but lots of mainstream and lesser-known storage box makers got on board for a bit.
By the turn of the decade Copan had been swallowed up by SGI and MAID seemed to run into the sand. A range of explanations were proffered, that ranged from the unsuitability of HDDs to power down and up to the simple economics of still not being as cheap as tape.
It’s called Pelican and it weighs not far off a tonne and a half. Packed with 1,152 10TB drives in a non-standard 52U rack it can store up to 11.5PB.
The idleness of drives therein is enforced by the dual controllers (that also contain its object storage file scheme) that schedule and orchestrate spin up, spin down, rebuilds etc and the key operating principle that no more than 8% of drives can ever be spinning, which is what keeps Pelican within its cooling parameters.
Pelican is being developed and rolled out by Microsoft for its Azure cloud datacentres, and is explicitly only for those that are “not big enough” for tape infrastructure (Azure currently uses IBM TS3500 libraries), according to Russinovich.
Implicit in that, it seem, is an acknowledgement that even today’s MAID, with 10TB hard drives and massive density, still doesn’t compete with tape in all scenarios. If it did, Microsoft would roll it out to all their Azure datacentres and we’d be set to see it hit the wider market.
So, for now, tape may not be dead. And can rest easy for the time being.
Tim Berners-Lee is rightly famous as the originator of HTTP, a fundamental of the World Wide Web as we know it.
But according to some, HTTP is old hat. It has helped create a web full of dead links, that is increasingly centralised, open to control by governments etc and prone to failure.
HTTP depends on IP, a device-specific method of addressing.
IPFS, on the other hand, relies on content addressing. In other words, each stored item has its own unique identifier, an immutable hash created for it alone.
This allows data to be stored anywhere, and those that request it can access it from the nearest location, or from many locations.
In fact, IPFS has characteristics somewhat similar to BitTorrent, the peer-to-peer protocol beloved of illegal movie downloaders and used by the likes of PirateBay.
Instead of downloading one file from one place a so-called Torrent swarm allows the user to download many shards of a file from many locations simultaneously.
In IPFS, as in a Torrent swarm, this is organised by a DHT, a distributed hash table.
And, with blockchain technology, you can record the hashes of data held – but not the data itself – with an immutable timestamp that also allows searching.
It’s early days yet. There are some efforts ongoing to marry blockchain technology to distributed storage in a payment model, such as Storj and SIA.
And IPFS’s related project the Interplanetary Database, which sought to create an internet-scale blockchain database, gave up the ghost earlier this year.
But, the idea of being able to distribute storage, to hold data anywhere and access it from anywhere, in a system uncontrolled by large entities, is surely and attractive one?
Well, perhaps for many, but the potential is possibly there to undermine the current economic models of big storage, in the datacentre and even, as some now call it, the “legacy cloud”.
There’s a drive towards hybrid cloud evident at the moment.
From hyper-converged products that offer cloud capability to single namespace file and object systems that allow seamless working between private datacentre and public cloud service.
While these mostly address midrange and enterprise customers, there are also hybrid cloud products for smaller-scale users that want on- and off-site interoperability.
One of these, FileShadow, last week added on-site Drobo NAS capability to its existing aggregation of multiple file-sync-and-share cloud-based services via its own file system.
The company aims its product at individuals and SMEs that have several cloud-based file share services and want to unify access to them. It has now added the ability to extend visibility to Drobo NAS products too, bringing hybrid cloud to smaller users.
FileShadow is a journaled file system that uses webhooks to access user data in cloud file-sync-and-share services, currently Box, Dropbox, one Drive, Google Drive and Adobe Creative Cloud. To these are now added access to Drobo, via agents on the NAS device.
FileShadow reads, encrypts and catalogues and indexes user data and stores it on the IBM Cloud Object Storage service. Its layering of services on indexed files allows, for example, images to be searched for by subject via metadata rather than just file name.
Fileshadow also allows files to undergo optical character recognition, for example, in PDFs to allow scanned documents to be searched.
Will FileShadow add more cloud services?
Tyrone Pike, president and CEO of FileShadow, said: “We’re covering about 96% of the market so far, but we hope to add some more, like Apple iCloud and some AWS services that are only available to Prime storage services.”
Hybrid cloud has had a boost recently with the emergence of file/object environments that allow customers to operate a single namespace between on-premise and public cloud locations.
One of the pioneers here is Cloudian, which offers its HyperStore object storage-based environment with file-level access via NFS and/or CIFS, in HyperFile. The latter capability was first introduced last December in a partnership with Milan-based Infinity Storage and that has now been cemented by the acquisition by Cloudian of the Italian firm.
But, how exactly can file and object co-exist? After all, file systems bar simultaneous user access via file locking while object storage has no such mechanism.
Talking this week to Michael Tso, CEO, and Caterina Falchi, new on-board VP of file technologies at Cloudian, it was interesting to delve into how the two sides – file and object – relate to each other in Cloudian, and the limits that places on possible workloads.
There’s no doubt that what Cloudian offers is an exciting development that allows customers to operate with file or object access with a single namespace between on-premises locations and public cloud services from Amazon, Google and Microsoft. It’s part of an emerging class of products, that also include those from the likes of Scality, WekaIO, Qumulo and Elastifile.
The fundamentals of Cloudian are that data is kept as objects. “The ultimate version of the truth is object,” said Tso. And S3” is the protocol by which data stored as objects is communicated with.
Now there is file access via NFS and CIFS, but data is converted to object format behind this. File locking exists in NFS and CIFS but once data is in object format it can, in theory, be altered by more than one party at a time.
How will this be handled? Tso and Falchi say global file locking is on the roadmap, but for now, “There’s file locking from the file side,” says Tso. “But it’s not easy from the object side. That’s because we don’t want to change S3 standards that do not contain any locking mechanism. “It’s something we still debating if we need to do,” he added.
“We’ve not had any major issues,” says Tso. “People manage access at the application level. The only time it would be a problem if there was some incidental change in the flow, where you don’t expect someone to come in from a different interface.”
So, like Google drive or Dropbox, if someone has access at the same time then different versions are created.
From that, said Tso, use cases that are beyond the pale are, “Remote and branch office stuff, where people are collaborating, several people working on the same document making multiple edits at the same time.”
But, he said, Cloudian will work for Internet of Things data, file sharing, and media archives, and looks to customers that want to move,, “from tape or Isilon [Dell EMC’s scale-out NAS product]”.
This year’s storage news so far has provided a firm impression of the increasing prominence of the cloud, and in particular of attempts to harness the public cloud and private datacentre in hybrid operations.
Now, recent IDC figures provide some evidence for a strong trends towards the cloud forming an important part of IT operations, as the table below shows.
In 2017, IT infrastructure spending for deployment in cloud environments hit $46.5 billion, a year-on-year growth rate of 20.9%.
Public cloud attracted the bulk of that (65.3%) and grew the fastest, with an annual rate of 26.2%. Meanwhile, spend on traditional, non-cloud IT was expected to have declined by 2.6% in 2017. It still formed the majority (57.2%) of user spending but was down from 62.4% in 2016.
This comes on top of recent news that have been centred on the efforts of vendors to provide a unified storage environment across the hybrid cloud, between on-premises and public cloud operations.
These have included: Cloudian’s upgrade to its Hyperstore object and NAS storage software to allow hybrid operations to multiple cloud providers; Qumulo’s launch in Europe of its hybrid cloud NAS, effectively a parallel file system that mirrors the likes of Dell EMC’s Isilon but between cloud and on-site locations, and; Microsoft’s purchase of Avere, a storage software maker that included hybrid cloud storage functionality.
Such products solve a bunch of problems for hybrid cloud storage. It has long been possible to work between private and public cloud environments, but getting data into and back from the cloud hasn’t always been so easy. And data portability between clouds has long been an issue.
It just wasn’t possible to handle data on a common file system or scheme (so put because object storage doesn’t use a file system, as such) as it is now with the type of products emerging.
These allow seamless operations between on-site and public clouds that mean the latter can be easily used for burst workloads or as a tier behind on-site performance layers.
That seems to me to be a significant landmark and we should expect to see further developments along these lines.
Sure, there will likely be a question mark over the fundamental resilience, availability and latency aspects of the use of cloud. After all, connection loss is only a misplaced JCB shovel away, but the appearance of near-true unified hybrid storage environments is a great step forward.
Microsoft’s acquisition of Avere for an undisclosed sum, announced at the beginning of the year, marks the swallowing of an always-interesting storage player and a significant move for Microsoft and its cloud strategy.
The move is a clear play to boost Microsoft’s hybrid cloud capabilities, and aims to meet the need of businesses for whom the cloud in its pure form still can’t cut it for their workloads, on grounds of availability or performance.
Avere’s products have always had something to do with improving performance across multiple locations
It started in 2008 with its NAS acceleration boxes – the FXT products, dubbed Edge Filers – that boosted access to core NAS clusters. Then Avere added the vFXT virtual versions of these and added cloud capability and tiering, within the cloud (using Amazon’s various classes of storage) and between on-site and cloud locations, including with a single namespace in its C2N hybrid cloud NAS-object storage appliance.
Such capabilities look likely to be added to the Azure stable at Microsoft and would offer customers a rich set of hybrid cloud possibilities, with tiering in the cloud and between on- and off-site locations.
The pull towards hybrid cloud is that increasingly organisations want data portability between on-site and cloud to be able to deal with availability issues as well as being able to burst for performance reasons.
What also stands out is that this is the first time I can recall a company like Microsoft – in the guise of cloud provider – acquiring a storage vendor.
The cloud is surely the future, with compute and storage increasingly provided as a service in the medium- to long-term, despite current concerns over availability, security etc.
Will this acquisition be the first of many in which storage is reconfigured as a hybrid function between datacentre and cloud?