IBM researchers are developing a cognitive storage system designed to automatically differentiate high- and low-value data and determine what information to keep, where to store it and how long to retain it.
Zurich-based IBM Research scientists Giovanni Cherubini, Jens Jelitto, and Vinodh Venkatesan introduced the concept of cognitive storage in a recently published paper in the IEEE’s Computer journal. The researchers consider cognitive storage a way to reduce costs to store big data.
The IBM Research team drew inspiration for cognitive storage from its collaborative work with the Netherlands Institute for Radio Astronomy (known as ASTRON) on a global project to build a new class of ultra-sensitive radio telescopes, called the Square Kilometre Array (SKA).
The SKA won’t be operational for at least five years. Once active, the system will generate petabytes of data on a daily basis through the collection of radio waves from the Big Bang more than 13 billion years ago, according to IBM. The system could reap significant storage savings if it could filter out useless instrument noise and other irrelevant data.
“Can we not teach computers what is important and what is not to the users of the system, so that it automatically learns to classify the data and uses this classification to optimize storage?” Venkatesan said.
Cherubini said the cognitive system draws a distinction between data value and data popularity. Data value is based on classes defined by random variables fed by users, and it can vary over time, he said. Popularity deals with frequency of data access.
“We like to keep these two aspects separate, and they are both important. They both play a role in which tier we store the data and with how much redundancy,” Cherubini said.
The cognitive storage system consists of computing/analytics units responsible for real-time filtering and classification operations and a multi-tier storage unit that handles tasks such as data protection levels and redundancy.
Venkatesan said the analytics engine adds metadata and identifies features necessary to classify a piece of information as important or unimportant. He said the system would learn user preferences and patterns and have the sophistication to detect context. In addition to real-time processing, the system also has off-line processing units to monitor and reassess the relevance of the data over time and perform deeper analysis
The information goes from the learning system into a “selector” to determine the storage device and redundancy level based on factors such as the relevance class, frequency of data access and historical treatment of other data of the same class, according to Venkatesan. The cognitive system would have different types of storage, such as flash and tape, to keep the data.
IBM researchers tested the cognitive storage system on 1.77 million files spanning seven users. They split the server data by user and let each one define different classes of files considered important. They categorized the data into three classes based on metadata such as user ID, group ID, file size, file permissions, file creation time/date, file extension and directories.
Cherubini said the IBM Research team developed software for the initial testing using the information bottleneck algorithm. He said they’re currently building the predictive caching element, “the first building block” for the cognitive system, which he said should be ready for beta testing by year’s end.
“Beyond that, it’s harder to make predictions,” Cherubini said. “If everything goes well, I think we should be able to have the full system developed at least for the first beta tests within two years.”
IBM researchers said early testing has fared well for data value prediction accuracy with the contained data set. But additional research is necessary to address challenges such as identifying standard principles to define data value and assessing the value of encrypted data.
Although the cognitive storage system is designed to classify and manage enormous amounts of data, the researchers said the benefits could extend to IT organizations. Venkatesan said the potential exists for a service-based offering.
“We think that this has a lot of potential for application in enterprises because that’s where the value of data becomes of highest importance,” Cherubini said.
The IBM Research team is looking for additional organizations to share data and ideas and collaborate on the cognitive storage system. Click the following links for contact information: Cherubini and Venkatesan.
Drobo this week unveiled DroboAccess to enable mobile file sharing on its NAS boxes.
DroboAccess lets customers access and share files stored on Drobo NAS from any device or location. The capability is available on the Drobo 5N for small businesses and the Drobo B810n for larger configurations.
The new software capability, which is part of the myDrobo suite of applications, allows customers to access and share files on their Drobo with end-to-end security. The mobile file sharing capability also allows users to share files or folders that can be designated read-only or read/write with password options.
“Three out of 10 of our customers are asking for this,” said Rod Harrison, CTO at Drobo.
Harrison said Drobo is using Pagekite as a partner to provide a secure tunnel for the data. Data is encrypted on the Drobo device before it is transferred. DroboAccess is an extension of the company’s myDrobo service platform that encrypts data end-to-end.
“This is something that can be complex getting it all to work for yourself. Your cable wire will have a firewall and you have to figure out the right ports and you have to worry about security,” Harrison said.
DroboAccess currently is available for free on 5N and B810n on the Drobo dashboard. The iOS and Android applications are available for 99 cents on the App Store and Google Play.
When Drobo and Connected Data merged in 2013, there were plans to combine Connected Data’s file-sharing Transporter technology with Drobo hardware, but Drobo was spun out in 2015 to a separate group.
Veeam Software moved further beyond pure virtual machine backup this week by unveiling Veeam Availability Orchestrator, a multi-node hypervisor orchestration engine for disaster recovery.
The Veeam Availability Orchestrator, which will be available the second half this year, is an add-on to the Veeam Availability Suite and Veeam Backup and Recovery in for VMware and Microsoft Hyper-V hypervisors.
The orchestrator software helps manage Veeam backups and replication via a disaster recovery plan that can be tested and automatically documented.
Doug Hazelman, Veeam’s vice president of product strategy, said the orchestration tool helps customers manages cross-replication across locations in the enterprise. Customers can set up policies, test against those policies for disaster recovery and get automated documentation for compliance requirements.
The software will be licensed separately from the Veeam Availability Suite. Veeam has not set pricing yet.
“It will be price per VM rather than out standard per socket,” Hazelman said. “You do have to have the Veeam Availability Suite installed and replication set up. The orchestrator has a separate interface to define the policies. Veeam Availability will hold rules in the event that a failover happens.”
Hazelman said the orchestrator is for enterprises looking to automate DR processes.
In February, Veeam announced it will add a fully functional physical server backup product this year. The company has focused on virtual machine backup and, had resisted supporting physical server backup. But Veeam customer requests as the vendor moves into the enterprise with its Veeam Availability Suite have prompted the change. With the Availability Suite, the company has emphasized what CEO Ratmir Timashov calls protection for the “modern data center” rather than only protecting virtual machines.
Fibre Channel (FC) took back the port count lead over Ethernet in external storage systems in 2015, after Ethernet had gone ahead in 2014, according to Dell’Oro Group’s latest numbers.
Chris DePuy, a vice president at Dell’Oro, noted that his market research showed FC ports gained share over Ethernet despite declining in overall ports shipped. The Dell’Oro forecast projects that FC ports shipped on external storage systems will grow 3.7% this year, while the number of Ethernet ports on external storage systems will decline by close to 1%.
“Fibre Channel’s going to take share,” DePuy said, “but it’s a very modest growth market in terms of ports attached to external storage systems.”
Here are the Dell’Oro’s figures for the past three years along with projected 2016 totals:
Note: The table does not include SAS or InfiniBand ports. 2016 numbers are projected.
DePuy attributed the FC share growth to enterprises moving to higher bandwidth 16 Gbps FC and adopting all-flash or hybrid arrays, which combine solid-state drives (SSDs) and hard disk drives (HDDs).
“For companies that have been using Fibre Channel historically, they’re very likely to continue to use Fibre Channel,” DePuy said. He said it would cost more if they switched to something else to upgrade their network.
Dell’Oro’s 2016 projections show the total number of networked ports on external storage systems will grow 1.5% this year, after at least three consecutive years of decline. But, DePuy said he expects the reversal to be temporary. He pointed out that the 2016 projected total of 3.683 million networked ports on external storage systems is still lower than the 3.949 million ports in 2013. His projected numbers show both FC and Ethernet with fewer ports in 2016 than in 2013.
“The real trend is away from networked storage,” DePuy said.
DePuy cited the use of servers with internal storage, including hyper-converged infrastructure, and direct-attached storage (DAS) as areas of growth. He defines DAS as the storage enclosure, or JBOD device, attached to a server.
“It has a lot to do with what’s going on with cloud service providers. They are purchasing a very different architecture than enterprises historically have used,” he said. “They use software-defined storage on top of standardized servers. And that is very similar to the hyper-converged products that branded vendors are selling now to enterprises.”
Here are DePuy’s rounded estimates for the number of external storage units:
Source: Dell’Oro Group
“We’re seeing a shift towards direct-attached and away from networked – in other words, the kind that use Fibre Channel and Ethernet,” DePuy said.
He noted that the external storage unit statistics specifically exclude servers with internal storage and hyper-converged infrastructure – an area for which Dell’Oro has not publicly released data.
DePuy said the number of adapter cards or built-in Ethernet in servers significantly outstrips the total of all ports on external devices. Ethernet ports outnumber FC ports by 2.5 to 1 when DePuy considers networked ports associated with internal storage systems and server ports associated with directed-attached external storage units.
Despite the growth challenges for networked external storage, vendors have been able to maintain their revenue roughly flat over the past few years through the introduction of products such as software-defined storage, hybrid cloud and hyper-converged systems, DePuy said.
“The big picture here is that enterprises looking at storage have an awful lot of choices that they have to make,” DePuy said. “And it’s getting more complex, not less complex, than it has been in the recent past.”
Nutanix’s amended S-1 filing includes results from the last quarters that weren’t included in its original filing in December. The good news is, revenue climbed significantly over those six months. The bad news is, so did losses.
For the quarter that ended Jan. 31, Nutanix reported $102.7 million in revenue compared to $56.8 million the previous year. That was Nutanix’s best revenue quarter ever, but it lost $33.2 million for the quarter compared to $28.3 million the previous year.
Nutanix is on track to rack up more revenue and more losses than its last fiscal year, which ended July 31, 2015. The vendor has $190.4 million in revenue and a loss of $71.8 million for the first two quarters of this fiscal year compared to in $241.2 million revenue and a loss of $126.1 million of for the entire last fiscal year.
The new filing also shows Nutanix CEO Dheeraj Pandey forfeited $17.5 million worth of restricted stock in March. Those shares go into the equity pool for employees and other investors to split without diluting the number of total shares.
Investors expected Nutanix to complete its IPO by now, but the IPO market has cooled and no technology company has gone public in 2016. The new filing shows Nutanix is still a candidate to become the first.
Still months away from closing its $67 billion acquisition of EMC, Dell today said it would immediately start reselling several EMC federation converged products and updated others it already sells.
Dell revealed the deals as part of a “doubling-down” on hyper-convergence and a push into selling EMC’s VCE products. And yes, Dell also moved forward with its Nutanix OEM deal by upgrading its XC Series of hyper-converged products using Nutanix software.
Dell XC Series appliances will now use the latest Intel “Broadwell” processors, which are Xeon E5-2600 v4 chips. The XC Series is also now certified for SAP NetWeaver. Travis Vigil, Dell executive director for product management for Dell storage, said Dell will offer XC Series appliances with Nutanix Acropolis, VMware or Microsoft Hyper-V hypervisors.
Acropolis competes with the VMware hypervisors that will become part of the Dell family when the EMC deal closes. Nutanix also competes with VMware’s Virtual SAN (VSAN) hyper-converged software. Still, Vigil said Dell will continue with both platforms.
“We are 100 percent committed to our XC Series,” he said. “We have had tremendous success with that product.”
Vigil said Dell has hundreds of XC customers since its Nutanix OEM deal started in 2014.
Dell has added upgraded its VMware VSAN Ready Nodes that compete with Nutanix appliances. Dell Ready Nodes now include an all-flash option using Dell PowerEdge R730xe servers, as well as Broadwell chips and factory-installed VSAN.
Vigil said Dell will continue to sell its current EVO:RAIL hyper-converged systems, but there will be no upgrades because VMware is phasing out its EVO:RAIL OEM program in favor of building more Ready Node partnerships. Dell and VMware will offer a transition program for EVO: RAIL customers who want to move off that platform to other VSAN products.
Dell will also resell VxRail Appliances and VxRack Systems from VCE, EMC’s converged infrastructure division. VxRail is a VSAN-based hyper-converged product that EMC launched in Februrary. VxRack Node and VxRack System 1000 Flex use EMC’s ScaleIO block-based storage software for hyper-converged infrastructure ranging from hundreds to thousands of nodes. Customers can also run VxRack on PowerEdge servers through the Dell Reference Architecture for EMC Converged Infrastructure.
Dell is also adding VSAN to the Dell Hybrid Cloud Platform for VMware reference architecture program. That reference architecture includes Dell Active System Manager, VMware vCenter, vRealize and VSAN for customers looking to build private and public clouds.
“We’re doubling down on providing the best portfolio for hyper-converged infrastructure,” Vigil said. “Dell identified hyper-converged infrastructure [as a growth market] early and will hopefully expand on our lead with this announcement.”
Gartner Research distinguished analyst Dave Russell said Dell was covering three bases with its converged platforms. “They have three camps – their EMC-based portfolio, their VMware-based portfolio and their Nutanix-based portfolio,” he said. “This is a case of, ‘If you have a preference, we want to satisfy that preference for you.’”
Red Hat Ceph Storage software is now officially tested, optimized and certified to run on SanDisk’s InfiniFlash storage system thanks to a strategic partnership between the two vendors.
The alliance – announced this week – is Red Hat’s first partnership involving an all-flash array. Ross Turk, director of storage product marketing at Red Hat, said running Ceph software-defined storage on performance-optimized hardware has been a hot topic of discussion at events and conferences.
Turk said via an email that he could foresee the combination of Ceph on a high-performance, low-latency all-flash array “broadening the use cases” for Ceph to workloads such as analytics, high-speed messaging and video streaming. The original sweet spots for Ceph tended to center on capacity-optimized workloads such as active archives and rich media as well as OpenStack block storage.
“The applications running on top of OpenStack can always take advantage of a higher performing storage foundation,” Turk said.
Turk said engineers tuned over 70 individual parameters in Red Hat Ceph Storage – the vendor’s supported distribution of open source Ceph software – for optimal IOPS, latency and bandwidth characteristics with SanDisk’s InfiniFlash. The vendors are currently working on reference architectures that document recommended configurations.
Although the Red Hat-SanDisk alliance is a joint engineering/development and go-to-market effort, customers purchase the products separately – either directly from SanDisk and Red Hat or through their respective channel partners, according to Turk.
Support operates similarly. Turk said customers get support for Red Hat Ceph Storage from Red Hat and support for InfiniFlash from SanDisk. But he added that Red Hat’s global support and services team is trained on InfiniFlash, and SanDisk’s support team is trained on Red Hat Ceph Storage.
“Each of us is prepared to refer customers when appropriate,” said Turk.
SanDisk had been contributing to the open source Ceph project for more than two and a half years, according to Gary Lyng, senior director of marketing and strategy of data center solutions at SanDisk. He noted that SanDisk and Red Hat already have active joint customers.
“We believe that, with Red Hat’s proven leadership with open source technologies, the adoption of Ceph as a mainstream platform in the enterprise and cloud is possible,” Lyng wrote in an email.
He said the SanDisk-Red Hat alliance underscores a number of key areas of momentum in the storage industry, including the adoption of flash for massive-capacity workloads, software-defined storage, scale-out platforms and flexible, information-centric infrastructures.
Lyng said, although this week’s announcement focused only on Red Hat Ceph Storage, “additional collaboration is natural” given their tight relationship. Turk said the vendors are exploring potential uses for SanDisk technologies and Red Hat Gluster Storage, his company’s supported distribution of the open source Gluster distributed file system.
Red Hat’s partnerships also extend to manufacturers of servers, processors, networking gear, hard disks, flash drives and controllers. Vendors include Fujitsu, Intel, Mellanox and Super Micro. Red Hat has worked with Intel to optimize Ceph on flash, according to Turk.
Lyng said SanDisk has been formally building out a technology partner ecosystem for its data center portfolio since last fall. Vendors with which SanDisk has collaborated include IBM, Nexenta and Tegile. Western Digital acquired SanDisk last October for $19 billion.
ScaleIO is block storage that can run on commodity hardware, although EMC began packaging it on hardware nodes in late 2015. It is designed to add enterprise storage features to direct attached storage, allowing for easy upgrades. The software has multi-tenant support for building cloud storage.
David Noy, EMC’s VP of product management for emerging technologies. said ScaleIO is gaining traction with three kinds of customers: service providers building out public clouds to compete with Amazon, large enterprises building private clouds with Amazon-like features and large financial services firms looking to build block storage systems on commodity hardware.
“The appeal of ScaleIO is the ability to plop in a commodity server with some drives to add capacity to your block storage,” Noy said.
Features added in ScaleIO 2 are designed to take advantage of the hardware nodes that EMC sells it on, as well as fit the types of customers using it the most.
They include security enhancements such as IPv6 support, Secure Socket Layer (SSL) connections between components and the ability to integrate it with Active Directory and LDAP. It also added in-flight checksum read flash capabilities, phone-home support and a maintenance mode that mirrors I/O coming in during maintenance, copies that I/O to another temporary location and moves the data back when the node returns online.
EMC also expanded ScaleIO support for next-generation applications such as containers, CoreOS and OpenStack.
For companies who don’t want ScaleIO shipped on a hardware node, it’s also available on a trial basis as a free download.
NetApp today rolled out an upgraded version of its SANtricity software for its E and EF Series of high performance arrays, with the focus on making Splunk and other data analytics applications run faster.
NetApp acquired the E Series platform from LSI in 2011, and added the EF all-flash version in 2013. The latest SANtricity release is designed to accelerate performance for high IOPS and low latency applications.
“E-Series is our main product line for these third-platform applications,” said Lee Caswell, NetApp vice president of solution and services marketing. “It gives you consistently low-latency response times.”
NetApp claims the latest version of SANtricity can:
· Increase Splunk search performance by 69% versus commodity servers with internal disks
· Drive 500% better Hadoop performance during data rebuild with Dynamic Disk Pools versus commodity servers with RAID
· Reconstruct a 400GB solid-state drive in 15 minutes for NoSQL database with commodity servers and direct attached storage
· Encrypt data at rest with less than one percent performance impact vs. the 70% impact from commodity servers with internal disk drives
· Build one architecture for hot, warm, cold and frozen tiers instead of different storage architectures for each tier.
NetApp is also partnering with Arrow on pre-configured E Series bundles for enterprise Splunk.
While SANtricity isn’t for flash-only storage, the EF Series seems the better fit when talking about high IOPS, low latency workloads. This release comes in the wake of two new flash systems from competitors that also target analytics – EMC’s DSSD D5 and Pure Storage’s FlashBlade.
“Flash enables a new level of performance for enterprise storage for big data applications,” Caswell said.
Factory revenue for backup appliances soared to $1.05 billion in the fourth quarter and $3.35 billion for 2015 – up 2.5% over 2014 – according to statistics released last week by International Data Corp. (IDC).
IDC Research Manager Liz Conner said the fourth-quarter figure marked only the second time that revenue from what IDC calls the Purpose-Built Backup Appliances (PBBA) market has hit $1 billion for a quarter. The $1.05 billion represented a 4.1% increase over fourth-quarter revenue for 2014.
Conner said via a prepared statement that backup appliance vendors have been adapting to industry trends by “putting greater emphasis on backup and deduplication software, meeting recovery objectives, [offering] the ability to tier to the cloud” and improving ease of use.
Worldwide PBBA capacity grew to 1,160 petabytes (PB) in the fourth quarter, a spike of 25.6% compared to the same period in 2014, according to IDC. Annual capacity rose to 3.3 exabytes, a 23.1% increase over 2014.
EMC continued to dominate the backup appliance market, generating more than one-third of its total annual revenue in the fourth quarter. Its fourth-quarter revenue of $707.9 million represented 67.7% market share and 10.6% growth over Q4 in 2014. For 2015, EMC’s revenue exceeded $2 billion, as the company captured 61.4% market share.
Symantec was a distant second with $479 million in annual revenue and 14.3% market share in 2015. Symantec closed out the year with $125.1 million in fourth-quarter revenue – an 8.1% increase over the same quarter in 2014. In January, Symantec completed the sale of its Veritas division, which includes the NetBackup and Backup Exec products, to The Carlyle Group.
Rounding out the top five in annual revenue were IBM ($165.7 million), Hewlett Packard Enterprise (HPE, with $150.3 million) and Barracuda ($88.1 million). IBM’s revenue was down 19.5% compared to 2014, but HPE (11.7%) and Barracuda (32.6%) saw substantial growth.
Behind EMC and Symantec in the fourth quarter of 2015 were IBM ($41.6 million), HPE ($40.7 million) and Dell ($23.6 million). Dell’s revenue grew 16.5% in Q4 2015 versus Q4 2014.
For its PBBA market sizing, IDC includes products that integrate the data movement engine (backup application) with the appliance as well as products that serve only as a target for incoming backup application data.