The capacity utilization for storage is one area where storage vendors have made a lot of improvements. Advanced features such as storage pooling, thin provisioning, and storage virtualization have introduced greater efficiencies for using storage capacity.
Still, trying to understand capacity utilization can be confusing. The utilization must be examined at a larger scale than a single storage system. Storage virtualization can span systems. Thin provisioning overcommits capacity across systems with the ability to drive up utilization rates. The larger the pool, the more flexibility is allowed for a system in allocating storage resources.
Data reduction (compression and/or deduplication) usually allows more data to be stored in a given amount of storage. Data reduction effectiveness varies based on the data type and the implementation by the vendor. Data reduction represents a potential increase in usable capacity. Guidelines or guarantees from the vendor can be used to gauge that potential, and actual measurements are usually available from the management interfaces on the storage systems when data reduction is in use.
In the discussion about storage capacity utilization, it is useful understand basic definitions and update them to current terminology for the technology in use. The following are some of the more basic terms and explanations.
Used capacity – where the data is stored that can be accessed from hosts.
Usable capacity –storage space within a storage system or across pooled systems that can be configured for volumes (LUNs) or filesystems. This is the capacity minus the storage system overhead. The overhead includes data protection such as RAID devices and allocated chunks in storage pools and segments for forward error correction using correcting codes such as erasure codes. Filesystems also reserve space for operational processes, which is not included in the usable capacity calculation.
Allocated but unused capacity – allocated storage space in a volume or filesystem with no data stored. This space is not available for applications or file systems, although it can be used later for data.
Effective capacity – the usable capacity multiplied by the expected effectiveness of data reduction.
Raw capacity – the aggregate of the capacity of the storage devices (hard disk drive, solid-state devices, flash modules).
Storage system data protection also has special considerations.
Snapshots – there are two primary types of implementations: Redirect-On-Write and Copy-On-Write. Redirec- On-Write is used with more recent storage pooling implementations such as all solid-state storage systems, where available space from the storage pool is used for the change data. With thin provisioning, the recommendation is to not exceed 90% utilization including snapshots and used capacity. Copy-On-Write implementations usually depend on pre-allocated capacity to contain a copy of the original data when a change is made. The pre-allocated space is included in the storage system overhead and reduces the usable capacity.
Replicated copies for disaster recovery / business continuance – these are volumes or filesystems, typically at remote sites, that represent a copy of the original active data. For capacity utilization calculation, the space is treated the same as any of the primary volumes – replication just means you need that much more capacity. The effect of low capacity utilization is multiplied with replication.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
The most prominent storage feature made available yesterday with the 10th release of OpenStack cloud software — known as Juno — gives users the ability to control how and where they want to store, replicate and access data across object storage clusters.
The new “storage policies” capability applies to the OpenStack Object Storage project, which is better known by its code name, Swift. The latest OpenStack Swift release also includes updated support for the OpenStack Keystone identity service and CPU-lowering data handling improvements, but the feature drawing the most attention is storage policies.
“They’re the biggest thing that’s happened to Swift since it was open sourced as part of OpenStack four years ago,” said John Dickinson, the project technical lead for OpenStack Swift and director of technology at SwiftStack Inc., which sells a commercially supported version of the open source Swift software.
Dickinson said, by using storage policies, a company with a Swift-based server cluster located in the United States and in Europe could choose to store some data only in one geographic region. Or, a user with flash- and disk-based storage could set up tiers based on storage policies and offer different service-level agreements or chargeback/billing options.
Storage policies also enable users to decide the number of data replicas they want across a Swift cluster. For instance, an enterprise might choose to replicate some data only in two locations and other data across four data centers in different geographies.
“You can very specifically customize your Swift cluster for your use case – which, in my opinion, is really the whole purpose of cloud,” Dickinson said.
In addition to the immediate benefits, storage policies will also pave the way for an important feature in the 11th version of OpenStack, known by its project code name, Kilo. Dickinson said storage policies are the “critical foundation” allowing the community to build erasure code support in Swift. The community hopes to finish its work on erasure codes by year’s end, and at the latest, by the time of next spring’s Kilo release, according to Dickinson.
Another key storage capability targeted for OpenStack’s Kilo release is encryption of data at rest by Swift, but Dickinson said the feature is still in the design phase at the moment.
Of course, Swift isn’t the only storage option in OpenStack. The OpenStack Block Storage project, known as Cinder, will focus on core internals in the Kilo release, according to John Griffith, the project’s technical lead and a software engineer at SolidFire Inc.
“There’s a good deal of housekeeping that needs to be done, not only general architecture and stability improvements, but also we would like to focus on things like rolling upgrades and project interactions,” Griffith said via an email.
In the meantime, this week’s OpenStack Juno release added new features such as support for volume replication, volume pools, consistency groups and snapshots of consistency groups to OpenStack Cinder block storage.
File storage remains a work in progress for the OpenStack community. The OpenStack Foundation’s press release listed the Manila shared file system among several projects in the incubation phase, “expected to land in late 2015 and beyond.”
At least one Symantec backup product will no longer be in the lineup by time the vendor splits apart its security and backup businesses in a little more than a year from now.
While many in the storage world were discussing the new information management company that would come from the Symantec split, Symantec last week disclosed plans to stop selling Backup Exec on an integrated appliance.
As of Jan. 5, Symantec will discontinue the Backup Exec 3600. It will sell Backup Exec the old-fashioned way – it will provide the software and let other vendors provide the hardware.
While integrated appliances for Symantec’s enterprise NetBackup software have been successful– it recently expanded the NetBackup appliance line – that has not been the case with the SMB-focused Backup Exec.
In a blog on the Symantec website announcing the move, senior director of global product marketing Drew Meyer wrote:
”Providing our partners with Backup Exec software that they can bundle with hardware and services best meets the needs of our small and mid-sized business customers looking for a combined offering.”
Meyer cited Fujitsu, which sells an Eternus BE50 appliance with Backup Exec in Japan and Europe. He also wrote the recent release of Backup Exec 2014 shows that Symantec is committed to the software, which ran into problems when the 2012 version came out.
Symantec’s new information management company will offer maintenance renewals for the Backup Exec 3600 through January of 2018 and support will continue until January of 2020.
Competitors are more than happy to relieve Backup Exec customers of their appliances. Zetta.net and Unitrends this week came forward with programs to tempt Backup Exec customers to switch.
Zetta said Backup Exec customers can sign up for Zetta’s cloud backup and DR service free for six months, and it will give up to 20 percent discounts on annual contracts. This is similar to a migration program Zetta ran for BackupExec.cloud customers after Symantec shut down that service earlier this year.
Unitrends said Backup Exec 3600 customers can trade their appliances for one if its integrated appliances for only the cost of support. The Unitrends Recovery-713, Recovery-813 and Recovery-822 are the available models. Backup Exec customers must sign three-year or five-year support contracts for their free appliances.
Object storage vendor Scality has scored a reseller deal with Hewlett-Packard, which the private company’s CEO said will greatly expand its global reach.
Scality and HP have worked together closely in the field, and a lot of Scality’s Ring software runs on HP Proliant servers.
“We’ve been working with all the server vendors since the beginning,” Scality CEO Jerome Lecat said. “HP has been the most proactive in coming up with a server that fits our industry.”
HP sells Scality software on the ProLiant SL4540 and DL360p Gen 8 servers.
Lecat said Scality has more than 40 PB of customer data deployed on HP servers. Scality-HP customers include DailyMotion, TimeWarner Cable and European television station RTL2, he said.
Lecat said the deal is crucial for Scality because “we’re still a relatively small company, and we do not have thousands of sales people around the globe like HP does.”
The deal is not exclusive. HP sells its own StoreAll product with object storage, and it also works closely with Cleversafe. There is no formal reseller deal with Cleversafe, but it is featured alongside Scality on HP’s object storage software for ProLiant web page.
Lecat said Cleversfe’s dsNet object storage is more suited for long-term archives while Sclaity Ring is for active applications such as email and video archiving.
“We don’t see ourselves as an object storage company,” Lecat said. “Object storage companies only focus on archiving. Our ambitions are larger than that. We have a lot of media companies running video on demand, consumer web mail and other applications. We’re not just deep and cheap archiving.”
Druva is taking its enterprise endpoint backup software and moving it into backup for small businesses and remote and branch office backup.
The company this week launched Druva Phoenix, a centralized management backup and archive product targeting companies that have tight budgets, limited local IT staff or none at all. The software is based on Druva’s nSync enterprise endpoint backup and nCube architecture. Phoenix is an agent-based software with global deduplication that is done at the source level.
Druva Phoenix is offered an alternative to traditional server backup that requires secondary storage, tape and archiving.
“This is a pure play software as a service cloud product,” said Jaspreet Singh, Druva’s CEO and founder. “The core to solving backup to the cloud is building a scalable deduplication in the cloud. In the last five and a half years, we built endpoint backup for the cloud. In the last 18 months, we were looking for what we can solve next. The remote office looked interesting.
“We thought we could remove a few processes by introducing Phoenix,” he said. “We are extending from endpoint to remote offices. It’s a very natural extension for us.”
Phoenix has a software-based cache accelerator for backup and restores, which resides on the server in the remote or branch office. The rest of the data is moved into the Amazon cloud.
“Because there is not much metadata, it can scale fairly well,” Singh said.
Singh said without deduplication, the amount of data stored in the cloud becomes exorbitant. For instance, 1 TB of data can multiple to 719 TB of data after it is retained for seven years if dailies, incrementals and full backups are done.
“One data reduction price-point is based on the source data,” Singh said.
Jason Buffington, senior analyst at Enterprise Strategy Group, said ROBO servers are the next “battleground” for cloud-based backup where it makes sense. For the remote office, he said the decision to back up to the cloud depends on whether IT wants to control ROBO backups or just manage the data repositories.
Druva’s endpoint software lends itself to small business and ROBO backup and archiving because the software was designed with administrative over-site capabilities, Buffington said. The software also comes with a three-year, seven-year and infinite retention policy.
“No one would keep endpoint data for an infinite amount of time,” Buffington said. “But it should be a requirement for server-based protection.”
The term access method is frequently used to identify types of I/O in open systems. Many who use it probably don’t understand the historical context for what has been known as an access method for over 50 years. In open systems, the types of I/O are for block data, file data, and object data. Access methods represent how the types of data are stored on devices.
The term access method comes from the mainframe world and denotes a number of well known (at least to those who have worked with mainframes) means to store or access information. Access methods are really software routines accessed by application programs using software commands that are inline calls to system functions. You can call these Application Program Interfaces (APIs). The closest equivalent function in open systems would be a device driver.
There are many types of access methods and most deal with how data is organized, usually in the form of records, which are typically fixed length blocks of data in a dataset.
Some the familiar access methods for storage in the mainframe world include:
- BSAM – Basic Sequential Access Method
- QSAM – Queued Sequential Access Method
- BDAM – Basic Direct Access Method
- BPAM – Basic Partitioned Access Method
- ISAM – Index Sequential Access Method
- VSAM – Virtual Storage Access Method
- OAM – Object Access Method
An example of doing I/O in an application in QSAM would be to set up buffers in memory for queued I/O (multiple records in a block) and then do a GET or PUT. Interestingly, the basic I/O for S3 object access is GET and PUT.
Open systems access methods are termed:
- Block – individual blocks of data are read or written from/to storage
- File – a stream of bytes that represent a file with associated file metadata is written or read within the organization of a hierarchical tree structure.
- Object – data segments and user or system-defined metadata is stored in a flat namespace with access through object ID resolution.
The open systems access methods don’t map directly to those in the mainframe world, but you can understand them if you know the mainframe methods. The term access method in open systems isn’t wrong, it just means a slightly different thing. Translating between the two will help understand the meaning.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
Symantec today confirmed it is splitting off its information management business from the security business. The security company will keep the Symantec name, while the Information Management company has no name yet. The split is scheduled to complete by the end of 2015.
The Information Management arm will be a storage vendor, with products in backup and recovery, archiving, eDiscovery, storage management, and information availability solutions. John Gannon, who retired as Quantum COO in 2005 and also led HP’s personal computing division, becomes general manager of the new storage company.
Michael Brown, named Symantec’s permanent CEO last month, will continue to run Symantec.
“We’re confident this is the right thing from a strategy standpoint,” Brown said.
Brown said Symantec’s leadership team decided it was too difficult to remain a market leader in security and data management, and that led to the breakup decision. The security and storage companies came together in 2005 when Symantec acquired Veritas for $13.5 billion, but there have been intermittent rumors that the backup business would be spun off or sold for years.
The security part of the business has been the bigger piece of Symantec, with $4.2 billion of revenue in fiscal year 2014 compared to $2.5 billion for information management.
So, does anyone think they should call the new information management company Veritas?
EMC has issued two responses to the letter that investor Elliott Management made public Wednesday calling for the vendor to spin off VMware and/or explore a merger with other large companies.
EMC first released a direct response to the Elliott letter, saying little except to repeat claims that EMC is exploring options but believes its strategy is sound.
An indirect response did a better job of making EMC’s case for keeping its federation of EMC, VMware, RSA, and Pivotal together. That response came today in the form of a release touting its Federation Software-Defined Data Center Solution.
The solution is little more than a combination of products from EMC’s companies with extras such as a self-service portal and scripts to tie them together. But the concept shows how the parts of the EMC Federation work together, testing the products at the federation’s engineering lab on the VMware campus, and putting pieces together to solve distinct data center problems.
Is it a coincidence that the data center solution release came one day after Elliott’s letter to CEO Joe Tucci and the EMC board questioning the value of EMC keeping everything under one umbrella? Bharat Badrinath. EMC’s Senior Director of Global Solutions Marketing, isn’t saying.
“That’s something Joe and the board will determine,” he said of the spinout and merger issue.
Badrinath’s job is pushing products, not mergers. EMC’s solution announcement also provided this list of EMC Federation products brought together as part of the software-defined data center solution:
- Management and Orchestration: VMware vCloud Automation Center, VMware vCenter Operations Management, VMware IT Business Management, EMC Storage Resource Manager
- Hypervisor: VMware vSphere, the industry’s most widely deployed virtualization platform
- Networking: VMware NSX, the network virtualization and security platform for the software-defined data center. VMware NSX brings virtualization to existing networks and transforms network operations and economics
- Storage: Designed for EMC ViPR & EMC Storage, EMC Storage Resource Manager, VMware Virtual SAN.
- Hybrid Cloud Deployment Models: Connectivity to VMware vCloud Air
- Choice of Hardware: Built on converged infrastructure and can be deployed on a variety of hardware including VCE Vblock and VSPEX.
- PaaS: Delivering Platform-as-a-Service with Pivotal CF
- Documented Reference Architectures
The point EMC wants to make is these products from different parts of the federation are intertwined and cannot be broken apart without harm.
“We have four strategically aligned companies which are working together at times, but there are also times when they are independent and operate on their own,” Badrinath said. “Customers can pick products developed independently or together. It’s all about us being better together or bringing the best of the best within the four businesses.”
Other solutions that will follow include Platform-as-a-Service, End-User Computing, Virtualized Data Lake and Security Analytics. Badrinath said they all should be available by early 2015.
Badrinath said the testing for the software-defined data center portion of the program took 40,000-person hours of engineering across federation companies. He also emphasized that EMC and VMware continue to work with outside partners, even if those partners such as Microsoft or other storage vendors compete with federation companies at times.
While the federation’s software-defined data center initiative has been going on for months, the release sounds as if it were put together to counter specific complaints from Elliott. The letter, signed by Elliott portfolio manager Jesse Cohn, said the EMC storage company and VMware “hinder one another” because they compete in areas, and the relationship prevents them from developing other critical relationships. Cohen said EMC’s stock is underperforming, the company is undervalued, and EMC and VMware would both be better off apart.
“As time passes, this untenable situation is going to get worse,” he wrote to EMC.
While launching the latest version of Red Hat Storage Server yesterday, the vendor provided little insight into the long-term positioning of its storage software portfolio and the chances that it might combine its Gluster-based Storage Server and Inktank Ceph Enterprise product lines.
Ranga Rangachari, vice president and general manager of storage and big data at Red Hat, said the company hopes to “get back to our customers and partners in the very near future with a consolidated vision of where this journey is going.” He addressed the topic in response to a question during the company’s Webcast entitled “Advancing software-defined storage,” which he said customers view as the ability to take advantage of industry-standard x86 servers with the intelligence resting in the software.
Rangachari noted simply that Red Hat’s acquisition of Inktank Storage Inc. this year brought object- and block-based storage to the table and complemented the file system capabilities the company gained through its 2011 acquisition of Gluster Inc.
Gluster had sold a supported version of the open source GlusterFS distributed file system in much the same way that Inktank sold a supported version of open source Ceph. Any innovative software development work rests with their respective open source project communities.
“The Gluster and the Ceph communities continue to thrive independently and thrive really well,” said Rangachari, claiming that Gluster and Ceph combined for almost two million downloads during the last nine months. “The innovation that’s going on on both those projects will continue to happen unabated.”
Red Hat put out new versions of each of the commercially supported products this year. Storage Server 3, launched yesterday, is based on open source GlusterFS 3.6 and adds support for snapshots, multi-petabyte scale-out capacity, flash drives and Hadoop-based data analytics. Inktank Ceph Enterprise 1.2, released in July, was based on open source Ceph’s Firefly release and added erasure coding, cache tiering and updated tools to manage and monitor the distributed object storage cluster.
The Ceph open source project claims to be a unified system providing object, block and file system storage. Ceph’s file system runs on top of the same object storage system that provides object storage and block device interfaces, according to the project’s Web site.
“It’s fair to say that file is probably the least well evolved of those three,” said Simon Robinson, a research vice president in storage at New York-based 451 Research LLC. “The file capability is very immature. It’s not enterprise-grade.”
But, as the Ceph technology improves, Red Hat will need to confront the question of whether to continue to focus on Gluster and Ceph, said Robinson.
“I think Red Hat’s bet buying Gluster was, ‘Hey, look at all this unstructured data. Look how quickly it’s growing. We need a play here.’ Three years ago, that play was NAS. Today it looks slightly different,” said Robinson. “When we think about the growth of unstructured data, it’s actually object that is seen as the future architecture rather than NAS.”
He cited Amazon and Microsoft Azure as proof points of the object model working at scale. “It’s just a case of how does that percolate down into the enterprise. It will take time,” he said.
Robinson said he doesn’t think it makes sense for Red Hat to physically merge Gluster and Ceph. He predicted that if Red Hat Storage does catch on, its success will be through Ceph – “the darling of the storage startup world” – tied to the broader success of the open source OpenStack cloud technology platform. Ceph has already started to gain momentum among cloud service providers, he said.
“Everybody’s playing with OpenStack, and if you’re playing with OpenStack, you’ve probably heard of Ceph. And Ceph has the interest of the broader storage community,” said Robinson. “Other big players are really interested in making Ceph a success. That works for Red Hat’s advantage.”
Henry Baltazar, a senior analyst at Cambridge, Massachusetts-based Forrester Research Inc., said he sees no problem with Red Hat having Gluster-based file and Ceph-based block and object storage options at this point, since the company doesn’t have much market share.
“They’re going to have two platforms in the foreseeable future. Those aren’t going to merge,” predicted Baltazar. “Gluster is definitely the file storage type. There are ways they could use it that can complement Ceph. It still remains to be seen where it will wind up 10 years from now.”
EMC is aiming its new RecoverPoint for Virtual Machines at cloud DR, in partnership with cloud security vendor CloudLink Technologies.
RecoverPoint for Virtual Machines is a hypervisor-based version of EMC’s RecoverPoint replication software. It will be generally available Nov. 17. EMC will also integrate the product with CloudLinksSecureVSA, which provides encryption for data at rest and data in motion.
The combined products can allow service providers to build DR as a Service (DRaaS), and enterprises can use them to replicate data to private or public clouds for DR.
RecoverPoint for Virtual Machines is a software-only product. Unlike previous versions of RecoverPoint, it is storage-agnostic so it doesn’t require EMC arrays to run. It works with any VWware certified storage. It is not hypervisor-agnostic yet, though. It supports VMware vSphere today with support for Microsoft Hyper-V and KVM hypervisors on the roadmap.
It is EMC’s first replication software that works on the individual VM level. Instead of replicating storage LUNs as other RecoverPoint versions do, RecoverPoint for Virtual Machines splits and replicates writes for VMware vSphere VMs. It requires splitter code on each ESX (versions 5.1 and above) node running protected VMs, and at least one virtual appliance at each site. Customers can replicate VMs regardless of hardware running at either end.
CloudLinksSecure VSA adds security. It allows customers to store and manage encryption keys on-premise.
“One of the big inhibitors of going to a public cloud is security,” said Jean Banko, director of product marketing for EMC’s data protection division. “That’s why we partnered with CloudLink.”