Caringo is strengthening its hand for cloud storage with three new software products built on its CAStor object-based storage software.
The Elastic Content Protection (ECP), CloudScaler and Indexer are separately licensed products that can be used independently or in combination to build private and public clouds. ECP uses erasure codes to distribute data across locations, CloudScaler enables multi-tenancy and Indexer is a real-time indexing engine.
CAStor was originally developed as archiving software. Caringo CEO Mark Goros said customers already use CAStor for storage clouds, but features such as erasure codes and multi-tenancy make it better tailored for private clouds in large enterprises.
“We’ve had object storage software since 2006,” Goros said. “This is version six. That means it’s just coming of age, it’s at its peak prowess. Now we’re adding elastic content and erasure code protection.”
Dell uses Caringo software with its DX object storage platform, and Goros said he expects Dell will resell the new Caringo cloud services, too.
Caringo claims ECP can protect exabyte-scale storage by using erasure coding to divide objects and store slices in different places to allow data recovery if slices are lost. Other object storage products use erasure codes, including Cleversafe, AmpliData, Scality, EMC Atmos and DataDirect Networks Web Object Scaler (WOS). Some of these use the Reed-Solomon error correction code while others enhance Reed-Solomon.
Until now, Caringo used replication to protect its clusters. “Customers never had to worry about backups for CAStor clusters,” Goros said. “But as storage requirements get greater and we get to multiple petabytes, people are looking for ways to save space, power and cooling. You can now mix and match between replicas or erasure codes. For small data sets, you want to replicate because erasure code is not effective for that.”
CloudScaler consists of a software gateway appliance and a management portal. The gateway includes RESTful API and multi-tenant authentication and authorization capabilities. The portal provides tenant management and handles quotas, bandwidth and capacity metering. CloudScaler can be configured as public, private or hybrid cloud storage, but Goros said it is especially useful for building private clouds. He describes CloudScaler as “Amazon S3-like storage, but fast and secure in your own data center.”
The Indexer consists of a NoSQL data store that indexes objects in a CAStor cluster and allows searching by file name, unique identifier or metadata. The Indexer runs on separate hardware than CAStor but can integrate with the CloudScaler portal to present information in the GUI.
Data protection is probably the most fundamental requirement in Information Technology (IT), and is generally aligned with storage overall. But, data protection is perceived as overhead — a tax on IT operations.
Because of that, data protection gets attention (and major funding) when there is a significant problem. There is an increasing problem in getting the protection done in the allotted time, meeting the recovery time objectives (RTO) and recovery point objectives (RPO). With capacity demand growing, the current methods of protecting data are being examined to improve the approaches.
At the Dell Storage Forum in Boston last week, there was more talk that IT has made a transition to include the use of snapshot and replication in the data protection process. Snapshots, or point-in-time copies that are synchronized with applications for a coherent snapshot copy, have become the primary means for making a copy that can meet the RTO for many of the primary cases where restores are required. About 90% of restores occur within 30 days of when that data was created or updated. The snapshots are typically done using features in the storage system, but may also use special host software.
Replication is typically a remote copy that is used for disaster protection and leveraged also for restores of data that may have been damaged (corrupted or deleted) locally. The mechanics of the recovery varies significantly between the different vendor solutions.
Backup is still used and still a valuable tool in the data protection arsenal. It is now just a part of the overall picture which includes snapshots and replication. Extensions to backup software are capitalizing on these transitions by IT and include such capabilities as invoking the storage system-based snapshots, managing the catalog of snapshot copies, and managing the remote copies of data.
Exploitation of storage system or hypervisor-based features such as Changed Block Tracking are another means to improve the data protection by reducing the amount of time required and the amount data. This is another developing area and will be a differentiator between different backup software solutions and the storage system hardware that has those capabilities.
Backup software will effectively need to be renamed to something that reflects that what it does goes beyond traditional backup.
The transitions occurring in data protection are being driven by IT to meet requirements to protect data while also meeting operational considerations. Software and hardware solutions can enable the transitions and make the operations more seamless. This will continue to be a developing area – both for vendor products and the adoption by IT.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
NetApp is embracing Hadoop with a converged system combining its two major storage platforms with compute and networking from partners. The vendor also broadened its partnerships with Apache Hadoop companies this week by forging a joint partnership with Hortonworks.
The NetApp Open Solution for Hadoop Rack includes NetApp FAS and E-Series storage along with Hewlett-Packard servers and Cisco switches. The base configuration consists of four Hadoop servers, two FAS2040 storage modules, three E2660 NetApp storage modules for 360TB of storage, 12 compute servers and two Ethernet switches. The system scales with data expansion racks made up of four NetApp E2660 modules, 16 compute servers and two Cisco switches.
The FAS2040 – including NFS – is used in the Hadoop NameNode and the E2660 with Hadoop Distributed File System (HDFS) is used in the DataNode. The goal is to enable enterprises to move Apache Hadoop quickly from the test lab into production.
“We’ve taken the approach that there is an issue with the NameNode in Hadoop,” said Bill Peterson, who heads solutions marketing for NetApp’s Hadoop and “Big Data” systems. “If that crashes, you lose the entire Hadoop cluster. The community is fixing that so it will no longer be a single point of failure. We decided we would put a FAS box inside the solution, so we could do a snapshot of the NameNode. We use E-Series boxes for MapReduce jobs. So the database of record is on FAS and fast queries are on the E-Series.”
The NetApp Open Solution for Hadoop Rack became available this week.
NetApp also signed on to develop and pre-test Hadoop systems that use the new Hortonworks Data Platform (HDP), which became generally available Wednesday. NetApp joint solutions with Hortonworks are expected later this year. NetApp also has partnerships with Apache and Cloudera, and will support all three versions of Hadoop on its Open Solutions Rack.
“That’s why NetApp has open in the name. We want as many partnerships there as possible,” Peterson said.
For greater detail on using Hadoop with enterprise storage, I recommend the excellent series from John Webster of Evaluator Group on SearchStorage.com, beginning here.
Violin Memory is providing a window into its roadmap this week at Microsoft TechEd.
Violin and Microsoft are demonstrating what the vendors call a NAS “cluster-in-a-box” with Windows Server 2012 running natively on Violin’s 6000 Flash Memory Array. Violin intends to eventually ship the product as a specialized appliance to handle enterprise file services.
Violin’s current arrays handle block storage. For the NAS box, it added two x86 Intel servers to run Windows. Windows Server 2012 gives the array snapshot, deduplication and replication features.
Other appliances tuned to specific applications will likely follow, says Violin marketing VP Narayan Venkat.
“This cluster-in-a-box is intended to deliver highly scalable file services for large enterprises and internal private clouds,” Venkat said. “It’s the first in a possible series of application appliances. We’ll release the file services one first. The others may be database-in-a-box or private-cloud-in-a-box. We have a tremendous amount of interest from other OEMs. The types of applications that would leverage the 6000 would be databases, ‘big data’ analytics or massive VDI [virtual desktop infrastructure] in a box.”
Violin VP of corporate marketing Matt Barletta said the Violin 6000 has a street price of around $6 per gigabyte to $9 per gigabyte.
Violin has raised $180 million in funding since late 2009, making it the best funded of the all-flash storage array startups. Barletta said EMC helped prime the market for all-flash storage when it spent $430 million to acquire XtremIO last month. The best part for Violin is that EMC won’t ship an XtremIO array until next year.
“My birthday is next week, and I view that as an early birthday present,” Barletta said.
Compression of data on primary storage has taken center stage in the storage wars now with IBM’s release of Real-Time Compression on the Storewize V7000 and the SAN Volume Controller.
Although not the first product to offer data reduction in primary storage, IBM raised the bar by doing compression inline (real-time) and without performance impact. Other solutions in the open systems storage area primarily compress data and sometimes dedupe it as a post-processing task after the data has been written.
Competition for storage business is intense, and inline compression of data for primary storage will be a major competitive area because of the economic value it brings customers. If the compression can effectively reduce the amount of data stored, the reduction amount serves as a multiplier to the amount of capacity that was purchased.
IBM claims a 5x capacity improvement, which gives customers five times as much capacity as they pay for. Even if IBM’s compression comes in at 2x, that would still be significant savings despite an additional license fee for the feature.
Doing compression with no performance impact means the compression is transparent to the application and server operating system. The customer gets increased capacity benefits without having to make an accommodation such as installing another driver or version of an application. The effective compression rate will vary with data types, but there has been a long history of compressing data and the types and compression rates are not a new science. Vendors usually publish an expected average and sometimes offer a guarantee associated with the purchase.
Compression of real-time data in the mainframe world goes back to the StorageTek Iceberg (later offered as the IBM Ramac Virtual Array) that compressed mainframe count-key-data in the 1990s. That system compressed data at the channel interface and then stored the compressed information on disk.
The use of the Log Structured File system and the intelligence in the embedded storage software allowed the system to manage the variable amount of compressed data (done on a per-track level), and removed the direct mapping to a physical location. That was an effective compression implementation and demonstrated the effect that compression multiplies the actual capacity.
One of the more significant aspects of compressing data at the interface level was the effect that had on the rest of the system resources. With data that was reduced by something like 5x or 6x, the other resources in the system benefited.
• The cache capacity was effectively multiplied by that same amount, allowing for more data to be resident in cache giving higher hit ratios on reads and greater opportunity for write coalescing.
• The interface to the device had the data transfer bandwidth effectively multiplied for much faster transfer of data from the disk drive buffers.
• The disk devices, while storing more data, also would transfer more data over a period of time to the disk buffers and the controller.
Similar benefits gained by the implementation in the StorageTek system can be achieved in new systems targeted for primary storage in open systems.
In the case of the StorageTek system, the compression was a hardware-intensive implementation on the channel interface card. With IBM’s Storewize V7000 and SVC, the implementation is done in software, capitalizing on the multi-core processors available in the storage systems. Faster processors with more cores in succeeding generations should provide additional improvement. Having compressed data in cache and compressed data transferred on the device level interface and from the device means performance gains there offset time spent in the compression algorithm.
There are other potential areas where transparent compression could be done. Compressing the data in the device such as in the controller for solid state technology is another option.
Customers will benefit from reduction of data actually stored and the inline compression of data that is transparent to operations. The benefits are in the economics and this will be a competitive area for vendors.
There will be a considerable number of claims regarding implementations until this becomes a standard capability across storage systems from a majority of vendors. You can expect a rush to bring competitive solutions to market.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
EMC’s backup and recovery team says Hewlett-Packard is playing games with its numbers in claiming its B6200 backup system with StoreOnce Cataylst software is significantly faster than EMC Data Domain arrays with DD Boost.
HP said its StoreOnce B6200 disk target with Cataylst can ingest data at 100 TB per hour with the maximum of four two-node pairs, compared to EMC’s claim of 31 TB/hour with its new Data Domain DD990 with DD Boost. However, the B6200’s nodes are siloed. That means an eight-node system actually consists of four separate pools, and it would take an aggregate performance to get to 100 TB/hour.
In an email, an EMC backup/recovery spokesman pointed out the DD990 would achieve 620 TB/per hour if measured the same way that HP measures performance. EMC’s 31 TB/hour claim is for a single storage pool but 20 pools can be managed from one Data Domain Enterprise Manager console.
According to EMC’s e-mail, “As lofty as they sometimes seem, we do make a concerted effort to keep our performance claims reasonable and defensible. This announcement by HP was, by contrast, very much a smoke and mirrors effort.”
The truth is that all vendor performance claims – including benchmarks – should be taken with a grain of salt because they are achieved in optimal conditions, often with hardware configurations that would bring the price up considerably. A smart backup admin knows that performance will vary, and these vendor claims need to be verified in real-world tests.
For all of its talk about smart storage this week at IBM Edge, Big Blue’s storage announcements amounted to mostly cosmetic changes. The lone exception was the addition of real-time inline compression for primary storage arrays.
IBM ported the Random Access Compression Engine (RACE) technology acquired from Storwize in 2010 into its Storwize V7000 and SAN Volume Controller (SVC) virtual storage arrays. This is IBM’s first integration of the compression technology into SAN arrays.
Until now, IBM used the technology only in its Real-Time Compression Appliances, which were re-branded boxes that Storwize sold before the acquisition. Even the Storwize V7000 launched in late 2010 lacked compression, despite its name.
Now IBM is claiming it can compress active primary data with no performance impact on SVC and Storwize V7000 storage, and says it can reduce primary data accessed via block-based protocols by up to 80%.
It turns out that integrating data reduction into primary storage isn’t easy. Dell bought primary deduplication startup Ocarina around the same time that IBM picked up Storwize, and has yet to port primary dedupe onto its Compellent or EqualLogic SAN arrays. Dell did launch a backup appliance using Ocarina dedupe in January, and may have a primary data dedupe announcement next week at its Storage Forum.
Other IBM enhancements include support for Fibre Channel over Ethernet (FCoE) and non-disruptive volume moves between I/O groups for SVC and Storwize V7000, and four-way clustering for Storwize V7000.
IBM added thin provisioning and Enhanced FlashCopy (allows for more snapshots) for DS3500 and S3700 midrange arrays, and a new web-based UI for the IBM Tivoli Storage Productivity Center (TPC) suite. For tape management, it added IBM Tape System Library Manager (TSLM) software that helps manage multiple libraries, and an IBM Linear Tape File System (LTFS) Storage Manager for customers using LTO-5 tape libraries and IBM’s LTFS Library Edition.
IBM also said it plans to extend its Easy Tier automated tiering software to direct attached server-based solid-state drives (SSDs) so customers can migrate data between disk systems and servers.
After seven years of partnering with Montreal-based Watch4net, EMC this week bought the software company to bolster its IT infrastructure management capabilities.
Watch4net describes its APG software as “a carrier-class performance management application that provides real-time, historical and projected visibility into the performance of the network, data centers and cloud infrastructures.”
EMC resold Watch4net software, and the software is already integrated into the EMC IT Operations Intelligence (ITOI) Suite. ITOI provides availability management, correlation and root-cause analysis for storage, networks and compute resources. Watch4net merges performance metrics from ITOI into custom reports and provides ITOI with alert information when performance thresholds are exceeded.
The acquisition gives EMC greater control over Watch4net’s intellectual property. Watch4net CEO Michel Foix and most of the company’s 70 employees will join EMC as part of its Infrastructure Management Group. EMC considers an expansion of its infrastructure management software a key part of its move to provide cloud services.
Flash storage vendor GreenBytes closed a $12 million funding round this week, led by Al Gore’s venture capital firm.
GreenBytes sells two platforms of hybrid arrays combining solid-state drives with hard drives, and in February launched Solidarity – an all-SSD drive with a starting price of under $100,000 for 13.4 TB. The startup said it will use its second funding round to expand sales, marketing and channel development.
Generation Investment Management LLP, co-founded by Al Gore in 2004, led the round with a contribution from Battery Ventures and GreenBytes management. Former U.S. vice president Gore is chairman of Generation Investement, which claims to make investments based on a company’s economic, environmental, social and governance sustainability factors.
That means it’s probably more interested in the green than the bytes with its new investment. Flash vendors say they are green because their systems have a smaller footprint and use less power than spinning disk storage.
Regardless of their environmental impact, flash array vendors are pulling in the green. EMC reportedly paid $430 million for XtremIO this month, and Violin Memory, Whiptail and Starboard Storage have closed funding rounds this year.
Last week’s EMC World would have to be viewed as a major success for EMC. There were customers, press, analysts, resellers, and even other vendors there with something around 20,000 people – counting EMC employees. The crowd was so large, getting through the corridors of the Venetian was a moving body rub.
The access to EMC executives and staff was a real credit to the event. They provided information and fielded questions and did not just “make an appearance” and then bolt. EMC technical people were available as well, and many attendees took advantage to ask about usage in specific environments. For analysts, there was a program to meet with executives and product owners and find about directions and new capabilities.
There has certainly been a sea change in storage events over the years. Meeting with vendors and hearing about their products or their latest announcements was done at large, multi-vendor events such as Storage Networking World (SNW) in the past. Now, the information is more available at the vendor events such as EMC World.
Most vendors have also changed their approach to releasing information. The vendor events are when the next generation of products or features are announced, and future capabilities previewed. There was a time when vendors rarely pre-announced products or capabilities. The announcements came when the products were available.
That practice was relaxed somewhat to coincide with major industry events and products or features were announced that were going to be available within the next quarter. At the major vendor events now, the announcements may be for features or products that will come out over the next three quarters. This is a much longer view and has both positive aspects such as creating interest and publicity for the vendor and negatives in that it may freeze purchases while customers wait for future releases.
EMC uses its show as what the vendor calls a “mega launch,” with significant announcements or releases involving most of its key products. It creates interest and has turned EMC World into a must-attend event where the amount of information is so great that not attending will leave people feeling they might be missing valuable information. Certainly that is the intention and it works well. The result may be that the industry-wide events such as SNW have less new information and their importance has diminished in the minds of many.
As an analyst, access to the executives and product people along with the explanations about the announcements are incredibly valuable. It is also a great chance to catch up with friends that you’ve worked with in the past.
The economics for these mega events are well understood by marketing – they cost a great deal and must result in increased or sustaining revenue to justify the investment. EMC has certainly set the bar for these shows, and we can expect more of the same from their rivals in the storage arms race.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).