There is a prevalent problem in Information Technology today – too much data.
Most of the data is in the form of files and called unstructured data. Unstructured data continues to increase at rates that average around 60% per year according to most of our IT clients.
Structured data is generally thought of as information in databases and this type of data is experiencing a much smaller increase in size than unstructured data. The unstructured data is produced internal to IT and from external sources. The external sources include sensor data, video information, and social media data. This type of growing data is alarming because there are so many sources and the information is used in data analytics that typically originate outside of IT.
The big issue is what to do with all that data that is being created. The data is stored while needed, which is during the processing for applications or analytics and while it may be required for reference, further processing, or the inevitable “re-run” in some cases. But what is to be done with the data later? Later in this case means when the probability of access drops to the point that it is unlikely to be accessed again. There is also cases when the processing is complete (or project is complete) and the data is to be “put on the shelf” much as we would in closing the books on some operation. Does the data still have value as new applications or potential usages develop? Will there be a potential legal case that will require the data to be produced?
The default decision for most operations is to save everything forever. This decision is usually made because there is no policy around the data. IT operations do not set the policies for data deletion. Because the different types of data have different value and the value changes over time, the business owners or data owners must set the policy. IT professionals generally understand the value but usually are not empowered to make those policy decisions. Sometimes the legal staff sets the policy, which absolves IT of the responsibility, but that may not be the best option. In a few companies, a blanket policy is used to delete data after a specific amount of time. This may not withstand a legal challenge in some liability cases.
Saving all the data has compounding cost issues. It requires buying more storage, adding products to migrate data to less expensive storage, and increasing operational expenses for managing the information, power, cooling, and space. Moving the data to a cloud storage location has some economic benefit, but that may be short-sighted. The charges for data that does not go away continue to compound. Storing data outside the immediate concern of IT staff takes away from the imperative to make a decision about what to do with it.
Besides the costs of storing and managing the data, the danger is that there may be some legal liability for keeping data for a long time. The potential for an adverse settlement based on old data is there and has been proven extremely costly. More impacting to IT operations is the discovery and legal hold required. Discovery requires searching through all the data, including backups, for requested information and legal hold means no deletions of almost anything – no recycling of backups. This causes even more operational expense.
Not establishing a deletion policy that can pass a legal challenge is a failing of a company and results in additional expense and liability. IT may the first responders on the retain-forever policy, but it is a company issue.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
LAS VEGAS — Hitachi Data Systems and NetApp wasted little time sending reviews of EMC’s new ViPR software. Both sent e-mails panning EMC’s attempt at software-defined storage.
You obviously woudn’t expect EMC’s competitors to have good things to say about ViPR, especially competitors who also offer storage virtualization. But after hearing EMC bang the drums about it this week at EMC World, let’s listen to other opinions:
“ViPR is essentially a YASRM — Yet Another Storage Resource Manager,” wrote Sean Moser, HDS VP of software platforms product management. “Another bite at the apple for EMC after the failure of Invista and its ancestors. In ViPR terms they call this function a control plane – an attempt to provide a single management framework across all EMC storage platforms, and eventually across third party storage as well.”
He called the attempt to provide a management platform across third-party storage “a pipe dream as there’s no motivation for third-parties to write to your SRM API to allow their products to be nicely managed by a tool not of their own making. So part one of ViPR is to create an SRM tool that allows clients to use enterprise storage much as they would Amazon — a set of software APIs that abstract the detail of the underlying storage, presenting Storage as a Service. While conceptually a good idea, it will be impossible to really do outside of EMC storage.
“The other key function with ViPR is storage virtualization; the long sought storage hypervisor. However, even for EMC’s own storage platforms (at least in version 1.0), ViPR only allows control plane (i.e. management functions) for file and block. The only data plane support is for object-based storage. So for now, it’s just a new Atmos front-end that adds an SRM management layer for block and file.”
Moster maintains that the Hitachi Content Platform (HCP) had the support for file, block and object that EMC claims ViPR will have. “Further, there’s no gymnastics required to make this happen – you get it straight out of the box,” he added.
Brendon Howe, NetApp vice president of product and solutions marketing, wrote that the software-defined storage concept is a good one. But, he added, NetApp does it better in its Clustered Data OnTap operating system.
“NetApp provides this capability with our with open and flexible Storage Virtual Machine (SVM) technology in Clustered Data OnTap,” Howe wrote. “[NetApp provides] hardware independence spanning NetApp optimized to commodity hardware to the cloud with Amazon Web Services. Combining the best set of software-enabled data services with programmable APIs and the broadest set of integrations is precisely how Data ONTAP became the most deployed storage operating system.”
Well, you didn’t expect EMC’s claim of being the first to provide software-defined storage to go unchallenged, did you?
EMC will make a bunch of product launches next week at its annual EMC World conference in Las Vegas. But there were upgrades the vendor couldn’t wait to announce, so it revealed a handful of data protection changes this week.
The changes centered on EMC’s RecoverPoint replication software, which uses continuous data protection (CDP) to allow any point in time data recovery.
RecoverPoint 4.0 and the Symmetrix Remote Data Facility (SRDF) replication application for EMC’s VMAX enterprise arrays have been integrated, to the degree where customers can use them on the same volume. Previously, SRDF and RecoverPoint ran on the same VMAX system but not the same volume.
RecoverPoint’s CDP can now run across two arrays, so every change made on the volume can be recorded and replicated remotely to another array. That makes data continuously available across two arrays, allowing customers doing technology refreshes to move data without downtime.
“This represents a key step forward in our integrated data protection strategy,” said Colin Bailey, EMC director of product marketing. “CDP brings almost an almost infinite number of points of recovery for total data protection for critical applications.”
EMC also now offers a software-only version of RecoverPoint called vRPA for VNX midrange arrays for easier and cheaper deployment on existing systems.
Maybe Brocade has been a little over-optimistic about Fibre Channel SANs.
After Brocade executives gushed bout how lucrative the FC market remains on the switch maker’s last earnings call, the vendor Wednesday said the quarter that just ended didn’t go as planned. Brocade downgraded its forecast for the quarter, mainly because of a sharp drop in its SAN revenue.
Brocade said its overall revenue in the quarter that ended Tuesday would be between $536 million and $541 million, down from its previous forecast of $555 million to $575 million. FC SAN revenue is now expected to come in between $373 million to $376 million, down six percent to seven percent from last year and 10 percent to 11 percent from last quarter. Brocade said revenue for the quarter that ends in April typically drops five percent to eight percent from the previous quarter, which includes the end-of-year budget flush from many storage shops.
According to Brocade’s press release, “the lower-than-expected SAN revenue was duo to storage demand softness in the overall market which impacted the company’s revenue from some of its OEM partners.”
Two of its largest OEM partners, EMC and IBM, reported disappointing results for last quarter. EMC missed Wall Street’s estimates for revenue and IBM continued its trend of declining storage hardware sales. According to EMC CEO Joe Tucci, “customers are still being very cautious with their IT spending.”
At least Brocade’s Ethernet business is going as expected. The forecast is for $163 million to $165 million in revenue, up 14% to 15% from last year and down four percent to five percent from the previous quarter.
After Brocade’s last earnings report in February, its new CEO Lloyd Carney said his optimism about FC SANs was one of the reasons he took the job. “Fibre’s not dead anymore,” he declined.
Maybe it’s just napping. In Brocade’s release Wednesday, Carney hinted that the FC SAN revenue drop will not be permanent. “We believe that by leading the Fibre Channel industry with innovative technology and solutions that are relevant to the problems that customers face today, Brocade continues to be well-positioned for long-term success in the data center,” Carney said.
It may not help Brocade that its switch rival Cisco is rolling out its first major FC product overhaul in years, and is upgrading to 16 Gbps FC nearly a full year after Brocade.
Brocade will give its full earnings report May 16.
The term archiving can be used in different contexts. Its use across vertical markets and in practice leads to confusion and communication problems. Working on strategy projects with IT clients has led me to always clarify what archive means in their environments. To help this out, here are a few basics about what we mean when we say “archive.”
Archive is a verb and a noun. We’ll deal with the noun first and discuss what an archive means depending on the perspective of the particular industry.
In the traditional IT space such as commercial business processing, etc., an archive is where information is moved that is not normally required in day-to-day processing activities. The archive is a storage location for the information and typically seen as either an online archive or a deep archive.
An online archive is where data is moved from primary storage that can be seamlessly and directly accessed by the applications or users without involving IT or running additional software processes. This means the information is seen in the context in which the user or application would expect. The online archive is usually protected with replication to another archive system separate from the backup process. The size of an online archive can be capped by moving information based on criteria to a deep archive.
A deep archive is for storing information that is not expected to be needed again but cannot be deleted. While it is expected to be much less expensive to store information there, accessing the information may require more time than the user would normally tolerate. Moving data to the deep archive is one of the key areas of differentiation. Some online archives can have criteria set to automatically and transparently move data to the deep archive while others may require separate software to make the decisions and perform the actions.
In healthcare, information such as radiological images is initially stored in an archive (which translates to primary storage for those in the traditional IT space). Usually as images are stored in the archive, a copy is made in a deep archive as the initial protected copy. The deep archive will be replicated as a protected copy. Based on policies, the copy in the archive may be discarded after a period of time (in many cases, this may be one year) with the copies on the deep archive still remaining. Access to the copy on the deep archive is done by a promotion of a copy to the archive in the case of a scheduled patient visit or by a demand for access due to an unplanned visit or consultative search.
For media and entertainment, the archive is the repository of content representing an asset such as movie clips. The archive in this case may have different requirements than a traditional IT archive because of the performance demands on access and the information value requirements for integrity validation and for the longevity of retention, which could be forever. Discussing the needs of an archive in this context is really about an online repository with specific demands on access and protection.
As a verb, archive is about moving information to the physical archive system. This may be the actual application that stores the information in the archive. An example of this would be a Picture Archiving and Communications System (PACS) or Radiology Information System (RIS) system in healthcare. In other businesses, third-party software may move the information to the archive. In the traditional IT space, this could be a solution such as Symantec Enterprise Vault that could move files or emails to an archive target based on administrator set criteria.
As archiving attracts more interest because of the economic savings it provides, there will be additional confusion added with solution variations. It will always require a bit more explanation to draw an accurate picture.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
Startup Nimble Storage is taking a page out of NetApp’s playbook with its private cloud reference architecture put together with Cicso and Microsoft. And it is going beyond other storage vendors’ monitoring and analytics capabilities with its InfoSight services.
This week Nimble launched its SmartStack for Microsoft Windows Server and System Center reference architecture. It ncludes a three-rack unit of Nimble’s CS200 hybrid storage, Cisco UCS C-Series rackmount servers and Windows Server 2012 with Hyper-V and Microsoft Systems Center 2012. The reference architecture is designed to speed deployment of private clouds with up to 72 Hyper-V virtual machines.
Last October, Nimble rolled out a reference architecture for virtual desktop infrastructure (VDI) with Cisco and VMware.
The reference architecture model is similar to that of NetApp’s FlexPod, which also uses Cisco servers and networking. NetApp has FlexPod architectures for Microsoft and VMware’s hypervisors. EMC added Vspex reference architectures last year, two years after NetApp launched FlexPods.
Nimble’s InfoSight appears ahead of other storage vendors’ analytics services. It goes beyond “phone-home” features to collect performance, capacity, data protection and system health information for proactive maintenance. Customers can access the information on their systems through an InfoSight cloud portal.
What makes InfoSight stand out is the depth of the information amassed. Nimble claims it collects more than 30 million sensor values per array per day, grabbing data every five minutes. It can find problems such as bad NICs and cables, make cache and CPU sizing recommendations and give customers an idea of what type of performance they can expect from specific application workloads.
“Nimble collects a much larger amount of data than is traditionally done in the industry,” said Arun Taneja, consulting analyst for the Taneja Group. “Traditionally, an array would grab something from a log file at the end of the day. These guys are grabbing 30 million data points. Then they return that information proactively to users in the form of best practices and provide proactive alerts about product issues. I think everybody will end up there, but it might take five years. “
The National Association of Broadcasters (NAB) conference has become a big focus for storage vendors. The growth in media content and the increased resolution of recordings make for a fast growing market for storage demand. And, the data is not thrown away (deleted). Media and entertainment (M&E) industry data is primarily file-based with a defined workflow using files of media in a variety of formats.
The large amount of content favors storage archiving solutions to work with media asset management for repositories of content. But, these archives are different than those used in traditional IT. The information in M&E archives is expected to be retrieved frequently and the performance of the retrieval is important. For rendering operations, high performance storage is necessary and the sharing capabilities for the post-production processes determine product usability.
Evaluator Group met with a number of storage vendors at this month’s NAB conference. Below are some of the highlights from a few of those meetings.
• For tape vendor Spectra Logic, Hossein Ziashakeri the VP of Business Development talked about changes in the media and entertainment market and Spectra Logic. He said media and entertainment is becoming more of an IT environment. Software is driving this, particularly automation tools. And the new generation of people in media and entertainment are more IT savvy than in the past. M&E challenges include the amount of content being generated. The need to keep everything is driving an overwhelming storage demand. The cost and speed of file retrieval are major concerns. Spectra Logic is a player because the M&E market has a long history with tape, which has become more of an archiving play than a backup play.
• Mike Davis, Dell’s director of marketing and strategy for file systems, said Dell’s M&E play is primarily file-based around its Compellent FS8600 scale-out NAS. Davis said M&E customers also use Dell’s Ocarina data reduction, which allowed one customer to reduce 3 PB of data. The FS8600 now supports eight nodes and 2 PB in a single system.
• Quantum has had a long term presence in the media and entertainment market with StorNext widely deployed for file management and scaling. StorNext product marketing manager Janet Lafleur said Quantum will announce its Lattus-M object storage system integrated with StorNext in May. Quantum’s current Lattus-X system supports CIFS and NFS along with objects. Quantum also has a StorNext AEL appliance that includes tape for file archiving.
• Hitachi Data Systems (HDS) had a major presence at NAB with several products on display, including Hitachi Unified Storage (HUS) storage, HNAS and Hitachi Content Platform (HCP) archiving systems. Ravi Chalaka, VP of solutions marketing, Jeff Greenwald, senior solutions marketing manager, and Jason Hardy, senior solutions consultant spoke on HDS media and entertainment initiatives. HDS is looking at solid state drives (SSDs) to improve streaming and post-production work. HNAS to Amazon S3 cloud connectivity has been available for two months, and HDS has a relationship with Crossroads to send data from HCP to Crossroads’ StrongBox LTFS appliances.
• StorageDNA CEO Tridib Chakravrty, CEO and director of marketing Rebecca Greenwell spoke about the capabilities of their company’s data movement engine. StorageDNA’s DNA Evolution includes a parallel file system built from LTFS that extracts information into XML for searching. StorageDNA technology works with most media asset management software now. The vendor plans to add S3 cloud connectivity.
• Dot Hill sells several storage arrays into M&E market through partnerships, including its OEM deal to provide build Hewlett-Packard’s MSA P2000 system. Jim Jonez, Dot Hill’s senior director of marketing, said the vendor has several partners in the post-production market.
• CloudSigma is a cloud services provider that uses solid state storage to provide services for customers such as content product software vendor Gorilla Technology. CloudSigma CEO Robert Jenkins said the provider hosts clouds in Zurich and Las Vegas built on 1U servers with four SSDs in each. The SSDs solve the problem of dealing with all random I/Os. He said CloudSigma plans to add object storage through a partnership with Scality, which will provide geo-replication.
• Signiant sells file sharing and file movement software into the M&E market. Doug Cahill, Signiant’s VP of business development, said his vendor supports the new Framework for Interoperable Media Services (FIMS) standard and recently added a Dropbox-like interface for end users. Signiant’s software works as a browser plug-in to separate the control path from the data path.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
The massive amount of unstructured data being created has vendors pushing to deliver object storage systems.
There are many object systems available now from new and established vendors, and others are privately talking about bringing out new object systems soon.
Objects, in the context of the new generation of object storage systems, are viewed as unstructured data elements (think files) with additional metadata. The additional metadata carries information such as the required data protection, longevity, access control and notification, compliance requirements, original application creation information, and so on. New applications may directly write a new form of objects and metadata but the current model is that of files with added metadata. Billions of files. Probably more than traditional file systems can handle.
Looking at the available object storage systems leads to the conclusion that these systems are not developed to meet the real IT needs. Vendors are addressing the issue of storing massive number of objects (and selling lots of storage), but the real problem is about organizing the information. File systems usually depend on users and applications to define the structure of information as they store the information. This is usually done in a hierarchical structure that is viewed through applications, the most ubiquitous being Windows Explorer.
We need a way to make it easier to organize the information according to a different set of criteria, such as the type of application, user (person viewing the information) needs, age of information, or other selectable information. The management should include controls for protection and selectivity for user restores of previously protected copies of information. Other information management should be available at the control view rather than through management interfaces of other applications. This seems only natural but it has not turned out this way.
Vendor marketing takes advantage of opportunities to ride a wave of customer interest. Vendors will characterize some earlier developed product as an object file system just as today almost everything that exists is being called “software-defined something.” But the solution for managing the dramatic growth of unstructured data must be developed specifically to address those needs and include characteristics to advance management of information as well as storage.
The investment in addressing object management needs to be made, otherwise, the object storage systems will be incomplete. Linking the managing of information and the object storage systems seems like a major advantage for customers. This will be an interesting area to watch develop.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).
Silver Peak Systems Inc. is building out its Virtual Acceleration Open Architecture (VXOA) that allows storage administrators to bypass network administrators when they need to improve application performance through WAN acceleration.
The company announced Web-based downloadable software products aimed at increasing accelerating offsite data replication workloads. The SilverPeak VRX-2, VRX-4 and VRX-8 software are virtual WAN-optimizing products that support VMware vSphere, Microsoft Hyper-V, Citrix Xen and KVM hypervisors. The virtual WAN optimization software is compatible with IP-based array replication software from Dell, EMC, IBM, Hitachi Data Systems, Hewlett-Packard and NetApp.
SilverPeak VRX-2 can handle up to per replication throughput per hour, while the VRX-4 can handle 400 GBs per replication throughput per hour and the VRX-8 handles up to 1.5 TB per replication throughput per hour. Annual licenses for each cost $2,764, $8,297 and $38,731, respectively.
Silver Peak CEO Rick Tinsley said the VRX-8 is positioned more for large deployments such as EMC’s EMC Symmetrix Remote Data Facility (SRDF) asynchronous product, RecoveryPoint and EMC DataDomain backup. The small VRX versions are tailored more for Dell EqualLogic replication.
In December 2012, Silver Peak Systems brought out Virtual Acceleration Open Architecture 6.0 WAN optimization software with expanded support for virtualization hypervisors. The WAN acceleration software, which operates on Silver Peak’s NX physical and VRX virtual appliances are part of the company’s strategy to give storage administrators the ability to more efficiently improve application performance, reduce bandwidth costs without involving network administrators to re-configure network switches and routers.
“Back in December, we did make enhancements to our software that made it easier for storage managers to deploy our technology, which we call our Velocity initiative, but it was not productized specifically for storage managers at that time,” according to a SilverPeak spokesperson. “This is the next phase and culmination of those Velocity developments, where these new VRX software products are uniquely priced and positioned with the storage managers in mind by addressing storage concerns such as ‘shrinking RPOs’ and how many terabytes-per-hour can be moved to an offsite location.”
In March, Silver Peak announced its Virtual Acceleration Open Architecture (VXOA) software can be used for WAN optimization in Amazon cloud deployments for off-site replication and lower disaster recovery costs.
A recent conversation I had about the cost of storage made me think that talking about the cost of storage is the wrong way to approach it. The discussion should be about the value that storage delivers.
Trying to explain the complex nature of meeting specific demands for storing and retrieving information and advanced features for management and access is difficult when discussing it with someone who is focused only on how much it costs to store the information.
When storage costs, there is an implicit assumption that all factors are equal in storing and retrieving information. But several factors should take priority:
• How fast must the information be stored and retrieved? The ingestion rate (how fast data arrives) and how long it takes for the data to be protected on non-volatile media with the required number of copies has a big impact on applications and potential risk. Retrieving information is about how fast the data can be accessed (latency) and the amount of IOPS or continuous transfer (bandwidth) that can be sustained.
• What type of protection and integrity are required? Information has different value and the value changes over time. Information protection may be as simple as a single copy on non-volatile storage or as complex as multiple copies with geographical dispersion. Integrity is another concern. Protection from external forces so the loss of one or more bits of data can be detected and corrected is highly valuable and often assumed without understanding what is involved. Additional periodic integrity checking is another assurance for the information. It also answers the question posed for many in IT: “How do you know that is the same data that was written?”
• The longevity of the information can have a major influence on storing and retrieving. A significant percentage of information is kept more than 10 years. Compliance requirements dictate the length of time and manner of control of information in regulated industries. Storing information on devices that have limited lifespans (such as when you can no longer purchase a new device to retrieve information), means that other considerations must be made. If the information can be transparently and non-disruptively migrated to new technology without additional administrative effort or cost, that should be a factor in the selection process.
Here’s an example of how this works with a real IT operation that needed to increase its transactions per second. Increasing the number of transactions allowed the organization to get more done over a period of time, expand its business and provide better customer service. In this case, more capacity was not the issue – the capacity for the transaction processing was modest. After evaluating where the limitations were, it was clear that adding non-volatile solid state technology for the primary database met and even exceeded the demands for acceleration. Storage selection was not based on the cost as function of capacity ($/GB). It was based on the value returned in improving the transaction processing and gaining more value from the investments in applications and other infrastructure elements.
Storage must be evaluated on the value it brings in the usage model required. Comparing costs as a function of capacity can make for bad judgments or bad advice.
(Randy Kerns is Senior Strategist at Evaluator Group, an IT analyst firm).