Arun Taneja and I have focused on the topic of data protection management (DPM) in recent posts. He sees what I also see – DPM software is undergoing a significant transition in purpose. While DPM software is not new and companies like APTARE, AGITE Software, Bocada, Tek-Tools, ServerGraph and others have offered it for years, only now are vendors and customers figuring out how to use in a larger context within organizations.
Companies tend to only think of and use DPM software in a singular context – day-to-day operations. For the most part, this software does a good job of monitoring and reporting on the successes and failures of backup jobs, the identification of failed tape drives and the utilization of media in tape libraries. However, this did not raise DPM software’s value proposition much beyond the purview of the day-to-day operations staff.
Now, some DPM vendors are reworking their messaging to make their products more appealing to a larger corporate audience – with capacity planners and storage architects their primary focus. Part of the motivation for this change is that more companies want to bring disk into their backup scheme. But, these organizations lack information on their current backup environment to confidently make these types of changes to their infrastructure.
For many companies, it’s a roll of the dice as to how well a new disk library will work in their environment. I have spoken to more than one user who purchased a disk library with high amounts of disk capacity only to find out the controllers on the disk library could not keep pace with the amount of data backup software fed to it. This required them to purchase additional disk libraries which created new seta of management problems.
DPM software can help to address these types of sizing issues by quantifying how well current backup resources are being used and trend their use over time. This allows companies to select and implement appropriately sized disk libraries based on facts, not assumptions. It may also give them the facts they need to justify waiting on a disk purchase, since they may identify better ways to utilize their existing tape assets.
Performing trending and capacity planning is very different than sending out an alarm that a tape drive or a backup job has failed. Companies need to be sure that vendors are actually delivering a product makeover regarding reporting and analysis capabilities and not simply covering up their product’s deficiencies with its latest rendition of marketing literature.
Data Robotics, a new storage startup just coming out of stealth, has announced both itself and its first product today. Data Robotics was formerly working under the name Trusted Data, and is a venture led by BlueArc founder Geoff Barrall. The product is a consumer storage unit called Drobo that Data Robotics says it intends to scale up into the enterprise market.
Why should you care? Because if Data Robotics has its way, Drobo could change the concept of RAID storage.
Here’s how it works: the Drobo box, a black cube that will fit on a desk, contains four disk bays. Any SATA disk drives from any manufacturer of any size can be added and the box will automatically stripe data across them; the box uses no management software, and instead has a system of lights which show red, yellow or green. If it’s red, replace the disk. If it’s yellow, the disk is filling up. Green and all is well.
When disks fill up, they can be swapped out for a larger size and data is restriped automatically, using RAID levels that change according to the disk capacity left over. For example, in a system with a 120 GB, 250 GB, 500 GB and 750 GB drives installed, 120 GB of each disk would use RAID 5 striping in a 3 + 1 configuration. As the 120 GB drive fills up, the system would put the remaining capacity into a 2+1 configuration, and then finally into a mirrored pair (in case you’re doing the math at home, the final 250 GB of the 750 GB drive would remain empty in that scenario).
Data Robotics president Dan Stevenson said the system was designed for the non-technical customer–it could perhaps even be termed the extremely non-technical customer. Hence the lights. “If you can figure out a traffic light, you can figure out how to manage this storage box,” Stevenson said.
Of course, you may have to understand a bit more than that to know that you shouldn’t take the 750 GB drive out to replace it without enough capacity left in the remaining disk slots to absorb the data while it’s missing. And Stevenson said so far Drobo isn’t pursuing any distribution deals with Best Buy, instead presenting this as an alternative to homegrown RAID arrays for Bit-Torrent addicted power users or “prosumers”, professional digital photographers, and small businesses where, to quote Stevenson, “the IT guy is Frank’s son.”
One such company is Michael Witlin Associates (MWA), a five-man company that produces corporate events in Silicon Valley. Its owner, namesake and executive producer received a free eval copy of Drobo and says it’s been humming along nicely in the few months he’s tested it, in contrast to a Maxtor-drive-based RAID 5 array he struggled to manage. Every so often partitions on that array would “drop off” in Witlin’s user interface, “And I could never figure out what the problem was,” he said. Drobo plugs in via USB 2.0–something Witlin said he prefers–and “there’s nothing I have to do after that.”
Meanwhile, Data Robotics is aiming to take its approach to RAID to the big time. Stevenson envisions a Drobo-ruled utopia in which lower-paid “Tier 3″ IT admins also manage Tier 3 nearline storage, automatically striped using Drobo’s RAID method, who merely need the expertise to observe and act on a red, yellow or green light.
“Their algorithms could have a huge impact on enterprise storage, as well,” wrote Brad O’Neill, senior analyst with the Taneja Group in an email. “If you can begin to create heterogeneous arrays right down to the drive level, with no interruption of availability or performance, you’ve done something extremely disruptive to the market–capacity upgrades would become the equivalent of simply plugging in drives, waiting for a green light, then adding more. I could imagine large service providers using drive robots running swaps and upgrades not unlike tape robots do today. ”
In the interest of full disclosure, O’Neill’s not just an enthusiastic supporter of Drobo’s vision, but also of their bottom line: “I have a Drobo plugged into my laptop right now via USB providing 700 gigabytes of…storage…with four different drives of varying capacities and vendors,” O’Neill confessed. “I [also] bought four of them and gave them to my friends.”
(His friends are probably used to those special Christmases with Brad…)
You have seen my writings on (and may even have heard me speak about) Cross Correlation (CC) analytics engine as a necessary part of a Data Protection Management (DPM) product. DPM products make your backup and restore environment work more efficiently. Recently, I have seen the application of CC techniques to solve problems on the primary storage side. And much to my pleasure, I have also seen the technique applied to manage application performance.
Several players are delivering products in the DPM market including Aptare, Bocada, Illuminator, Servergraph, Tek-Tools and WysDM, and most recently, Symantec, with their NetBackup Reporter product. These products, as a category, are delivering real value, based on my conversations with many of you. EMC, who resells WysDM as Backup Advisor, is apparently shipping in large quantities. All big data protection vendors have gotten religion on this recently, and they are all scrambling to add DPM functionality via in-house R&D or through a partnership.
To be sure, not all products are created equal in terms of the strength of the CC engine (or even the existence of one), which to me is the essence of the product. Without a sound CC engine, the best a product can do is rudimentary analysis and basically report on changes.
I have seen two new and interesting uses of CC recently. First, WysDM announced WysDM for File Servers. Essentially, that means the same CC engine is being used to look at NetApp filers (primary storage) to determine if the filer is behaving as it should. Much as before, the product gathers data from the application and through all hardware and software layers that reside between it and the filer, and applies analytics to determine if the system is behaving within acceptable boundaries. Are response times to file requests deteriorating? Is capacity being utilized efficiently? Is a file system ready to run out of storage? What needs to be done to solve the problem? Will an additional GE connection make a difference? You get the point.
I know you are probably saying to yourself, “I get some of that information from filer’s integral management tool?” Of course, you do. But, just like on the data protection side, the amount and type of information about the environment that was being delivered before this tool was available was rudimentary and static. Unless one escapes outside of the filer and looks at the entire picture from end-to-end it is hard to determine the root cause of a problem that exists or is in the making. That can only be done with a sophisticated CC tool. And only a sophisticated tool will give you predictive information with a high degree of confidence.
Another company that has applied CC to the primary storage is Illuminator Software, whose DPM product now includes functionality about snapshots and replication. But, the product is still true to its data protection roots. In this case, the product provides information on the readiness of volumes from a data recoverability point of view. Whether the volume is protected using snapshots or replication or secondary disk or tape, its recoverability is established and reported on. The product also offers advice on the actions necessary to improve recoverability.
The third company, Akorri Networks, has applied a CC engine for an entirely different purpose: to provide insight into application performance. Of course, application recoverability is improved when application availability is improved so there is an underlying connection here. But, the overt focus is to provide insight into how storage resources are being used to deliver a certain level of performance at the application level. In other words, given a particular SLA for an application, does one have adequate or inadequate storage resources applied? Would extra resources (higher throughput storage, more storage, another pipe to storage, etc.) help to bring application performance back into SLA boundaries? Or would it be a waste? What would help the most? With this kind of information the right type and quantity of resources can be applied thus saving time and resources.
The progress in these areas has been truly phenomenal in the last three years, and yet, we are still in infancy stages of utilizing these tools. Most of these technologies have become available from smaller companies, whose reach is limited. Given that your environment is only getting more complex it behooves you to check these out! Send me an email if you need any help.
Why is it still acceptable to the majority of us to protect and recover our data like we did in the 1980s? Obviously, backup software has evolved in the last 20 years to perform differentials, incrementals and synthetics, integrate with most major databases, take advantage of array-based snapshots and do SAN-based backups. But, at the end of the day, many data protection products still lag the wide-scale user desire – may I even suggest requirement – for near-instant recoveries.
It strikes me as ironic that low-tech industries like fast-food can serve up a hamburger in 30 seconds or less while those of us who work in the technology industry can’t recover data for many of our company’s applications in the same amount of time or less. The guys running the hamburger joint at some point figured out that they made more money and were more productive making hamburgers every 30 seconds than they did every 90 seconds. We should minimally seek to be as productive.
The fast food guys also managed to figure out that letting you fill up your own drinks and then going back to get free refills was cheaper and faster than dedicating two people behind the counter to do the same thing. So, why can’t us high-tech folks figure out a way to empower our users to recover data rather than always requiring storage administrators to perform this task for them?
Now, this is not meant to diminish the value that storage administrators provide or to say that data protection is akin to serving up a hamburger. Obviously, all data is not created equal and you don’t want just any user to be able to access and restore data for a mission-critical production environment. There is still too much complexity and the ramifications – financial, political and technical – if anything goes wrong are potentially enormous. But, should recovering a file on a file server in 2007 really require a call to the help desk, a storage administrator and a wait time of 30 minutes or longer?
Near-instant recovery of data in the 21st century should no longer be reserved for just applications deemed “mission-critical”. Companies have too few employees and too many applications running on too many different servers to possibly keep track of which applications are mission-critical and early indications are that the emerging world of virtual servers will only exacerbate this situation.
Now, I am not suggesting one immediately abandon one’s current backup software product in favor of new products like CommVault’s Continuous Data Replicator, NetApp’s Topio Data Protection Suite or InMage’s DR-Scout that can deliver near real-time data replication and recovery. Everyone should be extremely cautious about their data and proceed cautiously with any of these new products, because they all take time to implement and tune to your environment.
But, we should keep in mind this is 2007, not the 1980s, and there is a risk associated with not moving forward. Just as your computing environment has changed, new data protection technologies are available that are better suited for today’s environment. Unfortunately, if your company has not changed its fundamental approach to data protection and how it protects and recovers data, odds are your company is operating at a disadvantage when it comes to providing your users a level of service that in this day and age they should not have to ask for but should expect.
Hitachi Data Systems announced a major upgrade to its Content Archive Platform, which SearchStorage.com News Director Jo Maitland reported on today. If you’re considering adopting an archiving product, you might want to check out the second chapter of the Data Retrieval Research Guide, which we published this week. It highlights the key issues involved with retrieving data from archives, with lots of information on CAS and deduplication.
We also recorded a podcast on email archiving a while back, which offers information about archiving with CAS and dealing with unstructured data. Check that out below.
Managed service provider RenewData briefed us today on its launching of a data migration service specifically for transferring email archives from one archiving product to another while maintaining a legal chain of custody.
Renew has partnerships with EMC, Symantec and CA (for the former iLumin product), which allow its proprietary data migration software to bypass the archiving application and extract data directly from the archive for quicker transfers. According to James Smith, vice president of enterprise solutions for RenewData, the company had already begun offering these migration services on an on-demand basis for customers and Smith says Renew has performed dozens of migrations already — the formal packaging and marketing of the service is what’s new.
There were no firms willing to speak to the press about their use of the service, but the fact that Renew anticipates a market for such a service is interesting evidence of the influence that e-discovery and email archiving in particular have in the storage industry of late. It’s difficult to tell what it means at this point if there’s a large market for assisted migration between email archiving tools–would it mean that users are not making the best choice of archiving systems the first time? Or would it mean that email archiving systems are not delivering on their promises?
The bottom line is that this service is anticipating at least some market because in many ways email archiving, as well as migration between archives, can be a painful and proprietary exercise. According to Smith, the service can be used to create a “baseline” copy of data in intercustodial deduplicated format. The service can also export to “standard” formats such as HTML or XML. However, most often the service has been used to migrate from one proprietary archive to another, according to Smith.
“Very few products out there archive the pure message file,” he said. “They put it in their own format so that it’s more painful to migrate away.”
If you think data protection management (DPM) tools that monitor and report on backup successes and failures are going to disappear with the introduction of virtual tape libraries (VTLs), think again.
It is easy to view DPM tools only in the context of an all-tape environment, since that is where the source of most backup troubles are and most of their value is derived. However, this can lead one to mistakenly assume that by bringing in a VTL, one can eliminate both the more vexing problems associated with backups and the need for DPM software. Unfortunately, VTLs create their own unique sets of problems that require users to keep DPM software available to help them identify and report on these issues.
This was made abundantly clear to me in a case study that Agite Software recently shared with me, in which a company had installed and tested Agite Software’s backupVISUAL DPM software. This company used it to monitor their backup environment. They had recently begun to use a Sepaton VTL and wanted to document to what degree the backup situation had improved since they switched from tape to disk. Much to everyone’s surprise, backups to the Sepaton VTL were still failing 30% of the time.
But, this was neither a Sepaton nor a backup software problem, per se. It was an oversight on the part of the administrators. The company decommissioned the tape drives that the servers were previously using as their backup target, and the servers had nowhere to direct the backup job, causing the subsequent backup jobs to fail.
While this is obviously an extreme case (and I am sure one that Agite Software brought to my attention to demonstrate the value of their product), it does illustrate that there is always more to consider when purchasing any new product than just plugging it in and letting it rip.
In today’s environments, where everything is so interconnected and interdependent, no one should believe any vendor’s claim that their product is “Plug’N’Play”. And even if everything appears to work fine on the surface, rest assured that any level of examination will almost always unveil some blatant gaps in service and performance.
Things have gotten kicked off in earnest out here in the Windy City at this year’s Storage Decisions conference in Chicago. Today was the first full day of sessions at this year’s edition of the conference, and attendees heard discussions of hot topics from blue-chip companies including United Airlines, Federal Reserve Bank, and Bank of America.
Gary Pilafas, managing director of enterprise architecture for United Airlines (UAL), gave a presentation this morning about his company’s DR plans, much of which centered around classifying data according to criticality, and setting disaster recovery levels appropriately, a common trend in DR of late. Pilafas said he steered application admins away from insisting on Tier 1 DR (after all, no application admin wants to say his data isn’t of top importance) by emphasizing cost.
On this he was challenged by Michael Thomas, storage architect for the Federal Reserve, who said he’d seen that kind of planning go awry in some cases after 9/11 and Hurricane Katrina. “Some business units had [scaled back] DR plans based on cost, but then their SLAs didn’t match their true business requirements,” Thomas said. “They still expected IT to respond, and we did, but not in as timely a manner as they would have liked in some cases.”
Pilafas acknowledged that getting a true sense of business requirements and managing application interdependencies made tiering for DR a tricky project. However, he said UAL is currently testing service-bus software products including IBM’s Websphere MQ and BEA’s Aqualogic, layered over Hitachi Data Systems’ Universal Storage Platform for a services-oriented architecture. That plan, he said, will decouple data services from individual business units, specific applications or devices, eliminating the issue of application interdependencies. He said it will also go a long way toward addressing the confusion about business units and their priorities. “This way we can discuss each business unit’s priorities, map it back to services, and the higher-priority services float to the top,” he said. “It’s like taking the opposite of the lowest common denominator.”
Thomas himself had a different approach to making DR plans more effective, which is to go back to the drawing board with testing. “One of the big problems in this industry is that a lot of people don’t really test their DR plans,” he said. “They send people out a week in advance and prepare, and then test.” Thomas advocated more spontaneous tests and recounted one test in an earlier position where employees were “toe-tagged” at random to more realistically simulate a disaster scenario.
Meanwhile, if there’s anything that requires as much careful planning and precise procedure as DR, it’s e-discovery, and on hand with a keynote speech on that subject was Daniel Blair, e-discovery, investigation and incident support within the information security and business continuity division of Bank of America (say that five times fast).
Among the nuggets offered by Blair was the estimation that for every 1 GB of data produced for e-discovery, 6.25 GB of storage space is needed for multiple working copies, indexing and conversion to TIFF formats as well as the production of copies for opposing counsel. BOA’s approach to cut down on storage costs is to put the original “golden” copy of data onto lower-performing, high-capacity SATA disk (backed up vigorously, of course) and use higher-performing FC storage for the processing.
Blair wasn’t able to discuss specifics because of the sensitive nature of corporate litigation, but he did say that so far, he has yet to find a single comprehensive product for e-discovery. He also said that BOA uses a combination of in-house work and outsourcing, specifically with TIFF conversion, to lighten the workload and save financially.
Ultimately, though Blair said the new federal rules of civil procedure could make e-discovery a more bearable undertaking (since they recognize a “good faith” effort to preserve data), further attention on e-discovery means that more savvy practitioners will find new ways to key on process vulnerabilities during a lawsuit.
As the pressure grows, Blair said there’s plenty of room for improvement in the technology space. “Real-time indexing, content categorization, records management for the lifecycle, true policy-based management, and better scalability,” he listed off immediately when asked for ideas.
One other item of note: Compellent was the name on everybody’s lips during the expo on the show floor tonight. Users said they had always liked Compellent’s automated tiered storage feature, but it had taken some time to see more customer traction in the market and product maturity for the emerging company.
So, what are you hearing at the show? Give us your thoughts in the comments section.
Is your data on fire? In this case, I am not talking about how frequently your data is accessed or how great the information is contained in your data? I am talking about literally on fire.
Why do I ask? This week I am attending the PRISM International conference (www.prismintl.org) conference in Savannah, Ga., and one of the focuses of the conference is the lessons learned from last year’s Iron Mountain fire in London. In attending the first of two sessions on this topic, one of the questions asked was how many records management companies have had fires in their facilities. Out of about 200 -250 attendees in the room, 2 or 3 raised their hands. Sure, that’s only 1% of the total number in attendance but from my perspective, that is a lot. And from the soberness of those in attendance, their sentiments would seem to match mine and that the entire records management industry, and Iron Mountain in particular, are taking this occurrence very seriously and taking steps to prevent this from occurring again.
To their credit, one of the steps that Iron Mountain took was an attitude of full disclosure and cooperation with the public fire officials in the U.K. The results of the study by an outside independent consultancy were that Iron Mountain’s fire and security systems were properly maintained but their building services were not. That sounds worse than it is. That means items like pallets or a dumpster with flamable materials (cardboard, paper, etc) were too close to the building. In this circumstance, if a fire does get started, even with these other systems in place, the fire now has a source and a steady supply of oxygen which can overwhelm the other systems and lead to a catastrophic loss, as in Iron Mountain’s case.
What is most disconcerting is that in London, according to Mike Murphy, a director with Osborn Associated, Ltd., and the independent fire protection consultanting firm in the UK that assisted with the Iron Mountain investigation, 60% of the fires started are as a result of arson. Unfortunatly the statistics in the US are similar. According to the most recent statistics on the U.S. Fire Administration’s Web site, there were 31,500 intentially set fires in 2005 which caused 315 deaths and $664 million in structural losses.
So, what does this mean for the rest of us? One should not assume we are immune from something similar happening either to our records management provider or even our own facility. We need to make sure the grounds around our own company’s facilities are clear of flammable debris of any kind. While they obviously cannot catch on fire by themselves, with 50% of the fires in the US set by juveniles, why give them any temptation to do so? Also, be sure to ask your records management provider to do the same and maybe even occasionally drive by and check out their facility to be sure they are because their standards for protecting your offsite data should be no less than your own.