Last week, Tory Skyers wrote a post about the unforeseen complexities of disaster recovery after his PC’s motherboard fried. This week, I had an interesting discussion with somebody working with a much bigger enterprise infrastructure who also found that people, process, and sometimes luck–good and bad–can influence disaster recovery planning more than any technology.
Mark Zwartz, manager of information technologies for privately held real estate conglomerate JMB Companies, has been signed on with SunGard’s Availability Services for virtual server-based DR since August. Last week Zwartz gave me a deeper look behind the scenes at his disaster recovery planning process, and the ways it wasn’t so simple.
For one thing, it might not have happened at all without a contract re-negotiation with SunGard. “Our original contract was a wacky deal with [one subsidiary] that expanded to the other entities, but the contracts were so goofy and webbed into each other that if we called with a problem or needing to fail over, the people at SunGard might not have any idea which company or machines we were talking about,” he said. “We saved money on lawyers and negotiations, but quite honestly, if we had to failover, nobody at SunGard might’ve known what to turn on.”
The renegotiation of that contract happened to coincide with the beta program for the SunGard service. “That was the foot in the door to change things,” Zwartz said. Because it was a beta program, the companies got three free months of testing, which caught the attention of Zwartz’s management.
Meanwhile, JMB was in the midst of a hardware refresh, as well as rolling out virtualization. Still, “one of the hardest parts was selling virtualization–these are highly intelligent people who are handling millions if not billions of dollars, and it’s hard to explain the concept that they don’t actually ‘own’ anything [with virtual servers],” he said.
A broker at a hedge fund firm in the conglomerate was highly reliant on Outlook contacts and notes in each contact file to do her daily business. Meanwhile, the company still worked with traditional tape backup, which wouldn’t offer the granular protection to recover all of those contacts in the event of an outage.
I’m sure most of you out there in blogland know what happened next. “She was synching her BlackBerry herself, and blew out her contacts,” according to Zwartz. The only option for restoring Exchange backups was to restore the entire Exchange database from tape to a separate server, which the company had declined to buy. “They lost a significant part of a trade before that,” he said. “Nobody realized such a small thing would make her unproductive.”
Of course, without a fully staffed and replicated secondary environment, Zwartz acknowledged, it’s impossible to be disaster-proof. But the incident also had a silver lining when it came to convincing management to participate in the new SunGard program. “It would’ve cost $1,500 to get a new backup device,” he said. “It wound up taking six weeks to get one contact back and cost between $15,000 and $20,000. It was a selling point when it came to virtual servers with SunGard.”
NetApp today quietly pulled the plug on its SnapMirror for Open Systems (SMOS) heterogeneous data replication software, acquired from startup Topio for $160 million in 2006.
In a press release posted on the NetApp Web site - but not distributed – the vendor said it would discontinue SMOS and close the former Topio development facility in Haifa, Israel on Jan. 15. According to the release, NetApp “has not made final employment decisions” on the 51 employees in Haifa.
NetApp acquired Topio for its Data Protection Suite, at least partly in response to EMC’s purchase of Kashya six months before. But while EMC built Kashya’s replication and CDP into RecoverPoint – a staple of its replication platform — Topio’s heterogeneous replication offering never caught on, even after NetApp re-released it as ReplicatorX and then SMOS.
NetApp’s release blames the product’s failure on a lack of interest in replication between multiple vendors’ products. “Our decision to terminate SMOS product development was based on customer priorities and actual purchase histories,” the release said. “The market for replication products for disaster recovery purposes is dominated by homogeneous, rather than multivendor, solutions. Our ‘any-to-any’ solution with SMOS was never adopted by customers in the way we anticipated.”
NetApp added that it remains committed to its other SnapMirror versions for “any-to-NetApp” data protection. SMOS customers will get three years of maintenance and technical support.
Pillar’s CEO Mike Workman dropped by our office today, and said that while Pillar retains its earlier reservations about SSDs not being best utilized behind a network loop, the company will support them next year. The systems vendor will not have an exclusive partner for the drives, Workman said, though he mentioned Intel as one supplier.
Pillar’s Axiom arrays separate disk capacity from the storage controller with components called bricks (disk) and slammers (controllers). Workman says the Axiom will support SSDs in the bricks, and the arrays’ QoS features will be updated to support moving workloads to SSDs. This can happen either according to policy or automatically (with prior user approval) when the system is under intense workload.
This is definitely a change in tune, though Workman has always said Pillar’s systems were capable of supporting SSDs and probably would. He just thought network latency was too great, and he hasn’t retreated from that position. “It’s there,” he said. “There’s no way to get around that.”
But Workman says the biggest obstance to SSD now is price. “When we show people how they work, they say, ‘Fine,’” he said. “Then we tell them how much it costs, and that’s when they keel over.”
Despite offering an 80% utilization guarantee earlier this year, Workman said only about 15% of Pillar’s customers are at 80%. But the company hasn’t been paying out lots of guarantee money, either. The details of the offer are vague to begin with: the terms are negotiated on a case by case basis, “to remediate any issue, as well as financial pain.” The terms of the guarantee would have to be negotiated as part of the original sale.
Analysts also said users might be wary of pushing utilization that high given that it requires capacity planning to be precise. “I can’t make [customers] write data to the system,” Workman said. “The guarantee was not that they will but that they can.” He added that the average Pillar customer’s utilization currently is 62%.
With sales of EMC storage systems sold through Dell on the decline, EMC CEO Joe Tucci declared last October “there’s a lot more we could and should be doing together” to strengthen the EMC-Dell relationship.
Today, Dell and EMC say they’ve extended their agreement to co-brand Clariion midrange SAN systems to 2013 and added EMC’s Celerra NX4 to the deal. Whether that’s doing “a lot more” or not will depend on how well Dell does with the NX4, which gives Dell another NAS product to sell along with its Windows-based NAS systems. Dell will begin selling the NX4 early next year.
Dell and EMC are calling the new deal a five-year extension, although it’s really a two-year extension because their previous deal was to run through 2011.
“The EMC-Dell relationship has been extremely successful,” says Peter Thayer, director of marketing for EMC’s multi-protocol group. “If you look at the storage industry, you won’t find any relationship as successful as this.”
He’s probably right, considering how many Clariion systems Dell has sold in the seven years since the co-marketing alliance began. But the relationship hasn’t been the same since Dell acquired EqualLogic for $1.4 billion in January, giving it its own midrange SAN system.
EqualLogic’s products are iSCSI only, so Dell still relies on Clariion for Fibre Channel SANs. But it’s probably no coincidence that Dell has sold fewer Clariion systems still picking up EqualLogic. Overall, EMC’s revenues from Dell declined 26% year over year last quarter, and Dell has gone from 15.8% of EMC revenue to 10.4% in a year. Dell accounted for 35% of Clariion revenue a few years back, but less than 30% last quarter.
Wachovia Capital Markets financial analyst Aaron Rakers calls the extension good news because it could end speculation about the two vendors’ commitment to each other. “There have clearly been increased questions surrounding this partnership throughout 2008, or rather since Dell has increasingly focused its attention on driving its EqualLogic business,” he wrote in a note to clients.
Financial analyst Kaushik Roy of Pacific Growth agrees that investors have been concerned about the direction the EMC-Dell relationship is going and the extension should help, but he isn’t sure about how much.
“We will have to wait to see if the relationship bears any fruit,” Roy said of the extension. “While we are not expecting EMC’s revenues from Dell to ramp up materially, investors would be happy if revenues do not decline precipitously.”
EMC is also involved with Dell’s plans to add data deduplication next year. While it hasn’t disclosed any products yet, Dell last month said its dedupe platform will include Quantum software and be compatible with EMC disk libraries.
IDC’s quarterly tracker numbers for the third quarter of 2008 show disk storage and storage software sales holding steady at a time when many industries are feeling the effects of recession.
According to an IDC press release, “worldwide external disk storage systems factory revenues posted 8.8% year-over-year growth totaling $4.9 billion…total disk storage systems market grew to $6.6 billion in revenues, up 1.1% from the prior year’s third quarter, driven by softness in server systems sales.”
Meanwhile, the storage software market grew year-over-year for the 20th consecutive quarter with revenues of $3.1 billion, up 11.6% over last year’s third quarter.
On the disk side, companies with server businesses showed declines. IBM disk revenue declined 18.1%, Dell dropped 8.7% and HP was down 0.5% in overall disk system revenue (including servers). Fujitsu Siemens and NEC also declined. But for external (networked) storage, HP increased 3.3% and Dell was up 8.6a% over last year. Storage system-only vendors EMC (16.2%) and NetApp (13.8%) gained significantly over last year, as did Sun (up 25%).
Year-over-year numbers looked similar on the software side, with outliers like HP increasing revenue 106.2% in storage management software from one year to the next. However, nearly all the storage software vendors stumbled from the previous quarter. HP took a 19.7% sequential hit in storage management. Storage infrastructure software slipped 7.8% from the previous quarter, with NetApp revenue declining 18.3% in that category.
According to IDC’s software press release, storage software revenues in the third quarter are traditionally slower before a typically strong fourth quarter. However, the overall economy has been going in the opposite direction. Many industries look at the third quarter as the calm before the storm, with dire predictions of declines coming for next year. And having covered these trackers before, I can say anecdotally I don’t recall seeing quite such sharp declines one column (the quarterly comparison) affecting almost all companies and almost all categories.
Other industry experts have told SearchStorage.com that the worst is yet to come. According to a report issued in October by Forrester Research, the third quarter for IT companies remained relatively stable because most vendors are still working through a sales pipeline. But poor sales are predicted for all IT vendors in the fourth quarter.
In the meantime, however, while budget growth may be constrained next year, storage managers have said they aren’t expecting their daily tasks to change drastically because of the recession.
I had an interesting conversation today with TheInfoPro’s managing director of storage research Rob Stevenson about the results of his firm’s latest survey of 250 Fortune 1000 and midsize enterprise storage users. Fortune 1000 users surveyed by TIP cited block virtualization as having the biggest impact on their environment this year. TIP expects 50% of the Fortune 1000 to have virtualization in use by the end of 2009. All of this is in response to ongoing and relentless data growth.
“Impact” is difficult to define, as TIP doesn’t offer definitions or parameters to the open-ended question for users, instead letting the responses shape the definition. (Midrange users cited server virtualization as having the biggest impact, and we all know that there are good and bad impacts involved).
What really stood out to me, though, when I discussed the results (as well as the semantics of the word impact) with Stevenson, was how block virtualization is being used and for what purposes. My general impression has been that block storage virtualization has not lived up to its initial round of hype as a “silver bullet” for single-pane-of-glass management of an overall storage environment. I wondered, had that changed when I wasn’t looking?
According to Stevenson, while adoption for block virtualization has risen steadily even since this past February (number of respondents with the technology “in use” went from 21% in February’s Wave 10 to 23% in Wave 11), 23% said they’re using it with just 2% of the overall storage capacity.
Stevenson said the users in this case were petabyte-plus shops that in the past year or so have seen storage balloon from the single petabyte range to 2.5 petabytes or more, with no signs of stopping. These admins are scrambling to consolidate storage, move to new technologies that offer better utilization, and automate tasks. Where block virtualization comes in for most of them is performing data migration while moving to new technologies or systems — hence the relatively small proportion of data being managed by block virtualization devices from day to day.
Meanwhile, midrange enterprises are increasingly looking to maximize their resources on the server side. Close behind that, though, come utilization improvement technologies for storage like thin provisioning and data deduplication.
It’s largely a matter of consolidating resources and improving utilization rates. “But the big Fortune 1000 shops have a bigger ‘legacy drag’ of data that they have to move,” Stevenson said. Hence the use of block virtualization tools.
Stevenson said continued data growth and an increasing amount of complexity to go along with it–storage managers are not only managing an average of 400 TB each compared to 200 TB each a year ago, but the number of LUNs to manage within that volume is also increasing. That drives a need for automated management. While the most popular use case for virtualization seems to be data migration, Stevenson said users are finding day-to-day data movement is also increasing, bringing these devices to the forefront once again for management.
Does this mean we could be seeing block virtualization tools proliferate once the midrange market reaches the petabyte level? (After all, as the old chestnut goes, a megabyte used to be a lot of data). That’s where Stevenson’s crystal ball grows, well, cloudier.
Right now, one tentative theory is that users whose data centers already have large amounts of data under management tend to also already have specialized staff. But as the midrange market comes up against a need to scale staff as well as technology, they may turn to service providers before turning to storage virtualization devices.
“In large data centers, we see the pooling of resources among multiple data center groups to balance workload, like moving storage networking to a networking team or data classification and archiving to server and application groups,” Stevenson said. “When it comes to midsize admins having to start ‘not doing things’, it’s probably not going to come with an increase in internal staffing–instead they may look to offshoring those tasks to cloud service providers.”
But that’s not to say large enterprises won’t be looking up at the clouds, too. “We’re still working out the ‘competing futures’ if you will,” Stevenson said.
Overland Storage’s employees and partners received an early Thanksgiving present last Wednesday when the struggling tape and disk vendor secured financing for up to $9 million of its domestic accounts receivable.
Marquette Commercial Finance will finance Overland’s accounts receivable. Accounts receivable financing means Overland will sell its invoices for a percentage of their total value to generate immediate cash. CEO Vern LoForti says Overland will determine how much of its domestic accounts it will sell to Marquette, up to $9 million. He also says Overland is negotiating another deal to finance its international receivables.
In its last quarterly earnings filing with the SEC, Overland revealed it might miss payments to vendors, liquidate assets and suspend some operations if it did not get financing in November. Overland was looking for $10 million.
“There was a lot of anticipation about this announcement,” LoForti says. “A lot of people were watching for this.”
Now that Overland can pay its bills, it will concentrate on marketing the Snap NAS business it acquired from Adaptec in June. LoForti hopes the NAS business becomes profitable by the first quarter of next year.
“We’re in the middle of a lot of big deals for Snap,” LoForti says. “We’re pushing it for video surveillance, which is a big market.”
He also sees Overland’s REO VTL as a potential growth area, but Overland is far from home free. The company lost $6.9 million last quarter and laid off 53 employees – 13% of its staff – in August. LoForti says no other staff reductions are planned now, although he won’t rule out more.
“We’re watching things closely, and prepared to do more if we need to,” he says, pointing out his rival Quantum’s cuts announced last week.
I’ve recently been at the fuzzy end of the data recovery/data availability lollipop. I lost a motherboard due to some crazy unknown issue/interaction with my front-mounted headphone jack, the motherboard and the sound card. During this nightmare I’ve come to appreciate even more the process of making sure that, in the event of a disaster, companies (even small ones like my home business) have access not only to their data but to their critical systems as well.
I’ve passed through all the phases of grief with this motherboard. At first, I was in denial for a good 24 hours, thinking ‘there’s no way this could be happening, something just tripped and all I have to do is reset a switch or jumper.” Well, I moved around the three jumpers on the board, and the myriad of switches at least 10 times each, and it was still dead. I took out the CPU, the memory, all the cards, and tried a new power supply. No go.
By now I’d been down for about 48 hours and panic was setting in. So I set out to try and at least recover my data. I have most (funny thing, I thought I had all) of my important data on my file server in the basement, my email via IMAP (replicated from a protected server on the Internet to a server in my home virtual server farm) and the applications I’d need to carry out my work functions available via ISOs on another file server. I figured these steps would be good enough to get me up and running in case I lost my desktop. But I was wrong, ooooh so wrong. As it turned out, neither repairing the motherboard nor restoring data from other devices even came close to solving the whole problem.
The first tenet of data recovery planning is “Know the value of thy data” (Jon Toigo). The second tenet is “Know where it is, dummy” (Curtis Preston). I thought I knew the value of all my data, and I was absolutely certain I knew where it was. I had scripts built to move that data around from where I created it (my now-dead desktop) to a “safer” place (my super-redundant file server), while some of the smaller file size and text-based items were created directly on the file server.
I routinely categorize my documents, images, invoices and other data I create as well. As far as data classification is concerned, I really do eat my own dog food.
But apparently. this wasn’t enough (or I need something new) because I still wasn’t able to work after my desktop went down. I was literally dead in the water–production in my office came to a screeching halt with terabytes of storage, servers and such still happily whirring away.
Why? Here’s the kicker. I was so used to my dual monitor setup with that fast storage subsystem that most of the things I was creating I couldn’t easily (or in some cases at all) shift to working on a laptop. Not only that, but I missed small things that I thought were unimportant, like Outlook email filters I created to organize my email (I get about 100 or so real messages out of the 500+ total messages on a weekday). I found it almost impossible to sift through all the email to get at the bits I needed. I kept running into situations where documents I was creating depended on some bit of data that was easily accessible when I was working on my desktop but took me close to two hours to find when I was on my laptop (I have a desktop search engine setup that indexes my document stores).
I’d also gotten so used to the notepad gadget on Vista’s sidebar that I stored all kinds of little notes to myself, URLs and such. All now inaccessible. While I could technically “work,” it was taking me eight hours to do what normally took 30 minutes.
Being caught completely offguard by this made all the steps I took to prepare for this situation seem all the more pointless. I had most of my data. I could access most of my data. But I was having serious problems with productivity because key pieces were missing.
This cost me. . .and not just in terms of productivity. I actually ended up paying $100 more for goods for my hobby e-shop because I couldn’t locate the original quote the company sent me and it had been a relatively long period of time between quoting and purchasing. Aargh!
Trying to find a motherboard (the same brand and model) locally was an exercise in futility. The board was out of production and stock had dried up everywhere but on the Internet, where the price was astronomical. I ended up having to RMA a second board and had to switch manufacturers and reinstall Vista three times.
What’s more, there are always complicating factors at work in any recovery situation. Right before my motherboard shorted, my wife and I — given the economy — had revisited our budget looking to cut costs, and, seeing how much we were paying for communications and television, decided to switch to Comcast VoIP from a Verizon land line.
In doing so, we discovered that the cable line coming into our house had a crack in it, and when the wind blew or a bird sat on the cable the cable swayed, and the signal strength would fluctuate too much for the VoIP Terminal Adapter. So the cable had to be replaced. This meant that when the motherboard died, not only was my main computer down, but I also had no reliable communications besides my cell phone. The only way for me to get on the Internet reliably was to tether with my cell phone–all this only a week after I got my computer back to a semi-productive state!
Comcast would replace our modem four times, and send five different technicians out to diagnose the issue. After two weeks of no (or nearly no) Internet, they replaced the cable all the way out to multiples poles along the street.
And those were just the infrastructure disasters. The work stoppages caused by them were disasters in and of themselves. I have a home office, and my wife works exclusively from her home office. Without the Internet she is, for all intents and purposes, out of business, and I’m not too far behind her. Over the five weeks it took for these events to unfurl we’ve calculated the lost man (and woman) hours at about 350. . .give or take a few working Saturdays.
Lessons learned for me:
- Have a spare board. Sure, it’s costly, but after almost two weeks of lost productivity just waiting for a board, I realized it’s cheaper to have a board on a shelf.
- As an infrastructure engineer I do my best to plan for disasters by building in replication facilities and sourcing storage subsystems that lend themselves to replication and can operate in hot/warm and hot/hot configurations. This, however, is not disaster recovery planning, as much as I’d like to pat myself on the back and say it is. That part of the process is simply being prudent about hardware choices. While it helps with DR, it cannot be relied on as your main plan no matter what hardware vendors tell you.
- Really planning for DR involves things that I’ve always felt should be left to folks with proven expertise. My recent experiences have firmly cemented that belief. A storage professional is not a DR professional by default, no matter how many storage professionals happen to be extremely proficient at DR. Having a great protection plan for data with SRDF, snapshots and gigawatts of backup power does not mean that you or your business will actually be able to function in the event of a disaster.
- Make efforts to truly understand the value of metadata, indexes and other things required to conduct business in the event of a disaster, not just the Word file and a copy of Microsoft Office.
- Internet access has become a requirement. It is no longer a luxury plan for a backup line (DSL, cellular etc).
- If your computers not working means you will lose money at your business, pay someone to help you with a REAL DR plan. If you are a home-based business, do research on what you should be planning for and talk with a professional about DR.
- Have spares. . .wait, did I say that already?
Hopefully all this will scare spare some folks this nightmare by pushing them to take a real look at how they work and how they can continue working in the event of a disaster. Whether it’s on a small or large scale.
Like the rest of corporate America, storage companies will spend the holiday season implementing cost-cutting measures to get through the current financial crisis.
Quantum today kicked off Thanksgiving week by disclosing it is chopping 180 jobs – about 8% of its workforce – and people inside the storage industry are waiting for larger vendors to announce layoffs in the coming weeks.
Quantum’s press release says the layoffs and other steps to decrease expenses will save the company around $18 million per year, after an initial $4.4 million cost to implement. Quantum emphasized it will increase its investment on data deduplication and replication, which leaves its declining tape business to feel the cuts.
The immediate goal of Quantum’s cuts is to get its stock price up. Quantum’s shares finished last week at a paltry $0.14. The New York Stock Exchange can delist Quantum if its shares do not rise to $1.00 by April 27. Quantum can also proceed with a reverse stock split that shareholders approved in August to raise its share price.
It’s certainly no surprise that Quantum is concentrating on dedupe for its disk backup appliances. Its disk and software revenue has been increasing mainly because of dedupe while its legacy tape business has declined over the past few quarters. EMC uses Quantum’s deduplication software with its backup disk libraries, and Dell recently said it would partner with Quantum to sell dedupe products next year.
Last week we reported privately held archiving vendor Copan Systems is laying off staff, giving unpaid leave to workers and slashing executives’ salaries while waiting to close a round of VC funding.
Hewlett-Packard, Dell, and Sun have all disclosed cost-cutting measures as well.
“Whether it’s a private or public company, everyone is feeling the crunch,” Pacific Growth financial analyst Kaushik Roy says. “The problem is that nobody knows if this is the bottom or if it could go down a lot more.”
Although proponents of 10 Gigabit Ethernet point to virtual servers, iSCSI, and Fibre Channel over Ethernet (FCoE) as reasons it will catch on in storage, it has yet to do so.
But vendors continue to build out the infrastructure in hopes of making 2009 – or 2010 at the latest – the year of 10 GigE.
Hewlett-Packard this week rolled out a Virtual Connect Flex-10 module that connects HP blade servers to shared MSA2000 SAS enclosures (HP also has a Virtual Connect 4Gb FC module). The Flex-10 module divides capacity of a 10 GigE port into four connections, and lets customers assign different bandwidth requirements to each connection instead of having to use multiple NIC cards for virtual servers.
“Flex 10 makes 10-gig useful,” said Mark Potter, vice president and general manager for HP BladeSystem. “This makes 10-gig to the server dramatically efficient and will help 10-gig take off on a rapid ramp.”
Alacritech this week announced 10GbE Scalable Network Accelerators (SNAs) that combine a NIC with a TCP/IP offload engine (TOE) on one card. Alacritech positions the card as a way to alleviate performance bottlenecks and make it feasible to run 10 GigE storage devices. The cards will be available in early 2009.
There have been other 10 GigE storage offerings in recent weeks. Woven Systems, trying to make a play as an Ethernet data center switch provider, released a EFX 5000 core switch to go with its backbone and top of rack switches. Woven also released a 10 Gigabit Ethernet Fabric Manager application to monitor multi-path fabric utilization, and measure latency and jitter.
InfiniBand chip maker Mellanox Technologies rolled out a ConnectX ENt 10 GigE chip that can power storage systems using FCoE and Data Center Ethernet.
Stephen Foskett, director of data practice for storage consultant Contoural, says FCoE is driving interest in 10 GigE among storage admins he talks to.
“There’s a lot more interest in 10-gig from people interested in FCoE and the converged network concept,” he said. “For FCoE, they need something faster than 4-gig [FC] and something with a roadmap past 8-gig [FC].”
Foskett says 10-GigE will catch on soon for iSCSI, so much that “I would be shocked if in three years we dind’t have most iSCSI traffic on 10-gig.” And that will drive 10-gig TOE card adoption.
“If you’re going to use 10-gig, you’re really going to want an offload engine,” he says. “There’s not much support out there for offload engines in general, and that’s a hurdle that really has to be cleared before people start investing in 10-gig.”