December 1, 2008 4:33 PM
Posted by: Tskyers
Data center disaster recovery planning
, small business storage
I’ve recently been at the fuzzy end of the data recovery/data availability lollipop. I lost a motherboard due to some crazy unknown issue/interaction with my front-mounted headphone jack, the motherboard and the sound card. During this nightmare I’ve come to appreciate even more the process of making sure that, in the event of a disaster, companies (even small ones like my home business) have access not only to their data but to their critical systems as well.
I’ve passed through all the phases of grief with this motherboard. At first, I was in denial for a good 24 hours, thinking ‘there’s no way this could be happening, something just tripped and all I have to do is reset a switch or jumper.” Well, I moved around the three jumpers on the board, and the myriad of switches at least 10 times each, and it was still dead. I took out the CPU, the memory, all the cards, and tried a new power supply. No go.
By now I’d been down for about 48 hours and panic was setting in. So I set out to try and at least recover my data. I have most (funny thing, I thought I had all) of my important data on my file server in the basement, my email via IMAP (replicated from a protected server on the Internet to a server in my home virtual server farm) and the applications I’d need to carry out my work functions available via ISOs on another file server. I figured these steps would be good enough to get me up and running in case I lost my desktop. But I was wrong, ooooh so wrong. As it turned out, neither repairing the motherboard nor restoring data from other devices even came close to solving the whole problem.
The first tenet of data recovery planning is “Know the value of thy data” (Jon Toigo). The second tenet is “Know where it is, dummy” (Curtis Preston). I thought I knew the value of all my data, and I was absolutely certain I knew where it was. I had scripts built to move that data around from where I created it (my now-dead desktop) to a “safer” place (my super-redundant file server), while some of the smaller file size and text-based items were created directly on the file server.
I routinely categorize my documents, images, invoices and other data I create as well. As far as data classification is concerned, I really do eat my own dog food.
But apparently. this wasn’t enough (or I need something new) because I still wasn’t able to work after my desktop went down. I was literally dead in the water–production in my office came to a screeching halt with terabytes of storage, servers and such still happily whirring away.
Why? Here’s the kicker. I was so used to my dual monitor setup with that fast storage subsystem that most of the things I was creating I couldn’t easily (or in some cases at all) shift to working on a laptop. Not only that, but I missed small things that I thought were unimportant, like Outlook email filters I created to organize my email (I get about 100 or so real messages out of the 500+ total messages on a weekday). I found it almost impossible to sift through all the email to get at the bits I needed. I kept running into situations where documents I was creating depended on some bit of data that was easily accessible when I was working on my desktop but took me close to two hours to find when I was on my laptop (I have a desktop search engine setup that indexes my document stores).
I’d also gotten so used to the notepad gadget on Vista’s sidebar that I stored all kinds of little notes to myself, URLs and such. All now inaccessible. While I could technically “work,” it was taking me eight hours to do what normally took 30 minutes.
Being caught completely offguard by this made all the steps I took to prepare for this situation seem all the more pointless. I had most of my data. I could access most of my data. But I was having serious problems with productivity because key pieces were missing.
This cost me. . .and not just in terms of productivity. I actually ended up paying $100 more for goods for my hobby e-shop because I couldn’t locate the original quote the company sent me and it had been a relatively long period of time between quoting and purchasing. Aargh!
Trying to find a motherboard (the same brand and model) locally was an exercise in futility. The board was out of production and stock had dried up everywhere but on the Internet, where the price was astronomical. I ended up having to RMA a second board and had to switch manufacturers and reinstall Vista three times.
What’s more, there are always complicating factors at work in any recovery situation. Right before my motherboard shorted, my wife and I — given the economy — had revisited our budget looking to cut costs, and, seeing how much we were paying for communications and television, decided to switch to Comcast VoIP from a Verizon land line.
In doing so, we discovered that the cable line coming into our house had a crack in it, and when the wind blew or a bird sat on the cable the cable swayed, and the signal strength would fluctuate too much for the VoIP Terminal Adapter. So the cable had to be replaced. This meant that when the motherboard died, not only was my main computer down, but I also had no reliable communications besides my cell phone. The only way for me to get on the Internet reliably was to tether with my cell phone–all this only a week after I got my computer back to a semi-productive state!
Comcast would replace our modem four times, and send five different technicians out to diagnose the issue. After two weeks of no (or nearly no) Internet, they replaced the cable all the way out to multiples poles along the street.
And those were just the infrastructure disasters. The work stoppages caused by them were disasters in and of themselves. I have a home office, and my wife works exclusively from her home office. Without the Internet she is, for all intents and purposes, out of business, and I’m not too far behind her. Over the five weeks it took for these events to unfurl we’ve calculated the lost man (and woman) hours at about 350. . .give or take a few working Saturdays.
Lessons learned for me:
- Have a spare board. Sure, it’s costly, but after almost two weeks of lost productivity just waiting for a board, I realized it’s cheaper to have a board on a shelf.
- As an infrastructure engineer I do my best to plan for disasters by building in replication facilities and sourcing storage subsystems that lend themselves to replication and can operate in hot/warm and hot/hot configurations. This, however, is not disaster recovery planning, as much as I’d like to pat myself on the back and say it is. That part of the process is simply being prudent about hardware choices. While it helps with DR, it cannot be relied on as your main plan no matter what hardware vendors tell you.
- Really planning for DR involves things that I’ve always felt should be left to folks with proven expertise. My recent experiences have firmly cemented that belief. A storage professional is not a DR professional by default, no matter how many storage professionals happen to be extremely proficient at DR. Having a great protection plan for data with SRDF, snapshots and gigawatts of backup power does not mean that you or your business will actually be able to function in the event of a disaster.
- Make efforts to truly understand the value of metadata, indexes and other things required to conduct business in the event of a disaster, not just the Word file and a copy of Microsoft Office.
- Internet access has become a requirement. It is no longer a luxury plan for a backup line (DSL, cellular etc).
- If your computers not working means you will lose money at your business, pay someone to help you with a REAL DR plan. If you are a home-based business, do research on what you should be planning for and talk with a professional about DR.
- Have spares. . .wait, did I say that already?
Hopefully all this will scare spare some folks this nightmare by pushing them to take a real look at how they work and how they can continue working in the event of a disaster. Whether it’s on a small or large scale.
November 24, 2008 10:55 AM
Posted by: Dave Raffo
Like the rest of corporate America, storage companies will spend the holiday season implementing cost-cutting measures to get through the current financial crisis.
Quantum today kicked off Thanksgiving week by disclosing it is chopping 180 jobs – about 8% of its workforce – and people inside the storage industry are waiting for larger vendors to announce layoffs in the coming weeks.
Quantum’s press release says the layoffs and other steps to decrease expenses will save the company around $18 million per year, after an initial $4.4 million cost to implement. Quantum emphasized it will increase its investment on data deduplication and replication, which leaves its declining tape business to feel the cuts.
The immediate goal of Quantum’s cuts is to get its stock price up. Quantum’s shares finished last week at a paltry $0.14. The New York Stock Exchange can delist Quantum if its shares do not rise to $1.00 by April 27. Quantum can also proceed with a reverse stock split that shareholders approved in August to raise its share price.
It’s certainly no surprise that Quantum is concentrating on dedupe for its disk backup appliances. Its disk and software revenue has been increasing mainly because of dedupe while its legacy tape business has declined over the past few quarters. EMC uses Quantum’s deduplication software with its backup disk libraries, and Dell recently said it would partner with Quantum to sell dedupe products next year.
Last week we reported privately held archiving vendor Copan Systems is laying off staff, giving unpaid leave to workers and slashing executives’ salaries while waiting to close a round of VC funding.
Hewlett-Packard, Dell, and Sun have all disclosed cost-cutting measures as well.
“Whether it’s a private or public company, everyone is feeling the crunch,” Pacific Growth financial analyst Kaushik Roy says. “The problem is that nobody knows if this is the bottom or if it could go down a lot more.”
November 21, 2008 3:54 PM
Posted by: Dave Raffo
Although proponents of 10 Gigabit Ethernet point to virtual servers, iSCSI, and Fibre Channel over Ethernet (FCoE) as reasons it will catch on in storage, it has yet to do so.
But vendors continue to build out the infrastructure in hopes of making 2009 – or 2010 at the latest – the year of 10 GigE.
Hewlett-Packard this week rolled out a Virtual Connect Flex-10 module that connects HP blade servers to shared MSA2000 SAS enclosures (HP also has a Virtual Connect 4Gb FC module). The Flex-10 module divides capacity of a 10 GigE port into four connections, and lets customers assign different bandwidth requirements to each connection instead of having to use multiple NIC cards for virtual servers.
“Flex 10 makes 10-gig useful,” said Mark Potter, vice president and general manager for HP BladeSystem. “This makes 10-gig to the server dramatically efficient and will help 10-gig take off on a rapid ramp.”
Alacritech this week announced 10GbE Scalable Network Accelerators (SNAs) that combine a NIC with a TCP/IP offload engine (TOE) on one card. Alacritech positions the card as a way to alleviate performance bottlenecks and make it feasible to run 10 GigE storage devices. The cards will be available in early 2009.
There have been other 10 GigE storage offerings in recent weeks. Woven Systems, trying to make a play as an Ethernet data center switch provider, released a EFX 5000 core switch to go with its backbone and top of rack switches. Woven also released a 10 Gigabit Ethernet Fabric Manager application to monitor multi-path fabric utilization, and measure latency and jitter.
InfiniBand chip maker Mellanox Technologies rolled out a ConnectX ENt 10 GigE chip that can power storage systems using FCoE and Data Center Ethernet.
Stephen Foskett, director of data practice for storage consultant Contoural, says FCoE is driving interest in 10 GigE among storage admins he talks to.
“There’s a lot more interest in 10-gig from people interested in FCoE and the converged network concept,” he said. “For FCoE, they need something faster than 4-gig [FC] and something with a roadmap past 8-gig [FC].”
Foskett says 10-GigE will catch on soon for iSCSI, so much that “I would be shocked if in three years we dind’t have most iSCSI traffic on 10-gig.” And that will drive 10-gig TOE card adoption.
“If you’re going to use 10-gig, you’re really going to want an offload engine,” he says. “There’s not much support out there for offload engines in general, and that’s a hurdle that really has to be cleared before people start investing in 10-gig.”
November 20, 2008 6:04 PM
Posted by: Dave Raffo
With large storage vendors – and other corporations – laying off and struggling these days, where does that leave the little guys? Today’s economy doesn’t bode well for startups that rely on venture funding to survive.
Take Copan Systems for example. At the start of 2008, Copan CEO Mark Ward talked about taking the MAID systems vendor public this year. As 2009 approaches, Copan’s survival depends on it securing another funding round. While waiting for funding, the company has laid off a big chunk of its staff, cut the pay of its top execs, and will give many of the surviving workers unpaid time off.
Ward says he fully expects to get funding from new and existing VCs, but the Copan board suggested moves to cut costs.
“My board members said ‘Why aren’t you doing the same thing that Sun, HP and Dell are doing?’” Ward said, referring to companies who have announced layoffs and unpaid leave over the holidays.
Copan slashed 15% of its staff with almost all of the cuts coming in sales and marketing, Ward said. The CEO and other senior management members have taken pay cuts, which other industry sources say come to 20 percent for Ward and 10 percent for others. “It was a voluntary pay cut and I’ve taken more than everybody else,” Ward said.
Over the next two weeks, Ward said about half the employees – those not involved in new product development – will take an unpaid leave.
Ward says much of the sales slack will be picked up by a worldwide reseller deal with a large partner he hopes to name in a few weeks. According to Ward, Copan signed a multi-million federal government order last quarter and sales are going well. “We have 175 customers, and a path to profitability in 2009,” he said.
That’s providing it gets the funding to make it into 2009.
November 20, 2008 3:11 PM
Posted by: Beth Pariseau
By now it’s clear that all major storage vendors will support flash in their systems. But the debate rages over whether flash should be as cache or as persistent storage.
Earlier this month NetApp revealed plans to support solid-state Flash drives as cache and persistent storage in its FAS systems beginning next year. The cache model will come first.
“We believe the optimal use case initially lies in cache,” says Patrick Rogers, NetApp VP of solutions marketing. Netapp has developed wear-leveling algorithms that will be incorporated into the WAFL. WAFL’s awareness of access frequency and other characteristics for blocks will allow it to use both DRAM and flash, with flash as the “victim cache” — a landing spot for blocks displaced from primary DRAM cache.
Why not just use DRAM? “If you have a very large amount of data, and you can’t accommodate it entirely in [DRAM] cache, flash offers much higher capacities,” Rogers says.
EMC’s Barry Burke responded about a week later with a post on his blog, The Storage Anarchist, asking some detailed questions about Flash as cache. To wit:
- What read hit ratios and repetitive reads of a block are required to overcome the NAND write penalty?
- How will accelerated cell wear-out be avoided for NAND-based caches?
- What would be required to use NAND flash as a write cache – do you have to implement some form of external data integrity verification and a means to recover from a damaged block (e.g., mirroring writes to separate NAND devices, etc.)?
I asked Burke to answer his own questions when it came to Flash as persistent storage, which is EMC’s preference so far. He answered me in an email:
- Overcoming the Write penalty – not an issue, because storage arrays generally always buffer writes, notify the host that the I/O is completed and then destage the writes to the flash drives asynchronously. Plus, unlike a cache, the data doesn’t have to be read off of disk first – all I/O’s can basically be a single direct I/O to flash: read what you need, write what’s changed. As such, reads aren’t deferred by writes – they can be asynchronously scheduled by the array based on demand and response time.
- Accelerated Wear-out – not an issue, for as I noted, the write speed is limited by the interface or the device itself, and the drives are internally optimized with enough spare capacity to ensure a predictable lifespan given the known maximum write rate. Also, as a storage device, every write into flash is required/necessary, whereas with flash, there likely will be many writes that are never leveraged as a cache hit – cache will always be written to more than physical storage (almost by definition).
- Data Integrity – again, not an issue, at least not with the enterprise drives we are using. This is one of the key areas that EMC and STEC collaborated on, for example, to ensure that there is end-to-end data integrity verification. Many flash drives don’t have this level of protection yet, and it is not inherent to the flash technology itself. So anyone implementing flash-as-cache has to add this integrity detection and recovery or run the risk of undetected data corruption.
I also asked NetApp for a response. So far no formal response to Burke’s specific questions, but there are some NetApp blog posts that address the plans for Flash deployments, one of which links to a white paper with some more specifics.
For the first question, according to the white paper, “Like SSDs, read caching offers the most benefit for applications with a lot of small, random read I/Os. Once a cache is populated, it can substantially decrease the average response time for read operations and reduce the total number of HDDs needed to meet a given I/O requirement.”
Not as specific an answer as you could hope for, but it’s a start. NetApp also appears to have an offering in place for customers to use to determine which specific applications in their environment might benefit from Flash as cache, called Predictive Cache Statistics (PCS).
As to the second question, according to the whitepaper, “NetApp has pioneered caching architectures to accelerate NFS storage performance with its FlexCache software and storage acceleration appliances. FlexCache eliminates storage bottlenecks without requiring additional administrative overhead for data placement. ”
Another precinct was also heard from in the vendor blogosphere on these topics, with a comment on Chuck Hollis’s blog late last week. With regard to the write penalty, Fusion-io CTO David Flynn argued that the bandwidth problem could be compensated for with parallelism–i.e. using an array of NAND chips in a Flash device .
Latency, on the other hand, cannot be “fixed” by parallelism. However, in a caching scheme, the latency differential between two tiers is compensated for by choice of the correct access size. While DRAM is accessed in cache lines (32 bytes if I remember correctly), something that runs at 100 times higher latency would need to be accessed in chunks 100 times larger (say around 4KB).
Curiously enough, the demand page loading virtual memory systems that were designed into OS’s decades ago does indeed use 4KB pages. That’s because it was designed in a day when memory and disk were only about a factor of 100 off in access latency – right where NAND and DRAM are today.
This is an extension of the debate that has been going on all year about the proper place for solid-state media. Server vendors such as Hewlett-Packard argue that Flash used as persistent storage behind a controller puts the bottleneck of the network between the application and the speedy drives, defeating the purpose of adding them to boost performance. And round and round we go…at the rate it’s going, this discussion could last longer than the disk vs. tape argument.
November 20, 2008 1:34 PM
Posted by: Dave Raffo
Within the next few weeks, Storage Soup is moving to IT Knowledge Exchange, a TechTarget site where IT pros can ask or answer technical questions or follow dozens of IT blogs hosted there.
We’re moving our blog there to bring you closer to your peers in the storage industry.
The content of Storage Soup won’t change. Only our address will change — and we’ll automatically redirect you there when the change happens.
Once we move, be sure to bookmark the new link, and if you’re into RSS, subscribe to us using your favorite feed reader.
November 18, 2008 3:21 PM
Posted by: Beth Pariseau
Backup software vendor Asigra is looking to “hold storage vendors’ feet to the fire” with a new free tool it will be offering to customers to validate IOPS on disk-based backup hardware. According to Asigra executive vice president Eran Farajun, sometimes customers end up buying systems to support disk-based backup that are overkill thanks to persuasive salespeople, and others “think they can get away with a cheaper version” and under-buy.
The new tool will simulate read/write loads of different file system configurations and user profiles to simulate the work that Asigra’s backup software would place on the system. Users can simulate configurations of 300, 500 and 1000 sites. “Maybe [getting adequate backup performance out of a disk array] means you switch from Fibre Channel to SATA drives, or you go from a 32-bit to a 64-bit NAS head,” Farajun said. Results of the testing done by the I/O simulator are generally ready in a week or two depending on workload, and can be fed into Excel spreadsheets or Crystal Reports.
Vendors already post results of benchmark testing on sites like SPEC.org, but according to Farajun, “Most of the time, those numbers aren’t that helpful–they’re optimal, utopian, statistical numbers. It’s like going to buy a Camry and the salesperson tells you how fast the NASCAR version can go on a closed course with a professional driver.”
Enterprise Strategy Group analyst Bob Laliberte said he saw this move being less about taking on storage vendors and more about Asigra attempting to boost its software’s appeal in the market. “Everyone’s trying to do what they can to show value and value add in this economy,” Laliberte said, adding that one potential application for the new tool would be among service providers who could pass along any cost savings realized by paring down backup infrastructures to customers.
November 18, 2008 1:00 AM
Posted by: Beth Pariseau
Many in the storage industry are wondering about the fate of the Sun StorageTek business following Sun’s revelation of its umpteenth restructuring last week. In a press release issued Friday, Sun said it will be laying off between 5,000 and 6,000 employees, or 15% to 18% of its workforce, in an effort to save $700 million to $800 million in the next fiscal year.
Sun’s continually dismal earnings reports (it reported a revenue decline of $1.7 billion for the most recent quarter) already led to speculation that the company will be taken private or sell off parts of its business. But the buzz in the industry is intensifying with this lastest layoff because of the restructuring’s keen focus on open source software, which is where Sun has been turning its efforts in storage as well.
The elephant left in the room is the “traditional” storage business, most of it acquired for $4.1 billion with StorageTek three years ago.
Sun’s storage business now consists mainly of tape from StorageTek and open storage. But Sun is primarily interested in developing its own ideas and making its own way. CEO Jonathan Schwartz made clear on the earnings call that open storage would be the focus going forward. “We believe we can expect strong growth in open storage as the adoption of ZFS continues and the need for open systems becomes ever more critical for customers seeking better performance at a lower price than traditional NAS,”he said.
Now, sources with inside knowledge of Sun’s storage strategy point to the realignment of key executives to focus on software as a confirmation that Sun is getting ready to pull the plug on the traditional storage business. Sun shifted Anil Gadre from chief marketing officer to the head of a new software-focused business unit, and moved the Solaris, Virtualization, and Systems Management Software divisions under Systems executive vice president John Fowler. One source said the shift in focus can’t bode well for the traditional business.
“Sun’s key systems and marketing execs are all now in charge of software business units – or they have left the company, like Andy Bechtolsheim,” said the industry insider.
According to this source, “the facts are that Sun can’t sustain its current business given its current fiscal performance. Sun’s core expertise and strategy is Solaris, file systems, [and] systems software. All indications are that Sun will continue to invest here, but economically the company must divest or sell other assets not critical to Sun’s future or core competency…the traditional storage business is clearly packaged for a sell off.”
One shareholder, Southeastern Asset Management, which took a 21% stake in Sun last week, has also stated publicly it views Sun as a software company. Reuters reported that Southeastern “said it might go around the technology company’s board to talk to ‘third parties’ about alternatives,” though that story also notes that a buyer for all of Sun as a whole is unlikely. However, “one business that Sun could sell fairly easily is StorageTek, a data storage business that it bought in 2005 for $4.1 billion. Bankers estimated it to be worth $750 million to $1 billion today,” the Reuters report adds.
If Sun is looking to sell off what’s left of StorageTek, who will buy? While a fire-sale price compared with what Sun paid for it, $750 million to $1 billion is still a hefty price for anyone to pay for tape in the current financial climate. Unless there’s some surprise white knight waiting in the wings to take on a tape storage business in this kind of economy, it could still be back to the drawing board for Sun…again.
November 17, 2008 6:06 PM
Posted by: Beth Pariseau
At the end of a busy Monday, Symantec revealed that John Thompson is giving up his CEO post and “retiring” to the role of Chairman of the Board, with COO Enrique Salem taking over as CEO.
I was somewhat surprised by this move. I met Thompson at Symantec Vision this June and found him sharp and personable. His age, 59, also seems a bit young for retirement.
However, financial bloggers have seen this differently. For example, 24/7 Wall Street.com put Thompson on its list of “CEO’s to Go” in 2008, with a detailed explanation of the ways Thompson and Symantec have run afoul of Wall Street. The complaints are mostly due to a stubbornly low stock price, attributable in part to the Veritas and Altiris acquisitions. “Wall Street hated the change of strategy [with Veritas] and still dislikes it. To us, the storage play and the data security play makes sense. But money talks and the money is against this merger even after two-years,” the 24/7 hit-list article stated.
In response to today’s news, 24/7 Wall St. author Jon C. Ogg writes, “this was with some mixed emotion because we have heard such great things about him and believed him to be a high-caliber person. Because we thought well of him, despite his company’s share performance, we said it isn’t too late for Thompson and we think there is a real shot that he’s be more valuable to keep as Chairman with a new CEO rather than an outright revolution. ”
Sigh. I don’t know about you, but right about now, I am a bit sick of hearing about Wall Street. Maybe Thompson felt the same way.