Amazon’s S3 online storage service suffered an outage this morning for several hours, echoing the outage suffered by email service provider RIM last week. While RIM’s outage affected CrackBerry addicts with alternatives to email, the Amazon outage may have affected Web-based companies relying on S3’s storage to deliver core services. Not good.
However, one S3 user I talked to today, SmugMug CEO Don McAskill, said his site didn’t feel a thing. “None of our customers reported any issues–we haven’t seen any problems that are customer facing,” he said.
But there’s also an important factor that may have led to SmugMug’s resiliency: the fact that after another outage last year, SmugMug started keeping about 10% of its data in a hot cache on-site. “It could have been that the hot cache was adequate for the 2 or so hours it was going on, or it could have been that for some people the outage was intermittent,” he added.
Meanwhile, some users were still reporting issues as recently as five minutes ago on Amazon’s Web Services Developer Connection message board. According to an Amazon.com official response on the thread about an hour ago, “This morning’s issue has been resolved and the system is continuing to recover. However, we are currently seeing slightly elevated error rates for some customers, and are actively working to resolve this. More information on that to follow as we have it.”
Their businesses aren’t the same, but I think this ties in with what I was saying in my post about RIM’s Blackberry meltdown–as more and more data “eggs” put into centralized service provider “baskets”, more and more of them are going to get broken, especially as the service-provider market ramps up.
Or as TechCrunch put it:
This could just be growing pains for Amazon Web Services, as more startups and other companies come to rely on it for their Web-scale computing infrastructure. But even if the outage only lasted a couple hours, it is unacceptable. Nobody is going to trust their business to cloud computing unless it is more reliable than the data-center computing that is the current norm. So many Websites now rely on Amazon’s S3 storage service and, increasingly, on its EC2 compute cloud as well, that an outage takes down a lot of sites, or at least takes down some of their functionality. Cloud computing needs to be 99.999 percent reliable if Amazon and others want it to become more widely adopted.
Growing pains may have had something to do with it, according to Taneja Group analyst Eric Burgener. “There’s less of this going on than there used to be, but this is one of those things that gives people pause about services,” he said. A focus on secondary storage and storage for small companies has made this crop of service providers more successful than the SSP’s of the bubble days, and even where companies are relying on services like this for primary storage, Burgener argued that the services option is still the better bet. “For small internet businesses services are still a perfect play–they allow businesses to start up rapidly without the kind of capital expense or infrastructure they need for an in-house system.”
I feel the need to make a confession here. Up until yesterday, despite spending a generous portion of my waking hours covering data backup, disaster recovery and data protection, I myself did not have a backup plan.
I do digital photography in my spare time, and creative writing outside work, and I’ve been a digital music addict since the advent of Napster. So I have about 100 GB on two IDE drives inside a Windows XP machine custom-built for me by a highly geeky friend. And it’s just been sitting there, waiting to be snatched away into the ether.
Then another friend of mine told me about how his MacBook hard drive crashed. On his birthday. While he also had the flu.
He told me how his entire visual design portfolio, an important part of his resume for the business he’s in, has been lost, along with all of his digital photographs, many of which he didn’t have posted on Flickr or stored anywhere else.
He went on to tell me that his costs for trying to recover the data from the drive are going to run him upwards of $2,000–if he’s lucky. It could be cheaper, but that would mean less of his data has been recovered, and so now he finds himself in the position of hoping he’ll have to spend more money.
It’s a bittersweet subject for him that so many people he knows, myself included, have credited his experience with finally getting them off their butts and backing up. But that’s the reality.
I ended up going with the 500 GB Western Digital MyBook, because that’s what my friend also ordered once he learned his lesson the hard way, and he’s far more technical than me, so I trust his judgment. The MyBook came with Memeo’s AutoBackup and AutoSync software, of which I’m only using the former. It also came with a bunch of Google software including Google Desktop, which I found rather odd.
Having covered data storage for the enterprise, I’ve had a chuckle whenever I’ve checked on the initial backup job’s progress. Granted, it’s got a QoS feature that cedes system resources to the PC, but let’s just say I’m not seeing the kind of data transfer rates with this thing I’m used to hearing about. It’s been funny, after being immersed in systems that perform at 8 Gbit or 10 Gbit for a few years, to watch my little PC poke along at what seems like 1 MB/hr, if that.
But still. At least I have a backup. Finally. And I can finally rid my closet of that skeleton.
Now my issue becomes off-site disaster recovery. It’s far more likely that my hard drive(s) will crash than that my house will be napalmed or something (knock on wood), but no sooner had I told Tory that he could stop bugging me about backup, than he started bugging me about taking the drive to my office once the data transfer is done.
But the AutoBackup software, like so many low-end and consumer backup offerings, is set to automatically backup changed files, and what I told Tory was, I like having a low RPO over here. And I made that napalm comment, I’ll admit (I can just feel karma coming to get me). So I’m thinking about some kind of backup SaaS for off-site DR, but capacity with those services is at a much higher premium than it is in 3.5 inch external SATA. And so you know what that means…data classification!
I may be poking along at 1 MB/hr, but it all feels like a slow-motion, small-scale version of the issues I cover every day. It’s interesting to see firsthand how “Digital Life ™” is, in fact, blurring the boundaries between home and business computing.
As approximately the last person in the Western Hemisphere not to own a PDA, I escaped the Great Blackberry Outage of Aught Eight last week, and got to have that much more time to be smug about my lack of dependence on such a thing before I inevitably get one and grow so dependent on it I need Tommy John surgery on my thumbs.
This week, though, the plot thickened for storage folks as it was revealed that the outage was caused by a failure during a systems upgrade. According to Reuters, the outage was caused by an upgrade to a data routing system inside one of the company’s data centers. In the past, RIM suffered an outage to its Blackberry service because of cache upgrades. Drunken Data auteur Jon Toigo thinks they’re still having storage problems, and cites an AP report on MSNBC saying the failure happened during a system upgrade designed to increase capacity.
Meanwhile, Reuters seems to imply that at heart, data growth is what bit RIM. “RIM has been adding corporate, government and retail subscribers at a torrid pace and has had to expand its capacity in step to handle increased e-mail and other data traffic. Its total subscriber base sits at about 12 million according to latest available data.”
The fact of the matter is that no system is failproof–but I think Reuters brings up a good point. We’re opening up new frontiers in massive multi-tenancy and creating new and unprecedented demands on computer systems; we’re also consolidating data into the hands of service providers like RIM. My sense is we’re going to start seeing more of this kind of issue as these trends continue, especially as more and more new services come online. So maybe I’ll just rely on good old dinosaur Outlook for a little while longer.
After my posts on militant dolphins and black holes, you could be forgiven for taking that headline literally, but this time I’m referring to the software kind of wizard, not the pointy-hat/ Harry Potter kind.
What prompted this post were two stories I saw this week. First, Reldata announced new adaptive software wizards for its storage gateways and I had an in-depth conversation with the company’s CEO, David Hubbard, about that very subject. Second, everyone’s favorite, Storage Magazine, ran a trends story this month headlined “Storage staffing shortage looms.”
Reldata’s adaptive wizards are a little different from some of the others companies like HP have announced for low-end products, in that they’re not just there for setup. Rather, the adaptive wizards are there for several stages of deployment for the gateway’s iSCSI SAN functions (NAS, replication and clustering wizards are still on the to-do list).
We’re hearing a lot about ease of use these days; even I have been guided through setting up volumes on disk arrays from emerging storage companies by way of proving, “See! Anyone can do it!”
But are we headed toward the point where that will literally have to be true?
When Dell purchased email archiver MessageOne for $155 million today, the computer giant didn’t have to welcome the small startup into the family. MessageOne has been in the Dell family from the start, literally.
MessageOne was co-founded by Adam Dell, brother of Dell founder Michael Dell. Michael Dell also had a financial interest in MessageOne. The Dell founder, his wife, parents, and a trust for his children are investors in two investment funds that backed MessageOne. Adam Dell manages the funds, and served as MessageOne’s chairman.
So when the smoke clears after the deal, Adam Dell will receive around $970,000, Michael Dell, Susan Dell and their children’s trust will receive a total of around $12 million; and Dell’s parents will receive around $450,000. According to the press release Dell issued announcing the deal, the $12 million paid Michael and Susan Dell and their children will be donated to charity.
To the Dells’ credit, they disclosed these numbers in the press release. The company also claims Michael Dell was not involved in the negotiations for MessageOne. Dell’s directors – excluding Michael Dell and CFO Don Carty – handled negotiations and received an opinion from Morgan Stanley & Co. that the price was fair to the company.
You can expect Michael Dell to be especially careful, considering the company had some accounting problems with the SEC in recent years that were part of the reason the founder came back to replace Kevin Rollins as CEO. And the acquisition will have to pass muster with regulatory agencies. Still, the results of this deal will be watched especially closely over the next few months. While Dell can easily justify acquiring email archiving and storage software as a service (SaaS), there will be questions about whether the price was right — even if it is merely tip money compared to the $1.4 billion paid for EqualLogic.
So if Dell doesn’t see a quick boost from MessageOne’s products and services, Michael Dell will have explain more than why the integration is taking longer than expected. He’ll have to convince investors and skeptics that the deal wasn’t just a nice payday and perhaps a lifeline for his brother’s company.
First, I need to define constraints before we dig into the meat: What I consider a small to medium-sized business (SMB) is a company that would have a problem justifying a $50,000 purchase for a product that would perform a migration then have no use for it for 3 to 5 years until they migrate again, or have one to two IT people doing the work, or think a SAN is just a typo for SAN-D that you’d find at a beach. I know IBM, Sun, Symantec et al. have migration services but I’m looking at the smaller business space where people need to store more on tighter budgets that were small to begin with.
We’ve recently upgraded our SAN infrastructure and while our data migration chores aren’t all that intense, I’d still prefer that a computer did it. I’ve built some tools to handle my cleanup work (I’ll share them as soon as some bugs are worked out) but only because I couldn’t easily buy something to do the same or better. Now I’ll admit that sometimes I can be blind or ignorant (or both), but I’ve noticed a HUGE gap in the availability of migration tools for the lower end of the SMB spectrum. With me being a part of The Matrix like I am, or akin to Mr. Universe from Serenity, one would think I’d have caught a whiff of something significant.
For a company known primarily for spending hundreds of millions of Larry Ellison’s Oracle bucks, the folks at Pillar Data have a good sense of humor.Take this video Pillar put together for its Application-Aware Storage release this week: http://www.youtube.com/watch?v=b0Kx0w7fYx4
Funny. But I have a feeling that a lot of storage administrators might have similar reactions as those at the malls and McDonald’s did to Pillar’s claim that it’s the first to offer application-aware storage. Application awareness is helpful but not new in storage, let alone “game-changing,” as Pillar claimed when it announced it this week.
“Is this a new feature? Well, not for the industry, but certainly for Pillar,” said analyst Greg Schulz of The StorageIO Group. “Others have tried, including Sun. So for Pillar, it’s new and game-changing. For the industry, well, maybe game-changing for those who have not seen it before.”
But is it even new for Pillar? What Pillar describes in its release — writing scripts that assign an application to either the outside, middle or inside of the disks in a volume — was supposedly in their product from the start.
In his blog explaining Pillar’s application-aware storage, here’s how Pillar CEO Mike Workman describes it: “. . .application-awareness implies configuration of disk, but in the case of Pillar’s Axiom it also implies things like cache configuration, network bandwidth, CPU priority, and layout of data on the disk platters. In other words, all the system resources are tailored to the application — set up to make the application see the best possible disk attributes out of the resources in the array.”
Workman also writes that this is the approach Pillar took when it started shipping its Axiom systems 2-1/2 years ago.
Pillar customer Greg Thayer, director of IT at voice data network provider Comm-Works, says application awareness was a key part of why he bought a Pillar system last September. “It was a compelling reason for us,” he said. “I can characterize my data by what is the most important information that users access, and that goes on the outside of the disk where things are spinning more often.”
But why is Pillar trumpeting a feature that it’s had from the start? Cynics in the industry say the company is trying to generate buzz because of stalled sales. Pillar has watched less funded storage system vendors Compellent and 3Par go public and Dell scoop up EqualLogic for $1.4 billion. For Ellison’s $300 million or so investment, Pillar claims 300 customers — which means it has spent at least $1 million per customer.
Still, let’s hope Pillar sticks around. No other storage company is running videos on YouTube that are nearly as interesting.
It certainly beats watching this guy carry on about server virtualization conspiracies.
That, friends, is without a doubt the best headline I’ve ever written.
As many of you are surely aware, underwater Internet cables in Asia were cut last week, one of them by an errant ship’s anchor, and another two (or three–I’ve seen stories that say there were a total of three cut cables, and stories that say there were four)…unexplained.
It all happened last week, but repairs are still ongoing in the region. The cable cut by the anchor has been fixed, and reportedly most of the region of Asia, the Middle East and North Africa that was Net-less has come back online (all those Saharan nomads are surely relieved wireless is back on their laptops again). Fixes to the other cables should be done Sunday according to authorities.
As always when human beings encounter the unknown, their immediate instinct is to fill it in with knowledge or theory as quickly as possible. This story is no exception, and according to this AFP piece, the conspiracy theories are flying fast and furious. Many suspect terrorism, yet no one knows how it would have been accomplished.
All of which leads to the following paragraph, which I will now quote verbatim:
Bloggers have speculated that the cutting of so many cables in a matter of days is too much of a coincidence and must be sabotage. Theories include a US-backed bid to cut off arch-foe Iran’s Internet access, terrorists piloting midget submarines or “vengeful militant dolphins.”
If this blog were the Daily Show, that right there would be your Moment of Zen.
But in seriousness. While all this is happening, there are no doubt companies suffering a complete outage, and if the estimates for the repairs are true (personally I apply the same projection-to-reality formula for Internet fixes as I do to cable repair guy appointment times), these companies will have been suffering complete outages for at least a week to ten days.
Helpfully, IT companies are reminding us through press releases that most companies are not equipped to survive outages longer than seven days (per Gartner). They’re also reminding everyone that had these companies been using their product(s), and presumably a sufficiently distant secondary site, they would’ve been fine. How that would be if you don’t have a WAN to replicate and restore data, or a network through which to conduct commerce, is beyond me, but that’s really not the point; here in the trade press we expect to get press releases linking IT products to every conceivable natural or worldwide disaster, regardless of how tenuous the link may be.
The more I thought about it, the more I wondered…unless you’re a multinational company, how do you survive an outage that big? We’ve all heard about how 9/11 taught people to expand the scope of their DR plans, and Katrina taught people to expand the geographic area they consider potentially disaster-affected when sending tapes offsite. This type of disaster, though, is too big to be escaped by all but the biggest of global corporations. And it does beg the question–how far can DR go? How do you respond to a disaster of global or hemispheric proportions? Many companies are going through a painstaking process of broadening the scope of DR plans beyond their local area as a result of Katrina–should they start planning DR hot sites in Siberia instead?
Yet even as IT shops slowly inch toward better preparedness, disasters, and the global economy, wait for no man. Given our worldwide dependence on the Internet (and imagine what the effect would be if this had happened in North America and Europe), has this disaster suggested a practical limit to technical DR? If so, what’s the contingency plan for that?
Why is it when storage vendors hawk a system below a certain price point – say $10,000 – it automatically becomes an SMB product?
Take HP’s MSA upgrade launched today. According to HP’s press release,
“The easy-to-use, enterprise-class systems are designed for small and mid-size businesses …”
But the real use for the system comes next, after a however,
“… enterprises also will find the MSA2000 is an ideal solution for their remote office, departmental, secondary and tertiary storage needs.”
The real purpose for the MSA2000 is the second one listed. SMBs are listed first because that’s considered the hot “greenfield” market today. But just because the MSA is at the low end of HP’s SAN portfolio doesn’t make it right for SMBs. They’re more an option for existing HP customers to add smaller storage deployments or to hook up to blade servers. That doesn’t make them bad; but they’re smaller versions of HP’s SANs and not built for SMBs.
Charles Vallhonrat, MSA product manager, HP StorageWorks division admits as much. When I asked about the system being for SMBs, he said yes, the price point is low and management is simple, “but we also see a large uptake with large customers putting it in remote offices and departments.”
HP says the list price starts at $4,999 but you better have a lot of unused storage lying around because that price includes no disk. That price covers a single iSCI controller. If you want 4.5 TB of SATA storage, it costs $7,993. A single Fibre Channel controller costs $5,999 without disk and $8.993 with 4.5 TB of SATA drives. Dual controller systems add $2,500 to the price. More expensive SAS drives are also available.
Vallhonrat says a single controller system is viable because the controllers include transportable cache. If one fails, you can move the cache to a new controller and recover data. That’s useful for a storage administrator, but probably not something the person who manages systems at an SMB wants to deal with.
HP isn’t alone in labeling its small enterprise systems as SMB offerings. EMC does the same with its Clariion AX systems. The difference is HP has a real SMB system – called the All-In-One.
Vallhonrat says the MSA2000 platform is meant to compete with IBM’s DS3000, Dell’s MD3000 and the lower-end of the AX4.
As for differences between the MSA200 and All-In-One, he compared it to using a dedicated printer or scanner as opposed to a multifunction device. The All-In-One is the multifunction device with iSCSI and NAS for block and file storage while the MSA2000 handles only block storage.
“All-In-One is for people who have a need for multiple storage types [file and block], but not the best performance for one type,” he said. “Like a multifunction printer, it’s not best scanner or best printer but does all. The MSA is for if you need better performance or availability, but not the ease of use or functionality of the All-In-One.”
I’m sure any number of you can come up with witty figurative responses to that, but I actually mean it literally.
Back in August I did a case study on CERN, the world’s largest physics laboratory, in Switzerland, and the petabytes of data storage that are going to support research on its Large Hadron Collider (LHC). LHC is a 12-story-high, 10-mile-wide underground system of tunnels, magnets and sensors that’s designed to do no less than recreate atomic conditions at the creation of the universe and capture particles that until now have been only theoretical.
Having spoken with CERN about their research and the way the whole system is set up, I was surprised when I logged in to my personal email this morning and got a friend request from a profile titled STOP CERN. According to the profile:
This space has been set up to spread awareness of the risks a project due to be launched at CERN next year poses to our planet. For the first time in many decades someone has built a machine that exceeds all our powers of prediction, and although they estimate the possibility of accidentally destroying the planet as extremely low, the LHC propaganda machine that ‘everything is safe’ is well funded by your tax dollars, paying large salaries to thousands of people who have much to lose financially should the LHC be unable to prove its safety. As most of them perceive the risk to be small, they are willing to take that ‘small risk’ at our expense. The actual risk cannot presently be calculated, and a Large Hadron Collider [LHC] legal defense fund has even been set up to challenge CERN on the project.
I don’t have any kind of physics background, so I don’t know if the criticisms are legit, but I was doubly surprised to find that the MySpace profile is only the tip of the iceberg of people questioning CERN. In addition to some other critical websites, an LHC Legal Defense Fund has been started with the goal of legally intervening to stop CERN from turning on LHC this May, creating a black hole within the collider and accidentally destroying the planet.
By the way, isn’t that really every geek’s dream? To be working on a machine that even theoretically could accidentally destroy the planet?
Anyway, the debate seems to be whether or not something called “Hawking evaporation” (presumably named after physicist Stephen Hawking) will neutralize the microscopic black holes that could be created by the particle collisions in LHC, or if they’ll continue to grow and, well, eat France.
According to another anti-CERN site:
If MBH’s [microscopic black holes] are created, there is a likelyhood [sic] that some could fall unimpeded to the centre of the Earth under gravity…Scientists have estimated that a stable black hole at the center of the earth could consume not only France but the whole planet in the very short time span of between 4 minutes and 30 seconds and 7 minutes.
I’m a little more inclined to believe the multiple accredited physics organizations around the world involved in the LHC project know what they’re doing than I am to believe some people I’ve never heard of from the Internet, but what do I know? The criticism has at least been strong enough to prompt CERN to post a kind of FAQ page about black holes, strangelets, and all manner of interesting potential doomsday scenarios that have been envisioned for LHC.
Despite the impressive power of the LHC in comparison with other accelerators, the energies produced in its collisions are greatly exceeded by those found in some cosmic rays. Since the much higher-energy collisions provided by Nature for billions of years have not harmed the Earth, there is no reason to think that any phenomenon produced by the LHC will do so.
Wouldn’t it just be something, though, if after centuries of war and pollution and all the other things mankind has done to compromise the planet, Armageddon was actually brought about by a bunch of guys in a physics lab?