Amazon’s S3 online storage service suffered an outage this morning for several hours, echoing the outage suffered by email service provider RIM last week. While RIM’s outage affected CrackBerry addicts with alternatives to email, the Amazon outage may have affected Web-based companies relying on S3′s storage to deliver core services. Not good.
However, one S3 user I talked to today, SmugMug CEO Don McAskill, said his site didn’t feel a thing. “None of our customers reported any issues–we haven’t seen any problems that are customer facing,” he said.
But there’s also an important factor that may have led to SmugMug’s resiliency: the fact that after another outage last year, SmugMug started keeping about 10% of its data in a hot cache on-site. “It could have been that the hot cache was adequate for the 2 or so hours it was going on, or it could have been that for some people the outage was intermittent,” he added.
Meanwhile, some users were still reporting issues as recently as five minutes ago on Amazon’s Web Services Developer Connection message board. According to an Amazon.com official response on the thread about an hour ago, “This morning’s issue has been resolved and the system is continuing to recover. However, we are currently seeing slightly elevated error rates for some customers, and are actively working to resolve this. More information on that to follow as we have it.”
Their businesses aren’t the same, but I think this ties in with what I was saying in my post about RIM’s Blackberry meltdown–as more and more data “eggs” put into centralized service provider “baskets”, more and more of them are going to get broken, especially as the service-provider market ramps up.
Or as TechCrunch put it:
This could just be growing pains for Amazon Web Services, as more startups and other companies come to rely on it for their Web-scale computing infrastructure. But even if the outage only lasted a couple hours, it is unacceptable. Nobody is going to trust their business to cloud computing unless it is more reliable than the data-center computing that is the current norm. So many Websites now rely on Amazon’s S3 storage service and, increasingly, on its EC2 compute cloud as well, that an outage takes down a lot of sites, or at least takes down some of their functionality. Cloud computing needs to be 99.999 percent reliable if Amazon and others want it to become more widely adopted.
Growing pains may have had something to do with it, according to Taneja Group analyst Eric Burgener. “There’s less of this going on than there used to be, but this is one of those things that gives people pause about services,” he said. A focus on secondary storage and storage for small companies has made this crop of service providers more successful than the SSP’s of the bubble days, and even where companies are relying on services like this for primary storage, Burgener argued that the services option is still the better bet. “For small internet businesses services are still a perfect play–they allow businesses to start up rapidly without the kind of capital expense or infrastructure they need for an in-house system.”