View From Above

Oct 23 2012   7:59AM GMT

An Outage is an Outage

Ron Miller Ron Miller Profile: Ron Miller

Users don’t care if the service that went down was hosted in the cloud or you data center. An outage is an outage.

Yesterday, social networks, those that were still up, that is, lit up with complaints about Amazon going down and taking down many useful services with it. As someone who writes frequently about the cloud, and as a user, I understand the frustration people feel when a service is down, but I’m wondering how many Exchange servers went down yesterday and we didn’t hear a word.

The thing is when Amazon goes down it’s very public. When you’re an IT pro and you get called from your kid’s soccer game because one of your crucial systems has gone down, you and your colleagues aren’t likely yacking about it on Twitter: “Oh man, the Exchange Server at Acme Widgets” went down. Bill Smith in IT is is in deep doo-doo.” Not likely to see a tweet like that.

Now think about the likes of Foursquare, Pinterest, Reddit, Flipboard, Heroku, Airbnb and lots of others and no wonder it felt like the Internet was broken yesterday. As an article on The Next Web pointed out, the last major Amazon outage was in June when an electrical storm was the root of the problem. If you had 4 months between issues in your data center, and the last was due to natural causes, I’m guessing you would be happy with that.

That’s because things go wrong in private data centers all the time. Ask anyone who’s on call in IT and they’ll tell you some stories. The difference is when a public cloud platform goes down, it has a much greater impact and much more public view. You can’t hide when Twitter is blowing up about your company being down, and all those major properties are affected, yet the result is pretty much the same. Your customers are dead in the water.

When you think about the difference between a public cloud and a private one, the public one serves many different companies, while you’re private one serves your internal customers. Each one is offering a set of services. From the user perspective, if you’re down it doesn’t matter who’s running the data center. The bottom line is that you can’t do you work.

The New York Times reported that the issue was at the Northern Virginia data center, but it was still not clear what happened as of last night, and may take several days for Amazon to sort it out. It appears that everyone is back up and running this morning, so whatever it was has been resolved.

All Things Digital recently reported on a Forrester survey that suggested that people aren’t using the Internet as much as they used to, but the survey architects themselves said this might be a perception problem. Forrester analyst Gina Sverdlov told All Things Digital, “Despite the fact that they always have connected devices and are always online, they don’t really realize they’re online.”

And the same dynamic is likely in play in your data center. Most employees don’t know the source of their application, and don’t really care if it’s hosted in your data center or in the cloud, but they will care if they can’t do their work for whatever the reason because in the end an outage is an outage. The only difference is who’s getting paid to fix the problem.

 Photo Credit | Nature’s Images on Flickr. Used under Creative Commons License.

2  Comments on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Christine Herbert
    I think this may relate to the large solar flare that came our way a couple of days ago. See here:
    415 pointsBadges:
  • KenLC
    A couple of points about the Amazon outages:1. The June outage was technically caused by nature, but poor management "allowed" it to happen.  If they had switched the diesels manually 30-45 minutes before the storm hit, they may not have had any outage.  Any good data center manager knows to go to diesel BEFORE severe weather hits.  Human error?NOTE:  Microsoft had a major outage when they made a major network change on a Monday morning at 11 am.  Enterprise data centers (and traditional outsourcing providers) do not make changes during primary business hours.  Human error?2.  The level of communications with users during an Amazon outage has been horrible.  During the June outage, there were 3-4 hour gaps in status updates.  This would NEVER be the case in an enterprise's own data center.  Amazon (and the other mega-cloud providers) need to address this gap, even if it is an additional charge for "Platinum customer service", where clients can get guarenteed status updates every 15/30/60 minutes.3. More of a question:  have the mega-cloud providers allowed their "shared" environment to simply become too large?  Amazon had availability zones, but several outages have affected clients in multiple availability zones.  Should they be building a series of isolated smaller shared cloud environments? I believe several outages have been exacerbated by replication storms, where the automated recovery overwhelmed their networks.  I view most of these issues as a lack of maturity in the mega-cloud provider world. They need to actively look at how large enterprises and traditional outsourcing providers run their environments. One problem is that they are mixing consumer, SMB environments with larger enterprise, mission-critical environments and treating them with the same priority.  The impacts are dramatically different and these providers need to understand the differences and differentiate their services.I have every faith that the Cloud Computing world will get there....eventually.  Until progress is made, I believe larger enterprises will keep their mission critical environments in-house. 
    70 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: