Posted by: CarlBrooks
AWS outage, Blogger outage that nobody cared about, BPOS outage, BPOS postmortem, cloud communication, Cloud outages, lights out, subway cars
It’s been a rough patch for cloud computing in the “perceptions of reliability” department. Gremlins working overtime caused EBS to fail at Amazon, taking down a bunch of social media sites, among others. Naturally, that got a lot of attention, much as throwing an alarm clock down a wind tunnel will make a disproportionate amount of noise.
As the dust was settling and the IT media echo chamber was polishing off the federally mandated outrage/contrarian outrage quota for all kerfuffles involving Anything 2.0, more outages struck, including a Blogger outage that no one in IT really cared about, although this reporter was outraged that it temporarily spiked a favorite blog.
While nobody was caring about Blogger, Microsoft’s hosted (cloud) Exchange and collaboration platform, Business Productivity Online Services (BPOS, now a part of Office 365) went down, which people in IT most assuredly did care about. Especially, as many of the forum posters said, if they had recently either been sold or sold their organization on “Microsoft cloud” as a preferable option to in-house Exchange.
“I’ve been with Microsoft online for two weeks now, two outages in that time and the boss looks at me like I’m a dolt. I was THIS close to signing with Intermedia,” said one poster. That’s the money quote for me; Intermedia is a very large hosted Exchange provider and this (probably) guy was torn between hosted Exchange and BPOS. Now he feels like he might have picked wrong: notice he didn’t discuss the possibility of installing on-prem Exchange, just two service options.
Microsoft posted a fairly good postmortem on the outage in record time, apparently taking heed from the vicious pillorying AWS got for its lack of communication (AWS’ postmortem was also very good, just many days after the fact):
“Exchange service experienced an issue with one of the hub components due to malformed email traffic on the service. Exchange has the built-in capability to handle such traffic, but encountered an obscure case where that capability did not work correctly.”
Anyone who’s had to administer Exchange feels that pain, let me tell you. It also tells us BPOS-S is using Exchange 2000 (That is a JOKE, people).
What ties all these outages together is not their dire effect on the victims. That’s inconsequential in the long term, and won’t stop people from getting into cloud services (there are good reasons to call BPOS cloud instead of hosted application services but that’s another blog entirely). It’s not the revelation that even experts make mistakes in their own domain, or that Amazon and Microsoft and Google are largely still feeling their way around on exactly what running a cloud means.
It’s the communication. If anything could more clearly delineate “cloud service” from “hosted service,” it’s the lack of transparency, lack of customer touch, and the unshakeable, completely relative perception of users across the board, that when outages occur, they are on their own.
Ever been in a subway car and the power dies? I grew up in Boston, so that must have happened hundreds of times to me. People’s fear and unease grow directly proportional to the time it takes the conductor to yell out something to show they’ve got the situation in hand. Everything is always fine, the outage is temporary, no real harm done, but people only start to freak when they get no assurance from the operator.
Working in IT and having a service provider fall over is the same thing, only you’re going to get fired, not just have a loud sweaty person flop all over you in the dark (OK, that may happen in
somea lot of IT shops). Your boss doesn’t care you aren’t running Microsoft’s data center; you’re still responsible. Hosters have learned from long experience that they need to be, or at least provide the appearance of, being engaged when things go wrong, so their users can have something to tell their bosses. I used to call up vendors just to be able to tell my boss I’d been able to yell at “Justin our engineer” or “Amber in support” and relay the message.
Cloud hasn’t figured out how to address that yet; either we’re all going to get used to faceless, nerve-wracking outages or providers are going to need to find a way to hit that gap between easy, anonymous, economical and enterprise ready.