Saas archives - IT Trenches

IT Trenches:

saas

Oct 15 2009   12:51PM GMT

Google’s Postini services restored - cascading issues caused message delivery issues



Posted by: Troy Tate
Google, cloud services, saas, antispam, antivirus, service outage, service level, incident report, root cause analysis, corrective actions

I recently posted about Google’s Postini - cloud email security service - delivery issues. This is a follow-on post about the incident root cause analysis and corrective actions. Maybe there’s some lessons learned here that you can use in your organization’s service delivery.

The impact on customer email services lasted more than 24 hours while Postini engineers worked to resolve the issues. So, this was not an insignificant event. During this period, messages were delayed and users were not able to get to their quarantines to release messages trapped by filters. Administrators were also unable to access the administration console. The Postini support portal was unreachable at times due to the high volume of users trying to get updates on the event. The support phone line queues were very long and it took a long time to reach a support agent. Nothing like this has happened before in all of the years we have been a Postini customer.

I just received the incident report about the service disruption and wanted to share some of the information with IT Trenches readers. Continued »

Oct 13 2009   7:59PM GMT

Google’s Postini - cloud email security service - delivery issues



Posted by: Troy Tate
Google, cloud services, saas, antispam, antivirus, service outage, service level

Since very early today, US Eastern Daylight Time, Google’s Postini services have been experiencing some service issues. It is unknown as of this writing as to the cause or full scope of the issue. However, when logging into the Postini support portal, an administrator is given the following status indicators:

Postini system status on October 13, 2009

Postini system status on October 13, 2009

We have been Postini customers over 4 years now and this is the first time an outage like this has happened. It’s not a full outage as messages are still coming in although at a trickling rate rather than normal expected volumes. This outage is so bad that my ability to login to the support portal is impacted. I receive either an internal 500 server error or “Too many connectionsCould Not Select DB”. A recent update notification said that a secondary Postini secondary data center has been enabled.

The recent GMAIL outage raised some concerns about cloud computing. I wonder if today’s Google Postini outage is a symptom of some deeper Google service delivery problem.

Thanks for reading & let’s continue to be good network citizens! Hopefully you are not trying to send me any messages, who knows how long it might take for the message to reach me today. Otherwise, let me know what you think here in the comments.