Google’s Postini services restored - cascading issues caused message delivery issues
Posted by: Troy Tate
I recently posted about Google’s Postini - cloud email security service - delivery issues. This is a follow-on post about the incident root cause analysis and corrective actions. Maybe there’s some lessons learned here that you can use in your organization’s service delivery.
The impact on customer email services lasted more than 24 hours while Postini engineers worked to resolve the issues. So, this was not an insignificant event. During this period, messages were delayed and users were not able to get to their quarantines to release messages trapped by filters. Administrators were also unable to access the administration console. The Postini support portal was unreachable at times due to the high volume of users trying to get updates on the event. The support phone line queues were very long and it took a long time to reach a support agent. Nothing like this has happened before in all of the years we have been a Postini customer.
I just received the incident report about the service disruption and wanted to share some of the information with IT Trenches readers. Continued »


