 




<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>IT Trenches &#187; service level</title>
	<atom:link href="http://itknowledgeexchange.techtarget.com/it-trenches/tag/service-level/feed/" rel="self" type="application/rss+xml" />
	<link>http://itknowledgeexchange.techtarget.com/it-trenches</link>
	<description></description>
	<lastBuildDate>Fri, 19 Nov 2010 14:37:59 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Google&#8217;s Postini services restored &#8211; cascading issues caused message delivery issues</title>
		<link>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-services-restored-cascading-issues-caused-message-performance-issues/</link>
		<comments>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-services-restored-cascading-issues-caused-message-performance-issues/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 12:51:15 +0000</pubDate>
		<dc:creator>Troy Tate</dc:creator>
				<category><![CDATA[antispam]]></category>
		<category><![CDATA[antivirus]]></category>
		<category><![CDATA[Cloud Services]]></category>
		<category><![CDATA[corrective actions]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[incident report]]></category>
		<category><![CDATA[root cause analysis]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[service level]]></category>
		<category><![CDATA[service outage]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/it-trenches/?p=286</guid>
		<description><![CDATA[I recently posted about Google’s Postini &#8211; cloud email security service &#8211; delivery issues. This is a follow-on post about the incident root cause analysis and corrective actions. Maybe there&#8217;s some lessons learned here that you can use in your organization&#8217;s service delivery. The impact on customer email services lasted more than 24 hours while [...]]]></description>
				<content:encoded><![CDATA[<p>I recently posted about <a href="http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-cloud-email-security-service-delivery-issues/" target="_blank">Google’s Postini &#8211; cloud email security service &#8211; delivery issues</a>. This is a follow-on post about the incident root cause analysis and corrective actions. Maybe there&#8217;s some lessons learned here that you can use in your organization&#8217;s service delivery.</p>
<p>The impact on customer email services lasted more than 24 hours while Postini engineers worked to resolve the issues. So, this was not an insignificant event. During this period, messages were delayed and users were not able to get to their quarantines to release messages trapped by filters. Administrators were also unable to access the administration console. The Postini support portal was unreachable at times due to the high volume of users trying to get updates on the event. The support phone line queues were very long and it took a long time to reach a support agent. Nothing like this has happened before in all of the years we have been a Postini customer.</p>
<p>I just received the incident report about the service disruption and wanted to share some of the information with IT Trenches readers.<span id="more-286"></span></p>
<p>The event started at about 6:25 PM GMT Tuesday, October 13. At this time customers began experiencing severe mail delays and disruption. Some senders were receiving delivery failure notifications after multiple resend attempts failed. About an hour later, automated monitoring systems detected the mail flow issues and traffic was automatically failed over to a secondary data center. Message flow was also poor through the secondary data center.</p>
<p>Trying to improve message flow, message traffic was directed across both primary and secondary data centers. Also, in an attempt to reduce impact on data center resources, access to the administration and other web consoles was disabled. The engineers were able to eventually discover the causes of the message flow issues.</p>
<ul>
<li>A message filter update inadvertantly caused performance issues.</li>
<li>Unusual malformed messages caused increased scan processing and in tandem with the bad message filter update caused issues with mail delivery.</li>
<li>Processing capacity was reduced due to a power supply failure on a database storage system. This increased latency in message processing.</li>
</ul>
<p>As you can see, this event was caused by a series of issues. The hardware was repaired and the filter update was revoked, but not before a lot of messages were either deferred or not delivered.</p>
<p>The corrective actions to prevent future outages due to similar conditions include:</p>
<ul>
<li>Create a standard procedure for reverting message filter updates. <em>Isn&#8217;t it always a good idea to be able to back out updates?</em></li>
<li>Improve monitoring of database server power failures. <em>Apparently this power failure was not detected by their current monitoring process.</em></li>
<li>Improve communication with customers during service outage events. <em>This would help relieve some frustration and help customers understand the severity and scope of the outage.</em></li>
</ul>
<p>Fortunately email services have now returned to normal. However, during this 24 hour period, there was a high level of frustration and concern about how to work around the impact on email delivery. I think most IT organizations can learn some lessons and improve service delivery when reading of incidents like this and seeing the lessons learned. I know my team will have some discussions around this event and work to improve the resiliency of service delivery.</p>
<p>Thanks for reading &amp; let&#8217;s continue to be good network citizens!</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-services-restored-cascading-issues-caused-message-performance-issues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Postini &#8211; cloud email security service &#8211; delivery issues</title>
		<link>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-cloud-email-security-service-delivery-issues/</link>
		<comments>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-cloud-email-security-service-delivery-issues/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 19:59:16 +0000</pubDate>
		<dc:creator>Troy Tate</dc:creator>
				<category><![CDATA[antispam]]></category>
		<category><![CDATA[antivirus]]></category>
		<category><![CDATA[Cloud Services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[service level]]></category>
		<category><![CDATA[service outage]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-cloud-email-security-service-delivery-issues/</guid>
		<description><![CDATA[Since very early today, US Eastern Daylight Time, Google&#8217;s Postini services have been experiencing some service issues. It is unknown as of this writing as to the cause or full scope of the issue. However, when logging into the Postini support portal, an administrator is given the following status indicators: We have been Postini customers [...]]]></description>
				<content:encoded><![CDATA[<p>Since very early today, US Eastern Daylight Time, Google&#8217;s Postini services have been experiencing some service issues. It is unknown as of this writing as to the cause or full scope of the issue. However, when logging into the Postini support portal, an administrator is given the following status indicators:</p>
<div id="attachment_283" class="wp-caption aligncenter" style="width: 231px"><a href="http://cdn.ttgtmedia.com/ITKE/uploads/blogs.dir/46/files/2009/10/20091013-postinistatus.jpg"><img class="size-medium wp-image-283" src="http://cdn.ttgtmedia.com/ITKE/uploads/blogs.dir/46/files/2009/10/20091013-postinistatus.jpg" alt="Postini system status on October 13, 2009" width="221" height="168" /></a><p class="wp-caption-text">Postini system status on October 13, 2009</p></div>
<p>We have been Postini customers over 4 years now and this is the first time an outage like this has happened. It&#8217;s not a full outage as messages are still coming in although at a trickling rate rather than normal expected volumes. This outage is so bad that my ability to login to the support portal is impacted. I receive either an internal 500 server error or &#8220;Too many connectionsCould Not Select DB&#8221;. A recent update notification said that a secondary Postini secondary data center has been enabled.</p>
<p>The recent <a href="http://www.channelinsider.com/c/a/Cloud-Computing/Gmail-Outage-Rattles-Cloud-Computing-Confidence/" target="_blank">GMAIL outage raised some concerns about cloud computing</a>. I wonder if today&#8217;s Google Postini outage is a symptom of some deeper Google service delivery problem.</p>
<p>Thanks for reading &amp; let&#8217;s continue to be good network citizens! Hopefully you are not trying to send me any messages, who knows how long it might take for the message to reach me today. Otherwise, let me know what you think here in the <a href="#comments" target="_self">comments</a>.</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/it-trenches/googles-postini-cloud-email-security-service-delivery-issues/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>IT services and The Three Chinese Curses</title>
		<link>http://itknowledgeexchange.techtarget.com/it-trenches/it-services-and-the-three-chinese-curses/</link>
		<comments>http://itknowledgeexchange.techtarget.com/it-trenches/it-services-and-the-three-chinese-curses/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 18:38:11 +0000</pubDate>
		<dc:creator>Troy Tate</dc:creator>
				<category><![CDATA[bot]]></category>
		<category><![CDATA[botnet]]></category>
		<category><![CDATA[career]]></category>
		<category><![CDATA[information security]]></category>
		<category><![CDATA[information technology]]></category>
		<category><![CDATA[infosec]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[network analysis]]></category>
		<category><![CDATA[professional]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[service level]]></category>
		<category><![CDATA[support]]></category>
		<category><![CDATA[trojan]]></category>

		<guid isPermaLink="false">http://itknowledgeexchange.techtarget.com/it-trenches/it-services-and-the-three-chinese-curses/</guid>
		<description><![CDATA[In America, October is the time when haunting, evil spirits and curses come to mind. Earlier today I posted a blog entry titled Can IT education bring an end to the recession? I used a quote that is attributed to a series of Chinese curses that go in ascending order of severity. After I used [...]]]></description>
				<content:encoded><![CDATA[<p>In America, October is the time when haunting, evil spirits and curses come to mind. Earlier today I posted a blog entry titled <a href="http://itknowledgeexchange.techtarget.com/it-trenches/can-it-education-bring-an-end-to-the-recession/" target="_blank">Can IT education bring an end to the recession?</a> I used a quote that is attributed to a series of Chinese curses that go in ascending order of severity. After I used it, I pondered on the other two curses and their applicability to IT services.</p>
<p>According to <a href="http://en.wikipedia.org/wiki/May_you_live_in_interesting_times" target="_blank">Wikipedia</a>, the three curses are:</p>
<blockquote>
<ul>
<li>May you live in interesting times.</li>
<li>May you come to the attention of those in authority (sometimes rendered May the government be aware of you)</li>
<li>May you find what you are looking for</li>
</ul>
</blockquote>
<p style="text-align: left"><span id="more-282"></span>These curses have no time frame attached, so they could be true yesterday, today or tomorrow. The first curse about living in interesting times is very true for most IT shops. The challenges never seem to end but the funds do. The business requirements are fuzzy but the support requirements are real. What makes for interesting times in your role and your organization?</p>
<p style="text-align: left">The second curse could be attached to information security services. It most assuredly should apply to those who employ botnets and trojans for unrighteous financial gain. The most recent reports of trojans and bots that can dynamically alter web pages during a banking transaction are very scary. Fortunately, information security professionals are aware of the threat and are working on addressing it.</p>
<p style="text-align: left">The third curse may be applicable to those who support networks and need to answer the ever present question about &#8220;why is the network slow?&#8221; It seems like we are always pressed into identifying the reasons for slow application and service performance without having the toolset to do proper diagosis and troubleshooting. It then becomes a challenge to find what we are looking for.</p>
<p style="text-align: left">Maybe IT is a cursed profession after all. What do you think? Is there a more cursed profession?</p>
<p style="text-align: left">Thanks for reading and let&#8217;s continue to be good network citizens!</p>
<!-- wpms-network-global-inserts -->]]></content:encoded>
			<wfw:commentRss>http://itknowledgeexchange.techtarget.com/it-trenches/it-services-and-the-three-chinese-curses/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
