The Troposphere

Mar 23 2009   5:51PM GMT

The Tale of Three Cloud SLA’s

JohnMWillis John Willis Profile: JohnMWillis

Wikpedia says…

The SLA records a common understanding about services, priorities, responsibilities, guarantees and warranties. Each area of service scope should have the ‘level of service’ defined. The SLA may specify the levels of availability, serviceability, performance, operation, or other attributes of the service such as billing. The ‘level of service’ can also be specified as ‘target’ and ‘minimum’, which allows customers to informed what to expect (the minimum), whilst providing a measurable (average) target value that shows the level of organization performance. In some contracts penalties may be agreed in the case of non compliance of the SLA (but see ‘internal’ customers below).It is important to note that the ‘agreement’ relates to the services the customer receives, and not how the service provider delivers that service.

There has been a lot of hype around the discussion of Service Level Agreements (SLA) in the cloud as of late. SLA hype has been around as long as I can remember and definitely predates the cloud. In the enterprise you will typically hear numbers like 99.999 or terms like “Five Nines”. Five nines is sort of the gold standard in the enterprise, equating to about 5 minutes of outage per year. However, whenever I think of the five nine metric it reminds me what one of my old Six Sigma mentors used to say about five nines – “Five Nines equates to about one commercial plane crash per day for a year.”

The age old problem of negotiating an SLA has always been a difficult task for any client. One of the main contention points in negotiating an SLA is around the outage credits and how they are applied. Does the customer get a reimbursement for the lost services or is the SLA applied to a future credit? In the classic hosting example, a future credit might include hours added after your service contract terminates. The credit for future services is always a suspect model for an SLA. However, with some of the newer pay as you go models in the cloud, it is easier to apply these types of credits (e.g., next month’s bill). However, in any case the test of a great SLA is one that gives a customer a direct reimbursement for lost services. Another area of difficulty when negotiating an SLA is defining the SLA outages. A few times I have been on the provider side of defining an SLA, and it is always in the best interest of the provider to supply extremely clear SLA definitions. Detailed reports are always a good best practice for the provider and the customer. Without clear definitions and documented reports, in most cases, an SLA will be useless. Here are three main areas I typically focus on when discussing an SLA:

  1. Defining the outage.
  2. How does a customer prove an outage to get credit?
  3. How does the credit get applied?

I thought I might take a look at some of the top Cloud providers who provide server instances and see what their SLAs are in relation to the aforementioned three areas.

Amazon Web Services EC2

The Amazon Web Services EC2 SLA can be found at: http://aws.amazon.com/ec2-sla/ and describes the details of the AWS EC2cSLA. In the AWS SLA EC2 agreement, Amazon claims a 99.95% SLA. Let’s break down their SLA based on the three areas described above.

Defining the outage

Basically a defined outage in AWS is very confusing at best. It basically means that you can not launch a replacement instance within a 5 minute period while at least two availability zones within the same region are down. I take this to mean that if if two out of three data centers are available and you still can’t launch and/or run any application on your EC2 server, it will not be defined as an outage. To further complicate the matter, AWS calculates their 99.95 based on the previous 365 days. If the customer doesn’t have 365 prior days of service with AWS the prior days are calculated as 100% available. This means if you are a new customer (say 2 months), and a catastrophic event happens to hit two of the three US based data centers and you can’t start an instance for three days, you would get a 10% credit for only one day’s prorated costs for EC2 services. The first two days would not be below the 12 month period 99.95 outage percentage. Also complicating the AWS EC2 SLA is the new reserve instances’ up front fees are not eligible for credits concerning outages. Whoops, they have an exclusion for that scenario described above – “caused by factors outside of our reasonable control, including any force majeure event.”

How does a customer prove an outage to get credit?

In order to receive a credit for a defined AWS EC2 outage a customer has to capture, document, and send a request to Amazon to be processed. In other words, the onus is on the customer to prove the outage. AWS does not provide any interface or report documentation to help the customer define their outages. Furthermore, Amazon requires the customer to document the region, all instance ids, and provide service logs. The customer also is required to cleanse confidential information from the logs and all of this must be done within a 30 day period of the outage.

How does the credit get applied?

First off, the AWS credit gets applied against future credits and is not a reimbursement of lost services. As previously stated it is the customer’s responsibility to provide all of the proof and do it with a 30 day period. If the customer supplies all of the documentation and Amazon approves the outage that qualifies for the below 99.95%, they will then apply a 10 percent discount on the next month’s bill.

SLA Grade “C”

AWS puts a heavy burden on the client to prove the outage. The terms of the SLA are difficult at best to understand.

RackSpace/Mosso Cloud Sites

Cloud Sites was formally called Mosso. Cloud Sites is a service that provides a platform based cloud where users share scalable back end load balancing, web services, and databases clusters. The Cloud Sites SLA can be found here: http://www.mosso.com/sla.jsp. Let’s break down their SLA based on the three areas described above.

Defining the outage

Based on the agreement, the definition of an outage from Cloud Sites is extremely simple and is described in less than 150 words. The AWS EC2 SLA is over 1000 words. Simply put, if you open an support incident report with Cloud Sites, they will credit you with a 1 day prorated credit for every hour of downtime. Supposedly all you have to do is tell the rep that you have an outage. They should then start the incident and calculate the outage.

How does a customer prove an outage to get credit?

Here is the rub, they don’t tell you about recording the outage when you call. You have to tell them to record the incident as an outage, and then you will have to continuously monitor the situation and call back support to confirm the ending time. Cloud Sites does not do this automatically for you. The reason I know this is because one of my blog sites is hosted on Mosso/Cloud Sites. I have never been given an automatic credit even though I have had at least 5 or 6 outages over the last year. I have also called in at least 5 or six times and an incident report was never discussed. In fact, on most calls they say that a specific cluster is down and that it should be up soon with no mention of a start time or stop time for recording the outage. You have to point this out to them. One of the things that has always annoyed me about Mosso/Cloud Sites is that they never notify you when the outage is fixed, even if you call in and ask about the outage. You have to find that out for yourself. This makes it extremely difficult to document an outage. Another problem with Cloud Sites is that you don’t have access to the servers the way you do with AWS EC2, so it is difficult to gather the appropriate logs to document the outage.

How does the credit get applied?

Cloud Sites outages get applied against future credits and are not a reimbursement of lost services. The Mosso Cloud Sites SLA is a great example of where less is actually less. The brevity of their SLA seems attractive at first, however, they do not have a defined process for requesting a credit. At least AWS has a documented email where you can send your detailed information. There is nothing in the Cloud Sites’ short 58 word SLA that tells you how to go about getting a refund. The client would have to assume they could call Cloud Sites support and request a refund, assuming they actually documented the start and end time of the outage. All and all, it seems like a very confusing process that is supposed to be “as described” very simple. If you can get through all of the above, then Cloud Sites will credit you a one day prorated credit for every 60 minutes of documented downtime.

SLA Grade “B minus- ”

Cloud Sites offers a simple plan; however, they need to be more clear on their SLA. In the SLA, they state that if you open up an incident report they will start the clock. However, they don’t even have a ticketing system for customers to input incidents. You have to call or start a live chat. Also, they do not notify you when the outage is cleared and this makes it difficult for a customer to keep track of their outages.

3Tera

3Tera recently announced a new 99.999 SLA for their Virtual Private Datacenter (VPDC) customers. The SLA announcement can be found at the following location: http://www.3tera.com/News/Press-Releases/Recent/3Tera-Introduces-the-First-Five-Nines-Cloud-Computing.php. Let’s break down their SLA based on the three areas described above.

Defining the outage

According to the 3Tera announcement, the customer does not have to define the outage. 3Tera will automatically detect and calculate outages. The AppLogic Cloud Computing Platform constantly monitors and reports the availability of the system and instantly alerts 3Tera’s operations team of critical issues. Some might think that the 3Tera five nines announcement is the significant part of their SLA compared to AWS 99.95; however, it’s the automatic recording of the outage that is an unprecedented feature of their SLA. While other cloud vendors require the customer to prove the outage times, 3Tera automates this process.

How does a customer prove an outage to get credit?

Short and sweet..automatically with no human intervention.

How does the credit get applied?

3Tera’s credit gets applied to the current month’s bill. VPDC customers automatically receive SLA service credits for any calendar month where availability falls below the targeted 99.999 percent. If availability is anywhere between 99.999 percent and 99.9 percent, a 10 percent credit applies to the whole VPDC service for the entire month. If availability is lower than 99.9 percent, a 25 percent credit applies.

SLA Grade “A minus- ”

I would have otherwise given 3Tera a solid “A”; however, the service has just been announced and is not available yet. When they actually post the SLA page and the actual customer contract, then I will adjust this rating accordingly

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: