The SLA records a common understanding about services, priorities, responsibilities, guarantees and warranties. Each area of service scope should have the ‘level of service’ defined. The SLA may specify the levels of availability, serviceability, performance, operation, or other attributes of the service such as billing. The ‘level of service’ can also be specified as ‘target’ and ‘minimum’, which allows customers to informed what to expect (the minimum), whilst providing a measurable (average) target value that shows the level of organization performance. In some contracts penalties may be agreed in the case of non compliance of the SLA (but see ‘internal’ customers below).It is important to note that the ‘agreement’ relates to the services the customer receives, and not how the service provider delivers that service.
There has been a lot of hype around the discussion of Service Level Agreements (SLA) in the cloud as of late. SLA hype has been around as long as I can remember and definitely predates the cloud. In the enterprise you will typically hear numbers like 99.999 or terms like “Five Nines”. Five nines is sort of the gold standard in the enterprise, equating to about 5 minutes of outage per year. However, whenever I think of the five nine metric it reminds me what one of my old Six Sigma mentors used to say about five nines – “Five Nines equates to about one commercial plane crash per day for a year.”
The age old problem of negotiating an SLA has always been a difficult task for any client. One of the main contention points in negotiating an SLA is around the outage credits and how they are applied. Does the customer get a reimbursement for the lost services or is the SLA applied to a future credit? In the classic hosting example, a future credit might include hours added after your service contract terminates. The credit for future services is always a suspect model for an SLA. However, with some of the newer pay as you go models in the cloud, it is easier to apply these types of credits (e.g., next month’s bill). However, in any case the test of a great SLA is one that gives a customer a direct reimbursement for lost services. Another area of difficulty when negotiating an SLA is defining the SLA outages. A few times I have been on the provider side of defining an SLA, and it is always in the best interest of the provider to supply extremely clear SLA definitions. Detailed reports are always a good best practice for the provider and the customer. Without clear definitions and documented reports, in most cases, an SLA will be useless. Here are three main areas I typically focus on when discussing an SLA:
- Defining the outage.
- How does a customer prove an outage to get credit?
- How does the credit get applied?
I thought I might take a look at some of the top Cloud providers who provide server instances and see what their SLAs are in relation to the aforementioned three areas.
Amazon Web Services EC2
The Amazon Web Services EC2 SLA can be found at: http://aws.amazon.com/ec2-sla/ and describes the details of the AWS EC2cSLA. In the AWS SLA EC2 agreement, Amazon claims a 99.95% SLA. Let’s break down their SLA based on the three areas described above.
Defining the outage
Basically a defined outage in AWS is very confusing at best. It basically means that you can not launch a replacement instance within a 5 minute period while at least two availability zones within the same region are down. I take this to mean that if if two out of three data centers are available and you still can’t launch and/or run any application on your EC2 server, it will not be defined as an outage. To further complicate the matter, AWS calculates their 99.95 based on the previous 365 days. If the customer doesn’t have 365 prior days of service with AWS the prior days are calculated as 100% available. This means if you are a new customer (say 2 months), and a catastrophic event happens to hit two of the three US based data centers and you can’t start an instance for three days, you would get a 10% credit for only one day’s prorated costs for EC2 services. The first two days would not be below the 12 month period 99.95 outage percentage. Also complicating the AWS EC2 SLA is the new reserve instances’ up front fees are not eligible for credits concerning outages. Whoops, they have an exclusion for that scenario described above – “caused by factors outside of our reasonable control, including any force majeure event.”
How does a customer prove an outage to get credit?
In order to receive a credit for a defined AWS EC2 outage a customer has to capture, document, and send a request to Amazon to be processed. In other words, the onus is on the customer to prove the outage. AWS does not provide any interface or report documentation to help the customer define their outages. Furthermore, Amazon requires the customer to document the region, all instance ids, and provide service logs. The customer also is required to cleanse confidential information from the logs and all of this must be done within a 30 day period of the outage.
How does the credit get applied?
First off, the AWS credit gets applied against future credits and is not a reimbursement of lost services. As previously stated it is the customer’s responsibility to provide all of the proof and do it with a 30 day period. If the customer supplies all of the documentation and Amazon approves the outage that qualifies for the below 99.95%, they will then apply a 10 percent discount on the next month’s bill.
SLA Grade “C”
AWS puts a heavy burden on the client to prove the outage. The terms of the SLA are difficult at best to understand.
RackSpace/Mosso Cloud Sites
Cloud Sites was formally called Mosso. Cloud Sites is a service that provides a platform based cloud where users share scalable back end load balancing, web services, and databases clusters. The Cloud Sites SLA can be found here: http://www.mosso.com/sla.jsp. Let’s break down their SLA based on the three areas described above.
Defining the outage
Based on the agreement, the definition of an outage from Cloud Sites is extremely simple and is described in less than 150 words. The AWS EC2 SLA is over 1000 words. Simply put, if you open an support incident report with Cloud Sites, they will credit you with a 1 day prorated credit for every hour of downtime. Supposedly all you have to do is tell the rep that you have an outage. They should then start the incident and calculate the outage.
How does a customer prove an outage to get credit?
Here is the rub, they don’t tell you about recording the outage when you call. You have to tell them to record the incident as an outage, and then you will have to continuously monitor the situation and call back support to confirm the ending time. Cloud Sites does not do this automatically for you. The reason I know this is because one of my blog sites is hosted on Mosso/Cloud Sites. I have never been given an automatic credit even though I have had at least 5 or 6 outages over the last year. I have also called in at least 5 or six times and an incident report was never discussed. In fact, on most calls they say that a specific cluster is down and that it should be up soon with no mention of a start time or stop time for recording the outage. You have to point this out to them. One of the things that has always annoyed me about Mosso/Cloud Sites is that they never notify you when the outage is fixed, even if you call in and ask about the outage. You have to find that out for yourself. This makes it extremely difficult to document an outage. Another problem with Cloud Sites is that you don’t have access to the servers the way you do with AWS EC2, so it is difficult to gather the appropriate logs to document the outage.
How does the credit get applied?
Cloud Sites outages get applied against future credits and are not a reimbursement of lost services. The Mosso Cloud Sites SLA is a great example of where less is actually less. The brevity of their SLA seems attractive at first, however, they do not have a defined process for requesting a credit. At least AWS has a documented email where you can send your detailed information. There is nothing in the Cloud Sites’ short 58 word SLA that tells you how to go about getting a refund. The client would have to assume they could call Cloud Sites support and request a refund, assuming they actually documented the start and end time of the outage. All and all, it seems like a very confusing process that is supposed to be “as described” very simple. If you can get through all of the above, then Cloud Sites will credit you a one day prorated credit for every 60 minutes of documented downtime.
SLA Grade “B minus- ”
Cloud Sites offers a simple plan; however, they need to be more clear on their SLA. In the SLA, they state that if you open up an incident report they will start the clock. However, they don’t even have a ticketing system for customers to input incidents. You have to call or start a live chat. Also, they do not notify you when the outage is cleared and this makes it difficult for a customer to keep track of their outages.
3Tera recently announced a new 99.999 SLA for their Virtual Private Datacenter (VPDC) customers. The SLA announcement can be found at the following location: http://www.3tera.com/News/Press-Releases/Recent/3Tera-Introduces-the-First-Five-Nines-Cloud-Computing.php. Let’s break down their SLA based on the three areas described above.
Defining the outage
According to the 3Tera announcement, the customer does not have to define the outage. 3Tera will automatically detect and calculate outages. The AppLogic Cloud Computing Platform constantly monitors and reports the availability of the system and instantly alerts 3Tera’s operations team of critical issues. Some might think that the 3Tera five nines announcement is the significant part of their SLA compared to AWS 99.95; however, it’s the automatic recording of the outage that is an unprecedented feature of their SLA. While other cloud vendors require the customer to prove the outage times, 3Tera automates this process.
How does a customer prove an outage to get credit?
Short and sweet..automatically with no human intervention.
How does the credit get applied?
3Tera’s credit gets applied to the current month’s bill. VPDC customers automatically receive SLA service credits for any calendar month where availability falls below the targeted 99.999 percent. If availability is anywhere between 99.999 percent and 99.9 percent, a 10 percent credit applies to the whole VPDC service for the entire month. If availability is lower than 99.9 percent, a 25 percent credit applies.
SLA Grade “A minus- ”
I would have otherwise given 3Tera a solid “A”; however, the service has just been announced and is not available yet. When they actually post the SLA page and the actual customer contract, then I will adjust this rating accordingly
The EPIC has formally asked the Federal Trade Commission to open an investigation into Google’s cloud computing services — including Gmail, Google Docs, and Picasa — to determine “the adequacy of the privacy and security safeguards.” The petition follows the recent report of a breach of Google Docs.
Here’s a link to the EPIC petition against Google.
Developers looking to use Windows Azure to build apps this weekend faced the blue screen of death as a 22-hour outage locked them out of the service.
The outage is a reminder to users tapping into cloud services that the availability of the system is out of their control. Microsoft has so far declined to comment on what happened.
A few weeks ago Rackspace made an announcement about hosting the first PCI complaint cloud solution. PCI is short for the Payment Card Industry Data Security Standard, which is a worldwide security standard for merchants who store, process or transmit credit card holder data. Rackspaces’s Cloudsites (formally called Mosso) was used to enable the online merchant, The Spreadsheet Store , to move to the cloud without having to compromise the security of their online transactions (i.e., PCI compliance). What should have been a great success story for the Rackspace/Mosso team turned into a little bit of a PR debacle.
Some of the cloud security experts and thought leaders took exception with the Rackspace/Mosso titled “Cloud Hosting is Secure for Take-off: Mosso Enables The Spreadsheet Store, an Online Merchant, to become PCI Compliant”, and they called out Rackspace/Mosso on their bold claim of being the first cloud provider to offer PCI compliancy. Craig Balding, an IT Security Practitioner and cloud expert, was the first blogger to point out in his blog article “What Does PCI Compliance in the Cloud Really Mean?”:
Mosso/Rackspace recently announced they have “PCI enabled” a Cloud Site’s customer that needed to accept online credit card payments in return for goods (i.e. a merchant).
However, the website hosted on Mosso’s Cloud, doesn’t actually receive, store, process, transmit any data that falls under the requirements of PCI.
Or to put it another way, its ‘compliance’ through not actually needing to be…
Craig goes on to say that Rackspace’s “PCI How To” document is just an “implementation of an age-old Internet architecture that involves redirecting customers wishing to pay for the contents of their online basket to an approved and compliant online payment gateway.”
Christopher Hoff, another cloud and security expert, also calls an objection to the aforementioned Rackspace/Mosso PCI hype by stating in his blog, “How To Be PCI Compliant in the Cloud…”, the following:
So after all of those lofty words relating to “…preparing the Cloud for…online transactions,” what you can decipher is that Mosso doesn’t seem to provide services to The Spreadsheet Store which are actually in scope for PCI in the first place!*
The Spreadsheet store redirects that functionality to a third party card processor!
So what this really means is if you utilize a Cloud based offering and don’t traffic in data that is within PCI scope and instead re-direct/use someone else’s service to process and store credit card data, then it’s much easier to become PCI compliant. Um, duh.
Ben Cherian of Ben Cherian’s blog, also goes on to refer to the Rackspace/Mosso antics as a trick when he states the following:
When I saw this, I wondered how it was possible, but as I read closer it became clear that it was just a trick! It seems that their “PCI-compliant” solution requires Mosso not to store any information that requires PCI compliance. Instead they offload the burden of compliance to a third-party payment gateway (Authorize.Net).
However, keeping it real, Greg Hrncir the Director of Operations at Mosso shot back with the following comment on Craig’s blog:
The truth is that we are the first Cloud, that we know of, that enabled its Cloud customers to gain PCI compliance using multiple technologies. The future of Cloud technologies is full of these types of hybrid solutions that combine the best of both worlds. The goal for a customer and online merchant, is to get PCI compliance, not be purist in terms of technology. On line merchants want to leverage the Cloud for scaling, and this is a good way to do it by combining both worlds.
In summary, I think they were all right. Craig, Chris, and Ben were perfectly within bounds to call out the titled Rackspace/Mosso hype and in doing so they all did a brilliant job educating us all on what PCI really means in or outside of a cloud. However, Greg Hrncir, also points out that what Mosso did was a first-in movement and as a hybrid model they are setting the building blocks for otherwise roadblocked initiatives. In my opinion, what Rackspace has done is significant from a “cloud” industry standpoint; however, being “cloud” leaders they should have used a little bit more discretion in their announcement. With all the hype already associated with cloud computing it is important for the leaders in this space to keep the discussion a little bit grounded. However, this reminds me of an old friend of mine, that every time he would get into a fight he would stick his chin out and say “hit me”. In the Mosso/PCI debate it looks like Mosso got hit.
What do cloud computing and teenage sex have in common?
Everyone talks about, few actually do it, and even fewer get it right, according to the following story.
Check it out:
Eli Lilly and Co. tapped into Amazon Web Services to crunch a vast chunk of data associated with the development of a new drug. Its research time collapsed from three months to two hours, a huge advantage in the highly competitive pharmaceutical business.
The company repatriated the data over a secure line that connected end-to-end with Amazon. But the firm found there was no way to prove that all its data had left the Amazon cloud. It had to take Amazon’s word for it, which raised security concerns.
Read the full story here:
If anyone can solve the problems associated with selling cloud computing to enterprises, it should be IBM. Today the company announced a step in that direction with a slew of new services and partnerships to build cloud services for businesses.
The key challenges to overcome are:
- 1) Identity management – who sees my application and data in the cloud? Security and regulatory requirements are crucial.
- 2) Which workloads and applications are appropriate for cloud computing?
- 3) How does my application in the cloud get access to my data which is still stored onsite?
IBM doesn’t have all these answers yet but says it is working with the following organizations to understand these issues.
Elizabeth Arden, Nexxera, The United States Golf Association, and Indigo Bio Systems sign on as new IBM cloud computing customers
IBM Global Services will offer data protection software “as a service” through the cloud, in addition to a new IBM cloud environment for businesses to safely test applications
First live demonstration of a global “overflow cloud” – IBM and Juniper Networks to install hybrid cloud capabilities across IBM’s worldwide Cloud Labs for customer engagements. This is to let users bridge between private clouds and IBM’s public cloud offerings to turn up resources as needed.
At 13 worldwide cloud centers, IBM offers server capacity on demand, online data protection, and Lotus e-mail and collaboration software.
IBM Rational AppScan 7.8 lets users continuously monitor the Web services they publish into the cloud to check that they are secure, compliant and meet business policies.
Service Management Center for Cloud Computing contains a set of offerings including Tivoli Provisioning Manager 7.1 and the new Tivoli Service Automation Manager, to automate the deployment and management of private clouds.
Finally, IBM said it will launch a Tivoli Storage as a Service offering through its Business Continuity & Resiliency Services cloud. Not available until late in 2009, users will be able to consume Tivoli data protection technologies via a cloud and pay for only what they use. EMC and Symantec are already offering these kinds of services.
From these announcements it looks like IBM will be able to help businesses figure out which workloads to shift into the cloud, but there are no details yet on how it will ensure identity management, security and compliance.
An industry insider close to Amazon’s Web Services (AWS) business unit told us the company claims to have 400,000 customers using its web services offering.
AWS includes EC2, the compute-on-demand offering, S3, the hosted storage service, SimpleDB for hosted databases, Simple Queue Service (SQS) a communication channel for developers to store messages and CloudFront, which is a content delivery network.
Amazon has not publicly discussed much detail about its customers and how they are using AWS. For instance, of these 400,000 users, how many are using EC2 and S3, just S3 or just EC2? Is anyone using SimpleDB or CloudFront yet? How many of these users were one-time customers? My hunch is that 400,000 number includes any customer that has touched AWS regardless of whether they are still using it.
In conversations with IT users, it’s clear they are interested in these services, but need more reference cases on how to use it. A great success story goes a long way.
During a webinar on cloud computing today, James Staten, principal analyst at Forrester Research said enterprises need more transparency from EC2 to show that it can meet SLAs. “The predictability [of the service] is not good enough for business,” he said, noting that EC2 had two lengthy outages in 2008. Small businesses and gaming and entertainment companies are the biggest adopters of EC2, he said. The former can’t afford to build their own datacenters, while gaming and movie companies require extra infrastructure around the release of new games and movies, which can be setup and torn down as needed.
Staten said enterprises are using cloud services like EC2 for R&D projects, quick promotions, partner integration and colloboration and new ventures. He called for more companies to share how they are using these services and recommended that IT shops begin to experiment with it. Staten suggested endorsing one to two clouds as “IT approved” and establishing an internal policy for using these services. He urged IT organizations to let cloud providers know what you want and what’s more important to you? Secure enterprise links, standards, SLA expectations, levels of support (24/7 phone support, for example)? My guess would be all of the above. If you’d rather, I can hammer on the vendors, so let me know.
VMware, Inc. is on a mission to show companies that they can get the benefits of cloud computing without handing their mission critical applications over to an outside provider; with the upcoming Virtual Data Center-Operating System (VDC-OS), IT will be able to create secure, private cloud environments.
The yet to be released VDC-OS represents the evolution of the VMware Infrastructure; the platform, which is due for release sometime this year, will transform traditional data centers into internal cloud environments. The business case for creating an private cloud is less complexity in the data center; software like VDC-OS will virtualize and automate systems to the point that there is less ‘knob turning’ and more time spent on tasks that improve business, said VMware Sr. Director of Product Marketing, Bogomil Balkansky.
“Too much of IT budgets are spent on management tasks and keeping the lights on, instead of on tasks that actually improve business,” Balkansky said. “Infrastructure complexities should not get in the way of this, but they do.”
While external clouds like Amazon EC2 offer the same benefits of internal clouds, VMware is betting that large enterprises won’t send their mission critical applications outside the four walls of their data centers to these providers. Instead, they will want to create private cloud compute infrastructures using software like VDC-OS.
“There are security challenges with public clouds; enterprises don’t trust [outsiders] with their customer and financial data,” Balkansky said. “We want to transfer the notion of cloud computing to internal data center operations.”
VMware is also hosting a webinar on January 29 about Internal Cloud Computing, if you want to hear more on this.
Balkansky said private cloud computing environments will gain traction in large data centers, but that could just be a self-serving prophecy. After all, most public cloud providers won’t pay for VMware software and use free and open source Xen instead; hence, VMware has no place to go but within the enterprises that already know and love VMware.
While VMware is on an private cloud advocacy mission, as the largest virtualization provider on the planet, it can’t ignore the need to play well with public clouds. That’s where VMware’s vCloud initiative comes into play; it will eventually allow VMware users to move their virtual machines on demand between their datacenters and cloud service providers, and over 200 partners have signed up to support vCloud so far, Balkansky said.
Hifn says its new Express DS4100 NIC card is “optimized for the cloud”. What’s next? Cables? Batteries? My desk?
The problem with the cloud is speed in terms of uploads and downloads, says Hifn’s PR person.
“Try uploading a terabyte to the cloud and see how long it takes.” He has a point there, but I think it takes more than a slick NIC to fix this problem.
The exact speeds and feeds of the DS4100 will not be available until the official release next week, but ballpark pricing will be $1000 per card. It also supports virtualization and service-orientated architectures, just in case you need the whole ball of yarn.