Mosso archives - The Troposphere

The Troposphere:

Mosso

Jun 30 2009   7:43PM GMT

Rackspace falls over in Dallas, tweets the whole thing



Posted by: Carl Brooks
Rackspace, service outages, Data Center, Mosso, reliability

More than 24 hours after users began reporting Rackspace hosted services were unresponsive, and the main site went dark, Rackspace has possibly set a new record for transparency and accountability, if not customer satisfaction, by tirelessly tweeting the entire episode.

They also ran to update the company blog (how droll, so Web 1.0, right?) and blamed power outages in their Dallas data center.

For additional amusment, see the vulturescompetitors flock to #rackspacefail

An official statement has not been made and a request for comment has gone unanswered to date; so the root of the problem is still to be determined. Amazon’s recent calamity was exacerbated by lightning unaccountably penetrating a supposedly world-class data center- it’ll be interesting to see if Rackspace’s facilities have similar flaws.

UPDATE: Rackspace HAS NOT released their incident report, but it’s out in the wild. According to the report, which I won’t post, but will summarize from, since the content is fair game at this point: a mains breaker flipped and one line of generator backups had “excitation failure” which means they didn’t start up properly. Subsequently 3 banks of UPS batteries bled out and slammed a bunch of racks — which means they weren’t charging properly or worse, underdesigned for the load.

What this means in the simplest possible terms: “Heads Will Roll”. Between this and Amazon’s air-to-ground static electricity adventure, data center types are wagging their grimy, highly redundant fingers as hard as possible at these incidents.

UPDATE: the incident report is now public

Mar 23 2009   5:51PM GMT

The Tale of Three Cloud SLA’s



Posted by: John M. Willis
3tera, sla, Mosso, Rackspace, aws, amazon

Wikpedia says…

The SLA records a common understanding about services, priorities, responsibilities, guarantees and warranties. Each area of service scope should have the ‘level of service’ defined. The SLA may specify the levels of availability, serviceability, performance, operation, or other attributes of the service such as billing. The ‘level of service’ can also be specified as ‘target’ and ‘minimum’, which allows customers to informed what to expect (the minimum), whilst providing a measurable (average) target value that shows the level of organization performance. In some contracts penalties may be agreed in the case of non compliance of the SLA (but see ‘internal’ customers below).It is important to note that the ‘agreement’ relates to the services the customer receives, and not how the service provider delivers that service.

There has been a lot of hype around the discussion of Service Level Agreements (SLA) in the cloud as of late. SLA hype has been around as long as I can remember and definitely predates the cloud. In the enterprise you will typically hear numbers like 99.999 or terms like “Five Nines”. Five nines is sort of the gold standard in the enterprise, equating to about 5 minutes of outage per year. However, whenever I think of the five nine metric it reminds me what one of my old Six Sigma mentors used to say about five nines – “Five Nines equates to about one commercial plane crash per day for a year.”

The age old problem of negotiating an SLA has always been a difficult task for any client. One of the main contention points in negotiating an SLA is around the outage credits and how they are applied. Does the customer get a reimbursement for the lost services or is the SLA applied to a future credit? In the classic hosting example, a future credit might include hours added after your service contract terminates. The credit for future services is always a suspect model for an SLA. However, with some of the newer pay as you go models in the cloud, it is easier to apply these types of credits (e.g., next month’s bill). However, in any case the test of a great SLA is one that gives a customer a direct reimbursement for lost services. Another area of difficulty when negotiating an SLA is defining the SLA outages. A few times I have been on the provider side of defining an SLA, and it is always in the best interest of the provider to supply extremely clear SLA definitions. Detailed reports are always a good best practice for the provider and the customer. Without clear definitions and documented reports, in most cases, an SLA will be useless. Here are three main areas I typically focus on when discussing an SLA:

  1. Defining the outage.
  2. How does a customer prove an outage to get credit?
  3. How does the credit get applied?

I thought I might take a look at some of the top Cloud providers who provide server instances and see what their SLAs are in relation to the aforementioned three areas.

Amazon Web Services EC2

The Amazon Web Services EC2 SLA can be found at: http://aws.amazon.com/ec2-sla/ and describes the details of the AWS EC2cSLA. In the AWS SLA EC2 agreement, Amazon claims a 99.95% SLA. Let’s break down their SLA based on the three areas described above.

Defining the outage

Basically a defined outage in AWS is very confusing at best. It basically means that you can not launch a replacement instance within a 5 minute period while at least two availability zones within the same region are down. I take this to mean that if if two out of three data centers are available and you still can’t launch and/or run any application on your EC2 server, it will not be defined as an outage. To further complicate the matter, AWS calculates their 99.95 based on the previous 365 days. If the customer doesn’t have 365 prior days of service with AWS the prior days are calculated as 100% available. This means if you are a new customer (say 2 months), and a catastrophic event happens to hit two of the three US based data centers and you can’t start an instance for three days, you would get a 10% credit for only one day’s prorated costs for EC2 services. The first two days would not be below the 12 month period 99.95 outage percentage. Also complicating the AWS EC2 SLA is the new reserve instances’ up front fees are not eligible for credits concerning outages. Whoops, they have an exclusion for that scenario described above - “caused by factors outside of our reasonable control, including any force majeure event.”

How does a customer prove an outage to get credit?

In order to receive a credit for a defined AWS EC2 outage a customer has to capture, document, and send a request to Amazon to be processed. In other words, the onus is on the customer to prove the outage. AWS does not provide any interface or report documentation to help the customer define their outages. Furthermore, Amazon requires the customer to document the region, all instance ids, and provide service logs. The customer also is required to cleanse confidential information from the logs and all of this must be done within a 30 day period of the outage.

How does the credit get applied?

First off, the AWS credit gets applied against future credits and is not a reimbursement of lost services. As previously stated it is the customer’s responsibility to provide all of the proof and do it with a 30 day period. If the customer supplies all of the documentation and Amazon approves the outage that qualifies for the below 99.95%, they will then apply a 10 percent discount on the next month’s bill.

SLA Grade “C”

AWS puts a heavy burden on the client to prove the outage. The terms of the SLA are difficult at best to understand.

RackSpace/Mosso Cloud Sites

Cloud Sites was formally called Mosso. Cloud Sites is a service that provides a platform based cloud where users share scalable back end load balancing, web services, and databases clusters. The Cloud Sites SLA can be found here: http://www.mosso.com/sla.jsp. Let’s break down their SLA based on the three areas described above.

Defining the outage

Based on the agreement, the definition of an outage from Cloud Sites is extremely simple and is described in less than 150 words. The AWS EC2 SLA is over 1000 words. Simply put, if you open an support incident report with Cloud Sites, they will credit you with a 1 day prorated credit for every hour of downtime. Supposedly all you have to do is tell the rep that you have an outage. They should then start the incident and calculate the outage.

How does a customer prove an outage to get credit?

Here is the rub, they don’t tell you about recording the outage when you call. You have to tell them to record the incident as an outage, and then you will have to continuously monitor the situation and call back support to confirm the ending time. Cloud Sites does not do this automatically for you. The reason I know this is because one of my blog sites is hosted on Mosso/Cloud Sites. I have never been given an automatic credit even though I have had at least 5 or 6 outages over the last year. I have also called in at least 5 or six times and an incident report was never discussed. In fact, on most calls they say that a specific cluster is down and that it should be up soon with no mention of a start time or stop time for recording the outage. You have to point this out to them. One of the things that has always annoyed me about Mosso/Cloud Sites is that they never notify you when the outage is fixed, even if you call in and ask about the outage. You have to find that out for yourself. This makes it extremely difficult to document an outage. Another problem with Cloud Sites is that you don’t have access to the servers the way you do with AWS EC2, so it is difficult to gather the appropriate logs to document the outage.

How does the credit get applied?

Cloud Sites outages get applied against future credits and are not a reimbursement of lost services. The Mosso Cloud Sites SLA is a great example of where less is actually less. The brevity of their SLA seems attractive at first, however, they do not have a defined process for requesting a credit. At least AWS has a documented email where you can send your detailed information. There is nothing in the Cloud Sites’ short 58 word SLA that tells you how to go about getting a refund. The client would have to assume they could call Cloud Sites support and request a refund, assuming they actually documented the start and end time of the outage. All and all, it seems like a very confusing process that is supposed to be “as described” very simple. If you can get through all of the above, then Cloud Sites will credit you a one day prorated credit for every 60 minutes of documented downtime.

SLA Grade “B minus- ”

Cloud Sites offers a simple plan; however, they need to be more clear on their SLA. In the SLA, they state that if you open up an incident report they will start the clock. However, they don’t even have a ticketing system for customers to input incidents. You have to call or start a live chat. Also, they do not notify you when the outage is cleared and this makes it difficult for a customer to keep track of their outages.

3Tera

3Tera recently announced a new 99.999 SLA for their Virtual Private Datacenter (VPDC) customers. The SLA announcement can be found at the following location: http://www.3tera.com/News/Press-Releases/Recent/3Tera-Introduces-the-First-Five-Nines-Cloud-Computing.php. Let’s break down their SLA based on the three areas described above.

Defining the outage

According to the 3Tera announcement, the customer does not have to define the outage. 3Tera will automatically detect and calculate outages. The AppLogic Cloud Computing Platform constantly monitors and reports the availability of the system and instantly alerts 3Tera’s operations team of critical issues. Some might think that the 3Tera five nines announcement is the significant part of their SLA compared to AWS 99.95; however, it’s the automatic recording of the outage that is an unprecedented feature of their SLA. While other cloud vendors require the customer to prove the outage times, 3Tera automates this process.

How does a customer prove an outage to get credit?

Short and sweet..automatically with no human intervention.

How does the credit get applied?

3Tera’s credit gets applied to the current month’s bill. VPDC customers automatically receive SLA service credits for any calendar month where availability falls below the targeted 99.999 percent. If availability is anywhere between 99.999 percent and 99.9 percent, a 10 percent credit applies to the whole VPDC service for the entire month. If availability is lower than 99.9 percent, a 25 percent credit applies.

SLA Grade “A minus- ”

I would have otherwise given 3Tera a solid “A”; however, the service has just been announced and is not available yet. When they actually post the SLA page and the actual customer contract, then I will adjust this rating accordingly


Mar 16 2009   3:21PM GMT

The Rackspace/Mosso PCI Debate



Posted by: John M. Willis
Mosso, Rackspace, pci

A few weeks ago Rackspace made an announcement about hosting the first PCI complaint cloud solution. PCI is short for the Payment Card Industry Data Security Standard, which is a worldwide security standard for merchants who store, process or transmit credit card holder data.   Rackspaces’s Cloudsites (formally called Mosso) was used to enable the online merchant, The Spreadsheet Store , to move to the cloud without having to compromise the security of their online transactions (i.e., PCI compliance).  What should have been a great success story for the Rackspace/Mosso team turned into a little bit of a PR debacle.

Some of the cloud security experts and thought leaders took exception with the Rackspace/Mosso titled “Cloud Hosting is Secure for Take-off: Mosso Enables The Spreadsheet Store, an Online Merchant, to become PCI Compliant”, and they called out Rackspace/Mosso on their bold claim of being the first cloud provider to offer PCI compliancy. Craig Balding, an IT Security Practitioner and cloud expert, was the first blogger to point out in his blog article “What Does PCI Compliance in the Cloud Really Mean?”:

Mosso/Rackspace recently announced they have “PCI enabled” a Cloud Site’s customer that needed to accept online credit card payments in return for goods (i.e. a merchant).

However, the website hosted on Mosso’s Cloud, doesn’t actually receive, store, process, transmit any data that falls under the requirements of PCI.

Or to put it another way, its ‘compliance’ through not actually needing to be…

Craig goes on to say that Rackspace’s “PCI How To” document is just an “implementation of an age-old Internet architecture that involves redirecting customers wishing to pay for the contents of their online basket to an approved and compliant online payment gateway.”

Christopher Hoff, another cloud and security expert, also calls an objection to the aforementioned Rackspace/Mosso PCI hype by stating in his blog, “How To Be PCI Compliant in the Cloud…”, the following:

So after all of those lofty words relating to “…preparing the Cloud for…online transactions,” what you can decipher is that Mosso doesn’t seem to provide services to The Spreadsheet Store which are actually in scope for PCI in the first place!*

The Spreadsheet store redirects that functionality to a third party card processor!

So what this really means is if you utilize a Cloud based offering and don’t traffic in data that is within PCI scope and instead re-direct/use someone else’s service to process and store credit card data, then it’s much easier to become PCI compliant. Um, duh.

Ben Cherian of Ben Cherian’s blog, also goes on to refer to the Rackspace/Mosso antics as a trick when he states the following:

When I saw this, I wondered how it was possible, but as I read closer it became clear that it was just a trick! It seems that their “PCI-compliant” solution requires Mosso not to store any information that requires PCI compliance. Instead they offload the burden of compliance to a third-party payment gateway  Authorize.Net).

However, keeping it real, Greg Hrncir the Director of Operations at Mosso shot back with the following comment on Craig’s blog:

The truth is that we are the first Cloud, that we know of, that enabled its Cloud customers to gain PCI compliance using multiple technologies. The future of Cloud technologies is full of these types of hybrid solutions that combine the best of both worlds. The goal for a customer and online merchant, is to get PCI compliance, not be purist in terms of technology. On line merchants want to leverage the Cloud for scaling, and this is a good way to do it by combining both worlds.

In summary, I think they were all right. Craig, Chris, and Ben were perfectly within bounds to call out the titled Rackspace/Mosso hype and in doing so they all did a brilliant job educating us all on what PCI really means in or outside of a cloud.  However, Greg Hrncir, also points out that what Mosso did was a first-in movement and as a hybrid model they are setting the building blocks for otherwise roadblocked initiatives. In my opinion, what Rackspace has done is significant from a “cloud” industry standpoint; however, being “cloud” leaders they should have used a little bit more discretion in their announcement.  With all the hype already associated with cloud computing it is important for the leaders in this space to keep the discussion a little bit grounded.  However, this reminds me of an old friend of mine, that every time he would get into a fight he would stick his chin out and say “hit me”. In the Mosso/PCI debate it looks like Mosso got hit.


Oct 22 2008   9:18PM GMT

Rackspace: From managed hosting to cloud hosting



Posted by: Alex Barrett
Storage, Virtualization, VMware, Xen, Rackspace, VPS, Mosso

In an effort to wrap my mind around this cloud computing stuff, I watched the webcast of Rackspace’s cloud computing launch today, where the company laid out its plans to move from simple managed hosting provider to cloud provider extraordinaire, taking on Amazon Elastic Compute Cloud, or EC2, and Simple Storage Service, or S3, in the process.

Rackspace’s plan centers on acquisition, partnership and expanding its existing Mosso Web hosting product into three broad offerings: Cloud Sites website hosting, Cloud Files storage service, and Cloud Servers virtual private servers.

On the acquisition side, RackSpace has acquired Jungle Disk, a cloud-based desktop storage and backup provider that has thus far relied on Amazon’s S3. It also acquired Slicehost, a provider of Xen-based virtual private servers (VPSs) that claims 11,000 customers and 15,000 virtual servers.

As far as new Mosso offerings, the new Cloud Files will come in at $0.15 per GB of replicated data, or if the data is distributed across a content delivery network (CDN), at $0.22 per GB. CDN capabilities come by way of a partnership with Limelight Inc.

Also as part of Cloud Files, RackSpace will partner with Sonian Networks to provide cloud-based email archiving starting at $3/mailbox.

Coming soon, Cloud Servers is Mosso’s new name for Slicehost’s VPS offering. Under Slicehost, the services starts at $20/month for a virtual Xen server with 256GB of RAM, 10GB of storage, and 100GB of bandwidth. “Slices” scale to 15.5GB of RAM, 620GB of storage and 2,000GB of bandwidth for $800/month.

When it comes to the Xen-based Slicehost — aka Cloud Servers — I should note that Mosso is a longtime VMware customer that has publicly pondered the viability of the relationship as it expands its services. It will be interesting to see whether this acquisition signals a break from VMware or whether it will continue to use VMware as the underpinning of its Cloud Sites offering. Rackspace, care to comment?

On another note, Slicehost is one of many hosting providers that use open source Xen as the basis of their cloud offerings. Presumably, it’s also the kind of company to which Simon Crosby, CTO of Citrix Systems Inc., referred when Citrix announced XenServer Cloud Edition and Citrix Cloud Center (C3) at VMworld 2008.

At the time, Crosby said that luring these hosting providers into Citrix support contracts was a huge priority. “Trivially, we looked around and found a couple hundred hosted IT infrastructure providers using open source Xen,” he said. “XenServer Cloud Edition is intended to win greenfield accounts but also to bring the open source Xen guys back home.” XenServer Cloud Edition boasts features like the ability to run Windows guests and commercial support.

One final thought: If any of you find this whole cloud computing thing a bit, ahem, nebulous, Lew Moorman, Rackspace’s chief strategy officer, made an interesting distinction between different types of cloud offerings. “Cloud apps,” Moorman said, are what we used to think of as Software as a Service (SaaS); “cloud hosting,” meanwhile, refers to pooled external compute resources. And of course, there’s cloud storage. Rackspace, it seems, will offer all three.