The Enomaly-powered cloud brokerage service SpotCloud has been chugging along for some time, rolling on-demand virtual machine providers into its fold. It’s set to go live next week, a clearing house for cloud computing resources, a bit like a rudimentary RightScale, but unlike RightScale it handles the cash. Sellers pay a brokerage fee and set their own prices based on whatever the hell they fell like.
That’s neat, everyone’s been saying cloud computing is going to make infrastructure a commodity, so along comes the commodities broker to match the bulk producer with the consumer. It’s been tried before- Deutsche Telekom spun out Zimory as a cloud brokerage a few years ago. It wasn’t successful and it’s withered on the vine into a weird sort of cloud management/infrastructure delivery thingie.
SpotCloud might have a fighting chance, though, mostly because it’s about three years later. Vrtualization is much more widespread at the service provider/data center level, the market for public cloud is robust and so on.
It also has actual capacity to offer. SpotCloud says it’s got a solid 10,000 physical servers committed to the platform from scores of operators all over the world. Right there, that puts it at a comparable scale with Rackspace and Amazon’s cloud environments.
It works the same way those do- login, pick a server, launch it, pay for it. It’s incredibly light on features; ‘bare-bones cloud’ comes to mind. But you want a server somewhere? Get a server. Essentially without lifting a finger, SpotCloud just built out one of the largest contiguous cloud computing environments in the world. It remains to be seen whether or not sellers can offer prices that actually tempt buyers.
That’s where the surprising news comes in. One, SpotCloud now supports VMware; Enomaly CTO and SpotCloud creator Reuven Cohen said they were basically forced into it. Originally SpotCloud only worked with Enomaly-powered clouds, but that had to change, since the overwhelming majority of sellers with capacity to offer were on VMware. “Honestly I was a little surprised…VMware is basically everywhere,” he said.
Cohen said he pitched his platform as free and easy, just park it on servers that are only getting used for SpotCloud anyway and away you go, but VMware was king. The other surprise was that the sellers on SpotCloud aren’t all up to date, cloud-ready hosters with unused capacity they want to monetize. Cloud computing firms aren’t the sellers on the cloud computing brokerage.
“It was really, uh, little guys, other types, guys you would not associate with the cloud,” Cohen said. In other words, the excess virtualized capacity was floating around the hosting world. It’s floating around in data centers and colos of all stripes and sizes, from Joe’s Basement Hosting Shoppe to operators stuck with dead weight investments.
Cohen said the pitch had gone from “Join the cloud, get in on the action” to “Clean out your closets, the server-man is here” One large Australian DC had jumped the gun on a customer project and ended up with $3 million worth of infrastructure that was sitting dark. The client had bailed; they had 2 TBs of RAM and servers just sitting there. They signed it up for SpotCloud and at least it’s got a chance; any money is better than none on dead weight and they can just pull out if they find a better use for it. Another example was an LA-based DC that served the entertainment industry.
It’s interesting to me that what is, for all intents and purposes, one of the largest true IaaS clouds a) popped up overnight and b) is built out of scrap iron and used tires. That’s definitely a validation of the cloud computing model and how far we’ve come on platform. The real test though is whether anyone is going to buy enough of it to matter, since a commodity is by definition easy to get a hold of.
“It’s been harder finding the buyers,” admitted Cohen. Color me unsurprised.
Back in August we reported that pharmaceutical giant, Eli Lilly was looking for additional cloud providers to Amazon, for better support.
Lilly has picked Indiana-based hosting and cloud provider, BlueLock, according to a wink and a nudge from Twitter sources close to BlueLock. The hoster is in bed with VMware and the vCloud initiative, providing all the bells and whistles that go along with the vCloud Datacenter service.
Presumably Eli Lilly was willing to pay more for better support than it had been getting at AWS?
We reached out to all parties for comment and got these responses:
“Thanks for contacting us about this topic. Unfortunately, due to conflicting priorities we aren’t able to participate,” said Carole Copeland, corporate communications manager for Eli Lilly, via email.
And this response from BlueLock:
“We will have to kindly decline to comment. It is already public knowledge that Eli Lilly has been in discussion with a few cloud players (BlueLock included), however the discussion and any outcomes are between the companies involved due to competitive and partnership reasons.”
It’s unfortunate everyone is keeping tight-lipped on this cloud implementation as it would be really helpful to understand and learn from what happened here. It’s clear that terms of service and SLAs played an important role in Eli Lilly’s decisions.
A new report from the 451 Group paints a picture of cloud computing poised to spread but bound by limits of infrastructure. It’s still a tiny fraction of the IT market but advanced Asian governments are promoting cloud, with government-run data center and technology parks courting cloud development projects in Malaysia, Singapore, Hong Kong and elsewhere.
Communications and local government initiatives are two major forces driving the deployment of clouds in developing Asia, said Agatha Poon, research manager on global cloud computing for 451. Many parts of Asia are very technologically advanced, often well over what anywhere in the Americas can boast — South Korea is the most broadband-connected country in the world — but it’s localized in small pockets around strategic connection hubs.
There’s massive dark (or dim) gaps where communication infrastructure is lacking, underpowered or evolving in weird new directions with mobile communications (like most of India). But, for example, last year Amazon Web Services (AWS) and IBM opened clouds in Singapore. AWS is (probably) in the giant Equinix facility and IBM in the Chengi Business Park, run by the government’s Infocomm Development Authority of Singapore. AWS has an Availability Zone for its cloud there; IBM is doing something mysterious and high concept, naturally. Chinese telecoms are also experimenting with cloud platforms.
Poon says the market is gunning for SMBs who want new services based around cloud infrastructure; the large enterprises there are going to be more conservative, precisely the same pattern cloud is following in the US and Europe. The growth rate looks exponential — the 451 Group predicts $1 million of cloud spending in 2009 to be $17 million by the end of 2011.
That’s small potatoes, but a significant vote of confidence. Poon said that businesses are going to get more options, but local ICT providers should gird their loins: if they don’t catch up or implement something cloud-like, they’ll get absolutely crushed by the big players in the region.
The big telcos in Asia are moving fast in the cloud space, Poon said, either through partnerships and strategic investments. She pointed to South Korean telco KT, which is making an aggressive shift toward offering cloud infrastructure (compute and storage) through a variety of platforms, like Cloud.com and enStratus, and said the tech giants that supply the world with hardware, like Fujitsu and Samsung, were also gearing up to service the cloud market.
Its another interesting demonstration of the way the market for cloud works; it’s organic, demand-based, almost biological. It’s creeping out along the bright communication hubs, where the activity and the cash and electricity-based nutrition is, and avoiding the leaner pastures.
That’s because cloud computing, properly done, minimizes the fears of risky investment. A provider can go where the action is and drop in a little bit. If it works, they can add some more. There’s no need for anybody, provider or user, to stand up a massive data center operation and just hope for the best. They can just take it, or leave it. As long as there’s a big pipe nearby, of course.
Eucalyptus has announced it is in a technical partnership with Red Hat to bundle Deltacloud, Red Hat’s cloud platform project with the much more mature Eucalyptus platform. Check out the Euca-Hat FAQ here.
Red Hat’s Deltacloud tools will function more or less as a cloud management layer when used with Eucalyptus; their strength is reportedly in enabling the use of multiple public cloud services and internal, private cloud resources in a single view: cloud management, much like enStratus does.
CEO Marten Mickos said in an interview that the user base of the two companies are simpatico, and that’s why he he wanted the deal. “We see a very good overlap; the same people who are downloading Eucalyptus are downloading Red Hat,” he said.
Of course, you can do that for free, so that’s a good sign of interest but not necessarily potential revenue. Mickos said the deal was a good way for Eucalyptus to broaden its appeal and look towards the next few years when, he said, enterprises will be moving almost universally to a hybrid cloud model.
Right now, the products will be offered by both companies as a cloud lineup, but support and updates will come from each company separately. Mickos said this was an opportunity for Red Hat as well.
“For Red Hat, it is great because it allows them to compete against VMware going forward,” he said. Red Hat gets a robust cloud platform and Eucalyptus gets a monster-sized install base. A match made in free/open source software (FOSS) heaven.
Could a buy be in the works? Eucalyptus says it is roaring ahead on customers and capitalized to the tune of $35 million, putting a potential sale price around a minimum $120 million (VC investors like to get four times their money back, goes the common wisdom). Cloud technology is definitely a niche product, but the Mickos MySQL pedigree could be worth a lot…
SimpleCDN had a simple premise: people will buy into a cheap, reliable content distribution network (CDN). Turns out it was too cheap, and maybe too cavalier with its choice of customers. As a result, the service was booted off its hosting provider, leaving thousands of users without access to massive amounts of digital content. It’s a microcosm of all the things that can go wrong in the cloud model.
SimpleCDN went dark for the majority of its customers on Saturday, Dec. 11, followed by an angry, terse explanation from Frank Wilson, senior engineer for SimpleCDN. He said that his company had been summarily booted from its hosting infrastructure at Texas-based SoftLayer, which does dedicated and cloud hosting in three locations in the U.S. SimpleCDN had bought SoftLayer from a reseller called 100TB.com, a subsidiary of the UK2 group. Customers are being pushed to a competitor.
100TB.com offered unlimited, unmetered, network access and did not charge extra for it, as do most hosting providers, claiming you could use up to 100 TB of transit every month and still only pay roughly average VPS hosting costs, ($600 per month of a decent quad-core server).
“I think they were doing about 30 GBps sustained at the end,” mused Jason Read, professional cloud watcher at CloudHarmony. Read recapped that SimpleCDN was doing business with a second-tier hoster at a level most people would consider a full-on DDOS, all day, every day. Read also added that SimpleCDN was able to offer their bargain prices based on 100TB.com’s marketing. “[SimpleCDN] was kind of milking that 100 TB unlimited bandwidth offer and I think SoftLayer told UK2 they had to amend their terms,” said Read.
That may have happened, of course. SimpleCDN’s Wilson states that he thinks that SoftLayer was getting massively undersold on its own CDN business, which is much more expensive than SimpleCDN, or that they were getting slammed by his booming business.
“…our best guess currently is that these organizations could not provide the services that we contracted and paid for, so instead they decided that terminating services would be the best solution for them,” he said.
Of course, CDN services are an incidental part of SoftLayer’s hosting business, probably a tiny percentage of its revenue; something they offer because they can or because customers are asking for it. Many hosters do the same and consider it a value add rather than a critical part of the business. Same with UK2 — they were reselling Akamai as a CDN offering. In the CDN market, there’s Limelight and Akamai, and then there’s everyone else. SoftLayer also lives in Dallas, one of the world’s hubs for Internet connectivity. If they were running out of bandwidth, they could simply buy more and sell it to the UK2 Group. If anyone was getting killed in this deal it was UK2, the middleman.
What led to UK2 terminating SimpleCDN was the nature of the traffic it served. SimpleCDN hosted a lot of live video streams, and anyone who’s taken even a cursory look into it knows that there is a booming business in streaming pirated content to U.S. audiences; some Web communities have users that will post entire seasons of a TV show or movies to an online service for anyone to watch for free, unauthorized marathons that entertainment companies take a very dim view of.
CDNs know this, of course, and “monitor” their networks by pulling streams when they get a DMCA takedown notice, which doesn’t mean a thing to the 99 other pirate streams going at the same time. SimpleCDN even had an automated DMCA action form. Somebody out there got sick of playing whack-a-mole with SimpleCDN and went straight to the provider of record, SoftLayer.
SoftLayer would not comment officially for this story except to say that SimpleCDN was not their customer but rather UK2’s and they had no commercial relationship with SimpleCDN. However, they are still the ones hosting all this content and it can be assumed they got a DMCA notice, which they are going to take VERY seriously, because unlike UK2 or SimpleCDN, they have actual physical infrastructure and assets. They would have gone to UK2 and said, “We are holding you responsible for this.” Wilson’s letter says that UK2 accused him of content violations and changed their Terms of Service (ToS) on the fly to put him in violation.
Why did 100TB/UK2 change the rules of the game, instead of duly passing along SoftLayer’s DMCA, as they should have done? SimpleCDN was murder on their bottom line. Their “free bandwidth” offer was a fiction when put to the test. SimpleCDN took them at face value, ran hundreds of servers and hosted thousands of terabytes of data with them, but it was gone in a flash, because it was based on a false economic premise and shady marketing.
So the lesson for cloud is two-fold: one is “too good to be true” usually is. If SimpleCDN had started directly with SoftLayer, it would have had to pay those bandwidth costs and its prices wouldn’t have been so attractive. Likewise, the DMCA issues would have had one less hop.
Second, for the business user, it’s a new wrinkle in vetting a service. Popular online services haven’t been known to pop like a soap bubble and vanish overnight, taking massive amounts of data with them; that’s the province of shady warehouse distribution operations and basement stock brokerages. Now they do, fueled by the explosion of middlemen and easy access that drives cloud computing. It’s not enough to examine whether a provider is sound; you have to make sure you understand who they rely on too.
UPDATE: Both UK2 Group and SimpleCDN were contacted by phone and email for this article but neither responded by press time.
Startup, CloudSwitch has released version 2.0 of its software that lets enterprise users connect their private data centers with cloud computing services and crucially, extends their internal security policies into the cloud.
With CloudSwitch, applications remain integrated with enterprise data center tools and policies, and are managed as if they were running locally, the company claims.
The new features in Enterprise 2.0 include:
Provisioning of new virtual machines in the cloud (in addition to migration of existing ones), through:
– Network boot support
– ISO support (CD-ROM/DVD)
Web services and command-line interfaces for programmatic scaling to meet peak demands. Broader networking options to extend enterprise network topologies into the cloud:
– Layer-2 connectivity with option for Layer-3 support through software-based firewall/load balancing in the cloud
– Public IP access to cloud resources with full enterprise control
– Multi-subnet support
Enhanced user interface support for better scalability, control and ease of use
Broader geographic coverage (Terremark vCloud Express & eCloud, Amazon EC2 East, West, EU and Asia Pacific regions) .
CloudSwitch officials said the company has landed pharmaceutical giant Novartis and Orange San Francisco, the subsidiary of telecommunications operator Orange. It has about 10 customers in total from pharma, retail and financial services. These customers are using CloudSwitch for a range of use cases including cluster scale-outs, web application hosting, application development and testing and labs on demand.
CloudSwitch Enterprise 2.0 is available now, with a free 15-day trial. Pricing begins at $25,000 for an annual license including basic support and up to 20 concurrent virtual machines under management in the cloud. Additional server packs are available for scaling. Cloud usage fees are paid separately to the cloud provider.
Despite blowback from refusing to supply services to whistleblower website WikiLeaks, controversy over congestion, uptime and customer service (or lack thereof), it seems that cloud computing giant Amazon Web Services (AWS) has never seen better days. The cloud provider is making new services and news announcements at a record clip. Here’s a roundup of the most significant recent announcements.
Cluster Compute GPU instances: Amazon made high-performance computing (HPC) headlines when it launched a special type of high-powered compute instance based on hardware normally found only in supercomputers. It’s impossible to fake or emulate the kinds of uses scientific and functional computing demands, so Amazon built a mini-supercomputer in its own Virginia data center and opened it up to the world. Now they’ve done it again, but this times it’s graphical processing unit (GPU) instances that are available.
Originally driven by the video game market, the chips that run video adapters have gotten so powerful that they’ve paved the way to new areas of HPC. Again, impossible to emulate, so Amazon has clearly laid out some serious capital to build a GPU-powered supercomputer to go along with the one that fires Cluster Compute instances.
This may speak to AWS’ operational maturity; their systems have evolved to the point where they can accommodate a variety of hardware in their billing, provisioning and management systems.
DNS in the cloud: Domain name servers, the street maps of the Internet, have long been the province of Web hosters and Internet service providers. They’re vital to delivering what customers want — Internet traffic — to the right place, and providers must be able to keep control of how traffic is distributed and account for that. Without access to DNS servers, you can’t properly control your email server, for instance.
Amazon, despite hosting one of the world’s signature collections of Internet traffic, hasn’t made DNS available to its customers; now it is. This might be due to the theory that as a pure infrastructure host, users should run their own. That’s led to angst when Amazon, the provider of record, gets blocked or banned by the Internet community for users’ misbehavior. It also effectively crippled the Elastic Load Balancing (ELB) service for many users, since they were unable to point their root domain at the ELB service. “Too complicated,” they were told.
The Route 53 DNS service lets users create zone files for their own domains, a little bit like handing the steering wheel back to the driver for a network admin. It’s based on popular free software (of course) djbdns and goes for $1 per month. Most website hosters provide DNS service for free. Regardless, AWS users are largely ecstatic over this, although it does not fix the zoning problems with Elastic Load Balancers quite yet.
UPDATE: Not all are ecstatic; Cantabrigian Unix developer Tony Finch writes a worthy critique of Route 53.
PCI DSS compliance in the cloud (sort of): This is a big deal to many. That’s because Visa and other credit card companies won’t let you take credit card payments, online or anywhere else, unless you can pass a PCI DSS audit. To date, Amazon hasn’t been able to do that, shutting it out of the market for e-commerce applications (in part). PCI DSS 2.0 was announced, which added new and vague but apparently satisfactory guidelines for virtualization; weeks later, Amazon is now PCI DSS Level 1 compliant.
Does this mean your new online business is automatically PCI compliant if you use Amazon to host it? Absolutely not. All it means is that it is now possible for merchants to consider using EC2 and S3 to process card payments. The responsibility of passing one’s own PCI audit hasn’t gone away. In the event of a data breach, you’re still on the hook if you store customer’s credit card info on AWS.
New SDKs for mobile developers: Amazon is committing to the support of mobile development, releasing new software development kits (SDKs) for the iPhone and Android devices. The SDKs facilitate writing apps that can connect directly to AWS’ infrastructure and do fun stuff. No one in their right mind who is developing for mobile isn’t already using AWS, parts of AWS or some other cloud service, so this isn’t exactly a brainteaser. It’s a significant show of support, however, and again demonstrates the increasing maturity of AWS as an environment.
CloudWatch updates: A raft of updates to AWS’ CloudWatch service, a rudimentary notification system which is less rudimentary now. A shining example of starting off with something that is basically kind of broken (the original CloudWatch was considerably more limited than the monitoring features built into your average microwave oven) and gradually becoming a truly useful part of the tool kit.
That includes features like threshold notifications, health checks and policy-based actions (like not auto-scaling up your application in response to unexpected traffic and leaving you with an eye-watering bill).
5 TB files in S3: Users now have the ability to upload files up to 5 TB in size, several orders of magnitude greater than the previous 5 GB limit. One has to assume they’ve made a major stride in their storage architecture, Dynamo. Jeff Barr, AWS evangelist, posits direct connections between a genome sequencer, S3 and the HPC cluster on EC2 for near-real time data processing.
Of course, keeping it there will cost you a solid $125 per month per TB, and getting it back out of S3 for archiving or other purposes will cost you as well. But if you don’t have a supercomputer handy to go with your genomics research institute, this may look pretty handy. Make sure none of your giant files are classified, too, or you might get “WikiLeaked”…
Whew. Exciting six months, right, kids? Nope, that was just the last three weeks. Crazy.
Web company Mixpanel delivered an informative tirade on why they are leaving Rackspace Cloud for Amazon Web Services (AWS) today. The story basically boils down to “AWS is better potting soil for Web apps,” although there are choice words for Rackspace support and operations failures as well.
Mixpanel makes an app that tracks your website’s use in some detail; it’s a tool for site operators and e-commerce types. It left Rackspace for a few significant reasons, one of which was the Elastic Block Store (EBS) feature of AWS, the ephemeral storage system linked to your virtual machines; another was the lack of a fully developed API for Rackspace. Big deal, Rackspace makes hay over customer wins, too.
What this highlights is the difference in the two offerings — Rackspace Cloud is much closer to traditional hosting, both in concept and design, than AWS. Go to the site, click on a button, get a server/website/whatever. You also have to deal with humans after a certain size, submitting a request to increase resources here and there.
AWS is a completely hands-off, completely blinded set of resources and rules that have much less to do with the way standard hosting operates; it’s fundamentally different even if the end result (you get a server) is the same.
Mixpanel wants (apparently) a generally new but now well-established concept; they want Web stuff and they want it all the time and everywhere. They mention Amazon’s superlative CDN, the range of instance sizes and so on, but it’s really the fact that you’re not actually dealing with infrastructure, except in the loosest concept, that’s pulling them over.
Storage and CPU and bandwidth are logically connected, but so loosely that you can’t really say it’s mimicking the operation of a physical facility. It’s just buckets of ability you buy, like power-ups in a video game or something. This is ideal for a Web application, since that’s how users are looking at the application, too. Maybe not so much for someone running a different kind of application. Encoding.com, for instance, chose Rackspace because their video encoding service needed Rackspace’s superior internal connectivity and CPU, not application flexibility.
Anyway, the fun part starts in the comment section of the blog, where users come on to gripe about AWS in almost the same way Mixpanel is griping about Rackspace; one developer said he was mysteriously slapped with charges over bandwidth that could possibly have occurred and is not unwilling to turn his test instance back on, since AWS simply refuses to address the issue. Sounds like some place where they put a premium on customer support might be a better fit — you know, where they have “fanatical support”…
On route to the Cloud Computing Expo this week, I ducked into Abiquo’s offices in Redwood City to catch up with CEO, Pete Malcolm.
By the end of the year he said the cloud management startup will pull in a second round of venture capital funding to add to the $5.1 million raised in March, 2010. His lips were sealed on the amount, but it will be enough to see the company through 2011/12.
Abiquo has 35 employees and somewhere between 10 and 50 customers using its cloud provisioning and automation software. Most of these are hosting companies, like BlueFire in Australia, which use the software as an enabling technology to sell more advanced cloud infrastructure services to its customers.
Enterprises have tested the software and Malcolm expects real deployments next year, once the budget for it kicks in. He said most companies did not have cloud in their budget in 2010 but will in 2011.
Abiquo just released the fourth version of its cloud management software, Abiquo 1.7, available in 45 days. The biggest new feature is a policy engine that allows organizations to allocate virtual resources based on different business and IT considerations including governance, security, compliance and cost — as well as a variety of utilization models. The business rules can be applied at multiple levels, and customized for individual physical data centers, racks, servers and storage, as well as virtual enterprises and virtual data centers.
CA, VMware, Cloud.com and Eucalyptus among many others are all vying for the same market as Abiquo, and it looks like 2011 is shaping up to be a crucial year for gaining market share.
Mark Russinovich — Microsoft technical fellow, a lead on the Azure platform and a renowned Windows expert — took pains at PDC ’10 (Watch the “Inside Windows Azure” session here) to lay out a detailed, high-level overview of the Azure platform and what actually happens when users interact with it.
The Azure cloud(s) is (are) built on Microsoft’s definition of commodity infrastructure. It’s “Microsoft Blades,” that is, bespoke OEM blade servers from several manufacturers. It’s probably Dell or HP, just saying, in dense racks. Microsoft containerizes its data centers now and pictures abound; this is only interesting to data center nerds anyway.
For systems managements nerds, here’s a 2006 presentation from Microsoft on the rudiments of shared I/O and blade design.
Azure considers each rack a ‘node’ of compute power and puts a switch on top of it. Each node — servers+top rack switch — is considered a ‘fault domain’ (see glossary, below), i.e., a possible point of failure. An aggregator and load balancers manage groups of nodes, and all feed back to the Fabric Controller (FC), the operational heart of Azure.
The FC gets it’s marching orders from the “Red Dog Front End” (RDFE). RDFE takes its name from nomenclature left over from Dave Cutler’s original Red Dog project that became Azure. The RDFE acts as kind of router for request and traffic to and from the load balancers and Fabric Controller.
Russinovich said that the development team passed an establishment called the “Pink Poodle” while driving one day. Red Dog was deemed more suitable, and Russinovich claims not to know what sort of establishment the Pink Poodle is.
How Azure works
Azure works like this:
- |___Aggregators and Load Balancers
- |___Fabric Controller
The Fabric Controller
The Fabric Controller does all the heavy lifting for Azure. It provisions, stores, delivers, monitors and commands the virtual machines (VMs) that make up Azure. It is a “distributed stateful application distributed across data center nodes and fault domains.”
In English, this means there are a number of Fabric Controller instances running in various racks. One is elected to act as the primary controller. If it fails, another picks up the slack. If the entire FC fails, all of the operations it started, including the nodes, keep running, albeit without much governance until it comes back online. If you start a service on Azure, the FC can fall over entirely and your service is not shut down.
The Fabric Controller automates pretty much everything, including new hardware installs. New blades are configured for PXE and the FC has a PXE boot server in it. It boots a ‘maintenance image,’ which downloads a host operating system (OS) that includes all the parts necessary to make it an Azure host machine. Sysprep is run, the system is rebooted as a unique machine and the FC sucks it into the fold.
The Fabric Controller is a modified Windows Server 2008 OS, as are the host OS and the standard pre-configured Web and Worker Role instances.
What happens when you ask for a Role
The FC has two primary objectives: to satisfy user requests and policies and to optimize and simplify deployment. It does all of this automatically, “learning as it goes” about the state of the data center, Russinovich said.
Log into Azure and ask for a new “Web Role” instance and what happens? The portal takes your request to the RDFE. The RDFE asks the Fabric Controller for the same, based on the parameters you set and your location, proximity, etc. The Fabric Controller scans the available nodes and looks for (in the standard case) two nodes that do not share a Fault Domain, and are thus fault-tolerant.
This could be two racks right next to each other. Russinovich said that FC considers network proximity and available connectivity as factors in optimizing performance. Azure is unlikely to pick nodes in two different facilities unless necessary or specified.
Fabric Controller, having found its juicy young nodes bursting with unused capacity, then puts the role-defining files at the host. The host OS creates the requested virtual machines and three Virtual Hard Drives (VHDs) (count ’em, three!): a stock ‘differencing’ VHD (D:\) for the OS image, a ‘resource’ VHD (C:\) for user temporary files and a Role VHD (next available drive letter), for role specific files. The host agent starts the VM and away we go.
The load balancers, interestingly, do nothing until the instance receives its first external HTTP communication (GET); only then is the instance routed to an external endpoint and live to the network.
The Platform as a Service part
Why so complicated? Well, it’s a) Windows and b) the point is to automate maintenance and stuff. The regular updates that Windows Azure systems undergoes — same as (within the specifications of what is running) the rest of the Windows world — happen typically about once a month and require restarting the VMs.
Now for the fun part: Azure requires two instances running to enjoy its 99.9% uptime service-level agreement (SLA), and that’s one reason why. Microsoft essentially enforces a high-availability, uninterrupted fault tolerance fire drill every time the instances are updated. Minor updates and changes to configuration do not require restarts, but what Russinovich called ‘VIP swaps’ do.
Obviously, this needs to be done in such a way that the user doesn’t skip a beat. A complicated hopscotch takes place as updates are installed to the resource VHD. One instance is shut down and the resource VHD updated, then the other one. The differencing VHDa makes sure new data that comes into the Azure service is retained and synced as each VM reboots.
Virtualization and security
What is it running on, we asked? Head scratching ensued for many moons as Microsoft pushed Hyper-V to customers but claimed Azure was not compatible or interoperable with Hyper-V.
It is, in fact, a fork of Hyper-V. Russinovich said it was basically tailored from the ground up for the hardware layout that Microsoft uses, same as the Azure OSes.
Russinovich said that the virtual machine is the security boundary for Azure. At the hypervisor level, the host agents on each physical machine are trusted. The Fabric Controller OSes are trusted. The guest agent- the part the user controls—is not trusted. The VMs communicate only through the load balancers and the public (user’s endpoint) IP and back down again.
Some clever security person may now appear and make fun of this scheme, but that’s not my job.
The Fabric Controller handles network security and Hyper-V uses machine state registries (MSRs) to verify basic machine integrity. That’s not incredibly rich detail, but its more than you knew five minutes ago and I guarantee its more than you know about how Amazon secures Xen. Here’s a little more on Hyper-V security.
New additions to Azure, like full admin rights on VMs (aka elevated privileges) justify this approach, Russinovich said. “We know for a fact we have to rely on this [model] for security,” he said.
Everyone feel safe and cozy? New user-built VM Roles are implemented a little differently
Azure now offers users the ability to craft their own Windows images and run them on Microsoft’s cloud. These VM Roles are built by you (sysprep recommended) and uploaded to your blob storage. When you create a service around your custom VMs and start the instances, Fabric Controller takes pains to redundantly ensure redundancy. It makes a shadow copy of your file, caches that shadow copy (in the VHD cacher, of course) and then creates the three VHDs seen above for each VM needed. From there, you’re on your own; Microsoft does not consider having to perform your own patches an asset in Azure.
A healthy host is a happy host
Azure uses heartbeats to measure instance health: It simply pings the Fabric Controller every few seconds and that’s that. Here again, fault tolerance is in play. You have two instances running (if you’re doing it right. Azure will let you run one, but then you don’t get the SLA). If one fails, the heartbeat times out, the differencing VHD on the other VM starts ticking over and Azure restarts the faulty VM, or recreates the configuration somewhere else. Then changes are synced and you’re back in business.
Do not end these processes
Now that we have the ability to RDP into our Azure Roles and monkey around, Russinovich helpfully explains that the processes Azure runs within the VM are WaAppHost.exe (Worker Role), WaWebHost.exe (Web Role), clouddrivesvc.exe (All Roles) and a handful of others, a special w3wp.exe for IIS configuration and so forth. All of these were previously restricted from user access but can be accessed via the new admin privileges.
Many of the features set out here are in development and beta but are promised to the end user soon. Russinovich noted that the operations outlined here still could change significantly. At any rate, his PDC session provided a fascinating look into how a cloud can operate, and it’s approximately eleventy bajillion percent more than I (or anyone else, for that matter) know about how Amazon Web Services or Google App Engine works.
Azure : Microsoft’s cloud infrastructure platform
Fabric Controller: A set of modified virtual Windows Server 2008 images running across Azure that control provisioning and management
Fault Domain: A set of resources within an Azure data center that are considered non-fault tolerant and a discrete unit, like a single rack of servers. A Service by default splits virtual instances across at least two Fault Domains.
Role: Microsoft’s name for a specific configuration of Azure virtual machine. The terminology is from Hyper-V.
Service: Azure lets users run Services, which then run virtual machine instances in a few pre-configured types, like Web or Worker Roles. A Service is a batch of instances that are all governed by the Service parameters and policy.
Web Role: An instance pre-configured to run Microsoft’s Web server technology Internet Information Services (IIS)
Worker Role: An instance configured not to run IIS but instead to run applications developed and/or uploaded to the VM by the end user
VM Role: User-created, unsupported Windows Server 2008 virtual machine images that are uploaded by the user and controlled through the user portal. Unlike Web and Worker Roles, these are not updated and maintained automatically by Azure.