The National Science Foundation and Microsoft have announced they will be giving away Azure resources for researchers in an attempt to: “shift the dialogue to what the appropriate public/private interaction” is for research computing, according to Dan Reed, Corporate Vice President for Extreme Computing (yes, really) at Microsoft.
For 3 years, Microsoft is giving away an unspecified amount of storage and support as well as CPU time for research applications to be run on Azure. NSF assistant director for Computer & Information Science & Engineering Jeanette Wing suggested that cloud computing platforms and Azure in specific should be considered a better choice for research facilities than maintaining and building their own facilities.
“It’s just not a good use of money or space,” she said.
Look at the Large Hadron Collider, said Wing, which has 1.5 petabytes of data already, or digital research projects that can generate an exabyte of data in a week, or less. She urged researchers to use Azure to figure out new ways to coping with all that information.
This is a nice, charitable gesture, not unlike Amazon’s occasional giveaways to worthy scientific projects, of EC2 instances and bandwidth. There are significcant caveats that Microsoft and the NSF have papered over.
First, from all reports, Azure is a very large data center operation- possibly as large as some of the less prestigious high-performance computing facilities that researchers use around the world. unless Microsoft is giving away the whole thing, it’s not going to make much of a dent in the demand.
Second, go down to the local university science department and tell a professor he or she can hop on a virtualized, remote Windows platform and process their experiment data. Go on, I dare you.
99% of experimental, massive-data, high performance computing is done on open source, *nix-based platforms for some very sound reasons. Microsoft won’t gain much traction suggesting that researchers can do better on Azure. It may find some eggheads desperate for resources, but that’s a different story.
So what is the real import, the overall aim of setting up Azure as a platform to host boatloads of raw data and let people play with it? Both Reed and Wing said they wanted to see researchers with new ideas on how to search and manage these large amounts of data.
Well that makes more sense–go sign up for a grant, but read the fine print, or you could be inventing the next Google, brought to you by Microsoft…
Sharp eyes to Shlomo Swidler, who posted an update to an old thread and an old complaint on AWS – getting lumped into spam blacklists. EC2 staffer “Steve@AWS” announced the availability of a private beta today to institute PTR records for selected users to assist in getting them off real-time blacklists- a standard DNS tool conspiciously absent in AWS.
A major problem for AWS and EC2 since its inception is that users with the publically generated EC2 IP addresses handed out by Amazon are extremely susceptible to getting stuck on spam blacklists, like Spamhaus or Trend Micro’s (Spamhaus is by far the more influential).
Read coverage about the most severe blacklist to date here.
It’s been an ongoing problem because Amazon doesn’t provide the usual level of service for users running websites or sending email from within EC2. Most hosts provide ways for an email server to politely verify that it does, in fact, originate with the domain name it says it does. PTR records do that and they have become a de facto standard for email hosts. Without them, a spam complaint can knock entire swaths of IP addresses out of the daylight and get tagged as spam providers.
The only way for hosts to get unflagged after their IPs are dirtied up with the spammer label is for the host provider to individually verify and notify the blacklist provider that the address is good. Amazon, being very highly automated and very popular, doesn’t do that well, and it took the blackout by Spamhaus last year to force the cloud provider to open up and start to reform its practices of not responding to customers having email trouble.
Hopefully this private beta is a sign that Amazon is going further and moving towards accepting more of its responsibilities as a web host- after all, giving out the address means you need to police the streets, collect the garbage and make sure the mail can go through. Hosters have taken this on their shoulders since the telecoms washed their hands of responsibilities around spam a decade now- its well past time for Amazon to join in.
UPDATE: Amazon confirms they are adding new features for DNS and conducting a private beta for selected users.
“Benchmarks!” you explode, your exquisitely sensitive logical faculties coruscating in outrage. “Benchmarks are vendor-driven popularity contests that cherrypick tests for results! Its a swamp of dismal nonsense, a perpetual statistical hellhole that means nothing! Benchmarks make engineers maaaad!!” you say.
Well, fine, then you do it, because there aren’t any for cloud yet, and I don’t care if Rackspace did go have it done. They had sound reasons, there is precious little precedent, and the results are informative, useful and not overtly flawed.
Analyst and high-traffic expert Matthew Sacks carried out the benchmarks on his website The Bitsource. Overall, the tests were rudimentary and carefully controlled. He tested the time it took to compile a linux kernel on every type of instance between the two services, and used Iozone to compile read/writes for storage systems.
Sacks, a systems administrator at Edmunds.com, said he developed the methodology himself.
“The idea behind it is that we can get a pretty good idea on how these instances stack up” he said, but it’s not a comprehensive metric. “I decided on kernel compilation as a general measure of CPU” he said, because it was well understood, easy and fast to replicate and uncomplicated in results.
Sacks said that Rackspace’s motivation was good old fashioned boosterism. “They had received reports from their customers that Rackspace was way better than EC2.” he said, so they decided to test that out with a third party. The results seem to bear them out.
“There are clear wins in CPU and disk performance,” he said. Rackspace beat EC2 instances in compiling by a slender margin in every case but one, and showed 5 to 10 times the amount of CPU availability. Disk Read/Write was also higher, sometimes by twice as fast, although random access tests were much closer, suggesting throughput on EC2 lags behind Rackspace even if data request execution doesn’t.
However, Sacks took pains to say that his tests did not mean that an application running on EC2 could be shifted to Rackspace and save either time or money by default. He said that users always had to consider the application, not just the infrastructure, and he wanted his tests to be a resource for people to come and compare their specific needs. “The variables are so great it’s hard to come up with a standard for testing [cloud]” he said.
“What I would like to do is test more providers,” said Sacks. Perhaps he’ll get the opportunity- he said the testing and review only took a few weeks and it wouldn’t be hard to repeat for different platforms.
Sacks’ two-man experiment aside, I can think of at least one cloud out there that could make EC2 and Rackspace look like a snail chasing molasses when it comes to kernel compilation and disk I/O:
“NewServer’s Bare Metal Cloud makes Cloud Servers and EC2 look like two sick men in a sack race” sounds catchy to me. I wonder if Rackspace will sponsor that benchmark?
But that’s the point– I want more third party tests like Sacks’. I want more and more and more independent review of what providers say and what they do. I can find more compelling independent information about the USB stick in my pocket than any one of the clouds that want me to trust my business to them.
It only took Matthew Sacks a few weeks to make a clean, well documented and useful set of benchmarks, even if that came at the behest of a vendor.
So let’s make more.
It’s fun being at the top of a technology wave. The past year in cloud computing has moved with the giddy, inexorable pace that marks a major technological shift in how we use computing power, and more importantly, how we think about it.
Cloud computing, barely a whisper in three or four years ago, is now firmly embedded, if still nascent, in the ontology of mainstream information technology. It’s a part of any conversation about IT anywhere. Even the dyed-in-the-wool Grumpy Sysadmin(TM) will, grumpily, talk about cloud computing.
It started with Amazon, online retailer par excellence, who found a way to get IT pros what they wanted, without the hassle of shipping a hundred pounds of metal and very clever sand per buy. The world had moved on to the Internet, they reasoned, so why not get what they want – CPU cycles and plenty of bit storage – without the part they hated.
And it worked. By 2008, the tipping point was reached, and analysts officially began cramming ‘cloud’ into their IT buying predictions, which, naturally, immediately drove IT management insane trying to figure out a) what the cloud was b) what it cost and c) whether or not they needed it. That made 2009 a lot of fun.
So what happened to turn cloud computing from ridiculed buzzword to reality?
Most of us started the year wondering what the devil it was: Fortunately, the government came up with a pretty definitive answer, which should tell anyone with an ounce of sense how robust and uncomplicated the concept is. Many others jumped on the bandwagon with glee, ‘cloud washing’ any old thing with an Internet connection.
Cloud terminology-hit mainstream newspapers, and the boob tube, where we got the standard expression of polite interest. It’s n ow on a par with ‘hacker’, ‘firewall’ and ‘servers’ for IT terms the regular press doesn’t understand but is happy to sprinkle over any tech reporting.
And that’s the long and short of it, kids. No matter what happened, cloud made sense to users, practically and economically. They bought in and they’re still buying in. Analysts and pundits weighed, promising riches and/or wrack and ruin, security folks went through the roof at every turn, and yet, somehow, the idea makes enough sense that people don’t care. They’ll put up with the potholes for the sake of the ride; its still a lot better than walking.
Just so with cloud computing.
Amazon says its new EC2 Spot Instances offering lets user bid on unused EC2 capacity and run those instances for as long as their bid exceeds the current Spot Price. The price will change based on supply and demand and is not for everyone obviously.
Amazon says applications suited to this kind of pricing model might be image and video processing, conversion and rendering, scientific research data processing and financial modeling and analysis, all typical EC2 use cases already.
From Amazon’s perspective, it might as well sell as much of its unused capacity as it can, even if it is just for cents on the dollar. Presumably there’s a way through the AWS console for users to be alerted when the bids and pricing changes to warn them if they are about to lose their instance to someone willing to pay more for it. That sounds like a nightmare waiting to happen.
There is a feature called Persistent Request that supposedly prevents your instance from terminating before it has finished your job, but you might want to try this out a few times before you do anything real on it!
Hats off to Amazon though for continuing to innovate around pricing and during it’s peak season too.
Is Google a self-regulatory public servant or rapacious and unscrupulous monopolist, asked Eric Clemons, professor of operations and information management at The Wharton School, during a talk at the Supernova conference in San Francisco today.
Clemons drew a parallel between Microsoft’s ability in the 1980s and 90s to use its dominant position in one market, operating systems, as leverage to control and dominate another, the browser market. Microsoft’s Internet Explorer browser bundled with its OS killed Netscape, a competitor in the browser market. This anti-competitive, monopolistic power was the basis of the DOJ’s case against Microsoft.
A decade later Google has 70% of the market for internet search, is launching new products like gmail, calendaring, office apps, mobile operating systems, laptop operating systems and a cloud computing development platform, all making headway in the market and in some cases knocking out established players. Ironically, Google beat Microsoft to a contract for outsourced email and calendaring in Japan recently by bidding 40% less than Microsoft wanted for the renewal of that contract. Touche! Indeed. But it’s bitter sweet.
Clemons belives Google will face increased scrutiny by the DOJ for its potentially predatory monopoly, in a non-contestable market, that could harm the competitive process and will likely kick-off an anti-trust suit against the company.
I think the real message in the professor’s comments is that the high-tech industry moves so fast that it lends itself to monopoly and that market players need to be aware and know when they are in danger of a Google or Microsoft jumping in.
For instance, watch Mircosoft’s Windows Azure cloud computing development platform. This could do for Platform as a Service what Google did for search.
Interop was great. It highlighted for me how far and how fast cloud has come along. It was all too brief; all the cloud sessions were slammed and debate ran high.
Panelists like AT&T’s VP for Business Development and Strategy Joe Weinman, for instance, did a splendid job laying out a cost model for computing that follows the utility model, building on the old saw that “nobody has a generator in their backyard anymore”, and arguing that computing services are subject to the same rules as the utility market.
They aren’t, for one reason paramount above all others. Data is unique. It’s not a commodity. One datum of information does not equal another. You don’t care if your neighbor washes his dishes with water drop 1 or water drop 2– but you’re sure going to care if he’s using your data set instead of his to make money, however.
For instance, I asked Joe Weinman what happens to his rosy cost model if net neutrality falls apart and carriers can engage in prejudicial pricing for network users. Naturally, he didn’t answer that, but it’s a primary example of the fundamental problems with treating cloud as a utility.
Of course, what happens, is AT&T makes more money and users that went headlong into public clouds are going to get royally screwed. No large enterprise is a) unaware of this b) going to do it.
Net neutrality is hardly the last political consideration- it’s just the one I brought up because there was a telco in the room. It’s easy to conceive of legislation that would irrevocably compromise data stored in outside repositories- indeed, as far as the rest of the world is concerned, in the US, it already is.
Political wrangling that raises your electric bill a few cents an hour is one thing. Losing exclusivity to your company data by fiat is quite another. It’ll be a cold day in hell before any large (or even medium) size enterprise is going to commit to using public compute as a utility with vulnerabilities like that. Sure, they’ll put workloads in Amazon, and take them out again; they’ll shift meaningless data into these resources, but they’ll never, ever be thinking of it the same way they think about the electric company.
It’s still a useful metaphor, in a rudimentary fashion; It gets the concept of ‘on-tap’ across, and it’s balm to the ears of people who didn’t really understand their IT operations anyway. But futurologists and vendors, especially vendors who want to monetize cloud on top of carrier services, need to understand that the message has come home; we know what cloud is, and we know what the real risks are. Now, show us your answers– In the mean time, we’ll keep our ‘generators’ and our data, to ourselves.
Appirio, a cloud integrator that helps companies use cloud services has published a map of the marketplace.
It attempts to distinguish between services that are true cloud offerings versus those that are hosted (single tenant / multi-tenant) versus private cloud offerings.
We didn’t get a chance to look at in detail yet, but we did notice that Salesforce.com is listed under platforms, but not applications. The last we checked their CRM in the cloud app was way more popular than Force.com, their development platform.
Still, it’s a great straw man that everyone should check out and give feedback on. Send suggestion to email@example.com.
The site consists of Paessler’s PRTG Network Monitor software running in various public cloud environments around the world. Each PRTG ‘sensor’ reports back information on perfomance, in real time, and displays each stream on the site for all the world to see. Founder Dirk Paessler said the idea struck him after he began testing his Windows-only software kit on newly available EC2 Windows images.
“We began to create a network of PRTG installations” he said, to see how each cloud would stack up in terms of performance. “My initial interest was to find a way to compare these clouds to each other,” he said. Public clouds were technologically diverse, and he wanted a way to visualize that in a rudimentary fashion for users.
The results, especially over time, are fascinating. Best performers for the money?
“Newservers has the highest performance.” Paessler said. “They’re an interesting case because they give you bare metal,” he said, despite selling capacity and time the same way Amazon does. That may give virtualization boosters pause for thought.
Does that mean everyone else is a poor relation? No, said Paessler. Amazon had excellent overall performance, and the major indicator for performance may be in how you consume, rather than who you consume.
“If you compare, on Amazon, a small CPU[instance] with a c1.medium, the c1.medium is giving you a lot more bang for the buck,” he said, something value-conscious cloud consumers may not realize. Different applications on different platforms may simply be better suited for one flavor of cloud over another, too. Paessler noted that his software was written with Intel processors in mind; using a provider who based their cloud on AMD CPUs showed a steep and inaccurate performance disadvantage.
Yet another twist was that cloud performance varied in a very great degree as tests moved from one cloud to another, Paessler said. The monitoring software tests short-term CPU capacity, data transfer, network response and similar metrics, and how a test turned out really depended on where you were watching from.
“We found out the connection from EC2-EU to EC2-US was very very fast, very reliable, but from EC2-EU to NewServers” it dropped off sharply, he said. Similar variations in response and performance were seen moving data to and from other clouds as well. Analysis is proving more complex than he had imagined. “We are talking about a [world-wide] web here,” he said.
Paessler doesn’t make too much of his new toy, however. He said that the site shows only the most basic and rough kind of information, and doesn’t take into account any number of factors. That will come with time as cloud matures, and as he finds ways to improve his little experiment.
“As with all benchmark testing, this is only a clue,” he said. “If you do consider cloud hosting, try two, three, four [providers] and try them with your application.” Don’t assume Paessler’s results hold true across the board.
In the meantime, the site will prove a fascinating time sink for statisticians, analysts and cloud watchers. Paessler said he put the site up partly to softsoap the cloud community, partly as a public service, and partly, just because he could.
With a few dozen reporting nodes, co-locating in so many locations would have cost a pretty penny, but in the cloud, the site doesn’t cost Paessler more than a few hundred dollars to run, and the results for observers are proving well worth it.
We all appear to have swallowed the cheerful news about cloud computing hook, line and sinker, thanks in part to a confluence of economic woes and a sudden maturation of the technology- by happy chance, cloud computing seemed like the answer to a lack of cash. Economies of scale and self-service meant we could gorge ourselves on CPU cycles and bandwidth and quit whenever the price went too high- we didn’t have to over-provision, worry about fixed costs; it was brilliant.
So that story’s told. But what about the other side? If legitimate businesses can sip or chug from cloud’s sea of resources as they wish, so can the crooked ones. Spammers, scammers, extortionists, terrorists, bot herders, warez traders and internet privateers that fight unseen wars over international IT channels for profit and patriotism, and so forth- all of them use servers and software just like regular upstanding folks.
Now, with self-service and automated compute clouds, they can have all they want for pennies a shot. Even better, there’s no work involved in snaffling up a PC or a web server for malicious intent- why bother pwning boxes when you can rent? All you need is a credit card that doesn’t have your home address on it – i.e., someone else’s — and away you go. Of course, this is already happening; as recent events at Rackspace and Amazon will attest, not to leave out the awful LxLabs tragedy. The industry is aware of the potential problems, but how ready are they?
Rackspace’s Tom Sands wrote a feel-good blog about it back in April; theoretical expert IT zombie holocaust survivor Hoff has also detailed a few potential Zerg rush techniques.
What scares me, however, is that the bad guys are better than the good guys at technology. Much, much better–they have to be. The black market is a very pure example of a well-regulated free market economy- purer by far than any in the ‘white market’. A well regulated free market economy, as we all know, is the most potent driver of innovation there is. The regulation comes from above in the form of punitive and reactionary measures taken by industry and government. The innovation comes from being forced to re-invent ways to obtain malicious goals.
That means that attackers are seeing and using the true potential of cloud computing long, long before the rest of the world will. Bad guys have already taken advantage of public cloud resources in fairly rudimentary ways, like hop-scotching around the world to fire up spam servers as they get detected, and engaging in cheap DDoS attacks.
Now, with cloud cartography a reality, the possibilities are staggering. As attackers realize the real fundamental change that cloud computing brings to IT — the ability to think in hundreds and thousands of nodes whenever and wherever, rather than a few piled up in a heap, we will see astonishing feats.
There are millions of credit card numbers floating around out there- how long before someone bothers to nab a few tens of thousands, open up EC2 accounts and start up every single available instance on Amazon all at once? I mean every last scrap of CPU they have, at once.
How about a 10,000 instance, 10,000 hour rolling blackout of Google that moves from Azure to GoGrid to AWS to Rackspace or from Mexico to Brazil to Canada to Japan?
Never mind the idea that someone could compromise actual cloud infrastructures; Its not like operators use simple, well known authentication and web-based management consoles to administer these astonishingly potent resources, right? Right?
Now, fast forward a few years- Brazil, Russia, China, Korea and India all have services comparable to Amazon. Now what, kids?
UPDATE: Why, look here! Step by step instructions on cracking PGP passphrases with Amazon EC2! skip to the end: job time reduced from 5 years to oh, several days. Wait!! Amazon is not keen on unexpected 100’s of nodes firing up all at once.
Oh, wait, somebody found a way to fix that. Carry on.