Server Farming

Aug 22 2008   4:50PM GMT

VMmark a server vendor leapfrog game

Bridget Botelho Profile: Bridget Botelho

Active Directory

This week I wrote a follow up story on VMware Inc.’s virtualization performance benchmarking tool, VMmark, and found it is mainly used by vendors as a way to market their servers.

Server vendors run the VMmark test under a set of guidelines and submit results to VMware for posting. It is my suspicion that vendors play leapfrog with VMmark by looking at exsiting VMmark results and only submitting their performance results when theirs are as good or better.

For instance, IBM submitted a benchmark for its 16-core System x3850 M2
running VMware ESX v3.5, which trumped the other published as of March 2008. IBM then published a press release to brag about the results, but within a few months, Dell submitted results three PowerEdge systems sporting better virtual machine (VM) performance than IBM, and Hewlett Packard (HP) beat them all out with its ProLiant DL585 G5 server results published August 5.

HP also sent out an email to press this week boasting their top 32-core results, but didn’t mention one minor detail; they are the only vendor with results in the 32-core category so far. Sure, they are number one. They are the only one.

System Administrator Bob Plankers sums this game up nicely in his blog with a post called “Why VMmark Sucks.”  Here is what Plankers had to say:

“Having a standard benchmark to measure virtual machine performance is useful. Customers will swoon over hardware vendors’ published results. Virtualization companies will complain that the benchmark is unfair. Then they’ll all get silent, start rigging the tests, scrape and cheat and skew the numbers so that their machines look the greatest, their hypervisor is the fastest. Along the way it’ll stop being about sheer performance and become performance per dollar. Then CapEx vs. OpEx. Watt per tile. Heat per VM. Who knows, except everybody will be the best at something, according to their own marketing department.”

In addition, the benchmark is a real pain to set up and run, and the ‘free’ VMmark software requires other expensive software to work. According to VMware’s website,
VMmark requires  licenses for the following software packages;

  • Microsoft Windows Server 2003 Release 2 Enterprise Edition (32-bit)—thre 32-bit copies per tile (two for virtual machines and one for that tile’s client system), and one 64-bit copy per tile (for the Java server virtual machine)
  • Microsoft Exchange Server 2003 Enterprise Edition
  • SPECjbb2005 Benchmark
  • SPECweb2005 Benchmark

Plankers said he won’t be wasting any time or money running VMmark. “Instead, I’ll be in meetings explaining to folks why we are maxed out at 30 VMs per server when the vendor says they’ll run 50. Or why we chose VMware over Xen, when Xen claims 100 on the same hardware. I’ll have to remember the line from the FAQ that says “that VMmark is neither a capacity planning tool nor a sizing tool.”

Which begs the question: if it isn’t for use in sizing or capacity planning, exactly what is it good for?”

VMware says the benchmark is good for users who are making hardware purchasing decisions.

“The intention [of VMmark] is that customers can look at the results and make decisions based on what they see. It isn’t just about the fastest server; it’s about making system comparisons; between blades and rackmounts or a two-core or four-core system. Someone can see how much more performance they get from upgrading to four core processors, for instance,” said Jennifer Anderson, the senior director of research and development at VMware.

This makes sense, but as Plankers said, users should beware of benchmark manipulation by vendors and know that the results do not reflect the same workloads that users will run in their own data center environments.

1  Comment on this Post

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.
  • Tuomoks
    Benchmarks! Well, generic benchmarks especially, even TPC, can be used as guidelines, mostly within one vendor or at least same type of architecture BUT only if you have a huge experience how to read the results and analyze the environment very carefully - duh! I have been running both generic and application benchmarks since -72 in all sizes of systems, from small to super computers and huge clusters both as a customer and as a vendor. So - yes, after 10 years or so and a lot of work you start getting the hunch what the results really mean converted to the real world - just to find that even some a "small" change changes the whole picture next month. Very amazing what you can sell to most customers today. Even some of the large users have (once again?) lost the view and is just reading the marketing / advertising numbers or getting them blindly from a vendor rep. The real problem is, seen this before, companies don't realize that a system is a sum of many things and especially the cost/benefit is even more, much more complicated than just a benchmark result. Anyone remember those MIPS fights (even long before PC's existed), it just comes and goes, again and again? It is a catch 22 for smaller companies which don't have the resources or can't influence the vendors, should be no problem for large corporations but.. My advice for smaller companies would be, read the reports with (big) grain of salt, do your own homework - even a little helps, but the best, be active in computer user groups and communities, etc - maybe the companies don't share their information very easily but the developers, analysts, whatever are not so shy. For a large company it is no brainer, any vendor arranges a benchmark environment for your application when they hear words as 10K txs/second or nnn number of nodes/clusters/whatever. Unfortunately (not again!) even large companies seem to have forgotten that your infrastructure has to match, otherwise it doesn't matter how fast you execute something or how much throughput one of the components have. IMHO capacity planning has gone worse, not better lately and already causing problems everywhere.
    25 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: