December 11, 2007 9:52 PM
Posted by: Joe Foran
, Joseph Foran
, VMware ESX
At the risk of sounding like a commercial, Hyperic HQ is my leading-choice for agent-based systems management tools to handle both VMware and non-VMware systems. Personally I tend to prefer non-agent-based systems, but the Hyperic tools work, and work especially well for VMware environments. I like them because I’m an open-source nut – first attracted to it because of the price; I found I had a zero-dollar a day addiction to the LAMPP stack, MySQL, and Dia. Like all GPL junkies, I kept looking for more, and after a few years I found Nagios, then Groundwork, then Hyperic while doing some research for a presentation at Data Center Decisions 2006. I’ve been hooked on them since. I particularly like Hyperic’s rewards program for contributors who find bugs, fix bugs, and make the software better.
From 50,000 feet, Hyperic’s monitoring architecture looks like this:
You may be wondering why I use anything to monitor my VMware environment aside from VirtualCenter – number one is that I do use VC, but I prefer not to use multiple tools. I’ve been in a Fortune 100 company where there were so many 32″ LCD screens on the wall that you didn’t really know what was happening because you were getting so many different results from so many different tools. It was about as useful as having nothing at all except for a user’s phone call to tell you something was down. I have physical and virtual systems that I need to monitor, and until the day comes that my company goes 100% virtualized, I need one tool to monitor them all (please feel free to insert your own LotR joke here).
I’ll bypass the non-VMware material and get to the relevant point – using Hyperic to monitor VMware products – after this brief warning:
Reading the install manual is generally a must – there are several caveats to getting Hyperic fully functional, notably around graphing and charting and deprecated libraries that may need to be installed. Or, you can skip all that by downloading the pre-made Virtual Appliance. If you opt for that option, install the VMware Tools, or else time drift will cause a problem with reporting.
For this run I’m using the prebuilt virtual machine. If you need to install your own server, you need the following:
- 1 GHz or higher Pentium 4, or equivalent (2 x 2.4GHz Pentium Xeon or equivalent recommended)
- 1 GB RAM (4 or more GB recommended)
- 1-5 GB Free Disk Space
On Linux systems, you’ll also need an X server running (or at least the libraries).
To install, you need to run the command setup.sh -full and answer the prompted questions. Overall, it’s a straightforward installation. On a Linux system, execute w/ hq-server.sh start. At another point in the series, I’ll go into using datbases other than the default. You can use Oracle or Postgres, but not MySQL. I’m a big MySQL fan, so I would like to see support for it added later. EnterpriseDB, being a Postgres database engine, is supported.
Now, onto the agent part of the installation… it requires touching the guests, and this can be easily forgotten when you’re of the mindset that you can manage so much through VC. Some preparatory work is needed in order for proper operations on the ESX host. First amongst these is the creation of a user account (hqadmin is the default used by the agent) on the local machine. This account needs to have the admin-level role in ESX.
To install the agent:
(where x = the version number of the agent you’re installing)
You will get some prompts, most of them self-explanatory, about what sort of install you want to perform. I recommend saying yes to secure communications and using port 7443 instead of 7080 as the default port. When you are prompted for the user name, use the account you created earlier.
Configuring ESX3 to report to the HQ server requires some modification of the firewall. It’s easily accomplished with a couple of commands:
esxcfg-firewall –openPort 7443,tcp,out,HypericHQAgent
esxcfg-firewall –openPort 2144,tcp,in,HypericHQAgent
Note that if you selected the default port (7080) when you set up the agent, rather than the SSL port of 7443, you will have to use that port number. Again, I recommend using 7443 for secure communications.
Once the host has the agent installed, you can install agents on the guests (virtual machines) in the same fashion. When these agents are installed, their descriptor the Hyperic management console will indicate to which host they belong.
The VMware-specific monitoring information covers a lot of VM- and Host-specific functions on ESX hosts. The following, taken straight off Hyperic’s documentation, lists them:
Vmware Monitoring Specification
- General Server Metrics (CPU used, Total Memory Used, etc.)
- Memory Available for VMs
- Memory Used by VMs
VMware ESX 2.x and 3.x VM NIC Metrics
- Packets Transmitted
- Packets Transmitted per Minute
- Packets Received
- Packets Received per Minute
- Bytes Transmitted
- Bytes Transmitted per Minute
- Bytes Received
- Bytes Received per Minute
VMware ESX 2.x and 3.x VM Disk Metrics
- Reads per Minute
- Writes per Minute
- Bytes Read
- Bytes Read per Minute
- Bytes Written
- Bytes Written per Minute
VMware ESX 2.x and 3.x VM Metrics
- Process Virtual Memory Size
- Process Resident Memory Size
- Process Page Faults
- Process Page Faults per Minute
- Process Cpu System Time
- Process Cpu System Time per Minute
- Process Cpu User Time
- Process Cpu User Time per Minute
- Process Uptime
- Process Cpu Total Time
- Process Cpu Total Time per Minute
- Process Cpu Usage
- VM Cpu Wait
- VM Cpu Wait per Minute
- VM Cpu Used
- VM Cpu Used per Minute
- VM Cpu Sys
- VM Memory Shares
- VM Memory Minimum
- VM Memory Maximum
- VM Memory Size
- VM Memory Ctl
- VM Memory Swapped
- VM Memory Shared
- VM Memory Active
- VM Memory Overhead
- VM Uptime
Most of these have a default report time of ten minutes, though some of the more critical and/or volatile report every five minutes. Most of the ESX host reporting and all of the VM Disk and NIC reporting are on ten-minute report timers.
This has some unique operational opportunities in managing virtual desktops as well as servers – namely being able to proactively monitor individual workstations and prevent system faults from becoming productivity-impacting problems for users and generating helpdesk tickets on desktops the way it’s done on servers in most enterprises.
That should be enough for now… more in later posts in this series, complete with some screenshots.
December 11, 2007 9:51 PM
Posted by: Akutz
, VMware High Availability (VMware HA)
Systems admin Michael Gildersleeve is evaluating VMware for High Availability (VMware HA); but he’s not sure if that product is going to work well with his legacy software. He’s not sure, either, if HA is as mature and robust as other products on the market.
I’m answering his call for more information. I hope that you will, too, either by commenting on this post or emailing me a firstname.lastname@example.org.
Gildersleeve works for a company that has a Progress database running on a UNIX server. Hundreds of Windows clients and Web applications are attached to that database and server through Progress Brokers via service file ports.
“I need to provide 365 by 24 by 7 up time,” Gildersleeve said. “With our new web business, East and West coast facilities, and vendors managing our stock and replenishment, we need to be available all of the time.”
He wants to run his database across at least two servers, in a setup like an Oracle Real Application Cluster.
“This would allow me to upgrade the OS (operating system), reboot a server or take a server down for maintenance without affecting the database or the users. So far I have only found solutions that will give me a two-to-five minute downtime between switching from one server to another.”
Gildersleeve has looked a little at server virtualization. He’s evaluating server virtualization options and VMware HA to see if he can cut the downtime to nil. It seems to him, however, that virtualization options only cover one server at a time. He wants 100% uptime across several servers used for database activities.
“What if I need to do an OS update or patch, or what if some critical hardware fails? What I have seen so far is that if I upgrade my Progress app to v10 (Progress OpenEdge), and then move to two Integrity servers running (VMware) High Availibility; if one server fails or if we need to do maintenance on a server, we can manually switch to the second server. But the problem with this is that my users will feel the switch because I will need to bring one server down. They will need to log out and in again to the app, or whatever needs to be done to bring the ready server into production mode.”
Gildersleeve is willing to evaluate Sun Microsystems options, if they are truly viable for running Progress. Microsoft operating systems are out of the question, however.
In his evaluations, Gildersleeve has come up with a lot of questions, and he’s looking for advice from HA experts. Could you provide some advice and share your experiences by commenting on this post or emailing me a email@example.com?
December 7, 2007 7:59 PM
Posted by: Joe Foran
After much hoopla and fanfare, VMware spent a month and a half without issuing a SKU for it’s new small-to-mid-sized business edition of VI3.5. Happily, and with thanks to a couple of vendors who took the time to email me, I’ve learned that the new SKUs are out. While the pricing has been available since just after the announcement, the lack of a SKU means that no quotes or orders could be placed. I don’t know if the kits are actually shipping, but that at least gives me something to follow-up on in a subsequent post.
This is good news to the many SMBs that stand to benefit from virtualization, and good for VMware as having a real product to order is better than vaporware in the growing SMB battle between VMware, Citrix, and Virtual Iron.
December 7, 2007 7:58 PM
Posted by: Rick Vanover
, VMware ESX
How many times have you gone back to your resource pools and wondered why your performance is not what you were expecting? Here is a quick tip on your configuration that may help understand your situation. For small- to medium-sized ESX implementations, have a uniform value for the shares for CPU and RAM on your resource pools. Modifications of the shares values can lead to issues throughout your ESX implementation if not done cautiously.
Fair Playing Field
In my experiences, when the shares are equally set – at the default values for ‘normal’ – your configurations for reservations and limits can be more correctly enforced. I’ve many times tried to grasp the concept of the shares, and this description seems to describe it best: “Consider the shares as bandwidth to use the resource reservations and limits you have set forth.” In this fashion, the limits and reservations can have the behavior you are expecting.
Do not Have Anything Set to Unlimited
While we are talking about the resource pools we should definitely mention that if you have anything set to ‘unlimited’ – you are bypassing all management of the pool. This will go for the virtual machine host resources in an unlimited fashion, and can negatively effect other guests.
December 7, 2007 7:53 PM
Posted by: Rick Vanover
, Rick Vanover
, VMware ESX
VMware ESX has many commands on the console that you can use to get detailed information on the status or configuration of many elements of your installation. Recently I had an opportunity to run
esxcfg-info on the console (or can be run in the same fashion over SSH) and really liked this information. This tool will give you basically everything about the ESX host, including an inventory and details of the virtual machines on the host.
Lots of data
I ran the following series of commands to make a place for these files on the ESX system:
esxcfg-info > esxcfg-info.export.12.7.2007.text
Yes, the name of the file is long – but that will come into play later. Don’t run this command interactively on the screen; way too much information is generated. On my single system with only nine virtual machines in the inventory, it was around 900 KB. I really like this export as a documentation tool and benchmarking mechanism.
I will find myself running this command as part of my server documentation (kept locally in the \home\adminprofiles path or copied elsewhere), benchmarking, and baseline configuration information. I’ll also find this file beneficial for comparisons among systems within VMI.
Surely this is mainly as support tool for VMware, but there is really good information in here. I’ve taken out lines for simplicity, check out the following snippets of the output:
|----Primary Name Server................................172.16.25.2
|----Secondary Name Server..............................172.16.25.3
Virtual Machine Guest Information
\==+Virtual Machines:Check out the tool on your ESX systems. How have you used some of the other esxcfg-… commands?
|----UUID................................50 01 42 0b e6 a9 6d fb-37 54 e9 b1 42 c2 3e ce
\==+Memory Client Stats :
|----Current Size.....................512.00 MB
|----Target Size......................372.36 MB
\==+Memory Allocation :
\==+CPU Allocation :
December 3, 2007 10:22 PM
Posted by: Akutz
Not everything works virtually the same way it does physically. In this post, I’m referring to the network drivers that come with VMware Fusion (latest release). This is an important issue because it brings to light that point that many people glaze over: the difference between the physical and virtual IT world.
Take, for example, the case of one nerd-without-a-life-but-mysteriously-with-a-wife who spent last weekend setting up a VPN using L2TP over IPSec and took forever getting it working. This nerd — we’ll call him Landrew Lutz — even went so far as reducing his configuration to PPTP just to lose some of the complexities. Still no joy!
Turning on debugging in PPTP reveals the following truth facts:
Dec 2 05:09:52 pan pptpd: CTRL: Starting call (launching pppd, opening GRE)
Dec 2 05:09:52 pan pptpd: GRE: Bad checksum from pppd.
Dec 2 05:10:22 pan pptpd: GRE: read(fd=6,buffer=610ba0,len=8196) from PTY failed: status = -1 error = Input/output error, usually caused by unexpected termination of pppd, check option syntax and pppd logs
Dec 2 05:10:22 pan pptpd: CTRL: PTY read or GRE write failed (pty,gre)=(6,7)
It seems that the generic routing encapsulation (GRE) packet is not being successfully read from the network interface and thus resulting in a bad checksum. Okay, not so odd, right? Except it was odd because that EXACT pptpp.options file that was working for PPP on Landrew’s router — the router has a PPTP daemon — did not work on the VM (virtual machine) he was using for testing. WTF mate?!? What was the difference? Landrew thought about it and came to the realization that the router was physical and the VM was, well, virtual.
Now, Landrew did so happen to have a physical Linux server running Ubuntu Feisty Fawn 7.04 64-bit. He installed PPTP on that and using the same configuration EVERYTHING WORKED THE FIRST TIME!!! Again, WTF mate?!?! Given that Landrew’s original VM was actually running Ubuntu’s latest release, Gusty Gibbon 7.10 64-bit, he decided to blame everything on the version of PPP that ships on Gutsy as opposed to what is running on his physical server and router. Except that PPP has not been updated in over a year (ftp://ftp.samba.org/pub/ppp/). Yeah well, Landrew decided to give virtualization the benefit of the doubt and installed a VM with Feisty Fawn and set PPTP up one more time. It did not work!!!
This means that the likely cause of the error is the network drivers that VMware is providing with Fusion. Something is cutting off packets before they can go home. The packets thing they are too good for their home? Why do the packets hate poor Landrew? This is a rather glaring example of when going virtual can hurt rather than help.
Although the last few paragraphs only took a few minutes to read, it took Landrew between 20-30 hours to figure all of this out; because, after all, virtualization is never to blame, right?
I’ve further tested this on ESX 3.0.1 and VMware Server 2.0 without error. Can anyone test this on VMware Workstation and VMware Server
1.0.4 for me? Thanks!
November 30, 2007 7:12 PM
Posted by: Akutz
, VMware ESX
Up until now there has not been a way to manage VI3 from OS X clients. This stinks for those of us with our shiny PowerBooks and MacBook Pros (I just can’t let my 12″ go!). What to do, what to do… Then, suddenly and without warning BOMP BOMP BOMP (10 points if you know where that line is from) VMware released the VI Perl Toolkit, or as I like to call it, the viper toolkit (look at the directory name from a shell). Unfortunately for us there is no OS X version of the installer, but since VMware released the source, all that is required is a little ingenuity and we have ourselves a working VI client.
Step 1, cut a hole in a box. Oops, wrong series of steps. Step 1, download the viper toolkit from VMware at http://www.vmware.com/download/sdk/index.html. Make sure you get the source code (VI Perl Toolkit – source), not one of the pre-built installers.
Once the tarball is downloaded, deflate it and change directories into it. Go ahead and attempt to create a makefile with:
You will receive something similar to:
akutz@amends:viperltoolkit$ perl Makefile.PL
Checking if your kit is complete…
Warning: prerequisite Class::MethodMaker 2.08 not found.
Warning: prerequisite Crypt::SSLeay 0.51 not found.
As you can see, some perl modules are missing. Good thing OS X (10.5.1 Leopard) comes with cpan. Install the missing modules by typing:
cpan Class::MethodMaker Crypt::SSLeay
Accept all of the default for both of the modules. Now you can create the makefile needed to build the viper toolkit:
After you create the makefile, test it with:
You should receive something similar to:
akutz@amends:viperltoolkit$ make test
PERL_DL_NONLAZY=1 /usr/bin/perl “-MExtUtils::Command::MM” “-e” “test_harness(0, ‘blib/lib’, ‘blib/arch’)” t/*.t
All tests successful.
Files=1, Tests=1, 1 wallclock secs ( 0.75 cusr + 0.06 csys = 0.81 CPU)
You want it to say “All tests successful.” Next install the viper toolkit by typing:
sudo make install
This will place the viper toolkit common runtimes and libraries and man pages in the appropriate locations. For OS X that is ‘/Library/Perl/5.8.8/’ and ‘/usr/local/share/man/man3/VMware’. You should see the following output:
akutz@amends:viperltoolkit$ sudo make install
Appending installation info to /System/Library/Perl/5.8.8/darwin-thread-multi-2level/perllocal.pod
Well, now you have a working VI client for OS X! Stay tuned as I show you more how the viper toolkit can make your life as an ESX administrator much easier.
Hope this helps!
November 28, 2007 6:19 PM
Posted by: Joe Foran
VMware and Foedus joint issued a white paper some time called Tips and Tricks of Implementing Infrastructure Services on ESX Server. It’s a good read, although the first two paragpahs are meant for the virtualization-uninitiated and can be skipped if you are familiar with how to use VMware, VMotion, DRS, and HA to acheive redundancy and uptime improvements. The paper does focus largely on Windows networking, with Active Directory being repeatedly referenced. Since similar non-Windows services (such as OpenLDAP, OpenDirectory, eDirectory, etc. etc.) are affected in similar ways, it’s still a good cross-platform read despite the focus on Microsoft technologies.
If you are only going to scan the paper, there is one section I highly recommend reading, because it’s very ofen overlooked – Shutdown and Startup Sequence.
* I * Cannot * Stress * the * Importance * of * this * Enough *
Things crash – even in the best environment, you can expect to have problems now and again, and not planning this out just because you have other forms of reliable failover is no excuse. I have personally seen, in the lab and in practical reality, what happens when this tool isn’t used – it gets very nasty when you have a database server come up AFTER the server hosting and application that needs to access a database comes up, or when VPN servers come up early in the order and can’t find an authentication source.
A great tip for larger companies:
User environments with many Active Directory driven policies will want to ensure that the ESX Server %Ready time as viewed in the ESXTOP utility stays under 10
There is also the obligatory mention of time-syncing. Going back to my view on startup sequence… if you don’t install VMware Tools, you will have clock drift. If you have clock drift, services that depend on timed replication (particularly DNS, Active Directory, File Replication Services, and the Distributed File System). If you have an operating system that isn’t compatible with VMware Tools, the simple answer is this – don’t virtualize it.
What I was pleasantly surprised to see was a strong chunk of space allocated to setting up DMZ environments in VMware, which is somewhat of a sticky subject because of the two main ways it can be done (in virtual switching, or in the physcial network using additional nics in the host). The paper references one of my favorite open source projects of all time IPCop, a firewall that supports mulitple DMZs and is available as a VMware applicance. The paper’s approach is, appropriately for the material, focussed on the virtual swithcing, and they cover it well. Truthfully, I think the papwer should be renamed “How to use VMware and IPCop to make a DMZ, plus some other generic advice” because of the time they spend on DMZ vs. everything else.
To setup up IPCOP as a DMZ firewall, create two virtual switches on one or more ESX Server hosts named DMZ-EXT and DMZ-INT. Plug the red NIC of IPCOP into DMZ-EXT and the orange NIC into DMZ-INT. Plug the green NIC into whatever virtual switch is associated with the internal LAN address space.
This paper is less a tips and tricks paper and more of a general advice paper, so I wasn’t overly impressed. From the title I was expecting more focus on VMware tips and tricks, but because of the DMZ section, it still gets 7 pokers.
November 27, 2007 6:43 PM
Posted by: Rick Vanover
, Rick Vanover
, VMware ESX
, Windows Computing
The Virtual Infrastructure Client used to manage your VMWare ESX hosts may not perform all tasks correctly when running on Windows Vista. There are a handful of issues reported from users in the VMWare communities board such as “Object reference not set to an instance of an object” when trying to use the VMWare cloning features. I was fortunate enough to experience this issue just today. This thread mentions to update some time zone information into the registry. On another Vista system, we performed this registry tweak, and the VI client worked. However, this is not a good resolution from the Vistal environment perspective.
Critical VI Tasks Use on XP
If you are performing some critical VMWare tasks that are prone to failure when the VI client is installed on Vista, a better idea would be to perform those tasks from a Windows XP system. Though the issues reported with Vista are minor, and the VMWare server-side parts have intelligence to manage the issues as they occur to avoid integrity issues with the virtual machines or other pending tasks.