Posted by: Ed Tittel
add a network integrity/uptime check to MSS diagnostics HP, further adventures in network troubleshooting, HP EX-495 MSS goes south and gets returned
In the past weeks, I’ve recounted some misadventures in the wake of replacing my heavily-used but now retired HP EX-475 MediaSmart Server (MSS). That unit has been replaced with a brand-new EX-495 unit, which upgrades the AMD Sempron processor to an Intel E5200 Core Duo, and bumps the internal memory from 1 to 2 GB inside the same compact, attractive, and highly-functional enclosure.
Alas, but my attempts to bring the new model up and get it working kept running into one technical problem after another. I let one HP support tech convince me it was all my fault because I’d upgraded the unit’s RealTek GbE NIC driver (which, BTW, I had successfully also done with the EX-475 without hiccup or incident), and rebuilt that server from scratch. Then I got with another extremely savving HP support tech who had me run the following tests when I reported the unit kept dropping off the network on a regular but intermittent and mostly unpredictable basis:
1. Unplugged the unit from my NetGear GS108 ProSave 8-port GbE switch, and directly into one of the GbE ports on my D-Link DIR-655SW router/switch device. No change in behavior ensued.
2. Started the EX-495 back up, got it up on the network, then unplugged it from the DIR-655SW to let it run all night. Was still working when I plugged it back into the network the next morning (night-time is when the unit does its backups by default, so this eliminated the activity that was underway when most of the network drop-offs occurred).
3. Plugged the EX-495 directly into the GbE port of a single machine I then proceeded to back up. Not only did the back-up fail because the unit once again dropped off the network, but I also observed that by unplugging the RJ-45 cable from the unit, then plugging it back in, it would resume activity as if nothing had happened. It took several such “in-and-out” maneuvers to complete that backup, but complete successfully it did.
Fortunately, when I reported all this to the next HP support tech I spoke with he agreed with my diagnosis that the NIC in the EX-495 was just plain wonky. He declared my unit “DOA” and authorized a return and exchange with the HP Shopping.com site that let me ship the unit out on Saturday at HP’s cost for a brand-new replacement to be shipped back out to me as soon as HP’s RMA unit receives my incoming shipment. Estimated time for this turnaround is about a week, so I may actually be able to get something done in the meantime, without having another EX-495 to futz around with in the interim.
Probably purely on an automated schedule, a customer satisfaction survey showed up from HP this weekend — ironically, right after I’d boxed up my first and apparently failed EX-495 to ship it back to them for a warranty replacement. Don’t get me wrong: although it took a week to figure out the unit wasn’t working as it should have been, the Canadian-based MSS support team was great to work with, and I actually wound up learning some very useful things from them along the way (and just for the record, I’ve reviewed all three generations of MSS servers for Tom’s Hardware as they’ve come out, including the EX-47*, -48*, and 49* models) that even I hadn’t come across before. I also really like these products and find them to be an important component of my home network where their ability to back up systems automatically every night has actually saved my hindquarters more than once when various hardware glitches required me to restore (and in one case to completely rebuild) a vital production or test machine.
That left me a little uncertain as to how to respond to a survey of my satisfaction. In general, I am very satisfied with the MSS boxes, and I was completely happy with the level of support, knowledge, and professionalism of the MSS support team (over the week I was troubleshooting, I worked with 4 different people and they were all great). In particular I am also pretty unhappy with the unit they shipped me, and likewise unhappy that I probably spent 20-plus hours troubleshooting a box that’s supposed to deliver a straightforward plug-and-go experience to (mostly unsophisticated and technically unsavvy) users. If I didn’t know as much about networking as I do, the MSS support team could easily have spent 3 or 4 times the 4 or 5 hours they spent on the phone with me in total, and the poor schlub on the other end of the phone could have spent 40 or more hours just to figure out the unit wasn’t working properly and needed to be replaced.
Ultimately, I would recommend to HP that they add some kind of network monitor, or a keepalive/network availability heartbeat and/or check, to the MSS software arsenal. It would have detected the network drop-offs (which never showed up in the Application or System logs as errors on that machine) within the first day of the install, and would have shortened and streamlined the whole diagnostic and RMA process. That’s how you learn what kinds of tools are needed, I guess, so my concluding beef is that I turned out to be the guinea pig who had to learn that lesson so that others could avoid my pain.