Posted by: Ed Tittel
incremental troubleshooting puts new production machine on solid footing, not smart to overclock production PCs
File this one under the heading of “Another Windows war story.” It dwells on strange shenanigans, and lessons learned, in switching over from an old, familiar, and reasonably stable desktop to a newer and snappier, almost unknown, and possibly stable replacement desktop. My biggest reason for making the switch comes from increasing use of virtualization, where 4 GB of RAM just doesn’t cut it any more. And then, too, there’s always the chance to get a bigger, faster, more powerful machine any time you make such a move.
Right now, I’m almost through migrating from my three-year old production PC (Gigabyte X38-DQ6 mobo, QX-9650 quad-core CPU, Windows 7 Ultimate x86, Intel 80GB SSD, and 4 GB RAM) to my year-old former test machine (Asus P6X58D-E mobo, i7 930 CPU, Windows 7 Professional x64, OCZ Vertex 3 SSD, and 24 GB RAM). I had overclocked the test machine to see how fast I could push the i7 930 Bloomfield processor it contains. Rated at 2.8 GHz, I got it to 3.8 GHz with what I thought was a reasonable degree of stability, and pushed the 667 Mhz memory to 800 MHz without any signs of instability as well.
But alas, those conditions persisted only until I switched the machine from test to production duty, and really started hammering away at it. And of course, I started hanging the typical plethora of peripherals most production machines tend to acquire (and with which very few test systems must ever contend): two 27″ monitors, a laser printer, USB keyboard and mouse, USB media card reader, 2 USB external drives (1 USB2, the other USB3), 2 eSATA external drives along with two more internal 1 TB+ conventional hard disks, and a high-end Axiom audio output rig to my speakers.
I also doubled up the memory in the unit–this mobo uses tri-bank memory, so I’d inserted 3×4 GB DIMMs for 12 GB of total RAM for testing. Another trio of the same memory modules (G.Skill F3-12800C19-4GBRL units that run 9-9-9-25-34 at 667 MHz) brought the total RAM configuration up to 24 GB, now running quite nicely on my new production desktop. Here’s a snap of CPU Monitor showing the new clocking and memory size:
On Sunday morning, when I sat down to the machine to search out and install Samsung’s own latest driver for its ML-2851ND laser printer it started shutting down on me when I’d finished my task and tested how well it was working. Because the Devices and Printers widget in Control Panel appeared to have returned to normal operation, I didn’t think it was driver-related. My suspicions that the print driver wasn’t the culprit were confirmed when (a) I succeeded in printing test and other pages without difficulty and (b) when the machine continued to shutdown and crash intermittently over the next two hours as I got into troubleshooting mode.
Having seen weird behaviors in the past on Gigabyte Motherboards (in the ICH3 – ICH7 era) when all memory slots were populated, I first tried removing half the RAM to see if the system would stabilize. No joy. Next thing I did was to jump into the BIOS, turn off the overclocking for both CPU and the memory channel, and presto! Everything settled down to its usual rock-solid behavior, so I made a disk image. After installing a bunch of useful but not mission-critical utilities to give the system a workout, I realized that stability was restored. And in the 20 hours or so it’s been since I re-inserted the 3 new RAM sticks, the machine has continued to run without any serious hiccups (other than a disconnected wireless mouse transceiver that fooled me into a forced shutdown), as shown in my current Reliability Monitor graph:
Before I started migrating on 5/28, the test machine showed nothing but solid “perfect 10″ performance. Once I started installing new devices and driver on 5/28 (the first big dip in the curve) I shook things up with a Windows hang, and a couple of major issues with my Dell AIO968 drivers (that printer is now happily attached to my wife’s PC upstairs, where we now use it only for printing color output). Configuring various applications — Outlook, mostly — got me dinged once, and realizing that the ML-2851ND driver I downloaded from DriverAgent was hosing my machine cost me a couple more hickeys as well.
Yesterday, I got dinged when trying to remove the old ML-2851ND driver caused a system crash, and then again when the system started spontaneous shut-downs immediately thereafter. I still have issues with the video driver for my GeForce GTX 460 shutting down right after system startup, but the PC recovers quickly and without discernible side effects, so I’m OK with waiting to identify and install a more stable driver for that graphics card.
Otherwise, returning to safe clock settings for CPU and RAM seem to have brought things to a quiet, steady level — just the way I like them. And now, the new production machine is starting to feel like a real production machine, indeed.