A few weeks ago we had a major crash that was caused by a bad install. I wiped the machine and reinstalled the OS (Windows Server 2003). At first everything appeared to be fine.
We are using:
2 x 4 core 2.33 GHz Xeon E5345
Intel s5000XVN motherboard
ATI Radeon x300 Video
The machine is used primarily as a workstation for research data extraction and analysis using SQL Server 2005.
The problem is a query would run for close to 10-minutes and the system would hang. There would be a complete freeze-up of the system. A NMI was not thrown - no errors were thrown. I believe this pointed to hardware.
I first flashed the BIOS and updated the firmware. I then tested the RAM using Memtest86+ -- and the RAM tested out fine. I swapped video cards and seemed to fix the problem as I was now not having any freezes during the running of queries.
Yesterday the system started hanging again, so I started checking the OS. The event log had some errors with regards to WMIxWDM. I checked a number of forums which recommended flashing the BIOS, and checking for driver conflicts. I proceeded to remove any unneeded drivers that may cause conflict and updated all the drivers I could. My errors in the Event Log disappeared; however, my system hanging has not, and now it hangs at the 20-minute mark.
So I am essentially at a loss. The RAM tested out, so I cannot see it being that. The power supply has not given me any indication that it is failing - no odors and all voltages look good (HWiNFO32). The temperature of each of the cores is stable between 36C - 46C (98.6F to 114.8F). The RAM I think is a bit hot - 93.5C to 108.5C (200.3F to 227.3F). The main board is around 40C (100F). Other than that - the Hard Drives are running on a hardware SAS raid and seem to have no issues.
Any ideas would be of great help. I am running out of things to try - other than telling the bosses they need a new machine.