By your calculations, the average hard drive should last over 41 years (500/12) and a DIMM module would last over 82 years. (1000/12). Considering that the average life of a server is around 4 or 5 years, I don't see either of these numbers you supplied as "failure prone" In my experience, and believe many others will agree, the first parts that fail are mechanical i.e. Hard drives and fans. A major cause for component failure, other that mechanical, is generally heat (often times because of a failed fan...) This is just a theory, but processors are the focus of the cooling systems in a computer, as it the power supply. Often there is a heat sink on the north bridge chip and video cards as well. These cooling systems keep component at a relatively regulated temp when operating. RAM on the other hand has no cooling system, now generally RAM does not run that hot but repeated hot/cold cycles can cause material fatigue on just about anything. Still if your average DIMM will last 82 years, according to the numbers you have supplied, that is probably the reason companies are not too concerned with putting another noise making fan in the box. I have found that most problems with RAM are caused by poor handling, or heat due to dust build up. If you keep them cool, keep them clean, and supply them with clean power, your server hardware failure rate will be extremely low.
---------------------
I'll have to agree with Flame here. I rarely see RAM, or any other electrical component fail. However I have seen hard drives fail on a regular basis. I've probably worked with a couple of thousand servers over the years I other than RAM that was shipped from the vendor bad I can probably count the sticks on RAM on one hand which have failed after being in use for more than 30 days.
Flame, when I say that the current disks are failing every “500 months of server use”, that is not the same as saying they fail every five hundred months. In a facility that has 10,000 servers that facility experiences 10,000 server months each month. And in that facility that means 20 diskdrives on average will need to be replaced if they are IBM quality diskdrives. In four years that facvilty will experience 480,000 server months of use and will experience the replacement of 960 again if they are IBM quality disdrives. But if those servers are from HP those 480,000 server months would produce 3,117 disk replacements because HP drives experience a diskdrive failure every 154 server months.
You might be interested in the following replacement rates in “server use months”:
Device IBM HP Dell Sun Fujitsu
Sys BD 1291 742 895 727 1,427
Memory 407 255 1,044 163 150
Power Sup 3,874 1,293 1,044 283 2,855
Disk 500 154 261 232 317
CPU 2,113 10,024 6,267 1,490 2,855
HBA 2,213 8,019 2,089 2,981 571
The best composite server (Fujitsu sysy bd, Dell Memory, IBM power supply, IBM disk drive, HP CPU, HP HBA) would have 61% less failure then the best existing server.
If users measured the “Maintenance Rates” of servers and the “Replacement Rates” of the six major components that constitute 90+% of all failures the vendors would immediately respond by making much more reliable servers and components. And the cost of maintenance which is higher than either cooling and power cost would decrease substantially. Jim4522
Thank you for the clarification, It makes more sense to me now. I was thinking about this on the way to work, as integrated circuits in general are becoming more powerful, they are also becoming more fragile. Perhaps RAM chips are starting to demonstrate the physical limitations of their manufacturing process. I’d imagine that CPU’s and RAM both are made on rather densely populated dies, with extremely narrow connections. CPU’s get a fan at least. Your question has caused me to look into the chip manufacturing industry, I’m learning a lot!
Thanks!
-Flame