Here’s an interesting issue that I came across. I have a few clients who are still using IBM Servers, the server referred to here is an x226 Server and a ServeRaid 7 controller.
The client said that he would come in to work in the morning to check the backups and he would have to put in a reason as to why the server shutdown. He mentioned that the backups have also not been working for the last week or so. Just to let you know this wasn’t my fault, the client is not managed services and it is up to him to monitor and let me know if they are having issues. Anyways back to the blog :). We came in to troubleshoot the backup and found that we could run a test backup. Just in case a backup to disk was also setup. Next morning it was reported that the backup failed again.
I went onsite and used IBM UpdateXpress, this CD provides an all in one firmware update for all supported components contained within the server. You can download it here. I Updated the Server’s BIOS, ServeRaid Bios/Firmware, and also the firmware of the drives themselves.
I also ran an app from IBM’s Site called Dumplog, this will “dump” the configuration and event logs from the ServeRaid controller, don’t try to decipher any of the info in the txt file, you need to send this to IBM and they will tell you what your next step it based on the info contained within that file. Download it here
Well to make a long story short… the server would crash when stressed with I/O. I figured it had to be the controller, so I ran the onboard diagnostics, and sure enough the ServeRaid test failed. I exported the test log to a text file so I could send it to an IBM Tech. Once the IBM Tech saw the dumplog files he was able to tell me that a specific drive was failing, although the drive was not reporting it to the controller properly, thus the global hot-spare wasn’t kicking in. I ended up running ServeRaid Manager and marking the bad drive defunct, then I pulled the drive out of the server. The global hot-spare then kicked in and the rebuild started.
All seems well. It would have been nice if the drive just marked itself bad in the beginning and the issue would have been resolved much faster.
IBM Tech Support requires firmware and drivers to be up to date before they will really help you, so everything I did needed to be done. IBM is now sending a tech onsite to replace the drive and also the tape drive as It still didn’t work in the end. A backup to disk job was configured before going off site.
Till next time!