Two approaches to maintenancing servers

820 pts.
Tags:
Backup
Server maintenance
Virtualization
When a server suffers a hardware failure there are two ways to respond. The traditional way is to have onsite maintenance which can be very expensive the other is to replace the failing server with a backup device. In a large server site backup seems the more economical approach but is often opposed by operations because formating/setting up a new server seems to creat problems. Is it easier to use a backup server if the shop is using virtualization? Jim4522

Answer Wiki

Thanks. We'll let you know when a new response is added.

Hey Jim,

For all my servers in my environment I have redundant scsi cards and duplexed drives. If a controller or a hard drive fails it will switch over automatically. We use all the same or similar hardware. If a motherboard dies we have a spare machine waiting for us to pop our cards and drives into and we are back up shortly.

If you are using virtualization you can create the machines backups on standby and flip the switch when necessary. But personally I like having options. and Different setups for different environments. If you update this with a little more information on your setup I can give you a better answer.

Discuss This Question: 3  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Jim4522
    Karl, I am doing an analysis of an IT organization that has about 10,000 servers from five different vendors totalling 115 different models, they pay about $2.5 million per year to one of its vendors to maintain all of its servers in five different locations. They want maximum protection so they pay for 7/24 onsite maintenance response but when I look at what the vendor does for the money paid it doesn't seem to match the problem. The vendor does about a thousand installs and unistalls and about 3500 service actions per year. But 55% of those services calls is to perform functions any user could perform, such as rebooting or powering on or off a server. The remaining 45% mainly involves replacing disk drives, power supplies, Dimm cards and system boards which don't seem that complicated. So one of the alternatives I am thinking of is to suggest that the user consider doing his own maintenance and just pay the vendor a set fee for every install and uninstall. The other problem I am looking at is the very real possibility that the vendor is over maintenanceing this customer. I say that because I am currently monitoring two different organizations with similar hardware but with different vendors doing the maintenance. In one case one user averages a maintenance action about once every 500 server months over a year's time period, that is in a room of 2000 servers he averages 4 maintenance actions per month, and the other user averages one maintenance action every 50 server months, that is in a room of 2000 servers he averages 40 maintenance actions. I would think that if in the first room the user is being under-maintained the performance of the server would be noticable, but over maintenance is not noticable unless you are in a position to compare the two. Jim4522
    820 pointsBadges:
    report
  • Pressler2904
    Karl / Jim - Having a (physical) backup server in place is a great idea, especially in an environment consisting of multiple identical servers: if one fails, just swap components and you are up and running again in a short while (as Karl points out). The nice thing about virtualization, especially if you have many server "heads" and a networked storage (SAN), is that the server(s) can be virtualized and "hot swapped" so to speak very quickly, much like the physical disks in a SAN enclosure can. All of this you already know.... The key here is your "maximum protection" statement: you state a factor of 10 difference in server-month ratio incidents between the two organizations. Are they both involved in the same or substantially similar businesses? I've been involved in organizations where a degraded RAID array (for example) would be addressed (rebuilt) within a few minutes of discovering the issue, and at places where a degraded array is allowed to run indefinitely, provided the server is up and running at all... Is there a legal or contractual requirement for one company to insure "five nines" uptime, thus mandating that ANY service incident be addressed immediately no matter how minor (think certain DOD requirements)? Has one of the organizations been "burned" in the past by deferred, incomplete or shoddy maintenance practices? Is one company deferring maintenance on minor issues until there is a major / catastrophic system(s) failure? You state that one of the companies is a distributed environment (five locations): are the server maintenance incidents distributed (relatively) evenly across all five locations? Perhaps there is a situation where there is no qualified person to perform the maintenance required, hence dictating the usage of the external vendor for minor issues, such as power cycling a server as in you additional information. If the situation is that there is no IT dept on site, travel expenses could also become an issue...
    2,190 pointsBadges:
    report
  • Jim4522
    Karl, both of these organizations are major financial organizations. Both are reasonably well-managed. Yet one averages a maintenance action every 50 server months of use while the other averages approximately a maintenance action every 500 server months of use. The mix of servers among hardware vendors is about the same and each has turned over maintenance control to one of their server vendors. I have looked for a logical explanation for the difference and have found none. Both maintenance vendors charge a fixed fee per server per year so it is not in the interest of either vendor’s to do more maintenance then is necessary. I first considered three possibilities. One, the IT organization that is experiencing a maintenance action every 500 server months is being under-maintained. Two, the IT organization that is experiencing a maintenance action ever 50 server months is being over-maintained. Three, the 50 server month organization has a serious local environmental problem. It is not the later because you would have to assume that what ever the environmental problem was it was effecting all five server sites, since all sites experience about the same maintenance rate. It is unlikely that the second reason is true because the vendor has no incentive for doing ten times more maintenance then is necessary. The first reason is unlikely because if a vendor continually under maintenances a site the user would understand that its problems are not being solved. So what is the answer? I hate to say it but the data I am looking at would indicate that there is a combination of explanations. First, the IT organization owning the servers is not paying attention to how much maintenance is being done because it doesn't impact their cost. Second, the management of the vendor performing the maintenance is not paying attention because the profits on the agreement are more than adequate. And thirdly, someone working for the vendor is stealing parts and selling them on the open market (eBay, Craig’s list or others) and covering his tracks by documenting maintenance actions that are never really performed. Who would know unless you were in a position as I am to compare two different IT organizations and understands that their numbers are way out of wake (sp?). More later. Jim4522
    820 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following