How could automated failover ever be justified?

820 pts.
Start with the fact that the least reliable server has only a 2% chance of failing during its reasonable life time of 4 years. If I understand the argument for MS’s failover facility I can ensure that 2% probability can be reduced [strong]by an unknown percentage[/strong] if I am willing to have a standby backup server. But the backup server is going to cost me as much as the primary server, same hardware cost, software cost, cooling cost, power cost, footprint cost, maintenance cost, administration cost. Now apparently this total cost of backup can be justified because of the importance of what the primary server is doing, but of course we have no way of really computing what the dollar cost of that 2% chance (which means there is a 98% chance the backup will never have to be used) that the primary will ever fail. [strong]And more importantly[/strong] we recognize that backup server can’t protect the primary server if the power in the data center fails, or if the wrong cable is removed by accident, or if the primary system is the victim of a denial of service attack, or if operations does something really dumb, or if MS decided that the software that controls this failover system needs to be updated or patched which would require a reboot which would occur on both the primary and the backup at the same time because the whole idea of the back up is that it is identical to the primary, or if any of the software running on the primary or backup systems need to be updated or patched and requires a reboot, or if some tech type in IT decides he hates his job or his boss and decides to stop both systems to prove how much IT will miss that tech type after he quits. Jim4522

Answer Wiki

Thanks. We'll let you know when a new response is added.

Typically when you are setting up an Active/Passive cluster, the only software license cost you have is for the Windows OS. So for example if it is a SQL Server, you only need to license one copy of SQL Server.

When patching is done correctly both nodes of the cluster shouldn’t ever go down at the same time as you should patch each one separately and not patch another node until the first node comes back online.

An additional point to factor in is the loss of productivity while everyone works to get one server restored to working condition verses having the other server take over and losing no productivity while the first server is brought back to service. That is what typically ultimately decides whether it is worth it.

Discuss This Question: 1  Reply

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • Jim4522
    Technochic, I understand the value of getting the application up quickly and not waiting for the failed server to be fixed, but doesn't that happen both when I failover to an inactive standby server, or if I failover to an active server in a cluster? It seems to me that if I am using a standby backup server that is accomplishing nothing unless the primary server fails and my primary server has only a 2% chance of failing in its 4 year use life, and if I use this type of backup for a large number of servers then I am paying to have a lot of backup standby servers that may never do anything in 4 years. Jim 4522
    820 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.


Share this item with your network: