A random thought about failovers and reboots

820 pts.
Tags:
Failover
Reboot
I had the opportunity to exam tens of thousands of problems in multiple server farms and was struck by how many were resolved by rebooting. I then examined what happened to those servers later and found many repeatedly experienced the same failure until finally someone took the problem seriously and found a real solution.

Based on this experience I began to wonder if reboots were more likely to encourage managers to kick the problem down the road, rather than resolve it.

I now wonder if failovers are going to make what I observed even worse. At least with reboots it took time and therefore a user might be tempted to ensure that whatever the problem was it didn’t repeat itself. But to me a failover is a reboot without the time factor, so even more problems get kicked down the road.

I know that failover looks different because you are moving from one device to another, but since most reboots seemed to solve the problem on the same device why wouldn’t it also seem to solve the problem by being moved to a new device. My own feeling is that the inability to diagnose failures on servers has led to a solution of "when in doubt…reboot". And now the new idea is, "when in doubt…failover". Any thoughts or comments. Jim4522

Answer Wiki

Thanks. We'll let you know when a new response is added.

<b><i>”…a failover is a reboot without the time factor, so even more problems get kicked down the road.”</i></b>

Since downtime usually means money, it makes a lot of sense to replace reboots with failover and reduce downtime to 0.

The cost would be too high if every incident needed to be investigated in real time before bringing the system back to service, so when time is crucial, it makes sense to find the way to reduce downtime, and many times that way includes a reboot. However, the incident should be investigated further until the root cause is found and corrected, <b>but this depends on people</b>, and unfortunately, many times it is not done.

And I agree, failover <b>could </b>make this worst if people don’t do their jobs responsibly.

-CarlosDL

————

Discuss This Question: 1  Reply

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Jim4522
    CarlosDL, it is interesting when you mention downtime. I traditionally think of downtime as applying to hardware, as in the time the hardware is down is the time from failure to the time the hardware is functioning again, But in your answer you are referring to the downtime of the application not the hardware. And that makes sense because when someone refers to the high cost of downtime they are not referring to the hardware being down which is only a problem for IT, but how long the application is down, since that is where the potential of real expense lies. Jim4522
    820 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following