What kind of failure event would cause a primary server to failover to a standby backup?

820 pts.
Tags:
Failover
Failover Clustering
I am aware that in a situation where a primary server is being backed-up by a standby server that the standby server is constantly testing the pulse beat of the primary, what kind of problem will cause a failover to take place. Is it only when the primary stops functioning, does it have to stop functioning for a time period, or can the testing of the pulse beat sense an intermitant failures? Jim4522

Answer Wiki

Thanks. We'll let you know when a new response is added.

SQL Server Cluster provides both automatic and manual failover
capability for SQL Server services to another node in case the active node is
down. The active node can be down due to an operating system or a hardware
failure in which case the automatic failover to an available node on the same
cluster can happen and the users will start using the application through the
failover node. SQL Server services can be manually moved to a different node
at times when a planned maintenance like operating system upgrade or patch
maintenance is required on the active node.

When you decide you want to cluster SQL Server, you have a choice of configuring what is called Active/Active or an Active/Passive cluster:

*-An Active/Active SQL Server cluster means that SQL Server is running on both nodes of a two-way cluster. Each copy of SQL Server acts independently, and users see two different SQL Servers. If one of the SQL Servers in the cluster should fail, then the failed instance of SQL Server will failover to the remaining server. This means that then both instances of SQL Server will be running on one physical server, instead of two.

As you can imagine, if two instances have to run on one physical server, performance can be affected, especially if the server’s have not been sized appropriately.

*-An Active/Passive SQL Server cluster refers to a SQL Server cluster where only one instance of SQL Server is running on one of the physical servers in the cluster, and the other physical server does nothing, other that waiting to takeover should the primary node should fail.

From a performance perspective, this is the better solution. On the other hand, this option makes less productive use of your physical hardware, which means this solution is more expensive.

Personally, I prefer an Active/Passive configuration as it is easier to set up and administer, and overall it will provide better performance. Assuming you have the budget, this is what I recommend.

*-Two- or Four-Node Clustering?

SQL Server can be clustered using two nodes (using Windows Advanced Server), or it can be clustered using more than two nodes (using Windows Datacenter). Since I don’t personally have any experience is three or four node clustering, I won’t be discussing it here. But for the most part, what I say about two-node clustering also applies to three- or four-node clustering.

*-What is Log Shipping

Essentially, log shipping is the process of automating the backup of database and transaction log files on a production SQL server, and then restoring them onto a standby server. But this is not all. The key feature of log shipping is that is will automatically backup transaction logs throughout the day (for whatever interval you specify) and automatically restore them on the standby server. This in effect keeps the two SQL Servers in “synch”. Should the production server fail, all you have to do is point the users to the new server, and you are all set. Well, its not really that easy, but it comes close if you put enough effort into your log shipping setup.

Discuss This Question: 4  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Technochic
    Also, when you set the cluster up with its shared resources you can decide which resource failures will cause a failover. In the properties of the resource on the advanced tab there is an option to choose that the resource failure will not affect the group and therefore will not cause a failover. We set it this way for our backup software so if the backup service fails the servers do not fail over. So you do have some control of what will cause a failover.
    57,010 pointsBadges:
    report
  • Jim4522
    ITArts and Technochic, that you both for your response to my question Regarding the following in ITArts answer, "The key feature of log shipping is that is will automatically backup transaction logs throughout the day (for whatever interval you specify) and automatically restore them on the standby server." What is a reasonable interval between transactions throughout the day and how does the length of that intervale effect what is or is not lost as a result of the failure of the original active server? What I am trying to determine is does failover between active and passive nodes guarentee that nothing is ever lost, or does it mean that almost nothing is ever lost.
    820 pointsBadges:
    report
  • ITArts
    Failover between active and passive nodes guarentee that nothing is ever lost? Nothing is 100 % , but ,, it's working 4 me till now.
    160 pointsBadges:
    report
  • Jim4522
    ITArts, thank you for your response. I have two questions regarding your latest response: “Failover between active and passive nodes guarantee that nothing is ever lost? Nothing is 100 %”. Question 1. Does that statement apply only to active-passive or does also apply to active-active? Question 2. Is the “nothing is 100%” at the tag end of your statement mean that you are not sure that nothing is ever lost, or are you saying that your definition of “nothing” is 100% which means your really mean nothing is ever lost? My second set of questions deals with your earlier statement in this tread, “The active node can be down due to an operating system or a hardware failure in which case the automatic failover to an available node on the same cluster can happen and the users will start using the application through the failover node” Question 3. Does this mean that failover only applies to operating system failures and hardware failures, but not to application failures or network failures? Question 4. Does it also apply to “firmware” failures? Question 5. It is my understanding that whether you are using active-passive or active-active, failing over on the basis of a hardware failure makes sense because you are moving to a different hardware device, but why does it make sense for an operating system failure since you are moving presumably to the same version of the operating system although a different copy of that version that just failed. It sounds to me that you could accomplish the same fix by rebooting the failing node. Is it just a matter that you save the downtime involved in the reboot? Question 6. You indicated you failover preference is active-passive as opposed to active-active, in a pool of 10,000 servers is not active-active a very expensive solution? Question 7. If you knew that approximately 50% of all hardware failures on servers were found to be “no trouble found” after running extensive diagnostic tests would this concern you? I ask this question because it seems to me that the faster IT can move away from failures that are really not failures by using this failover facility, the less IT cares if the failures really happened or not. That could mean that 50% of all maintenance calls are accomplishing nothing yet are being paid for. Then add the fact that in an active-passive environment you have also increased the maintenance cost by 50%.
    820 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following