Posted by: Denny Cherry
While many of the technologies used for HA and DR are similar (or even the same) HA and DR are two different types of events which should be handled differently.
Before we can even begin to discuss the differences we need to make sure that we are talking about the same thing.
HA (or High Availability) is the technique of keeping a service up and running in the event of a system failure or hardware failure of a single server. For example, getting the database back up and running after the motherboard fails. This would be resolved by your HA solution, not your DR solution.
DR (or Disaster Recovery) is the technique of keeping your operations running in the event of a complete loss of a data center. For example, if the fibre lines into your CoLo or Facility were cut, then you would need to implement your DR strategy as your HA strategy wouldn’t do any good as the facility is now off the network.
Now, I’m not saying that you can’t combine the two together to save some cash, and better utilize your DR environment because in this day and age getting the most for your money is key. But make sure that if you are going to to this you choose the correct technology, and implement it correctly for the correct situation.
Clustering and Mirroring are probably the best way to build a top notch HA / DR setup; when combined correctly.
Using Mirroring for your HA, and Clustering for your DR probably isn’t the correct approach to take. If you are clustering for DR, then you’ve got a SAN in place at both ends so you can configure clustering locally. If you want to use Clustering for your DR, go for it, just setup a three node cluster and use it for your HA as well.
If you’re logic for using Mirroring for HA and Clustering for DR is that you want to put your data on two different Storage Arrays at your local site that’s fine, you can do that while clustering. Cluster the SQL Servers normally and use SAN replication to replicate the data between the Storage Arrays over the fibre channel. This will handle the data copying better as the fibre channel is going to be much faster than the Ethernet network the servers will be talking over, and it won’t put any additional load on your CPUs. Now if the SAN happens to go offline (which is very, very rare for a SAN to have an unplanned outage) you can simply mount the storage from the other SAN to the cluster are you are back up and running.
After clustering locally like this you can use SQL Mirroring to get the data to your DR site. Typically Mirroring will have a lower bandwidth cost than the SAN replication which needs to be used for a geographically distributed cluster. This is because the Storage Array will be replicating things at the block level, typically in 64k blocks. So if a single bit is changed on the block, all 64k has to be transmitted. Where is database mirroring only the transaction log entries are moved. And since this would only be the changed data and the row identifiers (as well as some other house keeping data) this will usually be smaller than replicating the entire 64k block. And don’t forget that you need to include overhead from turning the fibre channel data into IP data via a FCIP switch which adds additional overhead into the mix.
With database mirroring you also get the advantage of transactional consistency. Depending on how your SAN vendor handles the replication you could end up with inconsistent database on the remote site as in the event of a real disaster the transmission of blocks could be cut mid stream with only some of the changes being applied to the DR site.
Please don’t take this to mean that I don’t think SAN replication is great, I do. But like all technologies it needs to be used correctly and at the write time. If there’s a better solution out there SAN admins need to suck it up and work with the other IT professionals in the organization to get that better solution implemented.