I just got a job as the messenging engineer for a large firm. I've used Exchange for ten years, but now I'm going to have to start specializing in it. I quickly had to get up to speed, so I was curious about the same things you are, and how they implemented them here.
First things first, you have to start thinking a little more proactive in your disaster recovery approach. Yeah, you should be prepared for disaster, but the #1 priority is keeping everything up and running. Those smaller fires will happen a lot more often than a severe disaster.
They use Microsoft Clustering here, which i was unfamiliar with. When there's a failure, we're down for less than 30 seconds while the cluster switches to another node. Without clustering, you'd be looking at an extreme problem if the hardware on your server went down without clustering.
Next, is how you handle restoring single mailboxes and messages, so you never have to deal with restoring an entire information store just for a single mailbox. We use Commvault, and restoring deleted emails couldn't be easier.
I actually haven't spent much time worrying about the MAJOR problems just yet, like what we'd do if the server room flooded. Many people will say building a lab server and restoring a copy of your production information store, and running eseutil against it to fix corruption problems, etc. I've yet to start testing this, but I'm sure it's important, considering defraging and checking the database is part of normal maintainence anyways.
Your options also depend on what version of Exchange you are using. LCR and CCR are only available in Exchange 2007. If you are using another version please provide those details so we can make more targeted suggestions. Thanks