Posted by: Colin Smith
Disaster, Recovery, Systems Administration
We had a bit of fun today. The AC units in our secondary data center went out. This, as you can imagine, caused a bit of a panic. We have some DR stuff, Test, Dev, and some Production that runs from this facility and we certainly do not want to lose any hardware. We were in a mad rush to determine what we could shutdown and what had to stay up. We were able to get about 60% of the hosts at that location shut down before the temperature became hot enough to cause any damage. About 10 minutes after we started shutting down they were able to get the AC back on. It was out for less then 1 hour and the temperature climbed up to over 100 in the room. Within 30 minutes of it coming back up we were back down into the mid 80′s. Close call but it appears that we have made it out OK this time. I think we will leave most servers down until tomorrow just to make sure the AC is dependable overnight. All I can say is it is a good thing we have alarms on in that room. Just make sure that you do as well.