Posted by: Colin Smith
corruption, DBA, DR, On-Call, Planning, SQL Administration
Just the other morning at 3:00 AM I get a page, Love being on call, and I see some errors about MSDB. Well this concerns me but it is 3:00 AM and my head is not really on straight. I doze back to sleep, in ten more minutes I get two pages, This time I get errors about MSDB again and I also get another page telling me that the instance that is associated with this MSDB is down.
Well now I get up, I know that this instance is a production instance or I would not be getting pages at 3:00 AM. I know this because I wrote all of our monitoring scripts. (I love Powershell) So I drag out of bed and go to my computer log in and what do I see. Errors saying that the solution may be to restore MSDB from a previous backup. Well I know that I back MSDB up but I honestly have never attempted to restore MSDB. I will be doing that soon though. Anyway, I start researching and I find that usually restarting the SQL Server will repair the issue and if not then restore. Well the service was already not running so all I did was started it. Everything came up fine but I was not really convinced yet so I did a dbcc checkdb and found that all actually did seem to be well.
I then wiped the sweat off my brow and continued to look for a reason that this may have occured and why the instance shut down. I have found nothing but I have had no issue with the server from that point until now. I would really like to know what went wrong but I can not find anything in the SQL Server logs or the Windows Logs. If anyone knows where I should look I would appreciate it.
That was a good experience for me, now I really understand that I do need to back up my system databases as well as verify that I can restore them. I would also like to set up a server and start simulating Database Corruption and work on how to recover. Document it and have a plan just in case.