Posted by: Roger Crawford
Active Directory Restore, USN Rollback, Windows 2003 DC Restore, Windows 2003 Server
Well we had a DC die on us Friday in our office and this was also the FSMO Master of the domain plus it had the Enterprise CA on it. The DC was brought up Virtually on our BDR Device but in the whole process the DC went into USN Rollback meaning the version of AD on the DC was different or a older version than the rest of the DC’s in the domain but things seemed to be working but they was not right. Not good here well I got those emails from our Network Admin that something was amiss in the domain and could I look at it.
So as I dug through the problems in the event logs on the DC in question and on the other DC’s I keep finding that the error was that the servers was not replicating. I also found when I run the repadmin /showrep I get the error of IS_GC DISABLE_INBOUND_REPL DISABLE_OUTBOUND_REPL and also when you ran the dcdiag command you got the same thing and this had the commands to run to try and correct the problem which I ran. But it was not long and this error was showing up again when running the commands. I also found the netlogon service at a paused state this is another sign on USN Rollback problems. Basically all the DC’s had went into a mode of not allowing the replication to happen because of the old data from the bad DC.
I searched and finally came to the conclusion that we was going to have to demote the server down and bring it back on the domain as a member server. I called Randy and gave the bad news to him and just what he wanted to hear on a Saturday evening at 10 and we came up with a plan. We would use the ntdsutil to clean AD on the DC’s to seize the rolls and then get AD cleaned of the bad DC. I had went changed the main DNS IP on a lot of the servers in the network to the soon to be new FSMO master and then verified DNS was still working.
Randy had restored the server on to a physical machine and had it shutdown. We paused the Virtual DC with issues and then brought up the restored server on the physical machine with it plugged into a switch by itself and did a dcpromo /forceremoval to clean AD off the server. As this was being done I seized the roles onto the new FSMO master server and cleaned AD Sites and Services of the bad DC and also cleaned up any remnants of the bad DC out of DNS. This got AD straightened out and replicating to all sites like they should be and DNS functioning the way it should be.
When restoring a DC either into a Virtual Environment or on a Physical machine there is some steps you need to do before you bring it back online in the domain. Here they are this holds true for a Virtual Server or a Physical Server as we move more towards Virtual Servers this is something that really needs to be watch or you will run into this
Procedure for using the recovery option:
- “Restore” the image
- !!! Boot into DSRM !!! (not connected to the network)
- Note the value of “DSA Previous Restore Count”
(HKLM\System\CurrentControlSet\Services\NTDS\Parameters) (Not visible? –> Assume value of 0)
- Add the entry “Database restored from backup” (DWORD) with a value of 1
(HKLM\System\CurrentControlSet\Services\NTDS\Parameters) (This triggers the actions needed for AD right after a system state restore!)
- Stop the “File Replication Service (NTFRS)” and assign the value “D4” (for auth. or primary restore) or “D2” (for an non-auth. restore) to the entry “BurFlags” in (HKLM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup)
(This triggers the actions needed for the SYSVOL right after a system state restore!) (and other replicated DFS namespaces!)
(also see: Using the BurFlags registry key to reinitialize File Replication Service replica sets – http://support.microsoft.com/?id=290762)
- Boot into normal DC mode (not connected to the network)
- Check the value of “DSA Previous Restore Count”
(HKLM\System\CurrentControlSet\Services\NTDS\Parameters) (New value = old value + 1)
- In the DS event log check for event ID 1109
- In the FRS event log check for event ID 13565 & 13520 if a non-auth. restore was performed for the SYSVOL
- In the FRS event log check for event ID 13566 if an auth. restore was performed for the SYSVOL
- Connect to the network again
- Check the health of the DC (AD & SYSVOL)
- DCDIAG /D /C /V
- NETDIAG /DEBUG /V
- GPOTOOL.EXE /CHECKACL /VERBOSE
- REPADMIN.EXE /SHOWUTDVEC <FQDN DC> <NC>
More on the Enterprise CA next and what we had to do to bring that back.
Til later just Roger