The error reporting in SQL Server Replication isn’t all that great. This is a well known issue that pretty much everyone knows about. Something that I don’t know if a whole lot of people know about, is that there is a way to get a lot more information from replication about what’s going on, and what’s going wrong.
This is most easily done by running the replication job manually from within a command window on the distributor. This will allow you to add switches or change values as needed and easily see the output, or redirect the output to a file for analysis, sending to Microsoft, your consultant, etc.
Replication is run by jobs, with job steps of some funky types. These step types simply mask what is happening in the background. SQL is shelling out and running a command line app and passing it all the switches as they are within the job step.
All the command line apps which replication uses are in the “C:\Program Files\Microsoft SQL Server\90\COM” folder by default (for SQL 2000 replace the 90 with 80, for SQL 2008 replace the 90 with 100). In that folder you will find a few apps which are of interest. When you run the snapshot job snapshot.exe is called. When you run a distribution job DISTRIB.exe is called (I’ve got no idea why it’s uppercase). When the log reader is running logread.exe is run. When you are running merge replication replmerg.exe is run.
All of these can be run manually from a command prompt. For starters open up the SQL Job and edit step two, the one which actually does the work. Copy all the text in the command window and paste is after the filename in the command prompt window and press enter. You will need to stop the SQL Agent job before you can actually run the command from the command prompt, as replication is designed so that you can only run the commands one at a time.
Now the whole point of this was to get more log info because the replication is failing. This is done by added the “-OutputVerboseLevel” switch to the command. This switch has between 3 and 5 logging levels depending on which command you are running. 0 (zero) is basically no logging, and as the number goes up more data will be shown. The distrib.exe, replmerg.exe and snapshot.exe takes 0-2, while the logread.exe takes 0-4.
You should only do this when replication is failing and you can’t figure out why, and all SQL is telling you is some cryptic error message.
Hopefully you’ll find this information useful.