SQL Server with Mr. Denny

Jan 28 2015   5:00PM GMT

Random Unexplained Replication Job Failure in a large SQL Server Replication Topoligy

Denny Cherry Denny Cherry Profile: Denny Cherry

Tags:
Data Replication
Database replication
SQL
SQL replication
SQL Server
SQL Server 2005
SQL Server 2008
SQL Server 2008 R2
SQL Server 2012
SQL Server 2014
SQL Server replication

One of the environments that I work with includes a very large replication topology. While there are only a few servers involved there are dozens of publications between those servers. One of the problems that replication has is that it can only run so many agents at a time.

Now remember that SQL Server replication agents are just SQL Server Agent jobs. And while there are command line applications on the distributor (and subscriber if you are doing pull replication) that’s not quite how SQL Server replication works. You see, when you setup SQL Server replication there are custom job types for the various replication tasks. These custom tasks tell the SQL Server Agent to call custom DLLs which then do some magic and do the same thing as the command line applications which sit on the servers hard drive. At least that’s the theory.

Now on smaller topologies this all works just fine. Everything is happy, the jobs all run as expected. But on larger topologies you start getting some “strange” behavior from these jobs. They’ll randomly throw error messages. Or the agents will say that they are running, but they actually aren’t. The problem is something to do with the DLL that you are running as part of the SQL Agent job.

Thankfully, the trick to fixing this is actually pretty simple. Simply open up the properties for the SQL Agent job which is running the replication agent, and edit step 2 of the job. Change the type from whatever the replication operation is, to “Operating system (CmdExec)”. Then edit the command. Before the batch of parameters in there you want to add the full command line path to the replication agent that you are trying to run. By default this will be “c:\program files\Microsoft SQL Server\120\com\” for SQL Server 2014. For older versions of SQL Server just change the 120 to the version number 110 for SQL 2012, 100 for SQL 2008, 90 for SQL 2005 and 80 for SQL 2000. Then put the name of the executable that you need to use, you’ll find the list of applications listed below.

Application Name Replication Agent
snapshot.exe Snapshot Agent
logread.exe Load Reader Agent
distrib.exe Distribution Agent

When editing the SQL Agent job, if there’s spaces in the path to the executable, you’ll need to put double quotes around the full path and executable, but not the parameters as shown below.

“c:\program files\Microsoft SQL Server\120\com\snapshot.exe” -Publisher [Server1] -PublisherDB [MyDB] -Distributor [Server2] -Publication [MyPublication] -DistributorSecurityMode 1

You shouldn’t need to make any changes to the parameters. Just save and close the SQL Agent job, then restart the replication agent either from the replication monitor or by manually starting the SQL Agent job and everything will just keep on cranking along as expected.

Denny

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: