Posted by: Denny Cherry
Network Latency, Network Traffic, Replication, SQL Server
When dealing with a high load replication environment network latency and network traffic can become as issue. There are a few settings you can play with, within the subscription agents themselves such as the –PacketSize, -UseInprocLoader, and –SubscriptionStreams. While working with these settings may or may not help (personally I’ve had very limited success getting them to improve the data flow rate) there are some other techniques you can use.
The best technique to you is to include your stored procedures as articles within the publication. This will allow SQL Server to instead of replicating the entire record from one server to another, it will simply run the stored procedure on the subscriber with the same input settings as it was run with on the publisher. If you have very complex stored procedures, you’ll want to evaluate the increase load on the subscribers CPU against the reduction of network traffic between the distributor and subscriber.
This technique is critical to use to reduce the network traffic if you have a multi-tier replication topology. This is where your subscribers are also publishers to subscribers of their own. This is because of the way that the replication writes data to the subscriber. Replication uses stored procedures on the subscriber to update the values of the record. Assuming a table which looks like this:
CREATE TABLE Employee
(EmployeeId INT IDENTITY(1,1),
The stored procedure which Microsoft uses for replication will look like this.
CREATE PROCEDURE MS_Employee_Update
SET EmployeeId = CASE WHEN @Columns & 1 = 1 THEN @EmployeId ELSE EmployeeId END,
LastName = CASE WHEN @Columns & 2 = 2 THEN @LastName ELSE LastName END,
FirstName = CASE WHEN @Columns & 4 = 4 THEN @FirstName ELSE FirstName END,
DepartmentId = CASE WHEN @Columns & 8 = 8 THEN @DepartmentId ELSE DepartmentId END
WHERE EmployeeId = @KeyField
While this technique works great for a two tier replication topology, we suddenly have a problem when we try to replicate from the second tier to the third tier. On the second tier all the columns have been updated, no matter which columns have actually been updated. This will cause an undo amount of replication traffic between the second and third tier servers as every column must now be replicated instead of just the values which have changed.
When you replicate the stored procedures as well as the tables when the stored procedure is fired to update the data on the publisher, that same stored procedure will be fired on the subscriber thereby reducing the amount of data which needs to be sent from the second tier to the third tier as again only the stored procedure execution is replicated. On very wide tables this reduction of data can reduce your network traffic load up to 70% (or higher depending on the data being changed and the width of the table).
You can verify that SQL Server is indeed using the stored procedures by using SQL Profiler to profile the distribution process on the subscriber. If you see your stored procedure names being fired then it’s using this technique. If you see procedures named by Microsoft, then it is not.
This technique will only work, if your application has no inline SQL. If you use a combination of stored procedures and inline SQL within the application, the data which is changed by stored procedure execution will be replicated using this technique; while the data which is changed by the inline code will be replicated using the traditional technique. If you have dynamic SQL within your stored procedures the SQL Server will not be able to use the replicated procedure to update the data on the subscriber. The dynamic SQL will prevent the SQL Server from recognizing that the stored procedure is what caused the data change, and the data will be replicated using the normal technique.