Well today is the one year birthday of my blog. It’s been a very fun and exciting year. Here’s hoping that the next year will be just as fun.
The company I work for just finished our office move into our newer slightly larger office. I’ve managed to snag myself a Window seat this time around. I just wanted to share the view with everyone. This is the view out my desk window.
Some times its great to live in Southern California.
Orange County now has its own SQL Server User Group. It is being run as a part of the Orange County .NET Users Group. The first speaker was my friend Lynn Langit last month on August 14th. I’ve been asked to be the next speaker on October 16th. The meeting starts at 6pm and runs until 9pm. The meeting is being held at New Horizons in Anaheim at 1900 S. State College Blvd near the corner of Katella Ave.
I’ll be talking about Clustering SQL Server. It is going to be similar to the session about Clustering SQL Server which I’ve given at the prior SoCal Code Camps. If you would like a preview of the slide deck it is available for download. If I change the slide deck any I’ll update the version available for download.
If you are not able to make it to the session feel free to post your questions below and I’ll answer them as quickly as I can.
With SQL Injection Attacks being all the rage these days, I’ve been asked a couple of times for T/SQL code to clean up the database.
So I threw this code together to clean up the data. This code will clean all the character and uni-code columns in all the user defined tables in the system. You’ll need to be dbo or sysadmin to run this without error. If you have TEXT or NTEXT columns it will through an error for those columns. Cleaning TEXT and NTEXT columns is a little more complex as you can’t use the REPLACE function on a TEXT or NTEXT datatype.
DECLARE @sql NVARCHAR(4000) DECLARE @InsertedValue NVARCHAR(1000) SET @InsertedValue = 'The Script tags which were inserted' DECLARE cur CURSOR FOR select 'update [' + sysusers.name + '].[' + sysobjects.name + '] set [' + syscolumns.name + '] = replace([' + syscolumns.name + '], ''' + @InsertedValue + ''', '''')' from syscolumns join sysobjects on syscolumns.id = sysobjects.id and sysobjects.xtype = 'U' join sysusers on sysobjects.uid = sysusers.uid where syscolumns.xtype in (35, 98, 99, 167, 175, 231, 239, 241, 231) OPEN cur FETCH NEXT FROM cur INTO @sql WHILE @@FETCH_STATUS = 0 BEGIN exec (@sql) FETCH NEXT FROM cur INTO @sql END CLOSE cur DEALLOCATE cur
Hopefully you find this useful. If you need code for TEXT or NTEXT columns just post a comment and I’ll throw something together.
This code will work on SQL 2000 and up (it’ll probably work on SQL 7 as well, but I don’t have a SQL 7 machine to test against).
This morning I released another tip on SearchSQLServer.com called “SQL Server consolidation: Why it’s an optimization technique“. It’s a quick overview on consolidating SQL Server Instances onto a single server along with some of the gotchas.
When dealing with a high load replication environment network latency and network traffic can become as issue. There are a few settings you can play with, within the subscription agents themselves such as the –PacketSize, -UseInprocLoader, and –SubscriptionStreams. While working with these settings may or may not help (personally I’ve had very limited success getting them to improve the data flow rate) there are some other techniques you can use.
The best technique to you is to include your stored procedures as articles within the publication. This will allow SQL Server to instead of replicating the entire record from one server to another, it will simply run the stored procedure on the subscriber with the same input settings as it was run with on the publisher. If you have very complex stored procedures, you’ll want to evaluate the increase load on the subscribers CPU against the reduction of network traffic between the distributor and subscriber.
This technique is critical to use to reduce the network traffic if you have a multi-tier replication topology. This is where your subscribers are also publishers to subscribers of their own. This is because of the way that the replication writes data to the subscriber. Replication uses stored procedures on the subscriber to update the values of the record. Assuming a table which looks like this:
CREATE TABLE Employee
(EmployeeId INT IDENTITY(1,1),
The stored procedure which Microsoft uses for replication will look like this.
CREATE PROCEDURE MS_Employee_Update
SET EmployeeId = CASE WHEN @Columns & 1 = 1 THEN @EmployeId ELSE EmployeeId END,
LastName = CASE WHEN @Columns & 2 = 2 THEN @LastName ELSE LastName END,
FirstName = CASE WHEN @Columns & 4 = 4 THEN @FirstName ELSE FirstName END,
DepartmentId = CASE WHEN @Columns & 8 = 8 THEN @DepartmentId ELSE DepartmentId END
WHERE EmployeeId = @KeyField
While this technique works great for a two tier replication topology, we suddenly have a problem when we try to replicate from the second tier to the third tier. On the second tier all the columns have been updated, no matter which columns have actually been updated. This will cause an undo amount of replication traffic between the second and third tier servers as every column must now be replicated instead of just the values which have changed.
When you replicate the stored procedures as well as the tables when the stored procedure is fired to update the data on the publisher, that same stored procedure will be fired on the subscriber thereby reducing the amount of data which needs to be sent from the second tier to the third tier as again only the stored procedure execution is replicated. On very wide tables this reduction of data can reduce your network traffic load up to 70% (or higher depending on the data being changed and the width of the table).
You can verify that SQL Server is indeed using the stored procedures by using SQL Profiler to profile the distribution process on the subscriber. If you see your stored procedure names being fired then it’s using this technique. If you see procedures named by Microsoft, then it is not.
This technique will only work, if your application has no inline SQL. If you use a combination of stored procedures and inline SQL within the application, the data which is changed by stored procedure execution will be replicated using this technique; while the data which is changed by the inline code will be replicated using the traditional technique. If you have dynamic SQL within your stored procedures the SQL Server will not be able to use the replicated procedure to update the data on the subscriber. The dynamic SQL will prevent the SQL Server from recognizing that the stored procedure is what caused the data change, and the data will be replicated using the normal technique.
If you have a very high load on your database, but no specifically massive tables you may want to look into a fully load balanced solution for your database. This solution falls somewhere between your normal replication solution, and a federation solution. It’s not truly a federation as no one table is spread across several servers; every server in the federation holds every record of every table.
When laid out on paper this type of replication looks very similar to the Pyramid Federation technique we talked about earlier. This type of setup is ideal for an OLTP environment. We use transactional replication to move the data from one server to another as quickly as possible (you’ll want a pretty fast distributor to handle this). Because all writes are done to a single server (to prevent any potential identity column issues) this solution requires the most code change at the application layer as all commands which write data to the database must go to one connection string which talks to the publisher, which all the reads go to the load balancer which reads data from the federation of subscribers.
Because the publisher and distributor are both single points of failure in this setup, it’s recommended that they both be clustered so that they can survive a hardware failure. There is no need to cluster the subscribers as they are redundant by the fact that there are several of them behind a load balancer.
This concludes my mini-series on database federation. I hope that you have found it useful. As always questions or comments are welcome in the comments section below. (There’s no alerts when a response it posted, it’s something that I’ve already brought up with the ITKE staff, so check back for a response.)
Another type of database federation is what I call the full replication federation. This is where you place all the dimension tables (sticking with our data warehouse example from last time) on all servers of the federation. In addition to having the dimension tables on all the servers in the federation, we also allow all the users to connect to all the servers in the federation. This effectively creates an Active/Active solution as users should be connecting to the SQL Servers through a load balancer. As the dimensions are going to be read only as far as the users are concerned it doesn’t matter which server they connect to.
I call this the full replication federation as we setup replication on all tables which aren’t our large table which has been federated.
As we are connecting to all the servers, we can’t have our view and table named the same thing. In this case we want to have our actual table and view with different names. I prefer to simply use a different schema to hide the table where I want it. This changes our view to look more like this (using a three server federation).
CREATE VIEW dbo.FACT_Sales AS
I like to put the local database name in the view script, so that the same script can be easily deployed to each server. You can at your discretion remove the local server and database name.
You can now query the Data.FACT_Sales table on all three servers by simply querying the view on the local table.
You may end up with some of the same “interesting” optimizer query plans as when using the Pyramid federation technique, and the same solutions which we discussed in the “The Pyramid Federation ” post will still apply.
I’d like to say thanks to the San Diego SQL Server Users Group for inviting me to speak last night.
I had a great time speaking to the group, and just like last time the questions were all excellent.
The slide deck for last night session on Federated Databases is now available. I believe that it will also be made available on the San Diego SQL Server Users Group website.
For those that missed the session, it is one of the sessions which I’m giving at the SoCal Code Camp on October 25 and 26 at USC in Los Angeles. Based on the time the presentation took last night, I’ll be expanding it a bit to better fill the two hours I’ve allocated for it at the Code Camp. If you are going to attend the Code Camp, be sure to mark the interested check box after you register so that the Code Camp staff knows how large of a room to put all the session in.