SQL Server with Mr. Denny:

Federated Database

Oct 25 2008   11:00AM GMT

Slide Decks and Sample Code for SoCal Code Camp at USC



Posted by: mrdenny
Storage, In Person Events, Back To Basics, SoCal Code Camp, Federated Database

Here are the slide decks and sample code from my sessions at this weekends SoCal code camp.

Back To Basics; Getting Back To The Basics of SQL Server
Scaling that database bigger than ever
Storage for the DBA

While the first two are both two part sessions, there is only one download for both halves as they run together.

Denny

Sep 25 2008   11:00AM GMT

A Load Balanced Federated Database Solution



Posted by: mrdenny
Federated Database, Scaling Out

If you have a very high load on your database, but no specifically massive tables you may want to look into a fully load balanced solution for your database. This solution falls somewhere between your normal replication solution, and a federation solution. It’s not truly a federation as no one table is spread across several servers; every server in the federation holds every record of every table.

When laid out on paper this type of replication looks very similar to the Pyramid Federation technique we talked about earlier. This type of setup is ideal for an OLTP environment. We use transactional replication to move the data from one server to another as quickly as possible (you’ll want a pretty fast distributor to handle this). Because all writes are done to a single server (to prevent any potential identity column issues) this solution requires the most code change at the application layer as all commands which write data to the database must go to one connection string which talks to the publisher, which all the reads go to the load balancer which reads data from the federation of subscribers.

Because the publisher and distributor are both single points of failure in this setup, it’s recommended that they both be clustered so that they can survive a hardware failure. There is no need to cluster the subscribers as they are redundant by the fact that there are several of them behind a load balancer.

This concludes my mini-series on database federation. I hope that you have found it useful. As always questions or comments are welcome in the comments section below. (There’s no alerts when a response it posted, it’s something that I’ve already brought up with the ITKE staff, so check back for a response.)

Denny


Sep 22 2008   11:00AM GMT

The full replication federation



Posted by: mrdenny
Federated Database, Scaling Out

Another type of database federation is what I call the full replication federation. This is where you place all the dimension tables (sticking with our data warehouse example from last time) on all servers of the federation. In addition to having the dimension tables on all the servers in the federation, we also allow all the users to connect to all the servers in the federation. This effectively creates an Active/Active solution as users should be connecting to the SQL Servers through a load balancer. As the dimensions are going to be read only as far as the users are concerned it doesn’t matter which server they connect to.

I call this the full replication federation as we setup replication on all tables which aren’t our large table which has been federated.

As we are connecting to all the servers, we can’t have our view and table named the same thing. In this case we want to have our actual table and view with different names. I prefer to simply use a different schema to hide the table where I want it. This changes our view to look more like this (using a three server federation).

CREATE VIEW dbo.FACT_Sales AS
SELECT *
FROM SQL0.MyDataWarehouse.Data.FACT_Sales
UNION ALL
SELECT *
FROM SQL1.MyDataWarehouse.Data.FACT_Sales
UNION ALL
SELECT *
FROM SQL2.MyDataWarehouse.Data.FACT_Sales
GO

I like to put the local database name in the view script, so that the same script can be easily deployed to each server. You can at your discretion remove the local server and database name.

You can now query the Data.FACT_Sales table on all three servers by simply querying the view on the local table.

You may end up with some of the same “interesting” optimizer query plans as when using the Pyramid federation technique, and the same solutions which we discussed in the “The Pyramid Federation ” post will still apply.

Denny


Sep 19 2008   7:04PM GMT

Great Turnout At the San Diego SQL Server Users Group Last Night



Posted by: mrdenny
SQL, In Person Events, SoCal Code Camp, Federated Database, San Diego SSUG

I’d like to say thanks to the San Diego SQL Server Users Group for inviting me to speak last night.

I had a great time speaking to the group, and just like last time the questions were all excellent.

The slide deck for last night session on Federated Databases is now available. I believe that it will also be made available on the San Diego SQL Server Users Group website.

For those that missed the session, it is one of the sessions which I’m giving at the SoCal Code Camp on October 25 and 26 at USC in Los Angeles.  Based on the time the presentation took last night, I’ll be expanding it a bit to better fill the two hours I’ve allocated for it at the Code Camp.  If you are going to attend the Code Camp, be sure to mark the interested check box after you register so that the Code Camp staff knows how large of a room to put all the session in.

Denny


Sep 18 2008   11:00AM GMT

The Pyramid Federation



Posted by: mrdenny
Federated Database, Scaling Out

There are several techniques which can be used to federate your database. The first of which, which we will be talking about is the Pyramid Federation (I have no idea if that what it’s actually called, but that what I’ve named it). In a pyramid federation we have a single server which holds the bulk of the tables. Then a set of servers sits beneath this server holding the large table which has been spread over the federation. Normally no data is replicated between all these servers, however data can be replicated if this will improve query performance. That’s a decision which you’ll have to make depending on your system design and platform load.

This type of system is great, if you have just a few tables which you need to federate because of their size, and length of time to return queries issued against them.
The basic layout of the system is that we have a single front end server which holds all our other tables. Using a data warehouse as an example, we keep all out dimensions on this front end server. While our large fact tables and then spread of the 3 servers which make up our backend system. While a user could connect to any of the four servers in the system, the only server which all the data will be available from is the front end server which holds the dimensions. The servers don’t need to know that the table is spread out across three physical servers, as they will query a few as they normally would a single table on the system.

In our example our head server will be SQL_Main, and our three back end servers will be SQL0, SQL1 and SQL2. The table which we have spread over the federation is called FACT_Sales, and we have designed it to hold many, many years worth of sales data totaling in the several billions of rows. Each of the SQL0-SQL2 servers will hold 1/3 of the data for the table. We use the MOD (%) function to decide which SQL Server the data is stored on. (We’ll cover this later, I promise.)

On our SQL_Main server we have a view called FACT_Sales. This view will be setup something like this.

CREATE VIEW FACT_Sales
AS
SELECT *
FROM SQL0.MyDataWarehouse.dbo.FACT_Sales
UNION ALL
SELECT *
FROM SQL1.MyDataWarehouse.dbo.FACT_Sales
UNION ALL
SELECT *
FROM SQL2.MyDataWarehouse.dbo.FACT_Sales
GO

As you can see from the view definition, the view the fairly simple, we simply query the three remote servers for all the data, matching whatever parameters we pass to the view when we call it. When we create the FACT_Sales tables on the SQL0-SQL2 servers an additional column should be created. As we are using the SalesId value (which is populated by our sales system, not the data warehouse) to figure out which server the row should be stored on, we place a SalesMod column on the table. We will also place a constraint on this column so that the table on SQL0 can only have a SalesMod value of 0, and the table on SQL1 can only have a SalesMod value of 1, and the table on SQL2 can only have a SalesMod value of 2. Loading the data can be done in two ways.
1. The first option is to simply bulk load all the data into the FACT_Sales view, and let the SQL Servers figure out where everything needs to go. This technique will work just fine for smaller sets of data. Just make sure to include a column as part of the select from the sales system(s) which has the formula of SalesId%2, this will give you the value of the SalesMod column to split the data between servers.
2. The second option is to split the data into three select statements through our ETL process and load each of the backend servers separately. If you have a larger amount of data to process this may be faster as there is one less server processing the data, and therefore one less network hop to work with. In addition when using the first technique all data must be written through the linked servers to the backend database servers, and linked servers are not the most efficient way to move a large amount of data.

When using this technique to federate your database, you must be very careful with your queries. You may find that if your dimensions are large, and you are using the dimensions to filter your data, you can end up with some extremely inefficient queries. If this happens you may wish to replicate some of the dimensions from the SQL_Main server to the three back end servers, and reference these replicated dimensions in your query. This will make your query much more complex, but if some correctly if can help the SQL optimizer make much more effective decisions. An example query could be using the DIM_DateTime to filter your records.

SELECT *
FROM FACT_SalesData
JOIN DIM_DateTime on FACT_SalesData.DateTimeId = DIM_DateTime.DateTimeId
AND DIM_DateTime.Year = 2006

This could, under some circumstances, cause the SQL Optimizer to make some “interesting” decisions. A more effective query plan could result from a query something like this. Adjusting the indexes of the FACT tables will usually resolve this issue, however in some cases it may not. SQL Profiler will be your best friend when attempting to resolve these issues, as it will allow you to see exactly what commands the SQL Server you are connected to is sending to the remote server.


SELECT *
FROM FACT_SalesData
LEFT OUTER JOIN SQL0.MyDataWarehouse.dbo.DIM_DateTime d0 ON FACT_SalesData.DateTimeId = d0.DateTimeId
AND d0.Year = 2006
AND FACT_SalesData.SalesMod = 0
LEFT OUTER JOIN SQL1.MyDataWarehouse.dbo.DIM_DateTime d1 ON FACT_SalesData.DateTimeId = d1.DateTimeId
AND d1.Year = 2006
AND FACT_SalesData.SalesMod = 1
LEFT OUTER JOIN SQL2.MyDataWarehouse.dbo.DIM_DateTime d2 ON FACT_SalesData.DateTimeId = d2.DateTimeId
AND d2.Year = 2006
AND FACT_Salesdata.SalesMod = 2

By joining to all three servers DateTime dimension, and specifying that it should join to the local servers FACT_SalesData values only SQL Server should restrict the queries to the local server, and return the subset of data that we are looking for. It may however be necessary to manually break up the queries against each server within their own UNION ALL blocks.

SELECT *
FROM SQL0.MyDataWarehouse.dbo.FACT_SalesData FACT_SalesData
LEFT OUTER JOIN SQL0.MyDataWarehouse.dbo.DIM_DateTime DIM_DateTime ON FACT_SalesData.DateTimeId = DIM_DateTime.DateTimeId
AND d0.Year = 2006
UNION ALL
SELECT *
FROM SQL1.MyDataWarehouse.dbo.FACT_SalesData FACT_SalesData
LEFT OUTER JOIN SQL1.MyDataWarehouse.dbo.DIM_DateTime DIM_DateTime ON FACT_SalesData.DateTimeId = DIM_DateTime.DateTimeId
AND d0.Year = 2006
UNION ALL
SELECT *
FROM SQL2.MyDataWarehouse.dbo.FACT_SalesData FACT_SalesData
LEFT OUTER JOIN SQL2.MyDataWarehouse.dbo.DIM_DateTime DIM_DateTime ON FACT_SalesData.DateTimeId = DIM_DateTime.DateTimeId
AND d0.Year = 2006

Be sure to use the UNION ALL clause, and not the UNION clause so that the head SQL Server doesn’t try and do a distinct on these values. They are all going to be distinct between each server, and the MOD will be different on each one.

Depending on each situation you’ll need to make some decisions on which query technique gives you the best performance based on your specific data layout and dimension size. Different queries in your environment may have different query requirements.

When working with a federated database platform it is especially important to have an experienced query writer writing the bulk of the queries against the database platform, to reduce as much as possible the poor execution plans written against the database engine.

Look for my next post on database federation where we look into another technique for federating your database.

Denny


Sep 15 2008   11:00AM GMT

Scaling the database out, not up.



Posted by: mrdenny
Federated Database, Scaling Out, Microsoft Cluster Service

When your database has grown beyond the performance capabilities of a single SQL Server, there are still ways get increase the system performance.  This requires using a technique called a Federating the database, this is also known as scaling the database out.  When you increase a servers capacity by increasing the CPU count within the SQL Server it is called scaling up the system.  When you increase a servers capacity by adding additional servers to the system it is called scaling out the system.  By scaling out the system we add additional entire servers to the database creating a database federation.  There are a couple of ways which you can create the database federation.  The technique that you use will depend on your own system requirements.

A database federation is not a high availability solution.  The correct solution to use for a high availability solution would be Microsoft Cluster Service (MSCS) or Database Mirroring (SQL 2005 and up).

There are some potential down sides to federating your database which you need to be aware of in order to make an informed decision.

1.       If any server in the federation is taken offline, the entire database system will become unavailable.  This is because the way the federation works requires online access to all nodes of the federation.  As the database which is being federated is probably an important asset to the company, this risk can be mitigated by using clustering in combination with database federation to provide a high availability solution to build your database federation on top of.

2.       Licensing for a database federation is extremely expensive.  SQL Server Enterprise Edition must be used, as database federation requires the use of distributed queries, which is an Enterprise Edition only feature.  Another reason for Enterprise Edition would be the number of CPUs supported.  As the system is apparently CPU bound (which is one of the key reasons to use a federated database) you will want to use SQL Servers which have as many CPUs as possible in them.  This would lead you to select servers along the line of the HP DL700 series of servers, of the SUN Fire 4600 series of servers.  Use of these massive servers will decrease the number of servers in your federation, thereby increasing the ease of setup of the federation.

3.       The design of a database federation is not a simple task.  It requires an intimate knowledge of the not only the database, but the entire application platform which works with the database backend.  In addition you need to have a solid grasp of not only the current system requirements, but of the far reaching expandability requirements of the database as well.  This knowledge is key as changing the design of your database federation is an extremely complex task, which if not done correctly can easily lead to ours of down time, and poor performance while data is moved from one node of the federation to another.

While these are some pretty important things to think about, federating your database has some major upsides as well.

1.       By federating your database, you will increase the amount of data that can be loaded into cache, as that data is loaded as each server loads its own data into its own cache.  This allows you to go well beyond the 64 Gigs of memory that Windows 2003 or Windows 2008 Enterprise Edition support.  With enough servers in the federation this will allow you to go beyond the 2 TB limit of Windows 2008 Data Center Edition.

2.       In addition to the additional data cache you have access to, you also get access to more CPUs than you would be able to fit in a single server, unless you where to purchase a very high end system such as an EMC Superdome, or one of the IBM iSeries servers.

3.       Given that the data is laid across multiple servers this then increase the number of disk controllers, or HBAs that you have access to, which can increase the available throughput to the disk.  It also increases the number of PCI busses which you have access to, thereby preventing any sort of contention as the data crosses through the HBAs or RAID Controllers and through the PCI bus on its way to the CPUs and RAM.

 

Now that we’ve gone over some of the basics of the federated databases, read through my next few posts as I talk about the various techniques which can be used to federate a database, and we go through the design processes to use each one within your database environment.

Denny


Sep 4 2008   11:00AM GMT

Speaking about Federated Databases at the San Diego SQL Users Group



Posted by: mrdenny
In Person Events, Federated Database, Scaling Out

I’ve been asked to come back to the San Diego SQL Server Users Group on September 18, 2008.  This time around I’ll be speaking about Federated Databases, and some various techniques which you can use to federate your systems.

For those that can’t make it I’ll be speaking on this same topic at the SoCal Code Camp on October 25 and 26.

I’m still finishing up the slide deck and demos.  I’ll try and get them posted in advance.  If I can’t I’ll post them shortly after.

Bring your business cards for a drawing as I’ll be giving away a copy of Laptop Cop, the laptop retrieval product by Awareness Technologies (the company which I work for).

Denny