When setting up AlwaysOn Availability Groups you may receive Error 41158 which references error 41006 when you attempt to join the the replica to the Availability Group. What these errors in a nut shell mean is that it ain’t going to work with your current configuration.
Assuming that you ran through your SQL Server installation and went next, next, next through the install this result is to be expected. The reason for this is that your SQL services are all running under local accounts which don’t have the ability to log into each other. There’s two solutions to this problem at this point. 1 is supported, the second isn’t.
Option #1 – aka. The Supported Option
Reconfigure the SQL Services which will be hosting the Availability Group Replicas to run under a single domain account. Restart the services. Give the SQL Account that the services are running under sysadmin rights. The replicas should sync up automatically at this point. If they don’t you can use the ALTER AVAILABILITY GROUP command to join the AG.
Option #2 – aka. Totally Unsupported, but works great for a demo
Add the domain computer account for each of the nodes of the cluster to each others SQL instance so that they can log in. For example the four computers which I use for my demo are called ALWAYSON1, ALWAYSON2, ALWAYSON3, and ALWAYSON4. So on machine ALWAYSON1 I added the domain accounts BACON\ALWAYSON2$, BACON\ALWAYSON3$, and BACON\ALWAYSON4$ as members of the sysadmin fixed server role (again this is for my demo lab so I’m going for working not secure). On machine ALWAYSON2 I add BACON\ALWAYSON1$, BACON\ALWAYSON3$, and BACON\ALWAYSON4$ and so on for machines 3 and 4. Once that was done the replication should being syncing up automatically. If they don’t either use ALTER AVAILABILITY GROUP or use the UI to force retrying.
Some seats are still available for my SQL Server 2012 class which kicks off March 19th, 2012 in Los Angeles, CA. If you are planning on deploying SQL Server 2012 in the near future this training class is for you. But don’t wait to get signed up. The sooner you get signed up the better off you’ll be.
This training class is 4 days long and will be focusing on 4 key areas of SQL Server 2012.
- Planning and Installation
- Mission Critical Deployments (aka High Availability, programming and migrations)
- Breakthrough Insights (aka BI)
- Manageability and Security
This full four day class includes not only lecture, but lots of hands on labs which are only available through this class. All this is available for just $1200 which covers all four days.
So get signed up today!
I’m sad to say that I’m going to have to cut back on the number of SQL Saturday’s that I’m going to be able to attend this year. It’s not because I don’t love PASS, or SQL Saturday, or the attendees as much as I did before, because it’s not. I’m just so busy that I’ve been royally screwing up the whole work live balance thing so far this year so far. In the first two months of the year I’ve been home for something like 10 days, and 4 of those I was sick with the flu (by the end of March I’ll be home for 2-3 weeks total depending on if a trip happens or is canceled).
Between work and the conferences like Tech Ed, SQL Days 2012, etc. that I’ll be at I just need to make sure that I’ll be at home at least a little bit so that Kris doesn’t kill me.
I’ll be at my local Code Camps and SQL Saturday’s (I’ve even got to leave SQL Saturday Huntington Beach early for my flight to SQL Bits) for sure, I’ll be in Atlanta for sure. If there’s one the weekend before or after PASS I’ll try and hit that one. Other than that I’m afraid that I’ll probably have to keep it pretty light. Hopefully next year I can cut back all this other travel and get back on the SQL Saturday circuit a bit more.
Hopefully I’ll see you at one of the few SQL Saturday events that I’m able to attend, or one of the bigger conferences. SQL Saturday 120 is next followed by SQL Bits.
Over the weekend I had the pleasure of presenting to the great folks at SQL Saturday 109. I’m pretty sure this was my largest SQL Saturday to date to attend with over 400 people attending the session. Because of the massive number of great speakers, including MVPs, Microsoft Employees and local speakers I only got to present one session. But my session went really well, and I think everyone who attended got something out of it.
Several people asked for my slide deck, which you can download here.
Hopefully everyone had as much fun attending the session as I did presenting the session.
One of the great features with SQL Replication is the ability to initialize a subscription from backup instead of from a snapshot. The official use for this is to take a database backup and restore it to a subscriber then replicate any additional changes to the backup.
However this technique can be used to get replication back up and running after moving the publisher to another SQL Server. Simply setup the publication just like normal, then backup the database and add the subscription using the “initialize with backup” value for the @sync_type parameter as shown in the sample code below.
If you were going to actually initialize a new subscription using a backup like the feature was written to be used, then after the backup has happened restore the database to the subscriber under the correct database name.
BACKUP DATABASE YourDatabase TO DISK='E:\Backup\YourDatabase.bak' WITH FORMAT, STATS=10 GO USE YourDatabase GO EXEC sp_addsubscription @publication = N'YourDatabase Publication', @subscriber=N'ReportServer', @destination_db = N'ReportingDatabase', @article='all', @sync_type='initialize with backup', @backupdevicetype='disk', @backupdevicename='e:\Backup\YourDatabase.bak' GO
This technique should work on all versions of SQL Server from SQL Server 2000 up through SQL Server 2012 without issue.
So if you follow me on twitter you might have seen this tweet a little while back.
Since there aren’t many people out there that get the chance to buy and build a brand new data center from scratch, I figured that I’d go over the process with you. This is the first of who knows how many blog posts on the topic.
The first step in buying colo space and moving into it involves getting completely fed up with your current hosting company. Currently we are with a large managed hosting provider named RackSpace I probably shouldn’t name them, and have become totally fed up with them. The costs are to high and we get almost nothing from their support team but grief. They have actually unplugged a firewall’s power cable in the middle of the day by accident. We actually have to have paper signs taped to the racks with the equipment which says to not touch anything in these racks between 6am eastern and midnight eastern without manager approval (or something to that effect) because it has happened so many times.
The first step to moving into your own CoLo (this process has taken about a year at this point) is to figure out how much processing power and storage you need to purchase. This doesn’t need to be an exact figure, but a rough estimate. This will eliminate some hardware options for you.
You also need to know what features you are looking for. Here are some questions that can help you figure these things out.
- Are you going to virtualizing servers?
- A few large VM hosts?
- Lots of little VM hosts?
- Will you need storage level replication to another data center later on for DR?
- If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?
- How long do you need to keep backups for?
- How much data growth is expected?
- Over one year?
- Over two years?
- Over three years?
- How IO rates need to be supported?
- How much IO throughput needs to be supported?
So lets break these questions down a little bit.
Are you going to virtualizing servers?
This one is pretty much a give in. Most every company should be virtualizing at least some of their servers. If nothing else things like domain controllers, and other infrastructure servers should be virtualized. It just doesn’t pay to have physical servers sitting around using 1% of the CPU all day. Other servers like web servers and app servers are also usually a no brainer when it comes to virtualizing them. The big questions come down to your mission critical servers, SQL Server, Oracle, Exchange (yeah I know, it’s not mission critical but just wait for Exchange or mail to go down them tell me it isn’t mission critical), SAP, etc. These machines may or may not be able to be virtualized.
It’s OK to have some machines by virtual and others to be physical. In the case of this project everything is virtual except for the SQL Server cluster (to large to be a VM) the vCenter management server (cause I’m old school and want it physical), monitoring (it’ll run on the vCenter server for the most part), and some appliances which are physical appliances which have to be racked. All the web, file, and infrastructure servers will be VMs.
In our case we are going with a few larger hosts instead of a bunch of smaller hosts. As we got through the hardware review process we landed on Cisco UCS blades and servers. For the VMware hosts we are running on several of the dual socket, 8 core per socket blades with something like 96 or 128 Gigs (might be even more at this point) of RAM per blade.
For the SQL Server cluster we are also using blades as they ended up being less expensive than their physical counter parts. The SQL Server blades are quad socket, 8 core per socket blades with 256 Gigs of RAM per blade. We didn’t pick these blades for the VMware hosts because it was actually cheaper to have the dual socket blades over the quad socket blades, and nothing that will be a VM will be getting more than 4 or 6 vCPUs so having the smaller blades isn’t an issue.
Will you need storage level replication to another data center later on for DR?
If you are planning on building a DR site at some point in the future this is important to know now. It would really suck to buy a storage solution that doesn’t support this when you will need it in the future. Just because you will need it doesn’t mean you need to buy the replication software now, or setup the second DR site now. But you need to plan ahead correctly for the project to ensure that everything that you want to do with the hardware is supported. Nothing sucks more than having to go to management in the middle of the DR build and tell them that all that storage that you’ve purchased will be useless and needs to be replaced, not only at the DR site but also at the primary site. Issues like this can delay DR build out projects for months or years as you now have to pause the DR build out (probably while still paying for the DR site and equiptment), buy and install new storage, migrate to that storage, then restart the DR project and start up the replication.
In the case of this project management said that yes we will want to spin up a DR site probably within a couple of years so this limited our search for equipment to storage platforms which fully supported storage level replication. This includes having consistency groups so that sets of LUNs are kept in sync together (kind of important for databases, Exchange, etc), integration with Windows VSS provider, supporting of snapshots, etc.
Now if your storage doesn’t support replication, or you want to have a nice expensive storage array at the primary site and a much less expensive storage solution at the DR site, you can look into EMC’s Recover Point appliance. It supports replication between two storage array’s and doesn’t even require that they be the same brand of array. It isn’t a cheap solution, but if you’ve got a million dollar solution in one site and a $100k solution in another site Recovery Point might be a good fit.
If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?
The reason that this question needs to be asked is to ensure that the storage array supports iSCSI. The only way to build a Windows cluster as a VM is to use iSCSI to attach the VMs to the storage directly. Most every storage array supports iSCSI these days, but there are some that don’t so this is important to know.
How long do you need to keep backups for?
As much as we all hate dealing with backups, backups are extremely important. And keeping backups for a period of time will save you some headaches in the event that a backup becomes corrupt. Also there might be regulations on how long backups are kept around for. Your SOX auditor might have a requirement, as might you HIPAA auditor and your PCI auditor. You just never know what these guys might through at you.
Then there’s the question of off site backups. Having backups is great, but you need to get those backups off site in case something happens to the building that the backup system is in. You’ve got a couple of different options here.
- Go old school and have iron mountain or someone pull the tapes and store them somewhere.
- Get a virtual tape library (VTL) and backup to that. Then get a second VTL and put it in an office or another CoLo and replicate between the two.
- Put your backups on a LUN and replicate that LUN to another facility
- Out source the backups to the CoLo
Option 1 is the way that it’s always been done. It’s reliable, slow and can be pretty costly. Option 2 is a pretty new concept, probably just a few years old now. It can work, if your backups are small enough and if you’ve got enough bandwidth. Storing a monthly worth of backups can take a LOT of space. Option 3 probably isn’t the greatest unless the only backups to worry about are the SQL Server backups as SQL can handle the purging of backups it self. Option 4 is worth looking at. Depending on the amount of space needed and what your CoLo charges it might be worth it to have the CoLo handle this for you.
In the case of this project we went with a combination of options #1 and #2. We have a VLT to backup to so that the backups run very fast (a VTL is basically just a separate storage array that is only used by the tape backup software and includes compression and deduplication to reduce the size of the backups). So we will backup to the VTL then copy the backups to tape. Then iron mountain will take the tapes off site for us. The VTL will hold about 2 weeks worth of backups on site, which we’ll have a second copy of on tape. Once we have the DR site we’ll get another VTL and replicate that, probably increasing it’s storage to 4-6 weeks and dump the need for the tape and offsite backups as everything will be backed up in two different CoLo’s in two different cities.
How much data growth is expected?
Knowing how much space you need today is important. Knowing how much space you need in 3 years is more important. Just because a storage array supports your data size today doesn’t mean that it will support it in 3 years. We use three years for a couple of reasons. First that’s typically how long the maintenance contract on the hardware is. Second that’s typically how long the financing term is for these kinds of purchases.
If you have 20 TB of space needed today, but in 3 years you’ll need 80 TB of space that’ll drastically change the kind of equipment that you can purchase.
How IO rates need to be supported?
How much IO throughput needs to be supported?
The next two questions go right along with the prior one. How much IO needs to be supported and high much throughput needs to be supported. These numbers will tell you if you need an array which supports flash drives, and how many drives need to be supported. Without these metrics you are totally shooting in the dark about what you actually need.
Once you’ve gotten all these questions answered you’d think that it’s time to start looking at hardware, and you’d be wrong. It’s time to go to management and get this thing approved to move forward. Join me next time as we look at that process.
P.S. This series will be at least half a dozen posts long. I’ll be tagging all of them with the tag “Building a new CoLo” to make it easier to follow just these posts via RSS if you aren’t interested in the rest of my stuff.
My SQL 2012 class is just a few short weeks away, but there are still seats available for the class. Take the time and get signed up now for this great four day class where we will be diving into SQL Server 2012 with loads of hands on labs to really get you ready to deploy SQL Server 2012 as soon as it is released.
As this class is all about the hands on part of the class, we won’t just be going through four days of lecture but we will instead be doing a great combination of lecture and lab so that you know not just the theory from slides but will have actual hands on experience of using the product to work through real life like scenarios.
Hopefully you’ll get signed up today and I’ll see you at the class next month.
So while installing a new Cisco UCS system with the one of the newer builds of the firmware which was 2.0(1s) we were getting an error with both chassis which said they were in an unsupported-connectivity state. The config that we setup was pretty basic and straight forward. We have two UCS Fabric Interconnects, with two chassis, and two cables from each blade in each chassis going to the Fabric Interconnects. There’s a diagram over there of what the system looks like for one of the two chassis (I love the fact that the management app makes nice diagrams like this).
When you configure a Cisco UCS system you tell the system how many cables the Fabric Interconnects should use to discover each chassis. At least that is how the screen is worded. What the setting actually means is how many cables should the Fabric Interconnects expect to see between each Interconnect and each blade in the chassis. We configured the setting for “Platform Max” in case we decided to add more cables later even though we only had two cables now, and only planned on having 2 cables (as shown in the diagram) for the time being. When we were set this way we had this strange unsupported-connectivity state error showing up for each of the chassis.
To fix this problem we had to change the discovery policy from “Platform Max” to 2 as shown in the below screenshot.
To make this change in the UCS Manager and select the equipment tab in the left menu. On the right select Policies tab. In the new lower tab menu select “Global Policies”. From there you can change the Chassis Discovery Policy which you can see in the screenshot below.
Hopefully this helps you if you run into this problem.
SQL Server is a damn good product, but it sure isn’t perfect. Like any good product out there people have come up with things that can be bolted onto the core to make SQL Server even better. Without these bolt on parts SQL Server looks a little dull. But these bolt on parts may not make the engine run better, but they make it look a lot better and that makes us want to make the SQL engine run better.
Some of my favorites (in no particular order) include:
The SSMS Tool Pack is a great add on for SQL Server Management Studio. It’ll save you if SSMS crashes by auto saving all those unsaved SQL Scripts for you. It’s got a great feature to help you read execution plans, a way to easily run a script against multiple databases, various templates, and much more.
Michelle (aka SQL Fool) has written a great Index rebuild and defrag stored procedure that anyone who is walking into a shop which isn’t going maintenance can take and throw onto the servers and happily know that the SQL Server will have some good maintenance being done automatically. The script will do rebuilds online when possible, offline when it must and figures out the order that things should be done in.
Adam was written sp_whoisactive and this is probably the gold standard is looking at what is causing SQL statements to wait, getting their execution plans, and a lot more. I’m pretty sure that there is a switch in there somewhere that will tell sp_whoisactive to make me breakfast. Adam has included loads of ways to filter the output so you can quickly and easily filter out all the spids that you don’t care about and get into the ones that you want. You can even control the formatting of the output in a variety of ways so that it fits your needs.
I’m cheating a little on this one, as I’m the one that wrote sp_who3, but it’s my list and I’m allowed to do that. sp_who3 will normally show the same output as it’s mild mannered cousin sp_who2. But when you call sp_who3 and pass it a spid that you are looking for a massive dump of information about that spid is returned. This dump includes the current statement which is being processed, the entire batch which is being processed, all the information formatted like the old sysprocesses table about all the threads for the SPID (very useful when seeing CXPACKET waits) and a ton of locking information. While the output isn’t very pretty, its functional. Personally I use sp_who3 to dig into parallel queries after I’ve done the initial identification of the problem using sp_whoisactive. (While the site only says SQL 2005 as the newest version that version works just fine on anything newer than SQL 2005.
Now go download and install these bolt-ons to your SQL Servers. I’ll wait…
Now that you’ve got all these bolt on parts installed, can’t you see how much nicer it is to work on the SQL engine. It’s easier to get at the information that you need. It’s easier to keep the system up and running. And you want to work on the system more now that it’s prettier and easier to work on. Much like my motorcycle is much prettier now that it has all those shiny parts bolted onto it.
So I think it’s been kept pretty quiet so far, as I’ve been insanely busy the last few weeks, but I am so thrilled to say that I’ve been picked to present a pre-con at the SQL PASS Rally in Dallas, Texas either on May 8th or May 9th (I haven’t been told which day yet). Either way I can’t wait to come down and talk about storage and virtualization with you, yes you, for the whole day. If that sounds like something you’d be interested in then get signed up (as soon as registration opens) and we’ll have some fun learning and making fun of your favorite storage admin.