While working on a new part of our production system we begun getting the following error message back from the SQL Server.
Server: Msg 8623, Level 16, State 1, Line 1
Internal Query Processor Error: The query processor could not produce a query plan. Contact your primary support provider for more information.
Now I’ve seen this error before, but it was supposed to have been fixed in Cumulative Update 3 for SQL Server 2005 SP 2 but apparently it hasn’t been. I’m currently running build 3239 which is Cumulative Update 7. (I’m running the x64 edition.)
The only difference between my query and what’s documented in the KB article is that I was doing a basic insert statement.
INSERT INTO Table3
There are obviously no complex queries being run here. However I did have relational integrity enabled through foreign keys. I ended up having to remove the foreign key constraints from this table for the insert statement to work correctly.
Hopefully someone finds this info useful.
You can follow along with the Microsoft development teams progress on Microsoft Connect. It’s feedback number 332815.
The service broker queue is where the messages are stored after they are delivered, but before they are processed. They are kind of like tables as you insert data into them (sending the message) and you view the data in them (receiving the message). While tables can have triggers, queues can have what are called activated procedures. If a queue has an activated procedure defined when the message arrives in the queue the queue runs the procedure. No data is passed to the procedure, that procedure still needs to do a receive against the procedure (if that’s what actually needs to be done).
Setting up an activation procedure is a bit tricky as you have to have the queue to create the procedure, and you have to have the procedure to set it as an activation procedure. So what I like to do is create the queue, then the procedure, then alter the queue to setup the activation. This way I can send messages to the queue right away without worrying about getting errors because the procedure doesn’t exist.
Unlike tables, you do not get to define the structure of the queue. All data is stored within the message_body field, so if you need to send more than one value it’s recommend that you use an XML document to send the data. The syntax to create the queue is very simply.
CREATE QUEUE [QueueName]
Because the queue does hold the messages you can define what file group you would like the data within the queue to be stored in. If you have a business need to keep the messages around after processing you can specify the RETENTION keyword setting it to ON. By default queues are enabled when they are created. If you do not want to allow messages to be delivered to the queue specify the STATUS keyword setting it to OFF.
Every Service Broker conversation requires a contract. The contract defines which message type or types will be used, as well as who (sender or receiver) can use which message types. The contract is specified when the conversation is created.
The syntax for creating the contract is pretty straight forward.
CREATE CONTRACT [ContractName]
(MessageType SENT BY ANY)
The SEND BY portion can be used as many times as is needed for each Message Type which is defined on the contract. It must be listed at least once as you can not have a contract with no message types. You can specify that the Message Type be used by the INITIATOR, the TARGET or by both (by using the ANY keyword).
A more complex Contract would look something like this.
CREATE CONTRACT [ContractName]
(MessageType_I SENT BY INITIATOR,
MessageType_T SENT BY TARGET)
On 6/4/08 at noon PST (3pm EST) I’ll be presenting a MSDN Geek Speak session entitled “Spatial Data in SQL Server 2008 with Denny Cherry”. In the session we will explore the new support for spatial data in Microsoft SQL Server 2008. I will also be talking about managing huge SQL Server databases, and answer your questions about SQL Server. The Microsoft hosts will be Mike Benkovich and Mithun Dhar.
It’s a very free form session where we rely on Audience Q&A to really drive the direction of the session.
Here is the URL to register for the session. Just click the URL and register, you’ll get an email reminder, etc from the Microsoft Events system.
I hope everyone can attend. I hope to make my first geekSpeak a huge success.
The SoCal Code Camp is coming up shortly (June 28th and 29th). I’ve just finished posting my abstracts on the site for all to see. I’ve tagged them all with SQL Server to make them easier to find. I’ve currently got about 6 sessions on there.
I’m crazy I know.
If you aren’t in Southern California this might be a great way to get the boss to pay for a weekend in San Diego, California.
There’s also a free dinner which is called the Geek Dinner.
I hope to see you all there.
So, a little while ago I was having a major problem at work that I’d like to share in the hopes that someone else finds this information useful. (When I was having this problem, Google wasn’t able to find anything for me.)
The symptoms that we were having in a nut shell was that the tempdb was taking all the drive space which was allocated to it. On this system, tempdb is sitting on it’s own 50 Gig drive, so SQL can take the entire thing if it needs to. The problem was that sp_spaceused was showing that the entire database was empty, but we were getting out of space errors when running queries which needed to use tempdb. First thing we did was create another tempdb file off on another drive which is a 200 Gig drive. This gave us space to work with while we tried to figure the entire mess out.
The first thought that we had (well OK, not the first) was that we were having the problem described in MSKB 947204. The solution to this KB Article was to apply SQL 2005 SP 2 Update 6. So we did this, but to no avail.
Now, I need to take a little break and give you some additional background before we continue.
About 2 months ago we made a hardware swap on the SQL Server. The old server wouldn’t take a memory upgrade, so we got a new server with the correct amount of RAM (old server had 8 Gigs, new server had 16) and did a fork lift migration to the new server. The migration went fine, totally as expected.
Now another piece of information that you’ll need is that we are using the service broker heavily within our database.
What we didn’t notice, and what was eventually found to be the root cause of the problem, was that the service broker wasn’t working correctly. The reason we didn’t notice it was that there were never any error messages. Messages were being sent, but they never actually got delivered.
There were no errors being logged in the errorlog, on in SQL Profiler (when monitored using either the Errors and Alerts or Broker monitors). The error could only be seen by querying the sys.transmission_queue. In the transmission_status column was some sort of error along the lines of “An error occurred during decryption”. It also gave some sort of error about the database master key.
Working off of the database master key error I did an ALTER MASTER KEY and reset the master key, but that didn’t help. I then decided to reset the connection between the master key and the service key by using the ALTER MASTER KEY ADD ENCRYPTION BY SERVICE MASTER KEY command. This took forever to execute, which I assumed was a good thing since I didn’t see any blocking. Once it was finished the CPUs on the SQL Server started running very hot, and the disk IO went up very high. All the CPU was being taken by SPIDs 16 and 20-24 (or so) with a command of BRKR TASK (Broker Task). So the service broker was now chewing on all the messages in the system (we had no idea how many there were, we estimated it at millions). After a while of watching the system process messages which were no longer relevant we decided that we needed to kill all the messages in the service broker as quickly as possible. Fortunitly the few messages which we would need to be able to keep we would be able to easily recreate.
I wrote a nice little script which queried the top 100 conversation IDs from the sys.transmission_queue DMV and did an END CONVERSATION WITH CLEANUP on them to get rid of them. This was then wrapped in a loop so that it would kill messages in-mass. Because of the massive load on the database after 2 days of running only about 350,000 conversations had been closed. During this two day window the SQL service had to be restarted twice because tempdb had grown to over 200 Gigs with 198 Gigs of that showing free space, but the system still reporting out of space messages.
We decided that a more drastic approach was needed. We thought about deleting the queues, but that wouldn’t help. Service broker would simply hold the messages until the queues were back then start processing the messages again. I thought about deleting the Service Broker Service, and tried it but the process hung waiting for the disk, as I assume it was removing all the messages which went with that service. I didn’t want that sort of blocking and load on the system, so I killed that.
This is when I remembered that all messages being sent or received are done to a specific service broker GUID, not to a specific database name. This gave me an idea. If I change the GUID, the messages which the service broker is chewing on will become invalid and SQL will remove them as bogus messages. So I stopped the SQL Service, and restarted it in single user mode with a minimal config (not actually needed, but I needed to get all the users out of the database, and with 30,000 people hitting the system 24x7x365 this was just the easiest way to do it). I then did an ALTER DATABASE MyDatabase WITH NEW_BROKER. This changed the GUID of my database to a new GUID which stopped SQL from processing the messages. I restarted the service normally and brought the system back online. The tempdb is sitting at 30 Gigs.
Some of the queries which will come in handy to see if you are having this same kind of issue are:
The first one is pretty self explanatory.
If there are records in the sys.transmission_queue with an error about the master key, or decryption.
SELECT TOP 10 *
The second value is the important one here. The internal_obj_kb is the amount of space taken in the tempdb by internal objects like table variables, stored procedure input parameter definitions, etc.
SUM (user_object_reserved_page_count)*8 as usr_obj_kb, SUM (internal_object_reserved_page_count)*8 as internal_obj_kb, SUM (version_store_reserved_page_count)*8 as version_store_kb, SUM (unallocated_extent_page_count)*8 as freespace_kb, SUM (mixed_extent_page_count)*8 as mixedextent_kb FROM sys.dm_db_file_space_usage
This query shows the number of pages and the number of Megs that the internal objects where using.
SELECT SUM(internal_object_reserved_page_count) AS [internal object pages used],
(SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in MB]
By using these queries we were able to find out that we had Gigs of space assigned as reserved for internal objects, even those sp_spaceused was showing us that all the space in the database was available because there wasn’t actually any data written to it from a physical object. We normally have between 7 and 15 Megs of internal object space, but until the root cause of the issue was resolved we were running about 100 Gigs of internal object space.
So if you have moved from one server to another, and you use SQL Service Broker, and you aren’t getting messages processed because of a master key or decryption error, and your tempdb is filling up like crazy, this might be your problem.
I hope that no one ever has this problem, but if you do and you have run across this posting, I hope that you find it useful in fixing your problem.
EMC has posted a bunch of video’s from EMC World online for all to see.
[kml_flashembed movie="http://www.youtube.com/v/n_SSvKXsgKw" width="425" height="350" wmode="transparent" /]
You can see all of EMC's video's here.
EMC World was a blast this year. I’ve posted some highlights of the event through out the week. I’ll continue to post as much of the info as I can through out the next few weeks.
Wednesday at EMC World was a session packed day to be sure.
I started my morning with CLARiiON Rebuild Settings and Data which was an in depth look at how exactly the CLARiiON systems handle rebuilds, and how long various rebuilds take. In addition we went into detail as to how the CLARiiON will not see that a disk is going to fail, and instead of waiting for it to fail and then having to rebuild it will actually copy the data from the disk ahead of time to a hot-spare then mark the disk as failed so that it can be replaced before the disk actually fails there by providing you with no point in time where your data is unprotected.
Another session went through the changes to the EMC PowerPath product and all the new features they are building into PowerPath such as the encryption of data when it leaves the server on its way to the disk.
Another session went through some tuning tips an tricks for getting the best performance from the EMC CLARiiON product. These include the strip size, cache settings at both the LUN and SP level.
The night ended with the Billy Crystal performance. I figured that the show would be funny. I was wrong, it was hilarious. Like everyone else I’ve seen Billy Crystal on TV and in Movies and thought he was funny, but in person he was probably the best comedy performance that I had ever seen.
Yesterday was an very busy day. I didn’t have time to think, much less put together a post about it. I hit every session which I was looking for including the always hard to get into Navisphere Manager Hands-on workshop.
The session I probably got the most out of was the session on what’s new in the FLARE version 26 which was released a few months ago.
FLARE 26 now supports Active/Active presentation of the LUNs. What this means is that in the event of a fibre cut on either the front end or the back end the host machine (Server) will no longer need to trespass the LUN to the other SP. The LUN can simply send the IO request to the other SP. The non-preferred SP will then forward the request to the preferred SP automatically for completion. Upon the preferred SPs connectivity coming back online the requests will then be sent to the preferred SP. The newest version of PowerPath is required for this to work, or the native multipathing driver such as the Windows 2008 driver must support ALVA.
The support for supporting a broken connection between the host and the storage is from ALVA. The support for handling the request when the connection is broken between the SP and the DAE is an EMC only extension of ALVA.
FLARE 26 also includes RAID 6 support. When comparing RAID 6 with RAID 5 on the same system read performance will typically be better as the data is spread across all the drives in the RAID 6 array. Unlike a lot of other systems the EMC CLARiiON array spreads the parity sectors of (RAID 5 and) RAID 6 across all the drives in the RAID Group. So because there is an extra drive in the array a 4+2 RAID 6 RAID Group will give better read performance that a 4+1 RAID 5 RAID Group. When doing a full strip write the write speed between a RAID 5 and RAID 6 array will be basically the same. When doing smaller writes a RAID 5 array will have a faster write time than a RAID 6 array because RAID 6 has the extra parity to account for. The rebuild times for rebuilding after a failed drive will be about the same between a RAID 5 and RAID 6 array which have suffered a single drive failure. If the RAID 6 array has to recover from a dual drive failure it will take longer to recover than the single drive failure as the data must be recalculated from the two parity bits rather than from a single parity bit. However the odds of a dual disk failure are slim.
Just like with RAID 5 within the CLARiiON the RAID 6 supports the proactive hot spare. This is where when the system sees that a drive is going to fail it will automatically copy the data from the failing disk to a hot spare and mark the disk as bad. As the data does not have to be rebuilt this is a very quick operation.
FLARE 26 now supports a Security Administrator role. Members of this role have no access to the storage settings it self. They can only create accounts within the Array.
A very important change is that the SPs can now be setup to sync thier system time to a networked NTP time server. This will force the time on the SPs to be the same. Until now the times could end up getting a little off which could make tracking down event information very hard to do as the log entries would have different times on each SPs log file.
FLARE 26 now supports replication over the built-in iSCSI ports on the new CX3 line of systems. This is a great change as before you had to use the iSCSI ports on a FC-IP switch to do this replication. This includes SAN Copy, MirrorView, etc.
MirrorView /S should only be used for connections within ~100 miles as beyond that you start to get to much latency between the arrays.
Starting later this year (Q3 or so) there will be an extension to MirrorView /S called MirrorView /SE (Cluster Enabler) for Microsoft Cluster Service. This will give you the ability to use CLARiiON to setup a geographically disbursed cluster. In other words you can have servers in two different cities setup in a Windows Cluster.