During TechEd Europe I got a very scary phone call. A very large table was throwing errors that the value being inserted into the primary key column was overflowing the data type that makes up that column. In this case the data type in question was INT, so we were trying to stick the number 2,147,483,648 into the column and that number just doesn’t fit. The error that we got looked something like this:
System.Data.SqlClient.SqlException: Error 8115, Level 16, State 1, Procedure SomeStoredProcedure, Line 103, Message: Arithmetic overflow error converting IDENTITY to data type int. Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
Ourshort term solution was quick and easy, to change identity seed of the column from 0 to the lowest possible number, in this case -2,147,483,648 which got us back up and running pretty quickly. This however isn’t a very long term solution for this system. The application is only about 3 or 4 years old, and has been growing very quickly over the last year or so. We estimated based on the last few weeks worth of data that we would run out of negative numbers within about 12 months at the most. We sent an estimate of 9 months to the business when we advised them that the system was back up and running. We also told them that we wanted this fixed within 5-6 months to be save because if we didn’t get this fixed before running out of negative numbers there wouldn’t be any short term fix and that we’d be looking at a multi-day outage to fix the problem.
We couldn’t just rush into a database side fix, as the fix in this case is to change the data type from INT to BIGINT. As the application does use this column in a couple of places the .NET application needed to be reviewed to ensure that anything that was looking for an INT was corrected to handle the BIGINT correctly.
Based on the amount of data within the table (about 300 Gigs) it was decided that taking an outage to make the change in place wasn’t really an option as doing the size change in place would require somewhere around a 5 day outage to remove and rebuild all the non-clustered indexes. To make things a little more complex there is a second table which has a 1=1 relationship with this table, and the second table is even larger (about 800 Gigs), though thankfully the second table doesn’t have any non-clustered indexes.
The solution that was decided on was to move the data from the current table to a table with a BIGINT data type while the application was running. To do this meant that we needed to get all the data copied over to the new table and in sync while the old table was being used. It was decided that the easiest way to do this would be to use triggers. In this case instead of one complex trigger to handle all insert, update and delete operations three separate triggers were used for each of the two tables. First I created two new tables, with the exact same schema as the old tables, with the exception that the new tables used the bigint data type for the primary key instead of the int data type for the primary key and the new table was setup with the ident being the next available positive number. Once that was done the triggers were setup. The insert trigger is pretty basic. Take the data that was just loaded and stick it into the new table.
CREATE TRIGGER t_MyTable1_insert ON dbo.MyTable1
SET IDENTITY_INSERT MyTable1_bigint ON
INSERT INTO MyTable1_bigint
(Col1, Col2, Col3, Col4…)
SELECT Col1, Col2, Col3, Col4
SET IDENTITY_INSERT MyTable1_bigint OFF
The update and delete triggers required a little more logic. The trick with the triggers was that I needed to avoid doing massive implicit data conversions. In order to ensure that SQL was doing what I wanted (which it should be doing anyway, but it made me feel better doing explicit conversions) I explicit conversions into place for the JOIN predicates as shown. The update trigger is shown first, then the delete trigger.
CREATE TRIGGER t_MyTable1_update ON dbo.MyTable1
SET MyTable1_bigint.[Col2] = inserted.[Col2],
MyTable1_bigint.[Col3] = inserted.[Col3],
MyTable1_bigint.[Col4] = inserted.[Col4],
WHERE cast(inserted.Col1 as bigint) = MyTable1_bigint.Col1
CREATE TRIGGER t_MyTable1_delete ON dbo.MyTable1
DELETE FROM MyTable1_bigint
WHERE Col1 in (SELECT cast(Col1 as bigint) FROM deleted)
Once these tables were up and running all the new data changes were being loaded into the table. At this point it was just a matter of coping the existing data into the table. There are a few ways that this can be done. In this case I opted for an SSIS package with a single data pump task, and two data transformations within the data pump task with one transformation for each table.
In order to make the load as fast as possible I used the fast load option and loaded 1000 rows at a time. Within the data task if there was an error I redirected the rows to another data pump task which simply dropped the rows into the same table, but this time going row by row. Any failures from that import were simply ignored. While handling failed rows like this is time consuming it is easier than running T-SQL scripts to verify which rows are needed and which rows aren’t needed. SSIS also gives an easy option to ignore the foreign key relationship between the two tables so if the child table gets rows first that isn’t a problem as we know that the parent table will catch up. The SSIS package looked like this:
When all is said and done and the data is in sync between the new and old tables, the current tables will be dropped and the new tables will be renamed and put into place so that the application can continue to run without issue, with just a few minutes of downtime.
So why did this happen? When the applications database was being designed the developers didn’t think about how many rows the database was going to get over time, so they didn’t account for needing to support more than 2.1 billion rows over time. If I (or another data architect) had been involved in the project at it’s start this hopefully would have been caught at design time. However when the application was first being designed the company was brand new and didn’t have the funds for a data architect to help with the application design so this problem was missed.
Hopefully you never hit this problem, but if you do this helps you get out of it.
One of the companies which I work with has the occasion to create new databases when they do releases of their software. Normally this isn’t a problem, except that they are setup to use maintenance plans to handle the backup and pruning of their transaction logs. As all the new databases are created in the full recovery model this can end up causing some problems for them as within 12 minutes they start getting emails saying that the transaction log backup job has failed. And these emails will keep coming in, possibly for hours until the full backup job kicks in later that night.
To solve this problem, I added a DDL trigger to the server which will cause the new database to be backed up as soon as the database is created. The trigger itself is rather simple. Most of the trigger is setting variables. Then I make sure that the database isn’t a database snapshot, as database snapshots can’t be backed up. If it isn’t a snapshot we continue with everything else.
Then I create a subfolder for the backups to be put into (the backups for each database go into their own folder, so as this is a new database the folder needs to be created). Then I commit the transaction, as database backups can’t be taken within a transaction. Then we do the actual database backup. I then throw a message to the end user using the RAISERROR statement telling them that they can ignore the other error that they are going to get about the transaction being closed before the trigger was complete. This is just an annoyance of my needing to commit the transaction before taking the backup. Sure I could have setup a job which takes the backup and emails if there was a failure, but that just seems to complex for something so simple. The code for my trigger is below.
CREATE TRIGGER BackupNewDatabase
ON ALL SERVER
declare @database sysname, @event_data XML = EVENTDATA(), @folder nvarchar(4000), @file nvarchar(4000)
SET @database = @event_data.value(‘(/EVENT_INSTANCE/DatabaseName)’, ‘sysname’)
set @folder = ‘X:\Backups\’ + @database
set @file = @folder + ‘\’ + @database + ‘.bak’
if exists (select * from sys.databases where name = @database and source_database_id is null)
EXEC master.dbo.xp_create_subdir @folder
BACKUP DATABASE @database to disk=@file
raiserror( ‘You can ignore the error message which says that the transaction ended within the trigger.’, 16,1)
Hopefully you find this solution helpful if you get into a situation like this,
Today was the 1st day of the SQL PASS Summit and the keynote was full of some great announcements and demos. The keynotes are a massive with almost all of the 3894 conference attendees from 57 different countries.
All of the keynotes from this years conference not only are the keynotes being streamed live, but many of the sessions are also being streamed live via the SQL PASS website on PASS TV. The full schedule can found on the PASS TV page.
Bill Grazino told us turning his keynote that pass has sponsored 9 non-PASS events around the world. Including these 9 events, pass has provided well over 500,000 hours of training to members of the community, much of it like SQL Saturday’s, User Groups, etc. being available to the attendees free of charge. PASS has created a business analytics specific conference which will be held April 10-12 2013 in Chicago, IL. More information will be available on the SQL PASS website shortly.
When Ted Kummert took the stage he talked about some of the great products that Microsoft has released this year including SQL Server 2012, the Surface, Windows Server 2012, Office 2013 as well as many others.
Ted announced that SQL Server 2012 has been released as of today. This release aligns with Office 2013 and the new SharePoint 2012. Ted also announced project Hekaton which is a new in memory database engine which will be introduced within the next major release of SQL Server. Project Hekaton is a fully in-memory database engine which will greatly improve OLTP performance. While this new engine is a part of the SQL Server database engine, this portion has been totally rewritten to really maximize the hardware under the SQL Server database engine.
Including Product Hekaton is the Hekaton AMR Tool which (when renamed) will help identify bottlenecks which can be resolved by converting the database to a Hekaton database engine. Because the Hekaton engine is running with the entire database being in memory latching has effectively been removed. Hekaton also supports recompiling stored procedures from running T-SQL to running as native machine code. This allows the stored procedures to run much, much faster without any actual code change. While not all workloads are going to be right for the Hekaton database engine, but those that are should see a massive performance improvement. Because project Hekaton runs within the same database engine that we have today if you know SQL Server you will already know how to work with a project Hekaton database.
Another great announcement is the clustered column store which will be introduced in the next major release of SQL Server. Along with the clustered column store index we will also be able to update tables which have column store indexes created on them, no matter if the column store index is a clustered column store index or a non-clustered column store index. This will introduce massive performance improvements to real time data warehouses and make data warehouses much easier to setup and performance tune; even with the largest workload.
For the non-relational side of things you can today download the Microsoft HDInsight Server CTP which is Microsoft’s Hadoop offering.
Ted also announced a new update to the Microsoft PDW which will be available in the 1st half of 2013 and will be called the SQL Server 2012 Parallel Data Warehouse. This update will include some of the features which will be included in the next major release of SQL Server such as the updatable column store index. Another big enhancement of the new SQL Server 2012 PDW is that SQL Server Management Studio will now be the query tool which is used to run the queries against the PDW system. During the demo against the PDW doing a simple count against a 1PB table the query ran in just a few seconds. Doing data aggregations against the same table for a real world style reporting query the query again ran in just a few seconds.
The next announcement was Polybase. Polybase allows you to query data from an external source from within the SQL Server 2012 PDW. This will allow you to create an external table which simply links to a Hadoop cluster. This will be done with the CREATE EXTERNAL TABLE statement which accepts the schema that you want and a hdfs URL to Hadoop without needing to create a linked server to the Hadoop cluster.
The next step of BI is here in SQL Server 2012 SP1 and Office 2013. Office 2013 includes PowerView natively within the product. It also supports some new mapping features natively within Office 2013 without any additional downloads or plugins. This support includes term extraction which is done through HDInsight. This allows us to view text data and extract names, sentiment, locations, etc. from simple text data and use those extracted strings in order to build PowerView reports. PowerView also now supports DAX queries to a MOLAP cube straight out of the box.
Stay tuned for more updates from the PASS Summit through out the week.
With the release of SQL Server 2012 and specifically the AlwaysOn feature we need to rethink the way that we handle the licensing for servers that will be passive servers. In the past it’s been assumed that passive servers didn’t need to be licensed, and that was true, for the most part. The official docs read that you get one passive server per active server that you have licensed, provided that you have a current Enterprise Agreement. Without the Enterprise Agreement in place any and all passive servers need to be licensed.
With SQL Server 2012’s AlwaysOn feature we have an active server called the Primary Replica and we have up to 4 secondary replicas. Even if none of those 4 secondary replicas are in use for anything, you will still need to license two of them to be properly licensed. This is because when licensing SQL Server’s each licensed server gets you only a single free passive server. So for a 5 instance AlwaysOn Availability Group deployment you’ll need to license at least 3 of those instances which would give you two passive instances. As long as those two passive instances aren’t being used for read access they are free.
While I applaud the hard work and ingenuity of the sysadmins who works through the days after hurricane Sandy to keep the generators running, one thing kept coming to my mind. Why was this needed?
I’ve moved lots of companies into data centers before, and each time I’ve done so it has included a site visit during to RFP process to check on things like cooling, generator, site configuration, etc. During these site visits I stick my head through every door that the sales guy and the engineer who are doing the tour will open. If they won’t open a door for me they better have a damn good reason to not show me what’s in there.
If I went to do a site visit at a facility that was located just at sea level just a few blocks from the ocean, they’d better have some pretty impressive plans for how they are going to handle flooding. If the answer was “we’ve never had a problem with flooding” or something similar they’d be off my list as they haven’t done their due diligence to insure that the site will be online during the worst emergencies possible.
Now before you start telling me that I’ve got no idea what I’m talking about, and that data centers in the North East have different problems from data centers in the South West. I actually do as I’ve moved companies into data centers on both coasts. Most recently I moved the data center for Phreesia from a managed service facility in Texas to a colo in New Jersey. As a part of this move we looked at a number of facilities in the New York / New Jersey area. Many of the New York City data centers were eliminated due to cost, or being just to close to the water as we didn’t want to deal with situations like this exact one.
The data center we settled on is in New Jersey about a 30-40 minute drive from the Phreesia office (once you get out of New York City). While the data center is near a river, the river is a little over a mile away. The data center itself is on a slope with a cinder block wall on the outer edge which will divert water away in the event of a river overflow (it also protects the facility from someone driving a car or truck into the facility). The generators and fuel pumps are all mounted on raised feet (not very tall, but tall enough) so that they keep running in the event of a flood. The cables from the generators to the building have all been buried under the ground so that tree branches which are torn loose during the hurricane can’t cut those cables.
Our diligence in selecting a data center paid off. While the folks mentioned in that article were dragging buckets of fuel up 17 floors worth of stairs the team at Phreesia just sat back and rode out the storm with their families. The sites didn’t go down and the team didn’t have to rush into a very hazardous situation. The team was able to focus on their families and neighbors without having to worry about the systems. Those of us that aren’t in the New York area monitored the systems remotely and founds that everything was running perfectly normally just on generator power instead of city power.
This entire event just shows that when doing data center evaluations the worst possible case situation needs to be planned for and expected. If they aren’t they are going to come back and bite you. Especially in todays world of storms which are ever increasing in destructive power.
If you are in a facility which has risks such as fuel pumps which are below sea level (they are in the basement and the road is at sea level) then a second data center becomes very important very quickly. This became very clear during this hurricane when some very large important websites went offline because they didn’t have that DR facility that they could fail over to, unlike sites like StackOverflow (and ServerFault and the Stack Exchange network).
If you are at risk now is an excellent time to sit down with management and go over all the “what ifs” of your current hosting solution and think about the cost of a DR site along with the cost of not having one.
Like every year at Bush Garden in the international district of Seattle, WA the SQL PASS community will descend onto Bush Garden at 9:30pm Wednesday night for an evening that it to be remembered. This is our annual SQL Karaoke evening event, which for the second time ever is being sponsored. This years proud sponsor is Idera Software who is also a silver sponsor for the SQL PASS Summit itself.
What this sponsorship means is that the drinks are paid for, provided that you have a wristband which you have picked up from the Idera booth in the Exhibit hall Wednesday during the day. If you don’t get a wristband, don’t worry you can still come and have a great time, the event isn’t limited to those with a wristband. The only catch if you don’t have a wristband is that you’ll have to buy your own drinks. Now don’t worry about singing, singing isn’t a requirement of attending SQL Karaoke. Trust me. I can’t sing at all (see the videos below if you don’t believe me).
If you sing or don’t, if you have a wristband or not, the night is a great night, with some great singing, and some truly awful singing (this one probably isn’t safe for work) (yes I’m in all awful videos, this should give you an idea of my singing ability), it’s some awesome singing and an awesome night, and isn’t to be missed.
Now to be fair Bush Garden is small, cramped, hot, and the service can be a tad on the slow side (ok, that’s being pretty generous), but the vibe in the place is just right and SQL Karaoke just isn’t the same anywhere but Bush Garden (though we will try really hard next year in Charlotte) and pretty much everyone who attends has a great time. If you don’t believe me, watch the videos above to get some idea of the insanity.
Spread the word, SQL Karaoke is just 6 days away. Be there!
P.S. Don’t forget to head over to the Idera booth Wednesday during the exhibit hall hours to get a wristband for the drinks.
This was reposted from Clean Up BlogThis feed and make it a snipit http://feedproxy.google.com/~r/google/aTBw/~3/gU_WbGyETfg/ written by jstrate. They get all the credit for this, not me.
Obviously the title of this post is an exercise in bad grammar, but it’s a little catchy. Well, maybe not, but a post needs a title.
But let’s get to the point. Turns out a few days ago Kalen…
To continue reading this (hopefully important) post please click through to the authors blog post using the link above or below.
Additional reading can be found at the original author’s post.
While preparing for a major disaster is what should be done ahead of time, that’s a little late for some people on the east coast of the US today. So if you find yourself (or your company or your servers) on the east coast, what are some things you can do today to help keep things running if things go very, very bad.
On the SQL Server side of things…
- Verify your most recent full backup by restoring it to another server (both user and system databases)
- Make sure that your backups are copied to another servers or USB Drive
- Get a copy of the backups out of the facility (a USB drive is a great way to do this, the network will work if needed)
- Generate some scripts to ensure that you can restore the databases to another machine when ready
- A way to get the transaction log backups off site, maybe FTP’ed to a personal website (after encrypting the files)
On the Hyper-V / VMware side of things…
- Do you have backups of critical VMs? If not take them now.
- Get a copy of the backups out of the facility (a USB drive is a great way to do this, the network will work if needed)
- Don’t forget to do a system state backup of Active Directory
If there is a disaster, the things that you’ll need at the new data center will include…
- Servers (get your order in as soon as you know that your data center doesn’t exist any more, you won’t be the only one ordering servers)
- Switches, Routers, etc.
- Lots of USB Hard Drives (The SAN won’t be available, and it’ll take weeks or months to get a new one)
- Zip ties/cable ties
- Network cables
- All the backups that you took from above
- Installation media for all your OSes and databases servers
- As lots of things
As you are getting the installations all setup, some things to keep in mind…
- You’ll want to enable Instant File Initialization
- You’ll want to restore system databases, then user databases
- Restore AD from the system state backup, otherwise all the Windows authentication SIDs will be useless within SQL Server
If your data center looses power and it’ll be out for a while, and you don’t have a big expensive generator to run the data center know where you can get a generator. A few small ones from Lowes, Home Depot, etc. will work as well for keeping the most critical systems online. Get some large box fans to keep the air in the data center moving. If you are using small generators, use a different one for the fans as they will have very dirty power, which could damage the power supplies of the servers. Plastic sheeting can be used to direct air in or out of the data center so that you get cool air coming in from outside and so you can direct the hot air back out (ask me some time about turning a 6 story building into a giant chimney for a data center).
Once we are in a disaster situation we are no longer worried about peak performance, we are worried about keeping the business running for our customers. Don’t worry about redundant power supplies, or passive nodes, reporting servers, etc. Just worry about getting the core systems online so that websites work, and so that customers can login to your applications. If your application supports it be sure to put up a message telling your customers that performance won’t be very good because of the weather. Just because you know that the weather is a mess doesn’t mean that your customers are aware that there’s a problem. Most people are pretty understanding and if you tell them that the weather is causing performance problems they will be fine with that. This will save you a lot of phone calls and a lot of headache.
No matter what, when getting things setup, rebuilt, brought back online, etc. do it safely. Don’t get hurt and risk hurting yourself or others to get things back up and running. It just isn’t worth it.
With hurricane Sandy dropping into the East Coast of the US this week this is a perfect time to think about DR plans. DR planning isn’t something that people should take personally when it is brought up. There are some IT professionals who consider DR planning to be a personal insult to their ability to setup and configure systems, and there are some developers who consider DR planning to be an insult to their programming abilities.
Setting up DR (or backups in this case) has nothing to do with insulting the IT staff, or that the programmers don’t know what they are doing. As IT workers our job is to hope for the best, but plan for the worst. In the case of this week the east coast of the US is being hit with about the worst case that they can get, a full blown hurricane going all the way up the coast.
I don’t care how good you are at racking servers, installing Windows, writing software, etc. if the power at your data center goes out for a week, and they can’t get fuel to the data center for a week (depending on the number of trees which are down between the highway and the data center, this is a real possibility) the systems will be down and you won’t have planned correctly for the worst.
If you think about this from home perspective instead of the work perspective, when a disaster strikes you don’t want to have to rush to the store to try to find bread and other food items to keep yourself and your family fed during the emergency. If you live somewhere that has regular natural disasters (which is pretty much everywhere at this point) you hopefully have canned food, bottled water, flashlights, etc. at home so that you can ride out this sort of disaster for a few days at the least without running out of food and water. Why shouldn’t you plan accordingly at the office as well.
This sort of planning isn’t something that can be done last minute, because you can’t always see the disaster coming so you don’t always have time to plan (or shop) right before (or as) the disaster happens.
With some forethought and proper planning any business can ride out any disaster. But it requires planning ahead of time and the dedication of the company and the employees to properly setup and test the DR solution. DR projects can be big scary projects if you don’t have someone on staff who has experience with these sorts of things. But that’s OK, that’s what we have consultants for who specialize in these sorts of projects. Not every company needs to keep staff on hand that can plan out DR plans, but you should bring someone in who knows how to plan and execute these sorts of projects successfully. While the consultant may cost a few dollars an hour, it’ll be much less than a failed DR project, and a whole lot less than a failed DR failover.
This week I’ve found some great things for you to read. These are a few of my favorites that I’ve found this week.
- Using filetable with full-text? Lose the redundant index
- Cloud Computing: Start with the problem
- How To Support Multiple SQL Server Package Configurations in SSIS
- Tracking Extended Events for a SQL Agent Job
- Most common latch classes and what they mean
- This weeks SQL Server person to follow on Twitter is: SQLSoldier also known as Robert L Davis
Hopefully you find these articles as useful as I did.