While attending SQL Saturday 194 in Exeter over in England one of the attendees came to Mladen Prajdić, Andre Kamman and myself with an interesting problem. She had a database table which was about 200 Gigs in size which she wanted to delete about half of the data from the table. The catch was that the database table was full of LOB data where the rows were very large, with an average LOB data size of over a meg. She also needed to shrink the database after the database was deleted so that she could reclaim the space from the database. Oh and all this had to be done on SQL Server 2005 Standard Edition. (Everything here applies to SQL Server up through SQL Server 2012 as well.)
Deleting the data from the database is the easy part, a simple delete loop will handle that nicely. The problem is when you delete rows from a table which contains LOB data the LOB pages aren’t cleared when they are deallocated. We can see this by running the following code.
CREATE DATABASE Lobtest
CREATE TABLE t1 (c1 int IDENTITY(1,1) PRIMARY KEY, c2 ntext)
INSERT INTO T1 (c2) VALUES (replicate('a', 20000))
DBCC IND ('LobTest', 't1', 1)
DBCC TRACEON(5201, -1)
DELETE FROM t1
DBCC IND ('LobTest', 't1', 1)
DECLARE @dbid as int = db_id('Lobtest')
DBCC PAGE (@dbid, 1, 231, 3)
You can see that page 231 is a LOB page which is allocated to the table t1. When you look at the actual page using DBCC PAGE after the row has been deleted we can see that there is data in the page, and that the page header shows that the page is still allocated to the table t1. This can be seen by looking in the header of the page for the header value labeled “Metadata: ObjectId = 245575913”.
When you go to shrink the database the SQL Server engine will get to the LOB pages and it will need to figure out if the LOB row is a part of a row which still exists or not. In order to do this SQL Server will need to scan through the pages which make up the table looking for any rows which reference the page it is trying to delete.
When doing shrinks after deleing large amounts of LOB data SQL Server will generate large amounts of IO while figuring this out and the shrink operation will take an extremely long time. (Paul Randle talks more about it here.)
So the question that this person at SQL Saturday had was, how can I reclaim the space from my database within a reasonable time.
The solution that we came up with was actually pretty simple. Do the database deletion as normal. Then backup and restore the database. Then do the shrink, followed by rebuilding the clustered indexes in order to fix the fragmentation issue which the shrink will introduce.
This works for a pretty simple reason, because the PFS page shows that the LOB page isn’t allocated even though the page is full of data (you can verify this by looking at page 1 in file 1 in the sample database created by the script above). When the database engine backups up the database the database engine looks at the PFS pages to figure out which pages to back up. Because the PFS pages show that the pages are empty the database engine doesn’t bother to backup the pages, so when the pages are restored they are restored as blank pages. This means that after the restore the shrink operation can run without an issue.
In the case of this application there was a maintenance window which could be taken advantage of which would allow the backup and the restore to happen.
Another option which we came up with which would require less downtime involved using database mirroring. By configuring database mirroring (which is initialized via a backup and restore process giving us the same basic approach) and then failing over to the mirror we would end up in the same position. We could then shrink the database without issue (probably pausing database mirroring so that we didn’t have to wait for the second server to process the shrink in real time) and then fail back the database to the original server.
As geeky as it was, Mladen, Andre and I had a great time figuring this out, and the attendee had a great time watching us go through all the possible options as we excluded them one by one. And most importantly she got her problem solved.
So if you end up in this situation here’s a solution that will help you shrink the database so that you can reclaim the space that the LOB data pages are taking up without having to wait forever.