SQL Server Help

25 pts.
Tags:
INSERT SELECT statement
SQL Records
SQL Server
SQL Server administration
Stored Procedures
Hi all,
I am running into a little bit of a problem. I need to insert records with a stored procedure into a table based on the information selected. From the info selected, it decides what table to insert it into. 
Example:
Select top 1 SerialNo from dbo.Serial
--returns 'ASEEADL1234567'
based on the 8th letter from the end
so substring(reverse(SerialNo),8,1) = 'L'
so based on L, i need it to look into another table called SerialMfgYear, and based on if it is equal to L, then it inserts into the table that equals the value in the column SerialTableName.
SerialMfgYear SerialTableName
L Serial2010
so i need it to select the letter out of the select first and dynamically insert into correct table based on that letter.
I would like to run it on the whole database via a select * from dbo.Serial and then insert each row into the correct table.
I have a working way with cursor and a while loop but these are way too time consuming and don't seem to be the correct way to get this done.
Example of while loop:
create procedure test1        
as        
begin     
set nocount on      
declare @i int --iterator      
declare @iRwCnt int --rowcount      
 
set @i = 0 --initialize  
 
 
create table #tempSerial(ID int identity(1,1), Value varchar(20))    
 
insert into #tempSerial(Value)      
select top 100 SerialNo      
from dbo.Serial       
where substring(SerialNo, 1, 1) != 'B'    
and substring(reverse(SerialNo), 8, 1) between (select top 1 SerialMfgYear from dbo.SerialMfgYear order by SerialMfgYear)       
and (select top 1 SerialMfgYear from dbo.SerialMfgYear order by SerialMfgYear desc)      
set @iRwCnt = @@ROWCOUNT      
create clustered index idx_tmp on #tempSerial(ID) WITH FILLFACTOR = 100
 
declare @input as varchar(20)      
declare @year as varchar(1)        
declare @table as varchar(15)      
declare @sql as nvarchar(250)     
while @i <= @iRwCnt      
begin    
set @i = @i + 1       
select @input = Value from #tempSerial where ID = @i      
set @year = substring(reverse(@input), 8, 1)        
set @table = (select SerialTableName from dbo.SerialMfgYear where SerialMfgYear = @year)        
 
select @sql = 'insert into ' + @table + ' select * from dbo.Serial       
where @SerialNo not in (select SerialNo from ' + @table +') AND SerialNo = @SerialNo'        
EXEC sp_executesql @sql, N'@SerialNo nvarchar(15)', @SerialNo = @input       
end      
drop table #tempSerial      
end 
i am stuck :(


Software/Hardware used:
windows 7, SQL server management studio
ASKED: September 7, 2011  8:36 PM
UPDATED: March 31, 2012  5:02 PM

Answer Wiki

Thanks. We'll let you know when a new response is added.

Dynamic SQL is the only way that you’ll be able to do this. The code you have looks fine to me.

[kccrosser]
There are a couple of key things to do to speed this up – a LOT.

1. eliminate those “select top 1 …” queries. Those are causing a table scan and sort, where none are necessary. If you have to get the max/min values for the character, do it once outside the loop and save the values as variables.

2. Do large transactions. Explicitly BEGIN TRANSACTION and COMMIT TRANSACTION, but only do that after processing several hundred – perhaps thousands – of records. In the current code, since you aren’t explicitly defining transaction scopes, each INSERT is running as its own transaction, which adds a lot of overhead. Committing a batch of transactions at once is much more efficient, even if they span a few tables.

3. Since the key to the dbo.SerialMfgYear table is a single character, it is at most 36 rows of data (assuming there are no special characters, there are only 10 numeric and 26 alpha characters to index into the table). You can directly JOIN to that table in your query, or you can select from the table directly on the the character – don’t use those select top 1 … queries. However, since that table is so small, SQL Server will cache it after the first query, so we don’t need to get fancy here.

4. Assuming that you have a primary key on the 26-36 output tables that uses the Serial number, attempting to insert the record and handling a DUP VAL error is actually much faster than trying to check if it is there before the insert.

Cursor example:

<pre>declare @SerialNo varchar(255);
declare @SerialTable varchar(255);
declare @Year char(1);
declare @iTranCount int;
declare @iTranCommit int;
set @iTranCommit = 1000;
declare myCur cursor for
select SerialNo
from Serial with (NOLOCK)
order by substring(reverse(SerialNo),8,10); — this just gets us an ordered list,
— with all the ones going to a particular table in consecutive order, to improve the
— batch transactions
set @iTranCount = 0;
open myCur;
begin transaction;
fetch next from myCur into @SerialNo;
while @@FETCH_STATUS = 0
begin
set @Year = Substring(reverse(@SerialNo), 8, 1);
select @SerialTable = SerialTableName
from dbo.SerialMfgYear with (NOLOCK)
where SerialMfgYear = @year;
set @sql = ‘insert into ‘ + @table + ‘ select * from dbo.Serial with (NOLOCK) where SerialNo = ”’ + @SerialNo + ””;
— by wrapping the argument in apostrophes, don’t need to pass arguments
begin try
exec (@sql);
end try
begin catch
— should test for DUP VAL error here, but for now just assume it is a dup…
set @Year = @Year; — need to do any statement for valid syntax
end catch
set @iTranCount = @iTranCount + 1;
if @iTranCount % @iTranCommit = 0
begin
commit transaction; — close current transaction
begin transaction; — open a new one
end;
fetch next from myCur into @SerialNo;
end;
commit transaction; — don’t forget there might be some pending inserts left over
close myCur;
deallocate myCur;</pre>

Give this a try and see if it isn’t much faster.

You should probably be able to get this to process 20 milion rows in a couple of hours. I have a very complex data conversion process – much tougher than this one, with many intermediate lookups on several other tables, and it is running 4000 records/minute on a small VM.

By the way – note the “with (NOLOCK)” hint on the select statements. If this is going to run for a while and other processes might want to access any of these tables, this avoids escalating to a table locks on the various tables. It also tells SQL Server that you aren’t intending on using the select results to update the source tables. This can substantially speed up some queries.

Discuss This Question: 7  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • Cdizl
    I guess the only problem that I am running into is that I just ran the code and it only populated 700 rows in 20 minutes, this definitely will not work if I have 20 million records. It seems like it would be faster than this.
    25 pointsBadges:
    report
  • TomLiotta
    So the purpose is do a kind of conversion that breaks things out into multiple tables? Is this part of an attempt to restructure the database? Tom
    125,585 pointsBadges:
    report
  • TomLiotta
    I should have added that there might be better methods if this is part of a conversion or restructuring. A method that is best for a regularly repeating process is often inappropriate for a process that will run once or only a limited few times. Knowing something about the purpose may lead to different answers. Tom
    125,585 pointsBadges:
    report
  • Kccrosser
    TomLiotta has a good point. If this is a one-time conversion, the above approach isn't bad. However, if this is going to be run repeatedly, you need to think about how to improve the process. At the very least, you need to add some attribute to the source data to know whether that record has been processed or not. Without some indicator, the process will have to re-run on all 20 million records each time, even if there are only a few thousand new ones to be processed. Beware adding a simple "flag" to a large table (i.e., a yes/no type of value). Most database engines don't like to use indexes on columns that are "non-discriminant" (have few distinct values). When I have had to deal with this kind of situation, I usually add a numeric column and a sequence. Set new values to 0 or -1 as an indicator of records to be processed, and then set the processed records to the sequence number when done. You can then use a compound index on the new flag column + the serial number and get the records to be processed with "select serialno ... where newdata = 0". This is extremely efficient with a two-column index - it can actually retrieve the serialno value from the index without even touching the actual data records.
    3,830 pointsBadges:
    report
  • Cdizl
    [...] One member needed a little SQL server help, so Kccrosser and TomLiotta came to [...]
    0 pointsBadges:
    report
  • Cdizl
    Sorry guys I am not getting the emails when someone comments on this for some reason. Yes this is an attempt to restructure our database before the number of records gets out of hand. For this first time it will be running over the 20 million but after that it should run daily on only new records added to the serial table. Ill try what has been posted and keep you guys updated. Thanks to all!
    25 pointsBadges:
    report
  • Cdizl
    [...] 7. Kccrosser, mrdenny, and TomLiotta offer some SQL Server help. [...]
    0 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following