SQL Server with Mr. Denny

Jul 1 2015   10:00AM GMT

Splitting Files for SQL DW

Denny Cherry Denny Cherry Profile: Denny Cherry

Tags:
Azure
SQL Azure
SQL Server
Windows Azure

When doing data warehouse loads into SQL DW you’ll want to break your single large file that you extract into multiple smaller files in order to do loads into the SQL DW (via Azure Blob Storage) as fast as possible. There’s a few ways that you can do this. One way would be to handle this in your ETL that extracts the data from your source database. Depending on how you are extracting the data this can be anywhere from painful to REALLY, REALLY painful.

A much easier way is to just extract the data to a single text file, then split that single text file using a command line tool. Now you are probably asking yourself where you might get this handy command line tool. Well thankfully you can download it right here. While doing some work for a client on SQL DW we went ahead and put this tool together for you which as a command line tool you can easily build into your ETL process. The only requirements for using the tool is that you have .NET 4.0 installed on the machine which is running the tool. It runs very quickly and will create evenly sized files from your large source file. I used it recently on a ~16.7 Gig file and it was able to process the file in about 20 minutes (keep in mind I was using a slow 7200 RPM consumer hard drive. Smaller 2 Gig files processed in about a minute (again same slow disk). As time permits we’ll work on making the software run even faster.

Denny

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: