Today’s keynote started with Rick giving us some important news. He first announced that the SQL Rally website has gone online for the first SQL Rally event as well as the first SQL Rally pre-con speakers. The date and location of the next SQL PASS summit has also been announced as October 10th-14th.
After Rick gave a lucky winner an XBOX Kinect we were treated to a presentation by David Dewitt on how the optimizer works within the SQL Server database engine. Now there isn’t really anything that I can post about David’s presentation other than it was awesome to get this sort of insight into the query optimizer other than the fact that optimizer’s are very hard to write. One really cool thing that David did was to accept questions from the audience both online and in real life via email which were culled by people from PASS HQ and were passed to him after the power point was completed.
After listening to David’s keynote my head hurts. Tonight there will be more singing and drinking at Bush Garden to make the pain go away.
Today was the second day of the SQL PASS summit. The first item of business was to award The passion award to Wendy Pastrick (blog | twitter) . After the awesome award was given out by Bill Graziano, Bill talked about the budget stuff. Bill also promised to wear a kilt on Wednesday of next year, something which I’ve done for the first time this year supporting Women In Technology (wit).
Today was also the first time that I used my iPad to blog for the day. I went totally laptop free for the day to see how well it works for me.
Quentin Clark was our next keynote speaker for today. Quentin started by talking about the high level features around the new Denali product (also called SQL 11). SQL Server Danali will be bringing in an “Always On” concept which will allow ever database to have the ultra high availability requirements that many database applications have. This new “Always On” solution doesn’t require any sort of shared storage to work. It will replicate data to multiple servers with the ability of having some servers available for automatic failover ads well as having some servers available for reading the data as well as for backup so you don’t have to take the backups from the active server (not in the current CTP). The awesome part of the setup for this feature doesn’t require that you copy backups around. It will handle the moving of those backups automatically. I have a lot of questions about how this happens and how it effects the system cache.
On the T-SQL side we now have paging, sequences, and much better error handling. Paging allows you to specify the size of each page to display and which page to display with the database engine doing the hard work. Sequences and some more flexibility into the engine by allowing the developer to use the same sequence numbers for multiple tables.
The new filetable feature is a feature which I really don’t see the use for. This feature allows you to create a table which is exposed as a network share so that applications can write files into the database without any application changes. Under the hood it uses filestream feature to handle the IO. We have spent years keeping this blob data out of the database because of the performance issues of having TB of data within the database. Some questions which need answering about the filestream is how does this affect the buffer pool when pulling files in and out of the database engine? Is there a point at which this feature can’t handle a file load, in other words how many files can I shove into a single filetable table before it’ll slow down performance?
(Sorry this was so late coming out, I’m such a slacker.)
Ted Kummert was our keynote speaker for this mornings keynotes. Before Ted come onstage there was a very interesting video which gave us some history of the SQL Server product. The thing that I found the most astonishing was that SQL Server 6.0 and 6.5 was a $100M a year business for Microsoft. Today the SQL Server product is a $3B a year business. That is a massive amount of growth over just 12-13 years or so.
Parallel Data Warehouse
The first announcement was that the Parallel Data Warehouse (PDW) has been released, and the first appliances will be shipped from Dell and HP starting in December of 2011. When you create a database without specifying where the database is stored. When you create a table you specify the column which is used to distribute the data across the servers, as well as how the table should be partitioned. The admin console for the PDW is a web based system which allows you to view how all the cores are running and really see what is happening with the system, this is in addition to the SQL Server Management Studio of course.
The next thing which we saw was Yahoo telling us about their SSAS system which they have. It processes 3.5 billion rows a day which is 1.2 TB of new data per day being processed into a 12 TB SSAS cube. Truly an amazing scale.
Microsoft has announced the new project “Atlanta” which is a new self service configuration assessment service for Microsoft SQL Server which will hopefully reduce the number of support calls. Atlanta is a Azure based system which analyzes your system looking for problems with the SQL Server configuration. The Atlanta system will scan the system configuration configuration and upload the configuration to the Atlanta application nightly. Atlanta will then process the configuration and show the knowledge base articles which show how to resolve the potential issues. When systems settings are changed Atlanta will track those changes to allow the DBA to more quickly find changes which have been made to the SQL Servers.
SSRS is now available as an Azure product where you can simply upload your SSRS reports to the Azure SSRS website and it’ll get its information from your SQL Azure databases. These reports are edited and published using the same BIDS application which we already use today.
The new Azure Marketplace allows you do purchase a variety of known good data from a variety of sources such as local weather. The data is available to be imported into SSRS reports via a data source using an ODATA URL from SQL Server Reporting Services, either in the cloud or a local SSRS report. This published data can then be used side by side with the business data.
Denali (aka SQL 11)
The new Denali CTP (the first public one) has been released to the public. It will be shown during tomorrows keynote, so today was just the announcement. You can download the package today from the Microsoft website. I’m sure there will be much more information tomorrow about the product tomorrow. Be sure to watch Twitter (specifically the #sqlpass hash tag) to watch the information in real time.
Check back tomorrow for more keynote information.
Yesterday was my first day at the SQL PASS summit. Normally I don’t have anything official to do on that day, however this year was different. This year I was allow to present a pre-con at the summit. The pre-con was a heck of a lot of work to put together, but it was a heck of a lot of fun to present for the entire day. There were about 80 people in the room for the day and it seems that everyone had a great time, and pretty much everyone I talked to enjoyed the session and learned something.
There were a ton of questions being asked, and I was even able to almost all of them. Hopefully I’m able to do another one next year at the summit.
After the pre-con day was done I was suckered into being a part Quiz Bowl game at the welcome reception party. We did very poorly as far as the questions go, but we all had a great time with the game. Hopefully they make the mistake of asking me to come do the game again.
I’ll do my best to post something on my blog daily (the morning after probably).
If you aren’t at the conference, and you are a SQL Server professional, and you aren’t at the summit, then you are missing the biggest, best SQL Server conference available.
This week was my first dev connections conference. The conference was a bit bigger than I thought it would be, and I had a blast. The couple of sessions that I went to where excellent, but I wouldn’t expect much else from Paul Randal (Blog | Twitter), Kimberly Tripp (Blog | Twitter), and Buck Woody (Blog | Twitter). The people that attended by sessions were great, although they were a little quiet and didn’t ask many questions (I love questions during my sessions).
I made some new friends, some from Microsoft and some that were attendees. Many of which I’ll hopefully be seeing at future events.
It was a great conference, but I’m exhausted, and the PASS conference is coming up next week. I’ve got about 2 days to relax before the insanity of the SQL PASS conference starts up. If you are going, I’ll see you there. If not you should.
So after I upgraded from VMware’s vSphere 4.0 to 4.1 I ran across an awesome “feature”. Apparently for security purposes VMware’s vSphere decided that after the upgrade no one can log into the server except for root, and the vpxuser (the account that the vCenter server uses to log into the hosts). Given that root can’t log into the server remotly that presents with a little bit or a problem as without going to the data center (or using a remote KVM of some sort) you have no access to the physical console.
Fixing this is actually a rather easy fix. Log into the server’s console as root, then edit the /etc/security/access.conf and add a new line for each user that needs access.
Now if you have several users that need access to the physical hosts, then create a group in unix, and add this group to the access.conf file. Each new line should look something like…
In the case of my account the line looks something like this.
If you wanted to use a group, then the line is similar.
Have fun fixing this little one if you’ve got a lot of VMware hosts to fix.
I was working on performance tuning the data deletion process that I use to remove data from our database. We delete massive amounts of data on a daily basis, so our data deletion process is pretty important to us. As we’ve grown our customer base the amount of data to be deleted every day has also grown, and the amount of time to delete that data has gone up as well.
We it has started to take stupid amounts of time to delete the data, so I started digging into the data deletion procedures.
The first thing I looked at was the actual delete statements. These seams ok, the actual deletes were happening very quickly (we process deletes in batches of 1000 records per batch to minimize locking of data). So next I looked at the part of the code where we select the records to be deleted. Looking at the execution plan, everything looked ok.
But this little chunk of code took about 50 minutes to run. Pretty bad when only returning 1000 numbers back from the database.
SELECT TOP (@BatchSize) a.PolicyIncidentId FROM PolicyIncident a WITH (NOLOCK) JOIN #ComputersToProcess ComputersToProcess ON a.ComputerId = ComputersToProcess.ComputerId WHERE CaptureTimestamp < ComputersToProcess.StartDeleteAt
The first thing that I did was put a primary key on the @ComputersToProcess table variable. That turned the table scan into a Clustered Index Scan, but didn’t do anything for performance.
The next thing I did was switch the table variable to a temp table (without a primary key). This really didn’t do anything to speed up the process as there is still no statistics on the data. However this time the execution plan actually shows you that there’s no statistic on the temp table.
Now, I didn’t want to put at non-clustered index on the table keeping the table as a heap, and a clustered index that wasn’t a primary key wasn’t going be any more effective than a primary key, so I put a primary key on the table. While the query cost percentage went up from 2% to 7% the actual run time went down from 50 minutes to just 1 second.
Now I didn’t make any other code changes to the procedures, just changing from the table variable to the temp table, and adding a primary key and this one little three line query went from an hour to a second. Its amazing how much such a small change can make things run smoother.
Now obviously this isn’t going to fix every problem. But in my case I’m putting a little over 190k rows into the table variable (now temp table) and this is just to much for the table variable to take. Keep in mind that with table variables the SQL Server has statistics, but it assumes only a single row per temp table, no matter how much data is actually in the table variable.
Last year the SQL PASS conference had an official bloggers table setup in the keynotes so that some people who were tapped as the official bloggers would blog and use twitter in real time during the keynotes. Somehow I got on this list last year. Apparently I didn’t show up hung-over enough and I was asked to sit at the table again.
So once again I’ll be blogging and tweeting live from the keynotes at PASS for your education/entertainment/shear horror.
On Tuesday I will be cutting out a little early to get setup for my spotlight session which starts right after the keynotes as I have a couple of different VMs that I need to power up for the demo.
A little while ago I went to the SSWUG office and recorded some sessions for the upcoming SSWUG vConference. One additional thing which I recorded was a short little interview video which they’ve already got posted up on Facebook.
Sometimes a project comes around that requires knowledge beyond the normal SQL Server knowledge. This is where having the extra knowledge can really make you standout. Recently I was talking to Allen Kinsel (blog | twitter) about IPv6 on a Windows Cluster which was being blocked by Symantic which was causing all sorts of problem.
I then mentioned that this would create all sorts of productions for Direct Access as it requires IPv6 to function. Which let to a quick back and forth about what Direct Access was and how it worked. Suddenly Wendy Pastrick (blog | twitter) comes into the conversation asking specifically about Direct Access. Apparently she has a new client which has many remote SQL instances installed on peoples laptops and those laptop use merge replication to sync up data with the central database. This is a perfect situation for Direct Access to be deployed.
What’s Direct Access?
Direct Access is a feature of Windows 2008 R2 and Windows 7 where the client computers can create an automatic SSL protected connection to the company network on demand without the user needing to initiate the connection.
How can it help?
The current solution that the company has to deploy requires that the user initiate a VPN connection then the user would need to start the SQL Replication job to begin the data transfer (or have the distribution agent setup to try over and over until it succeeds). Using Direct Access when the SQL Server attempts to connect to the distributor (I’m assuming a pull subscription here) the computer will see the attempt to request access to an internal server, so it’ll then connect to the direct access server effectively making a VPN connection, which would then allow the data transfer to complete without the user even knowing that the connection was needed.
Obviously Direct Access isn’t a feature that most DBAs would know about. Now that you know about this feature you can pitch it if you are in the need for a distributed merge replication solution that will allow for automatic replication of data without the remote user knowing that the replication needs to take place.