All to often we end up with duplicate rows in a table. The best way to keep duplicate rows out of the database is to not let them in. But assume that they are there. This bit of sample code shows how to delete those duplicate rows quickly and easily in a single statement. No temp tables required (I use a temp table to put the data into for example purposes). This code is for SQL 2005 and up as it uses some features which were introduced in SQL Server 2005. SQL Server 2000 would require a totally different technique.
CREATE TABLE #DuplicateRows /*Create a new table*/
INSERT INTO #DuplicateRows /*Load up duplicate rows*/
FROM #DuplicateRows; /*Check that the data is actually hosed*/
WITH Cleaning AS (SELECT ROW_NUMBER() OVER(ORDER BY Col1, Col2, Col3) as row,
DELETE FROM Cleaning /*Delete the rows which are duplicates*/
WHERE Row NOT IN (SELECT row FROM (SELECT Col1, Col2, Col3, MIN(row) row
FROM Cleaning a
GROUP BY Col1, Col2, Col3) b)
SELECT * /*Check the table to see that it is clean*/
DROP TABLE #DuplicateRows /*Clean up the table*/
Hopefully you find this code useful.
Yes, for crying out loud yes.
Every server that can access the Internet or be access from the Internet, or that can be accessed from a computer that can access the Internet should have an anti-virus on it. Preferably a corporate wide solution like Trend Micro, McAfee, Norton, etc. so that the server reports back to a central server to make it easier to find out if a machine has a problem.
Next comes what should be scanned. I prefer to exclude the mdf, ndf, and ldf files. I don’t like to exclude the entire folder as this creates a hiding place where a virus could stick infected files. If possible have it exclude the mdf, ndf and ldf files from old the correct folders only. Even if a virus scanner wanted to scan the database files it wouldn’t be able to as the files are locked open by the SQL Server so that nothing else can access them. By not excluding the files all you are doing is throwing alerts to the monitoring server that files couldn’t be scanned.
Odds are a full scan doesn’t need to be done against the server all that often as the files on the hard drive of the server aren’t going to change all that often. Any virus that comes in from the network should be caught by the real time engine that is running at the time. You will want to do a full scan every once and a while (every couple of weeks or so) incase something came in over the network was saved and setup to launch at the next reboot but wasn’t yet in the virus definition file.
All to often I see people online asking some sort of question about connecting to their CoLo’ed SQL Server and they connect directly over the Internet. This is nuts people. If you can access your SQL Server via Management Studio from anywhere on the Internet so can people who would love to break into your SQL Server and use the machine for their own uses.
SQL Servers shouldn’t every be directly accessible from the Internet. Even if you have to use public IPs to host the machines, make sure that there is a firewall setup between that server and the public Internet so that no one has any sort of direct access to the machine from outside of the data center.
How do you manage the SQL Server in this case? You use the router’s built in functions to setup a point to point VPN with your office router so that you can securely communicate with the servers in the CoLo without sending that data in plain text over the Internet.
For that matter while you are locking down the SQL Server, suggest that the Web Servers be locked down as well. The only ports that they should have open are 80 and 443 unless you are running streaming servers, or known FTP servers.
If your servers have been sitting exposed on the public Internet then I highly suggest that you install an Anti-virus on them and check for Viruses, malware, etc that’s doing stuff you don’t want it to be doing.
Several years ago I was doing some work for a company that had Windows 2003 servers sitting directly on the Internet without an Anti-Virus and with no firewall. When I got to the machines and took them off the Internet for cleaning there were over 200 viruses on the machines that they had no idea were on there. There complaint was that the machines were running slow, and network costs kept going up. God only knows what sort of network traffic these viruses were generating as they did what ever it was they were trying to do.
Microsoft SQL Server 2005 and 2008 Standard edition will be getting the Lock Pages in Memory feature that SQL Server Enterprise Edition has had starting in SQL Server 2005. This was announced by Bob Ward via the CSS Blog in his post “SQL Server, Locked Pages, and Standard SKU…“. Per Bob’s post a CU will be released for SQL 2005 SP3 and SQL 2008 SP1 which will allow for a trace flag to be used to turn this feature on.
On behalf of the users I’d like to thank Bob and the rest of the SQL Server team for being able to get this into the product.
On behalf of the developer team, I emplore you to not turn this on for no reason. Only use this feature once you understand what this feature does and in the correct places.
The Locked Pages flag bascially tells the SQL Server that if it is told to flush RAM to disk to ignore it. If the setting is enabled then SQL doesn’t flush to disk. If you find that your SQL Server is flushing to disk, don’t just enable the flag and move on. I emplore you to contact CSS and figure out why SQL is being told to flush to disk. This is the only way the bug will be fixed. Once the issue has been reported to Microsoft and they have the information they need enable the flag until the bug is fixed. Then install the patch to fix the bug, disable the flag and you’ll be fine.
Because of the fact that this is how bugs are found and fixed I hope that this is a CSS only CU which will require that users contact CSS before they can get the fix.
There’s a new SQL Server 2008 MCTS Certification book on the market. It is “The Real MCTS SQL Server 2008 Exam 70-432 Database Implementation and Maintenance“. I normally don’t post book releases as there are a lot of new books coming out these days. However I published this one because I wrote part of it. The book is a complitation of several authers work, and I wrote three of the chapters in the book.
So if you are looking to get your MCTS Certification I’d recommend that you check it out.
This is my first book credit (though not listed on Amazon), and there will be a second one from the same publisher coming out soon.
Probably one of the least fun things about being a DBA is having to deal with the licensing of SQL Server. Continued »
One of my developers came to me asking me to add more RAM to the c++ build server. However the machine was already at 2 Gigs so I wasn’t sure if adding more RAM wouldn’t help. Turns out that VS 2005 doesn’t support AWE so adding more RAM wasn’t going to be of much help. Until I found a post by Steve Harman entitled Hacking Visual Studio to Use More Than 2Gigabytes of Memory.
After making those changes it seamed to work.
For those of you who know me, or have heard me talk at a Code Camp in the last year, you’ve heard me talk about a data center migration that I want to do from Rackspace in Texas to our own equipment in the LA area. Well that day has finely come.
Our current environment has served us well, but we have outgrown the services that Rackspace can offer us, and we have purchased our own production environment. This isn’t any rinky dink environment either. We are starting out with a fully redundant, highly available environment which can be scaled by simply deploying more VMs, and in the event that the VMware hardware is over tasked by simply plugging another VMware server into the mix, and shifting the VMs from one node of the cluster to another.
We are very proud of our new environment, so I figured that I’d give you some of the tech specs of it (yeah, I’m totally bragging here).
On the storage side of things we’ve got an EMC CX4-240 with ~35TB of storage laid out in three tiers. This is connected via multiple 4 Gig fibre cables to a pair of Cisco fibre switches. Each fibre switch is connected to each of the SAN attached servers.
We went with Dell servers (I would have preferred HP servers, but I was overruled).
The SQL Servers and the VMware servers are identical. Quad chip, quad core servers each with 64 Gigs of RAM. Each pair will be clustered for High Availability. The VMware servers will look a little like they puked cables out of the back. Because of all the various subnets and to ensure that each subnet is connected to each of the redundant network switches each of the VMware ESX servers will have 11 Ethernet cables, and 2 fibre cables coming out of the back.
The VMware vCenter services are running on a little single chip quad core server. This is the only part of the system which isn’t redundant, but ESX can run fine for up to 14 days without the License server running, and since this machine has a 4 hour turn around on parts we’ll be fine if the machine dies.
The file servers which host the screenshots, emails, etc which have been captured by our application and will be served to the website upon request are a pair of dual chip, quad core servers also clustered for high availability.
All the servers are SAN attached via the fibre and all data will be saved on the SAN.
Our current environment is much smaller. A single SQL Server, three web servers, and two file servers. The only redundant pieces are the fibre cables from the SQL Server to the SAN, and the fact that we have three web servers. However if the newer web server goes out in the middle of the day, the other two will choke at this point.
Rackspace has been pretty good to us over the years. It just wasn’t cost effective for us to purchase this level of hardware before now, and Rackspace was able to provide us with a good service level for a reasonable price. But at this point, because of the amount of hardware we were looking to move into, and the amount of bandwidth we are going to be using it simply became more cost effective for us to host the systems at a local CoLo.
The main reason that I’m telling everyone this is that if you have been trying to find me for the last two weeks or so this is why I can’t be found. I’ve been spending pretty much every waking moment this together and getting it all setup so that we can migrate over to it.
Needless to say its an awesome project. How many people get the chance to build a new data center and design it the way they want to from scratch. Pretty much no one. Data centers usually grow from a small config of a server or two in a sporadic way, and they are inherited from one person to the next. But this time I get to design everything they way I want to from the grown up. It’s going to be a blast.
I haven’t made a decision on putting databases in the cloud. I think Amazon and now Microsoft have the write idea for cloud databases. Give people lots of options, use named value pars (AmazonDB or the old school Azure database) when it makes sence and use full blown RDBMS when it makes sence.
But the big question that I have (besides pricing) is how does all this fit into the overall picture for my company or my client.
Do I see a lot of large enterprises moving large parts of there environment into the Cloud? Probably not.
Do I see the small/medium business moving customer facing applications to the Cloud? Possibly, it’s going to depend on the application and the business model.
Do I see the cloud being a stepping stone in a eventual path to building your own data center? Very much so.
Why don’t I see large enterprises moving data into the cloud? Mainly control and compliance. Large companies (and even larger medium sized companies) want to control everything about there data. They also need to be able to ensure that no one who isn’t authorized to view the data can’t view the data. The easiest way to do this is to own the machines that have the data. Large companies also have to have DR plans. Those plans usually can’t depend on some other company doing “Yes, it will be back up.”.
I said above that I see the cloud being a stepping stone to getting your own servers and data center. The path that I see in my mind is for the small to medium business who can’t afford to setup their own servers onsite or at a colo. For them cloud computing is a great first step to let them get started and see where the application goes. If nothing happens, then there isn’t much capitol lost. If it grows like crazy then everything scales nicely (not yet sure how well and automatically the databases scale). This gives the SMB the ability to judge where the business is going to go and how fast it’s going to get there.
Some applications may be able to stay in the cloud forever. Either they persist a lot of the data at the client, or they simply never outgrow the cloud. On the otherhand I see a lot of applications going from running in the cloud to moving to an MSP (Managed Services Provider) such as Rackspace, MaximumASP, etc. These guys offer the benifits of dedicated hardware, without having to shell out the massive amounts of cash up front. Over time however it becomes cost ineffective to continue at an MSP, and buying your own hardware simply becomes the correct thing to do. The trick is knowing when this is the case, so that you aren’t spending a lot of money at the MSP.
Now for those that were paying attention, you’ll noticed that I skipped the point above about the SMB moving somethings into the cloud. I think this falls into both answers above. Some things will make sense to host up in the cloud, other things won’t.
What do you think will happen to the cloud? Where do you see it being really useful? Will you be moving applications into the cloud at your current company, at a future company?
These are questions that you’ll need to ask yourself at some point, so why not now? In these times of rapid change to the IT world (and the world in general) don’t be afraid to change your answers to these questions.
Personally I don’t get most of the social networking sites / products / whatever you want to call them. Continued »