So the other day I got a ticket opened by my companies Managed Service Provider. There was a backup failure on one of our databases. The response from the hosting provider was:
This failed due to a "Error: 3041, Severity: 16, State: 1." error. This is because you must perform a full database backup before you back up the transaction log for a database in SQL Server. If you require any further assistance or have any queries regarding the above please do not hesitate to contact us at any time or update this ticket.
Now the database in question has been on the server for months, if not longer. And quickly looking at the database properties would tell you that a full database backup has been done, so this response was pretty much BS. Looking at the ERRORLOG for that time shows three error messages.
Error: 3041, Severity: 16, State: 1. BACKUP failed to complete the command BACKUP DATABASE prod_phreesia_print. Check the backup application log for detailed messages. Error: 18210, Severity: 16, State: 1. BackupMedium::ReportIoError: write failure on backup device '920a8f3f-be50-4e54-9b0f-f1bfeddea12a'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.). Error: 18210, Severity: 16, State: 1. BackupVirtualDeviceFile::RequestDurableMedia: Flush failure on backup device '920a8f3f-be50-4e54-9b0f-f1bfeddea12a'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).
Looking at these three error messages (which all were logged during the same second) gives a pretty good idea as to what happened. The tape backup system threw an error causing the full or differential backup to fail. Without looking through the system any deep I know that this had to be either the full or differential backup that threw the error because the full and differential backups go directly to tape while the log backups go to disk. Since the backup device listed in a guid, that tells me that it is a backup doing directly to tape via the tape backup solution. Since the tape backups are managed by the MSP, one would think they would be able to quickly match the two errors together.
But apparently I’m expecting to much from a company which my company pays over $40k a month to.
Moving data into the cloud is a huge thing with Microsoft and Amazon (among others). However for those in countries that aren’t the US there are some major road blocks to putting your data into the cloud.
The first issue is that when the cloud provider is a US company (which most of them are at the moment) then your data falls under US law, as well as any laws which exist if the data is being stored in a non-US data center. So basically if you are a non-US company and you use these services to host your data, the US government can issue a National Security letter to the cloud provider and get a copy of all the data that is hosted by the US company. (This effects those of us who live in the US, but there’s nothing we can do about it as we are already here.)
Additionally there’s some major gray area in US law at the moment. This has to do with the abandoned property laws. These laws were put together in the 90s or so when leaving data on a server for long periods of time was abnormal. These laws basically say that if data has been hosted on a 3rd party server (the cloud in this case) for over 90 days this data is abandoned and the US Government can request access to the data, without the need for a search warrant. Seeing as the whole point of using the cloud is to host your company data on a third party server these laws become a pretty big stumbling block.
Some countries have laws about where the data for their citizens can be stored. Check with your national government to see if it is even legal for you to store data in another country. Don’t forget that when picking your cloud provider, even if the data center you pick to host your data is within your country, there wouldn’t be anything stopping your cloud provider from replicating your data to another country for backup or high availability / disaster recovery.
There are lots of other laws to be informed about before moving your data into the cloud. Be sure to do your research before moving your application to the cloud.
Happy new year to everyone. May all your data storage, and query tuning dreams come true in 2011.
We are currently running on an EMC SAN. Did I hear you correctly, we should have databases, backups, temp, and logs on the same LUN? Thanks.
No, you want to put them on different LUNs. IBM is currently recommending that on the XIV array that everything can be put on a single LUN as the LUN is spread over all the disks in the array. For an EMC array you’ll want to have separate LUNs for each.
So I’m sitting at home reading an article on the NY Times website about the current travel mess in New England, and a statement within the article really scared me (about 2/3s of the way down).
“He wonders why during times like these, airlines, which are now profitable, cannot simply rent additional computing power and hire temporary customer-service workers.”
Now the person who is wondering this isn’t just your average guy on the street, he is Tom Groenfeldt who “publishes a blog on financial technology” (http://www.techandfinance.com/) so he should know a little something about technology, or at least I would hope so.
Computer processing power can’t just be rented, and as soon as you sign the contract everything gets faster. If you need more web servers, the site needs to be deployed to new servers, those servers need to be put into the load balancer. If you did go to one of the cloud sites like Amazon’s EC2, or Microsoft’s Azure and deployed your application to their web servers, you now need to reconfigure your network to allow these outside network connections in. But your back end databases aren’t going to get any faster, you just have more web servers.
Increasing the capacity of your database engine is going to require a little more than “poof its faster”. New hardware needs to be brought in, and the systems migrated to this new hardware. Before this can happen an OS of some sort needs to be installed, the system needs to be tested to ensure that there are no problems with the new hardware, etc. Now if this new hardware needs to be brought in, how is it going to get there? The airlines are canceling flights across the country, making it pretty tough to fly large computers around at the drop of a hat. Do you’ll need to truck that new equipment from where ever it is to where ever you need it. Assuming that you only need to go half way across the country that is still a 2-3 day drive (assuming that the roads are drivable).
As to the other half of his statement “hire temporary customer-service workers”, sure no problem. Do you know 500 people (probably more than 500 are needed, after all there are something like 10 million people that need to talk to customer service at the moment) that they can hire at the drop of a hat, that are located where their call center is? Does their call center have somewhere for 500 more people to sit? So we’ll have them all work at home. So we need to issue them computers, and phones, and they all need to have high speed internet. Assume for a moment that they all have high speed internet and a computer they need to either be given an office phone which will use their high speed connection so they can take phone calls, or they need to be given the companies Voice Over IP (VOIP) soft phone software which will run on their computer, and they will need a headset (something else which now needs to be purchased and issued, so we probably need to wait for these to be shipped in from somewhere).
Not to mention the little thing about these people need to be interviewed, and background checks need to be run since these people will be taking peoples credit card numbers, etc we’ll probably want to make sure that they aren’t going to steal the customers credit card information.
Needless to say asinine statements like the one above don’t serve anyone’s interest except to cause people to be pissed off for no reason. This person apparently doesn’t have a clue about how technology really works (I skimmed a couple of pages on his blog and I didn’t see anything about technology on there at all). He should stop talking about technology at all, unless he actually understands how technology works, and until he has worked with technology as a technology professional.
I’m happy to say that I officially can’t make a single change to my book “Securing SQL Server“. The publisher made the last little changes and sent it to the printer a couple of days ago. As I understand it, the book should be shipping out to Amazon, me, etc. around February 1st, 2011 slightly ahead of schedule. It has been a very interesting learning experience going through the book writing process, now that the entire process is complete.
The book lists at $49.95, but Amazon currently has it listed for $32.97 for pre-order. I don’t know if the book will be available for the Kindle or not, but hopefully it will be. Even if it is, feel free to order the hard copy and the digital.
Once the book ships, if you bring the hard copy to any event that I’m at I’ll be happy to sign your copy for you.
I will have a few copies to give away, which I’ll be doing at select events over the year (I’ll say which ones once I know which ones I’ll be giving the book away at).
Hopefully you pick up a copy of the book, and hopefully you find the book useful.
Back at Tech Ed 2010 a good friend Robert Cain (blog | twitter) and I were wondering the exhibit hall floor and we got stopped by the guys from the Deep Fried Bytes pod cast and they asked us if we’d be willing to sit down a record a pod cast. Of course we said sure, we’d be happy to, even though we had no idea what they would want to talk about.
Do you have any affordable recommendations (mid-size company) for DR to a secondary site, about 11 TB of storage?
It depends on how you are going to handle your data replication. If you are going to use native array based replication then you’ll need to either use array’s data replication options or look at EMC’s recover point.
If however you are going to handle the replication outside of the array:
- DFS for file servers
- Mirroring, log shipping, etc. for SQL Server
- Native Exchange replication for mail data
Then you’ve got a few options available to you.
Dell has a storage option called EquilLogic which can serve as a great storage solution for a DR environment where you need a lots of storage, but you need less IO than you need in your production environment (I’m not saying that EquilLogic is a slow solution, but there are solutions which are much faster but they will cost you more). You’ll probably be able to start with a single shelf, then add another as needed.
EMC also has a smaller array option called the AX4. The AX4 array is based on the same concepts as the CX4 array which is a much more expensive array, but the CX4 has more features available. The AX4 runs on SATA disks, and supports up to 60 disks which can be up to 1TB each. For an 11 TB solution I’d recommend at least 15 disks (one full shelf). If performance ever becomes an issue you simply buy another shelf and move some of the LUNs to the new shelf.
There are a couple of other options which you can look at from smaller companies, but when it comes to storage I prefer to stick with the bigger companies. Yes you have to pay a little more, but you know that EMC and Dell will be around in a few years, and that the technologies which they are using have been around for quite a while.
This weekend I was migrating some databases from one server to another. As these were rather large databases (about 2 TB in size) and I didn’t have the clusters setup on a SAN (using DAS instead) I needed to setup log shipping from one cluster to another so that the downtime when moving was kept to a minimum.
But a little problem came up, when I restored one database I got a torn page error when trying to roll the logs forward. My first thought was ok, maybe there was a network hiccup or a problem with the storage on the destination server. So I restored the database again, and got a torn page error again. However the second time the error was on a different page. This is very important to know, because the torn page had moved from one page (1:1396373) to another (1:24815312) and I used the same database backup both times, this told me something very important. That the torn page wasn’t caused by the source database or the database backup. If it was then the page which was torn would have been the same both times.
Now you’ll notice that the torn page was detected when the transaction logs were being rolled forward, not while the actual restore was happening. The reason for this is that torn pages aren’t checked for during a full backup restore. Torn pages are only detected during two specific operations, when the pages are read into the buffer pool (my database is in NORECOVERY so that isn’t going to happen), and when the pages are changed (in this case they are changed when the transaction log rolls forward). So the only other way I would have been able to verify that there wasn’t a torn page would have been to bring the database up in STANDYBY mode and to a scan of every index (both clustered and non-clustered) which would have caused all the pages to be read into the buffer pool and caused the torn page message to be thrown. But this can’t happen in this case because I’m log shipping form SQL Server 2005 to SQL Server 2008 and the database can’t be brought up in STANDYBY mode because a database upgrade needs to be done. So checking for additional torn pages requires taking a fully rolled forward database out of NORECOVERY and putting it into RECOVERY, querying every table on every index so that every page is touched (by using index hints) to ensure that the torn page isn’t there.
This not so quick bit of testing (the database backup in question is about 300 Gigs) told me that there was a problem with either the HBAs or the DAS that the cluster was using. To further test we failed the SQL Cluster over to the other node and ran the restore again which didn’t cause any torn page error messages when rolling the logs forward. This would appear to put the DAS in the clear, and put the problem either on the other node of the cluster, the HBAs in that node, cables, etc. Something which only that server uses to connect to the storage.
If you ever setup log shipping and you get torn pages let this be a lesson, don’t assume that there is a problem on the source database. It might be a problem on the destination database instead.
I just received notice today that I received a passing score on the first of the MCM exams. I must say, that the few weeks have been extremely nerve racking waiting for the results. I took the exam the day before Thanksgiving (which was awesome since the testing center is an hour away normally) and it took until today to get the results.
Now that the first exam is out of the way, I can start worrying about the lab portion. I’ve got a few weeks before I can take the exam, as not all the testing centers can offer the lab portion, so I’ll have to wait until my center has the exam ready. But when it does I’ll be there taking that beta exam and praying for the best.
If you are planning on taking the MCM exams when they come out, obviously I can’t tell you what is on the exam (you have to pinky swear not to tell before you can take it) but I will tell you it is by far the hardest Microsoft Exam I’ve taken. Even harder I think than the Business Intelligence exams which were very tough.
I’m off to start studying for the lab, which will be tough as I have no idea what is going to be covered, or what I’ll be asked to do.