Back in the old days of SQL Server 2012, we’d always tell people that creating indexes was half art and half science. And that’s true, but we’d create some non-clustered indexes (designed correctly of course), and the system would magically go faster. These days, in modern versions of SQL Server (or SQL Database if you’re using Azure) it can be a lot more art than science.
You have to know the data, at least somewhat, to create truly useful indexes. You need to know what the application is going to do with the data (you don’t need to know what the application looks like, just what the queries that it runs against the database). If you have a general idea of the queries that are hitting the server that’ll point you to the kinds of indexes that should be created.
If your application is doing summary reports on large amounts of data, sales by month over the last ten years (as an example) then a Clustered ColumnStore index will probably be your best option for the table.
If the application is doing summary reports on small amounts of data, more of a data dump than a summary report, then a ColumnStore index probably isn’t the best option. Instead, a non-clustered index would probably serve the application, and therefore performance, better.
It just depends on what the application needs and what it does. As a consultant, I have to drop into the company, review the application and quickly figure out what kind of application it is, and what kind of indexes need to be added (as well as how to define the indexes). It’s a hard job, but it’s a fun job, and I’m thrilled that I get the chance to do it.
But when push comes to shove, indexes are still an art form to some degree; and I think they always will be to some extent. We have some great tools like pssdiag and the query store that can help with this, but when you come down to it, it’s still people looking at a server and making it faster.
In just a couple of short weeks, I’ll be presenting an all day session on Microsoft Azure. The session is titled “Designing Azure Infrastructure for Data
Platform Projects” and will be held on the 21st of May, 2018. This session will cover all the components of the Azure platform including touching on the database platforms.
Hopefully everyone who’s interested in Azure can come out and see this great session on the various components of Azure. The session will be in Lingon, Germany on the 21st, then the SQL Grillen conference will be the next day on the 22nd. I’ll see everyone there in Lingen.
This question actually comes up a lot. With the new managed storage it’s less of an issue as Managed storage takes care of the high availability for you
when you are building your storage accounts. However if you are still manually building storage accounts, or you have older systems in Azure that either uses classic storage accounts, or manually created storage accounts in Resource Manager (basically not managed storage) then you have to take care of this yourself. Doing this is fairly easy, but it requires a little checking and manual work to do this.
Setting this up requires an understanding of how to Azure backend is working. You’ve probably heard by now that Azure keeps three copies of all data which is uploaded to it. This is true, but that isn’t the HA that I’m talking about in this post. What I’m talking about here is if there are two members of an availability group, web farm, etc. Azure storage accounts are created on what is known as stamps. A stamp is basically the physical hardware the sits under VMs and storage accounts. The trick is to make sure that the storage accounts are on different stamps. Now there’s no way to do this in the Azure portal, or with PowerShell. The upside to all this is that it’s all still pretty easy to do. All it takes to see if your storage accounts are on different stamps is a DOS window.
Each storage account has a public name, and an IP address to go with it. Each stamp has an IP address. Looking at the IP of the storage account will tell you if storage accounts are on different stamps or the same one. If the storage accounts are on different stamps, then the VMs that use the storage accounts will be highly available. Looking at the storage accounts behind the two MySQL servers that run the www.dcac.co website (see the image) we can see that the IP addresses for the storage accounts are different, which means that they are on different stamps.
If when you check the storage accounts the IP addresses match, fixing the problem is pretty straightforward. You have to delete the storage account and recreate it. Odds are it’ll be created on a new stamp; if it isn’t, delete the storage account and wait a few minutes and create the storage account again. That should cause the storage account to be created on another stamp and give you the high availability that you’re looking for.
There have been some questions recently about how people should pass certification exams.
Taking the exams
The answer is pretty straightforward, use the product that the exam is on. Certification exams (I’ve only taken the Microsoft ones, but I’m sure the other vendors are similar) are build
on actual product use. Now you may not use every feature of the product in your day to day life, but the exams aren’t going to be based on what you do. They’ll be based on what the product can do.
Let’s look at one of the SQL Server exams for example, specifically “Administering a SQL Database Infrastructure.” The notes on the exam say what it will cover, and in what percentages. For example, 35-40% of the exam will be on managing and monitoring SQL Server instances. And if you aren’t sure what’s considered “Manage and monitor SQL Server instances” the website spells this out as “Monitor database activity, queries, indexes, statistics, and instances.”
Now you may be thinking that you use RedGate’s SQL Monitor or SolarWind’s Data Platform Advisor (or another tool) to monitor your instances. These exams won’t be testing on these. Assume that only native tools exist because that’s what Microsoft will be testing you on. After all, Microsoft can’t test on another companies product.
There may be times when you don’t agree with the percentages on the exams. I know a big argument exists over how much XML should be on the exam. I referrer you to the notes on the exam on the Microsoft site. If it says the exam may be 40% of something, then it probably will be.
What’s the best way to PASS the exams?
First, let’s talk about brain dumps. They’re not always right, and there’s no way to know when they are wrong. And when they are wrong, who are you going to complain to, the person/company that violated their NDA with Microsoft to make the brain dumps?
When you take Microsoft Exams (or any exam), you sign an NDA that you won’t talk about the content of the exam, and you won’t cheat on the exam. That means that if you use a brain dump, or tell other people to use a brain dump, or help build a brain dump, you’re cheating and potentially in violation of your NDA with Microsoft. What that NDA says, in short, is that Microsoft can revoke your certifications and prevent you from passing any future certification exams. All for talking about their exams. Now you may feel like that’s a bit strong, but Microsoft takes their anti-piracy actions against the certification exams very seriously.
The good news is that there’s an easy way to take the exams more than once for free. That’s using the Microsoft Exam Replay option as it lets you take the exam again if you don’t pass. What you can do with this, is get the Replay and take the exam. If you don’t PASS that’s OK, you’ll now know what you didn’t know on the exam so you can study up in those areas and take the exam again for free.
Also, keep in mind, the exam documents online will say how much usage of the product is expected to PASS the exam. Most of the exams assume 3-5 years experience. If you’re just getting into technology, then you probably aren’t ready for the exams.
Practice Tests are written by different people then the exams, so they’ll be different questions. This is on purpose. The questions can’t be the same, that would be cheating. While practice tests are good, they aren’t the actual exam.
Now I know this hasn’t all been good news, but it wasn’t supposed to be. It was supposed to be an honest breakdown of what the certification exams look like and how to study for them. The first step is to have a couple of years experience using the product. The second step is to use the Retry feature that Microsoft offers, or whatever they move towards in the future.
Not all cloud providers are the same. Some are flexible and will grow and shrink with your needs and business (Azure, Amazon, Google Cloud), and some will not (IBM, Rackspace, Other Bare Metal clouds). The whole goal of using a cloud provider is to be able to scale up and scale down as needed, preferably without having to do anything past the initial setup.
For example, lets say that we have an IoT application that typically gets 100,000 new messages per second uploaded to it. At that rate any cloud (or “cloud”) provider will do. Now say that a feature in Time magazine is written about out product and our IoT product sales shoot through the roof so instead of 100,000 new messages per second, we now are getting between 10,000,000 and 100,000,000 messages per second. How will your cloud handle this? If you’re in one of the public clouds like Amazon, Azure or Google Cloud then your resources should magically scale. Boxes that are on a bare metal cloud will stop responding without someone in IT knowing that they need to scale up, provisioning a bunch of machines and configuring those machines. This is assuming that your cloud provides even has some IOT framework in place to handle these messages.
Now odds are you don’t have a wildly successful IoT application. But you’ve probably got a website that customers hit to access your company in some way. Maybe they place orders on your website. What would happen if a massive amount of web traffic started coming in with no notice, and IT doesn’t hear about it until it crashes? Would your want you’re IT department deploying and configuring new servers (bare metal) or would you want the cloud to handle this by automatically scaling the width of the web tier wider so that you can handle the additional requests?
I can tell you want I want for our customers, I want the web tier scaling automatically so that we can keep taking orders rather than our website not be available for hours (or days) depending on how quickly your bare metal provider can respond to your need for new hardware, and your IT departments ability to spin up new services on those new resources.
If you’re using some bare metal cloud provider and thinking that you are getting the uptime that you were promised, you probably aren’t unless you have an architect in place to make sure that you’ve got HA and DR built into your platform. Because that’s the biggest thing you aren’t getting with bare metal cloud providers (beyond Auto-scale), is any HA/DR. If you think you are getting some HA/DR, you probably aren’t (at least not what I’d called HA/DR) without paying for a lot of extras. (The same will apply if you are doing IaaS in the cloud, I’m talking about PaaS services in Azure, Amazon, or Google Cloud.)
What this all boils down to, is that “cloud” has become a marketing word and not an actual word that means that it says anymore. Companies will use the cloud for anything from “VMs on someone else’s hardware” all the way through “auto scaling Platform as a Service.” And the closer you are to “VMs on someone else’s hardware” the further you are away from the true nature of cloud platforms.
In August of this year I’ll be presenting my second precon of the year. This session at the Data Platform Summit I’ll be presenting a Precon on High Availability and Disaster Recovery. If you’ll be in Bangalore, India on August 8th, that’ll be the day that I’m presenting my precon.
Book soon in order to save the most of the conference as the prices go up each month.
Get booked, and I’ll see you there.
I’m thrilled to be able to say that I’ll be presenting a precon at the PASS Summit 2018. This year I’ll be giving a precon on database security. In this session we’ll be going over the various security features within Microsoft SQL Server, then putting these techniques into action by trying them out on Virtual Machine hosted in Azure as the day wraps up (so make sure you bring a laptop with you that you can RDP to an Azure VM with).
I’ll see you at the PASS summit.
Being a vendor/exhibitor at a few conferences has been an eye-opening experience for sure. Sadly exhibitors can’t sponsor at all the events that are available. It used to be the case that vendors could sponsor at them all, but back then there were just a few conferences that would be related to the vendor so it was “affordable”. Today with SQL Saturday events. Code Camps, User Group meetings as well as the large conferences like Build, Ignite, etc. fighting for those marketing dollars is a lot harder than it used to be.
What’s it worth?
Vendors, both small and big, have to get someone out of the event. With most of these events, the name of the game is email addresses to add to their mailing lists (you didn’t think booth swag was free did you?) and if the vendor already has sponsored an event in the past, especially over a couple of years, then that vendor probably has the contact information for most of the attendees at the event. If there’s a large percentage of attendees that are new to the event each year, I’d recommend highlighting that on your sponsor prospectus that you send out to vendors to get them to sponsor.
Speakers are not sponsors
The other major thing that events have going against them is treating speakers like sponsors. Now I’ve been a speaker at events for a decade, and I’ve always drawn a clear line between being a speaker and making myself an exhibitor for free (or getting paid for being an exhibitor when I have a precon). Some speakers haven’t always done this, and some have gotten rather blatant about it, with the events that they are presenting at doing nothing to curb those speakers from getting the benefits of being an exhibitor without paying for the privilege. Recently a speaker was collecting contact information from attendees during their precon to give those attendees a recording or the session. Those attendees that gave over their information were leads to the speaker, the same kind of leads that an exhibitor would be paying a large amount of money for on the exhibit hall floor.
People are going to say that events are welcome to allow speakers to act like exhibitors. And they absolutely are, those events should also not be shocked when sponsor money dries up. As a vendor, there’s nothing that says that I must sponsor some events and if as a sponsor I don’t like how myself and the other sponsors are being treated then I’m free to take my sponsor dollars elsewhere. This is one of those things that events only get to do once. Once an event has a reputation with the various sponsors and exhibitors that reputation is going to stay for a while, even once the speakers are no longer being treated like sponsors.
Sometimes speakers need to draw that line between speaker and sponsor themselves. That line is swag and selling/giving away goodies to attendees. Now I know that people love getting SWAG, and I know that people love giving SWAG, but people who aren’t paying the event to give away SWAG, and yes that’s basically what vendors are doing, shouldn’t be. Events cost money to run, usually a lot of money. How events get that money is from their sponsors. If sponsors don’t feel like they are getting their money’s worth from the event, then the money will go away along with the sponsor and the event may not have the cash on hand to run the event again. Suddenly that’s a lose, lose proposition for everyone.
In case you didn’t catch it, I’ve been out for a while, with good reason. Apparently, emergency brain surgery takes a while to recover from. I’m not back to 100% yet, but I’ll be speaking at my first event post-surgery, and it’s this weekend at SQL Saturday Orange County. I’ll have Kris, Joey, and John with me at the speaker dinner and after party on Saturday evening. I’ll be talking about on-prem storage and maybe a little Azure/Cloud storage as well. Overall, it’ll be a great day to see everyone old friends and meet some new ones. So get registered for the SQL Saturday, and we’ll see you there.
I get where Cloudflare was going with their 188.8.131.52 DNS server, but the rollout, in my opinion, has been a disaster.
For starters, most people are running DNS servers at home. They may not know it, but they are. Odds are your router is running DNS for you, and it’s probably pretty quick. Even if your router in 40ms, CloudFlare’s DNS is boasting 15ms response times. This first lookup is a little faster, after that, it’s all cached. After the first lookup your computer caches the DNS entry locally, so you’re saving 25ms or so (in this example) once.
Cloudflare also claims that their service stops your ISP from seeing where you are surfing on the web. It doesn’t. I spent years working for an ISP. We were moving the packets to your website, and we knew what websites you were going to even without tracking your DNS lookups.
Cloudflare claims that they’ll be deleting the logs, and not selling any data collected by this service. Now I don’t have an MBA, but where’s the profit? Running a global DNS service isn’t free, or even cheap. Companies don’t do this out of the goodness of their heart. They have to make a profit on services, or they pull the plug on them. So something has to be making money, or the board of Cloudflare will get sick of funding this real quick.
Several ISPs are blocking access to 184.108.40.206. I know that my ISP at home does. I get a lovely “Unable to connect” error in Firefox when I try and browse to the website running on 220.127.116.11. And yes I know it isn’t my machine as it works fine when I VPN into our CoLo which has a different network provider. There are several other ISPs that are blocking this access as well. Years ago I worked for an ISP, and we knew where every customer went, not because of DNS, but because we were capturing the headers of the network packets so that we could find response problems on sites. It really wouldn’t be hard to tie this data back to a user. Knowing what IP you got from DNS really wouldn’t stop us from tracking you on the Internet if we wanted to.
And they do this, I assume, for reasons which are talked about in the blog post from Cloudflare. There’s a lot of junk data being sent to these IPs, so a lot of ISPs are just blocking access to these IPs to make their life easier and safe themselves some network costs for sending that data.
The blog post that CloudFlare released tasks about how Twitter was used during the Turkish uprising and people got around the countries blocks by using Google DNS instead of the in-country DNS.
This shows that the blocking done by the country was lazy, not that DNS from Google fixed this. If Turkey (or another country) wanted to block access to Twitter no matter what DNS you’re using, blocking access to 18.104.22.168/24 (or whatever IP range comes up for the public IPs for the country that wants to block the service).
4. Login Pages
On top of that, several hotels, hospitals, convention centers, etc. use 22.214.171.124 as the login page for their portal, so they block external requests for that IP. One of the reasons that everyone uses that IP for their login page in right there in the Cloudflare blog post. That IP wasn’t publically used into this service from Cloudflare since so much junk was being sent to it. So because of that lots of people use it, or block it. You can see this right on Twitter where SwiftOnSecurity shared a DM from a network engineer. Should they be using this? Maybe.
We can’t expect to have every company that’s using 126.96.36.199 to reconfigure their network because Cloudflare decided to start offering this service. This is even the default for some Cisco models that are deployed around the world. I know that in a variety of hotels (and the hospital I was in last year) 188.8.131.52 was their login portal for their Wi-Fi. If I set my DNS to 184.108.40.206 on my laptop and went to any of these sites, I wouldn’t be able to browse. Stopping people from using their computer without a configuration change is a problem.
I get that the 220.127.116.11 IP isn’t a reserved private IP, but there are RFCs, and there is the real world. And in the real world that IP is in use in private networks all over the world, and it’s known that it is in use.
Would you be surprised to see that Cloudflare doesn’t own the IP space used by their DNS service? I sure was. The two public addresses that have been published are 18.104.22.168 and 22.214.171.124. Those are both owned by
APNIC Research and Development, which means that APNIC could decide that Cloudflare is done and APNIC could simply shut down the service with no notice to Cloudflare or the users. And since Cloudflare doesn’t own the IP addresses, there’s nothing that Cloudflare could do if this happens besides having a PR disaster.
Should we block?
Now I’m not saying that places should be blocking access here. But if I was a dictator looking to keep my people from getting online, there are much easier ways than blocking DNS (I’m assuming details like this are left up to some systems team somewhere).
Will all this get better? Cloudflare says that it will. I don’t see this getting much better. We’re talking about reconfiguring a large number of hotel, convention centers, hospitals, etc. with little to no benefit to them. We as a technology community have been trying to get IPv6 in place for 20 years, that still isn’t even close to happening, and that’s a much smaller number of companies that have to reconfigure things.