Having too many NUMA nodes can lead to some really strange CPU behavior. What you’ll probably see is that the CPU workload of the machine isn’t evenly balanced across the CPUs. If for example you have 32 cores on the VM, and you have 16 NUMA nodes, with two cores per NUMA node you’ll probably see two cores (might be four, or six it depends on a lot of factors) running very hot compared to the other cores on the server.
At one client I was working with in 2015 they had a server which was configured very similarly to what I’ve described above. They had 32 cores, with 11 NUMA nodes. One NUMA node has 4 cores, the next 9 had three cores each, while the 11th NUMA node had one core in it.
When monitoring the performance on the server what we saw was that three cores were running at 100% (or close to it) while the other cores were running at about 4%.
The reason for this, is that SQL Server is NUMA aware, and it was trying to get the best performance possible out of the configuration that it had access to.
The solution in this case was to reconfigure the virtual machine so that it had only two vNUMA nodes. We did this by changing the number of virtual sockets to 2, and put 16 virtual cores on each virtual socket. Once this was done CPU load on the server was much more balanced across the cores. Now instead of 3 cores running at 100% 16 cores were handing the same workload (that workload was still isolated to a single NUMA node, but now the node had 16 cores in it) but the CPUs were bouncing around the 15-20% range. Suddenly the box was happier, and more importantly the application which the end users was using was behaving much better and queries were responding much faster as they no longer needed to wait for a CPU to become available.
Now you may be asking yourself how a system could end up with a screwy NUMA configuration like this. And the answer is pretty simple. The system grew over time, and more and more cores were added to the server little by little as the years went by. As the system needed more CPU resources more cores were added, but no attention was paid to how those cores were presented to Windows. So we ended up with Windows seeing 32 cores in a very strange configuration.
Because most SQL Servers are running inside VMs these days, we need to be careful how we configure those VMs to ensure that we get the proper configuration of the virtual machine. If we don’t have the correct configuration of the virtual hardware there’s little hope of getting good performance out of the VM.
Check your NUMA node configuration to make sure that it’s correct and if it isn’t correct it. Especially if the configuration is very odd looking. And if you aren’t sure, check with us, we’re happy to help.
I’m thrilled and honored to have been selected to speak at the PASS Summit again. This year I’ll be presenting a session titled Database Security in SQL Server 2016 and beyond. In this session we will review the new enhancements to SQL Server security available in SQL Server 2016 and Azure SQL DB, including Always Encrypted, Row-Level Security, and Dynamic Data Masking, as well as whatever else Microsoft has released since this abstract was written. We’ll look at how to set these features up, how to use them, and most importantly, when to use them.
You can also find me at our booth called “Consultants Corner”, which we will be sharing once again with the team from SQL HA. Be sure to come by our booth, get entered for our giveaway and talk to the two teams who are there in the booth about any problems that you may be having that you want to bounce off of someone.
I was playing around with Azure Active Directory a little while ago, and I saw that Azure Active Directory can be setup to configure your Gmail implementation. This will allow Azure AD to configure new users in Gmail automatically when they are configured. If you have Azure Active Directory configured to sync with your on-premises Active Directory that means that when you create a new Active Directory user on-premises that use will automatically get a Gmail account created.
The configuration was shockingly simple to setup as well. To configure Azure Active Directory log into the Azure classic portal and select the Active Directory option. Open your Active Directory and select the Applications tab. Click “Add” down at the bottom and search for “Google”. Find the Google Apps option from the list and add it. After adding it run through the few page wizard. There’s really nothing to the wizard.
The only catch is that you need to already have your Google Apps account created and you have to have an account in the Google Apps account which has admin rights. You’ll be prompted to log into Google then test the connection. Once that’s done you’ll be taken back to the Azure AD application list. At the end of the wizard you’ll be prompted to select if you want to configure all your users in Google Apps or not. If you select to sync your users to Google Apps you can log into Google’s Admin screen and you should see all your users with accounts created.
And that’s it. When there’s a new user added to Azure Active Directory the user will be automatically created in Google Apps (and Gmail) automatically
If you are running a PCI compliant system on SQL Server 2005 you are going to fail your next audit. One of the audit requirements is that the vendors must support the version of your software which you are running on. As Microsoft no longer offers support for SQL Server 2005 that’s going to cause you to fail your next PCI audit.
Thankfully there is a way around this problem, upgrade. In a perfect world you’ll upgrade to SQL Server 2014 or SQL Server 2016, but in reality anything newer than SQL 2005 will do. The newer the version you upgrade to, the longer you can wait before your next upgrade project needs to kick off.
So if you have any PCI systems (or really anything running SQL 2005, or older) now is the time to upgrade.
(And yes, Microsoft really will not support you on SQL 2005. I just recently told someone to try calling support, and they were told there was nothing Microsoft support could do to help them.)
Recommended reading from mrdenny for June 10, 2016.
This week I’ve found some great things for you to read. These are a few of my favorites that I’ve found this week.
- How many copies do I need?
- Amazon IoT Button Hacking: Part 1
- Checkpoint process for memory-optimized tables in SQL 2016 and implications on the log
- Power BI Enterprise Content Sharing
- SQL Server 2016 features: R services
Hopefully you find these articles as useful as I did. Don’t forget to follow me on Twitter where my username is @mrdenny.
Presentations that are given are user groups, events, etc. often require demos. Sometimes those demos are to large or complex to run on a laptop. To get around this problem people will use Azure for their demos. Doing this, depending on the demo, requires using RDP to connect to the VM in Azure so that you can run your demo.
But what happens when the network administrator has decided that they need to secure the outbound network connections and make it to that they prevent network connections on random TCP ports? This blocks RDP from working.
When you spin up a VM in Azure it uses a random port number as the network port that RDP listens on to make it harder for hackers to find your RDP port. Random port numbers are also used because every VM within a single resource group shares a public IP Address, so in order you to be able to RDP to the machines, every machine will need to have RDP setup on a different port number.
We can use this ability to gain access to our Azure VMs and get around this annoying practice of having the output connections blocked.
Now I should be clear that this isn’t required all the time, only when the network is blocking the outbound connections. Thankfully it doesn’t happen all the time, but it does happen. I’ve been at some corporate venues doing training for corporate clients where this has come up, as well as at some SQL Saturday events. Now there’s nothing that the event organizer can do to solve the issue, the network admins sadly aren’t going to change anything just for us. But thankfully they don’t need to, we can reconfigure our Azure VMs a little so that this isn’t an issue anymore.
Even the most locked down network is going to allow web traffic. That’s TCP ports 80 and 443. As long as that traffic is allowed, we’re good to go. What we’re going to do is change the random TCP port that the Azure firewall is using for RDP access and have it use port 443 instead.
To make this change, log into the Azure portal and open the properties of the VM that you want to work with. Select Endpoints from the VM’s settings. It’ll look something like this (the public port number will probably be different).
Change the public port number from whatever it is to 443. Do not change the private port number. Then click save at the top. It’ll take a minute or two for the firewall to be reconfigured. Once it is, download the new RDP connection file by clicking the Connect button on the VM’s properties blade.
You should now be able to connect to your VM.
If you have multiple VMs in a single resource group you’ll only be able to set one of them to use port 443. So just use one VM and use it as a jump box to then access all the other VMs.
We’ve been doing some really cool things in Azure recently with some of our clients as well as hosting our own websites in Azure pretty successfully. But we decided that sometimes just having some physical hardware comes in handy sometimes, especially when you want to build out weird or large environments to test things for customers. And sometimes the cloud just isn’t the right place to try and do all that.
Well, thankfully Denny Cherry & Associates Consulting doesn’t have to just want to do this anymore. We’ve just built out a brand new hosting environment in a CoLo here in downtown Los Angeles. As lab environments go it’s a pretty nice config. We’ve got 32 physical cores, 432 Gigs of RAM (which we’ll be upgrading later to 576) spread across 4 physical hosts, and ~20 TB of Tintri SAN space.
Given that testing out most workloads doesn’t take a whole lot of CPU power or RAM, we can easily run 100+ VMs (less if we need some actual resources).
We’re configured to be able to test show multi-subnet clustering just like in a multi-site environment and just about any on-premises configuration that a client wants to see.
This is just one of the benefits that we are bringing to our clients when we work with them. If you want to be able to see how AGs work in an on-premises environment, or test out some weird replication configuration you don’t need to have the resources needed to spin all that up for that testing. We’ve got the hardware and the software to help you do that testing.
I’m a big rebel, so I snuck a pic of the gear in the rack. It’s amazing how much processing power you can shove into such a small space. We’ve obviously got plenty of room to grow (if any other vendors want to see their storage in the pic, feel free to get in touch with us. As our clients need more CPU power and RAM for testing out new configs for their systems we’ll just have to expand out the server farm.
It’s also amazing how big the boxes are to hold that little bit of hardware (scroll down for that pic).
Now you may be asking why don’t we just do all this in Azure? And we could, but the reason we didn’t is pretty straight forward. Cost. Building tons of VMs in Azure and leaving them running for a few weeks for customers can cost a decent amount pretty quickly, even with smaller VMs. Here our cost is fixed. As long as we don’t need another power circuit (we can probably triple the number of servers before that becomes an issue) the cost is fixed. And if we need more power that’s not all that much per month to add on.
All and all, this will make a really nice resource for our customers to take advantage of, and give us a place to play with whatever we want without spending anything.
Back in the good old days (the late 1990s) you didn’t need to worry about things like NUMA unless you were running the top 0.001% of workloads. Today however even a fairly small server can have multiple NUMA nodes. NUMA configurations aren’t just for physical SQL Servers either. Virtual Machines can have NUMA configurations as well (typically called vNUMA).
There’s a few different ways of figuring out how many NUMA nodes you have in your server. If you are logged onto the server via RDP or on the physical console you can find this just by opening task manager. Once task manager is open go to the Performance tab (if it isn’t visible select the “More Details” button at the bottom) and select the CPU graph. You should see a graph on the right. Right click on that graph and select “Change graph to”, then you should see a NUMA node option. If the NUMA node option is greyed out, then you have one NUMA node. If it isn’t greyed out then select the option and however many graphs you see is how many NUMA nodes you have. It should look something like this.
The second option is to use SQL Server Management Studio. Connect to the server using the object explorer. Right click on the instance in the object explorer and select the CPU tab. Expand the “ALL” option. However many NUMA nodes are shown is the number of NUMA nodes that you have as shown below.
You can even expand each NUMA nodes to see which logical processors are in each NUMA node.
The next way to see what your NUMA configuration is, is to look at the ERRORLOG file. Every time SQL Server is started it will log the NUMA configuration towards the beginning of the ERRORLOG file. That line will look something like this example.
Server Node configuration: node 0: CPU mask: 0x00000000000000ff:0 Active CPU mask: 0x00000000000000ff:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
For each NUMA node there is in the server there will be a line in the ERRORLOG file. The CPUs which are in the NUMA node are shown based on the values in the CPU mask and the Active CPU mask. In this case the server has a single NUMA node with 8 cores in the NUMA node. How do we get a value of 000000ff for 8? This is hex, so the first core is 1, the second core is 2, the third core if 4 and the fourth code is 8. Add those up and you get 15. Because we’re dealing with hex, we count 1 through 9, then to get beyond 9 we use letters. A is 10, B is 11, C is 12, D is 13, E if 14 and F is 15 (yes F is supposed to equal 16 but if we start counting at 1 and skip 0 it works, so work with me; all you need to remember is that F means 4 CPUs). In almost all deployments the number of CPUs will be some multiple of 4 so you’ll end up with either 0s or Fs. If there’s other values in there that might be ok, or it might not. More digging will be required.
You can also use T-SQL (or performance monitor) to see how many NUMA nodes there are. By looking at the sys.dm_os_performance_counters DMV you can see how many NUMA nodes SQL sees. The query below will give you the exact number. For every row it returns there will be one NUMA node.
where object_name = ‘SQLServer:Buffer Node’
and counter_name = ‘Page life expectancy’
How many NUMA nodes is too many? Like most questions, that’s a pretty loaded question. If you have more than 4 then odds are something is configured incorrectly. If you have more than 8 I can pretty much guarantee that something in configured incorrectly. If you have one or two, that’s probably OK. The general rule of thumb that I tell people is that if you have more than 2 it needs to be reviews because it’s probably wrong. The most that I’ve seen is 16 NUMA nodes on a box, and that starts going some REALLY strange things. But more on that on another day.
For now, let’s just be happy that we know how many NUMA nodes are too many, and how to figure out how many we have.
Recently Jen McCown received a speaker evaluation form from a SQL Saturday attendee. Now normally this wouldn’t be a blog worthy event as this happens several times a year as there are easily 100 SQL Saturday’s a year.. The reason that this speaker evaluation form is blog worthy is due to the comment which was left.
The question on the form was “What could the speaker do differently to improve?”. Acceptable answers typically are something like “speaker slower”, “more demos”, “fewer demos”, etc. However on this specific evaluation form the attendee wrote “wear négligée”. Jen was kind enough to post the evaluation form for us all to see.
In case you weren’t aware, and apparently at least one person isn’t, this is completely unacceptable. This would barely be acceptable if her husband, Sean, had written it (he didn’t, she checked), from a stranger it’s unwanted, immoral, and completely unacceptable in either a professional or social setting.
Clearly someone needs to be reminded how we act in civil society. We don’t tell total strangers that we want to see them standing around in their underwear. Just because you’re seen someone give a presentation you aren’t in a relationship with them. You don’t get to post these sorts of things.
Speakers in our community are not your playthings to do with as you please. They are giving their own time and money to put together presentations and attend the events which they give the sessions at.
If you think that these sort of comments are acceptable, let me tell you now once and for all, they aren’t. Not to a male presenter and not to a female presenter. All the presenter is looking for, after spending their time and money is a round of applause and if you liked the session a nice “thank you”.
If you want to write this sort of thing on a speaker evaluation form, stay home. Your attendance isn’t required.
Recommended reading from mrdenny for May 13, 2016.
This week, I’ve found some great things for you to read. These are a few of my favorites that I’ve found this week.
- Oracle V. Google Being Decided By Clueless Judge and Jury
- Amazon and Microsoft Directors Charged in Prostitution Sting
- Fascinating EMC World 2016 vLab stats
- Police Reveal Tactics for Fighting Botnets
- EMC World at your fingertips in vLab
Hopefully you find these articles as useful as I did.
Don’t forget to follow me on Twitter where my username is @mrdenny.