One of our customers ran across some interesting data within their Azure bill. Specifically they were
looking at the billing for their SQL DW, but this same issue can show up in the billing statement for other services as well.
This customer has a single SQL DW setup with a service tier of DW300. Looking at their monthly bill last month, something looked kind of odd. A snippet of the bill is shown below.
Some things stand out to me when looking at this bill. First is the service that is being billed. It doesn’t say how many DWs we’re being billed more (more on that later).
Something else odd is that the SQL DW shows being billed for 1220 hours. Now in theory the most number of hours for any single service you can be billed for in a single month is 744 as that’s 24 (hours) *31 (days), but here we’re being billed for 1220 hours. Now we know for a fact that this SQL DW is powered off at times, so how are we using this resource for more than 31 days in a month?
The last oddity is the rate. $1.46 an hour. This points us to the reason for all of these strange numbers. That’s the rate to run a SQL DW at DW100 for one hour.
The entire SQL DW service is built around 100 DWUs and incrementing the cost and scale up from there. If you look at the pricing page for SQL DW you’ll notice that the price increase is linear. This is for a reason, it makes the billing MUCH easier.
So even though we’re being billed for 1220 hours, the SQL DW actually only ran for about 406 hours and change.
When reviewing your Azure bill, especially for PaaS services, keep in mind that the billing may not look like you expect it to because they have configured the billing at a lower tier then used a multiple to get to the correct pricing.
Let me start by saving to every event organizer thank you for running the events you run. This post may come off a little bad-tempered, but please know that this comes from me wanting to make your life easier, and to make your events even better than they already are.
Please, please communicate with potential speakers WELL in advance of your event. If you’ve got a list of speakers that you know based on their abstracts you’ll be accepting (even if you don’t know which specific session you’ll be picking) then let those speakers know that you’d love to have them speak at your event. If you don’t know yet which session you want from them, let them know. This gives them the chance to start booking travel as quickly as possible if they are coming from a remote location. If international airfare is required for the speaker booking early can mean saving hundreds of dollars in flight costs being saved. If the speaker is paying for the flights out of their own pocket this can be a big benefit.
Doing It Right
A perfect example of this is Data Platform Geeks Summit in India. The team there has been great about working with the international speakers several months in advance, in order to let them know that they’ve been accepted so that flights could be purchased. In addition to being able to book flights in advance some of the speakers were able to book other speaking engagements in Southeast Asia maximizing the ROI on the expense of booking the flights to get to the other side of the planet.
The more communication that you can have with your speakers and sponsors the better as it keeps everyone advised on what’s going on and what needs to get booked and shipped well in advance. This holds true even after you’ve announced your schedule—you won’t annoy speakers by over communicating.
The Trump Administration has put together an Election Integrity Investigation commission to investigate voter fraud here in the United States. They are requesting all state governments hand over a LOT of personal information including (but not limited to) your name, address, birthday, political party (if recorded), last four digits of your Social Security Number, which elections you’ve
participated in since 2006. The data is known as Personally Identifiable Information (PII).
This is not a political post, that’s for political folks to talk about. I’m an information security professional: I want to talk about the massive risk to all our personal data if this information is sent over.
Here’s the letter that was sent to the states. First let’s review how the data should be sent in: one option is via email. We all know email isn’t secure, witness the email hacking and data drops from the last election. Transmission of the information alone puts every individual at risk of a hack. Alternatively, they offer the SAFE system as a backup. Based on the URL, the system is run by the US Army, not the commission, which will require a secondary transfer of our PII, under the assumption it’ll end up in the correct place, won’t be lost, released, etc. Moreover, it’s entirely possible it could arrive at the U.S. Army and still be transferred via email, nullifying any attempt made in the name of personal security.
Now let’s dig into that second to last sentence there. “Please be aware that any documents that are submitted to the full Commission will also be made available to the public.” So the panel is going to take everyone’s personal data, the exact data needed to steal someone’s identity, and release it to the public? Where anyone can download it?
Finally there’s the request for the last four digits of your social security number, which is the most pernicious and misguided request in the letter, because it is the key to your identity. The digits you would be handing over are the most random ones in your coding as they are in sequential order of your request for a card. The first five, if you applied for a social security number before 2011 (which is almost all of us) are sequencing based on location: the first three digits were assigned based on the ZIP Code of your mailing address at the time of your social security application. You may still live in the same zip code, or you can be easily tracked for previous addresses online. The second two digits are a group number based on location, again possible to reference it with a little work.
As a final consideration, there are also people who need to stay hidden for their own protection: spouses who do not want their abusers to locate them, witnesses from court cases, adoption issues, even celebrities who don’t want to subject themselves to stalkers or superfans constantly knocking on their front door.
And there is no “opt out” for an individual, the record is historical in nature going back to 2006 and presented by the election board, not the individual.
This whole project seems to have been construed without consultation with anyone in the information security sector and it shows. Even if the panel could overcome the transfer and privacy issues, we have no way of knowing how this information will be stored (assuming that it isn’t released to the public as the letter suggests) so that confidential data can actually remain confidential.
Thankfully many of the Attorneys General from across the country have said that this won’t be happening because of state privacy laws. Encourage them, and your state election boards, to keep fighting this. Hopefully the commission will figure out that this is a horrible idea.
On Saturday a lot of winners of the Microsoft MVP award got emails that they’ve been waiting for telling them that they were renewed as MVPs for another year. Some people didn’t get renewed for one reason or another, and only Microsoft knows why.
You may have expected to see me tweet that I was renewed as a Microsoft MVP on Saturday, but I didn’t. That’s because I wasn’t. Now this doesn’t mean that I’m not an MVP, because I am. The thing is that unlike most of the other MVPs I wasn’t up for renewal as a Microsoft MVP July 1st. The reason why is a little lengthy, so bear with me.
Back in the olden days of the MVP program (2016) there were 4 renewal schedules, at the top of each quarter. I received my award in October the first time around, so I got renewed every October. When they announced the change to the program around the MVP Summit last year they had just renewed the October awardees so technically we’d all still be MVPs when renewal came up in July. Because of that Microsoft decided that we’d keep our October award expiration, then they’d extend that until the next July (our awards should have expired October 2017, so they now expire July 2018). Because of this we all get to keep our awards for an extra 9 months (January and April award winners got extended until July 2017, the July award winners didn’t get any extra time).
So yes, I’m still an MVP. You can tell who’s an MVP by looking at the MVP webpage. Now this isn’t a perfect solution as some MVPs opt to hide their profile for one reason or another, but it’s a pretty good bet that if someone isn’t listed there they aren’t an MVP anymore, and if they are listed there they are an MVP. As soon as someone is no longer an MVP there profile is removed so there’s no risk of finding people that aren’t MVPs in the MVP directory.
Congrats to the MVP Class of 2017, you’ve all earned it. See you at the next award cycle.
Running SSRS with the ReportServer and ReportServerTempDB databases has been supported for a while now. If however you want to use scheduled delivery of reports you need to do some extra stuff to make this work.
The first thing that you need to do, if when setting up the SSRS instance you need to move the AG from Replica to Replica to setup the correct permissions within the MSDB database (or manually setup the permissions for SSRS within the MSDB database. The easy solution is to make the SSRS database a member of the db_owner role in msdb. This is because apparently the permission that the SSRS configuration manager gives the SSRS account doesn’t include the ability to add a category to the msdb database, which it needs to do. db_owner was the easy fix in our case, and the risk of SSRS doing something it shouldn’t in MSDB is minimal.
That’ll allow SSRS to create all the needed jobs on each node of the Availability Group.
The next thing you need to do is configure the Availability Group to restart SSRS after the AG fails over. This needs to happen because SSRS doesn’t recreate the jobs when it reconnects to the AG, it only does this on SSRS startup. If you are running SSRS in a farm (which you should be doing for high availability) you only need to restart SSRS on one node of the farm.
This can be easily done with a three line batch file, which you configure as a generic resource in failover cluster manager within the resource group which hosts the availability group. Personally I put this script in c:\scripts with some other stuff.
sc \\ssrs-server stop ReportServer > c:\scripts\restart_ssrs.txt
sc \\ssrs-server start ReportServer >> c:\scripts\restart_ssrs.txt
The first line stops SSRS, the second starts it. The third line pauses the batch file so that it doesn’t complete and quit. If it did that then failover clustering would see that as a failure and a failure trigger a failover (or if you disable that a failed service). And since I like my clusters all green and happy, I pause the batch file so it sits there forever. The > c:\scripts\restar_ssrs.txt is logging the output to a text file.
Yes, I could have done this with PowerShell but it wouldn’t have gotten me anything except for more complex code while I check to see if the service is down, then attempt to start it. And SC works just fine for this.
The last thing you need to do is make the computer accounts for all the members of the availability group members of the local administrators group of the SSRS server. This needs to be done so that the service can be restarted (or in some other way give these accounts access to restart services). It needs to be the computer account of the member nodes, not the computer account of the cluster as the batch files run under the context of the computer.
After that, when an AG failover occurs your SSRS will restart within a minute or two and create any missing jobs.
Now the downside here is that you’ll have a bunch of jobs failing on the secondary replicas, but I’ll take that over a non-working configuration.
Recently Microsoft Azure introduced the ability to have multiple front end IP addresses on an Internal Load Balancer. This allows for
Failover Cluster Instances to have multiple instances per cluster, and Availability Group deployments to have multiple Availability Groups per deployment.
But with this improvement comes the need to be very careful about how our probe ports are setup on things. Otherwise things aren’t going to work exactly as we expect them to. And we’ve seen this on a couple of different deployments with customers now.
Much of this problem comes from the fact that all the sample code that’s out there and talks about how to setup an FCI or AG assumes a single front end IP and a single probe, as that’s all that was available for years.
Thankfully solving this is actually pretty straight forward. We need a single probe configured in the Load Balancer for each front end IP address that we configure in the load balancer. That means that each IP that we have configured in failover cluster manger also needs it’s own probe as well. If you don’t setup a separate probe for each clustered IP address then the only way to ensure that everything works as expected is to have all of those Clustered IP addresses on the same node of the windows cluster.
What I like to do is something like this.
The reason that we have to do this is pretty basic, you just have to remember how the probe ports work. The probe port that you configure in Windows isn’t listening on the clustered IP Address. It’s listening on 0.0.0.0 or all IP addresses. Also, the Internal Load Balancer doesn’t connect to the probe port on the clustered IP, but instead on the clustered node’s IP address.
We can see this is we run netstat -a -n on the node that’s hosting the Availability Group replica. In this case the node of the VM is 10.0.0.8 and the probe for the AG is listening on port 59999. The SQL Server Availability Group is listening on 10.0.0.14 in this case and the cluster root of the Windows Cluster is listening on 10.0.0.13 with it’s probe on 59998. Before we reconfigured this cluster in this way, if the Cluster root and the AG were running on the same node then everything was fine. However if either one was moved to a different node of the cluster then connectivity to SQL became “interesting”. SSMS would only draw 1/2 the folders it is supposed to show. The application got impossible for users to use, etc. Basically everything broke and fell apart. Once we realized the problem, the fix became pretty straight forward. Change the probe configuration for the cluster root (10.0.0.13 in this case) to another port and configure the Internal Load Balancer with another probe, the configure the load balancing rules for the cluster root to use the new probe. Once all that way done everything started working as expected.
These sorts of configuration problems can creep up on you. This environment was 6 months old when this problem first showed up. Because until then, the cluster root and the AG always happened to reside on the same node. But recently that changed because the third node of the cluster (we have an extra machine in the config so that we are using a majority node set quorum configuration) lost network, then got it back. This caused a cluster vote and this machine became the cluster manager so the cluster room (10.0.0.13 and it’s network name) moved to this node causing the probe to be open on two machines and causing all hell to break loose as 1/2 the SQL connections were being directed to a server that didn’t even have SQL installed on it, much less have the Availability Group listener running on it.
The probe is very important to configure for everything that you want to load balance to, it just needs to be configured correctly.
It’s shockingly easy to setup as well. You’ll need this WordPress plugin to handle the actual authentication. It’s not in the WordPress gallery, instead it’s on GitHub so you’ll have to download it from there and install it.
Once it’s setup you’ll need to log into the Azure portal and tell Azure Active Directory that you’ll have a new application using Azure Active Directory. Then copy the keys from Azure Active Directory to the settings page of the Plugin.
Now don’t worry, there’s instructions on the GitHub page for the plugin that’ll walk you through setting it all up. It frankly took me all of about 5 minutes to get it setup and working.
Now if your WordPress is configured to prompt you to do a math problem when you login, you’ll need to disable the Protect feature in the Security tab in the JetPack Security tab.
The benefits of using Azure Active Directory for your authentication are enormous, including robust Two Factor Authentication, corporate password policies, granting access via AD group membership as well as the fact that your login process is now backed by the Worldwide Azure Active Directory infrastructure.
If you have WordPress and have access to Azure Active Directory I would HIGHLY recommend setting up this authentication process.
One of my clients recently had the need to upload tens of terabytes of data into Azure Blob Storage. This gave us the perfect opportunity to use the Azure Import Export service to get these several terabytes of data into Azure by using the Azure Import/Export Service. The Azure Import/Export Service allows you to ship hard drives to Microsoft with your data on them, which is then copied up to the Azure Blob Storage. You can then move the data around from there as needed. All this is done using the Azure Import/Export Tool which is a command line tool, which has a few quirks.
The biggest querks that we ran into getting the Azure Import/Export Tool working was that it doesn’t support quotes in the parameters unless there are spaces in the folder or file names. So the fix here, don’t use spaces and don’t use quotes. For example at first I was trying to use the following in my command line.
But what I needed to use was actually this.
That’s a pretty small difference, but an important one. And the error message that the tool gives doesn’t say that you have invalid characters in the logdir parameter. It just tells you that you have invalid characters in the path or file name. Which means any of the paths or file names, and you have to specify several of them including the journal file (/j) the log folder (/logdir) the drive configuration layout for what drives it’s writing to (/InitialDriveSet) and what folders to pull onto those drives (/DataSet).
Another annoying thing was that WAImportExport.exe didn’t like having /DataSet at the right end of the command line. It only liked it when it was at the left hand side of the command line. Now this was before I figured out the double quotes issue, and that may have been part of it but with the double quotes on all the parameters and with the DataSet parameter on the right hand side, it complained that there was no DataSet parameter provided.
When configuring the DataSet CSV file, you need to put the container name in all lowercase.
Overall I was pretty impressed with how the processed worked. The CSV files were pretty easy to put together. The DataSet file that you provide just tells the application what folders to move where. In this example I’m moving the files from C:\something to the “something” container in my blob storage account (you can use a network share instead of a local file).
In the InitialDriveSet parameter you tell the application which drives that are attached to your computer that it’s using to ship the data.
In my base the drive was formatted and the disk was already bit lockered.
The application has some quirks to it, like I said earlier in the post. But once those got figured out, it was pretty easy.
There are a lot of ways to build a database in the cloud. Picking the correct solution for your workload can be a daunting task. When building a data warehouse solution that needs to scale out from terabytes to near petabyte scale, you suddenly have a lot fewer options. The two biggest players in the cloud market are Amazon’s Red Shift product, and Microsoft Azure SQL Data Warehouse (SQL DW).
There two solutions are going to take a very different approach to building and designing your solution, and migrating between the two solutions can be tricky as there is a lot of data movement that needs to happen to move your data warehouse.
Azure SQL DW has some distinct advantages over Red Shift.
One of the biggest is its ability to scale up and down as needed within minutes with just a small service disconnection. Scaling a Red Shift environment up and down requires large amounts of downtime, the bigger the data warehouse, the more downtime is required as data must be moved around within the Red Shift environment for the data warehouse to be scaled up and down. So if you want to start small, and as the data warehouse grows, scale up you really can’t without a large interruption of service to your users. With SQL DW, the scale up or down operation is kicked off, and within a few minutes more (or less, depending on your need) computer resources are available behind the SQL DW database.
Another huge advantage with SQL DW is that as far as most tools are concerned, it’s just a SQLServer Database. Basically, anything that supports T-SQL can connect to the SQL DW database in Azure and run queries against it. The experience is much better if you use a new version of SQL Server management studio, but it isn’t required. There’s some abstraction that happens on the server side so take the queries which are run, which are written as normal T-SQL statements, to parallelize them against all the backed computer resources to give you the compute scale out that you need with the SQL DW platform. This makes migrating from a SQL Server data warehouse on-premises very easy because the reports which you run today, can just be pointed to use the SQL DW with no changes to the reports. If you are moving from an Oracle Data Warehouse to SQL DW, odds are the reports will be able to run with little to no change, as most SELECT statements convert from Oracle to SQL Server syntax very easily.
From a data visualization and tools layer the entire SQL Server stack just works with SQL DW. Running SSAS on premises or in the cloud–no problem. Power BI can execute live reports against your SQL DW.
Additionally, SQL DW supports Azure Active Directory authentication, which means your users can login with their on-premises credentials. If you have Active Directory Federation Services configured, your users can have pass through authentication.
With SQL DW if there are times of the day or night where you know that there are no users running queries or data loads against the data warehouse we can simply pause the SQL DW so that we aren’t paying for compute resources when we aren’t using them. This can make it more cost effective to run a large sized SQL DW so that processing can be completed faster, then the data warehouse paused during non-work hours.
These are just a few of the reasons that we’ve been using Azure SQL DW with our clients. We’ve had great success with data warehouse projects which have been using Azure SQL DW and we hope to have many more of them. If you’d like to work with our team, we’d be happy to assist with your Azure SQL DW projects. Contact the DCAC team to get started with your Azure SQL DW project today.
This sort of question comes up a lot. And there’s a lot of it-depends built into the answer, so I’ll try and break this down a little bit for you. Be warned, we’re going to be talking about NUMA and other hardware-y things here for the most part.
You wants 1×6 (one socket, 6 cores) because standard edition will only use the first 4 sockets in a server (up to 16 cores combined). There’s no getting around that.
From a NUMA perspective as long a vNUMA at the Hypervisor is disabled then it doesn’t matter as SQL Server standard edition isn’t NUMA aware (NUMA awareness is an Enterprise Edition feature).
This is where things get more complicated if vNUMA is enabled in VMware or Hyper-V.
If vNUMA is enabled then you want one multiple vSockets and multiple vNUMA nodes so that SQL Server is aware of how the CPUs and memory are laid out within the hardware so that SQL can made good decisions on how the processes are being laid out against the hardware.
Now that said, you probably don’t want 6 vSockets. You probably want 2 vSockets with three cores each so that you get multiple cores per vSocket. But a lot of that will depend on if you can control how many vSockets there are per vNUMA node.
If vNUMA is disabled then you care less because SQL thinks that everything is in a single NUMA node so it’s going to make decisions based on that.
More RAM than a pNUMA node
If the amount of RAM configured for your VM is larger than a physical NUMA node, then you need to turn on vNUMA for the VM (no matter how many cores you have) and configure the VM to the cores equally across the NUMA nodes that you present to the VM.
Yes, this is all very hardware-y, and requires some understanding of how pNUMA, vNUMA, vSockets, etc. all work together. For most DBAs you’ll want to just kick this over to the VMware / Hyper-V admin and have them do some tweaking.