Let me start by saving to every event organizer thank you for running the events you run. This post may come off a little bad-tempered, but please know that this comes from me wanting to make your life easier, and to make your events even better than they already are.
Please, please communicate with potential speakers WELL in advance of your event. If you’ve got a list of speakers that you know based on their abstracts you’ll be accepting (even if you don’t know which specific session you’ll be picking) then let those speakers know that you’d love to have them speak at your event. If you don’t know yet which session you want from them, let them know. This gives them the chance to start booking travel as quickly as possible if they are coming from a remote location. If international airfare is required for the speaker booking early can mean saving hundreds of dollars in flight costs being saved. If the speaker is paying for the flights out of their own pocket this can be a big benefit.
Doing It Right
A perfect example of this is Data Platform Geeks Summit in India. The team there has been great about working with the international speakers several months in advance, in order to let them know that they’ve been accepted so that flights could be purchased. In addition to being able to book flights in advance some of the speakers were able to book other speaking engagements in Southeast Asia maximizing the ROI on the expense of booking the flights to get to the other side of the planet.
The more communication that you can have with your speakers and sponsors the better as it keeps everyone advised on what’s going on and what needs to get booked and shipped well in advance. This holds true even after you’ve announced your schedule—you won’t annoy speakers by over communicating.
The Trump Administration has put together an Election Integrity Investigation commission to investigate voter fraud here in the United States. They are requesting all state governments hand over a LOT of personal information including (but not limited to) your name, address, birthday, political party (if recorded), last four digits of your Social Security Number, which elections you’ve
participated in since 2006. The data is known as Personally Identifiable Information (PII).
This is not a political post, that’s for political folks to talk about. I’m an information security professional: I want to talk about the massive risk to all our personal data if this information is sent over.
Here’s the letter that was sent to the states. First let’s review how the data should be sent in: one option is via email. We all know email isn’t secure, witness the email hacking and data drops from the last election. Transmission of the information alone puts every individual at risk of a hack. Alternatively, they offer the SAFE system as a backup. Based on the URL, the system is run by the US Army, not the commission, which will require a secondary transfer of our PII, under the assumption it’ll end up in the correct place, won’t be lost, released, etc. Moreover, it’s entirely possible it could arrive at the U.S. Army and still be transferred via email, nullifying any attempt made in the name of personal security.
Now let’s dig into that second to last sentence there. “Please be aware that any documents that are submitted to the full Commission will also be made available to the public.” So the panel is going to take everyone’s personal data, the exact data needed to steal someone’s identity, and release it to the public? Where anyone can download it?
Finally there’s the request for the last four digits of your social security number, which is the most pernicious and misguided request in the letter, because it is the key to your identity. The digits you would be handing over are the most random ones in your coding as they are in sequential order of your request for a card. The first five, if you applied for a social security number before 2011 (which is almost all of us) are sequencing based on location: the first three digits were assigned based on the ZIP Code of your mailing address at the time of your social security application. You may still live in the same zip code, or you can be easily tracked for previous addresses online. The second two digits are a group number based on location, again possible to reference it with a little work.
As a final consideration, there are also people who need to stay hidden for their own protection: spouses who do not want their abusers to locate them, witnesses from court cases, adoption issues, even celebrities who don’t want to subject themselves to stalkers or superfans constantly knocking on their front door.
And there is no “opt out” for an individual, the record is historical in nature going back to 2006 and presented by the election board, not the individual.
This whole project seems to have been construed without consultation with anyone in the information security sector and it shows. Even if the panel could overcome the transfer and privacy issues, we have no way of knowing how this information will be stored (assuming that it isn’t released to the public as the letter suggests) so that confidential data can actually remain confidential.
Thankfully many of the Attorneys General from across the country have said that this won’t be happening because of state privacy laws. Encourage them, and your state election boards, to keep fighting this. Hopefully the commission will figure out that this is a horrible idea.
On Saturday a lot of winners of the Microsoft MVP award got emails that they’ve been waiting for telling them that they were renewed as MVPs for another year. Some people didn’t get renewed for one reason or another, and only Microsoft knows why.
You may have expected to see me tweet that I was renewed as a Microsoft MVP on Saturday, but I didn’t. That’s because I wasn’t. Now this doesn’t mean that I’m not an MVP, because I am. The thing is that unlike most of the other MVPs I wasn’t up for renewal as a Microsoft MVP July 1st. The reason why is a little lengthy, so bear with me.
Back in the olden days of the MVP program (2016) there were 4 renewal schedules, at the top of each quarter. I received my award in October the first time around, so I got renewed every October. When they announced the change to the program around the MVP Summit last year they had just renewed the October awardees so technically we’d all still be MVPs when renewal came up in July. Because of that Microsoft decided that we’d keep our October award expiration, then they’d extend that until the next July (our awards should have expired October 2017, so they now expire July 2018). Because of this we all get to keep our awards for an extra 9 months (January and April award winners got extended until July 2017, the July award winners didn’t get any extra time).
So yes, I’m still an MVP. You can tell who’s an MVP by looking at the MVP webpage. Now this isn’t a perfect solution as some MVPs opt to hide their profile for one reason or another, but it’s a pretty good bet that if someone isn’t listed there they aren’t an MVP anymore, and if they are listed there they are an MVP. As soon as someone is no longer an MVP there profile is removed so there’s no risk of finding people that aren’t MVPs in the MVP directory.
Congrats to the MVP Class of 2017, you’ve all earned it. See you at the next award cycle.
Running SSRS with the ReportServer and ReportServerTempDB databases has been supported for a while now. If however you want to use scheduled delivery of reports you need to do some extra stuff to make this work.
The first thing that you need to do, if when setting up the SSRS instance you need to move the AG from Replica to Replica to setup the correct permissions within the MSDB database (or manually setup the permissions for SSRS within the MSDB database. The easy solution is to make the SSRS database a member of the db_owner role in msdb. This is because apparently the permission that the SSRS configuration manager gives the SSRS account doesn’t include the ability to add a category to the msdb database, which it needs to do. db_owner was the easy fix in our case, and the risk of SSRS doing something it shouldn’t in MSDB is minimal.
That’ll allow SSRS to create all the needed jobs on each node of the Availability Group.
The next thing you need to do is configure the Availability Group to restart SSRS after the AG fails over. This needs to happen because SSRS doesn’t recreate the jobs when it reconnects to the AG, it only does this on SSRS startup. If you are running SSRS in a farm (which you should be doing for high availability) you only need to restart SSRS on one node of the farm.
This can be easily done with a three line batch file, which you configure as a generic resource in failover cluster manager within the resource group which hosts the availability group. Personally I put this script in c:\scripts with some other stuff.
sc \\ssrs-server stop ReportServer > c:\scripts\restart_ssrs.txt
sc \\ssrs-server start ReportServer >> c:\scripts\restart_ssrs.txt
The first line stops SSRS, the second starts it. The third line pauses the batch file so that it doesn’t complete and quit. If it did that then failover clustering would see that as a failure and a failure trigger a failover (or if you disable that a failed service). And since I like my clusters all green and happy, I pause the batch file so it sits there forever. The > c:\scripts\restar_ssrs.txt is logging the output to a text file.
Yes, I could have done this with PowerShell but it wouldn’t have gotten me anything except for more complex code while I check to see if the service is down, then attempt to start it. And SC works just fine for this.
The last thing you need to do is make the computer accounts for all the members of the availability group members of the local administrators group of the SSRS server. This needs to be done so that the service can be restarted (or in some other way give these accounts access to restart services). It needs to be the computer account of the member nodes, not the computer account of the cluster as the batch files run under the context of the computer.
After that, when an AG failover occurs your SSRS will restart within a minute or two and create any missing jobs.
Now the downside here is that you’ll have a bunch of jobs failing on the secondary replicas, but I’ll take that over a non-working configuration.
Recently Microsoft Azure introduced the ability to have multiple front end IP addresses on an Internal Load Balancer. This allows for
Failover Cluster Instances to have multiple instances per cluster, and Availability Group deployments to have multiple Availability Groups per deployment.
But with this improvement comes the need to be very careful about how our probe ports are setup on things. Otherwise things aren’t going to work exactly as we expect them to. And we’ve seen this on a couple of different deployments with customers now.
Much of this problem comes from the fact that all the sample code that’s out there and talks about how to setup an FCI or AG assumes a single front end IP and a single probe, as that’s all that was available for years.
Thankfully solving this is actually pretty straight forward. We need a single probe configured in the Load Balancer for each front end IP address that we configure in the load balancer. That means that each IP that we have configured in failover cluster manger also needs it’s own probe as well. If you don’t setup a separate probe for each clustered IP address then the only way to ensure that everything works as expected is to have all of those Clustered IP addresses on the same node of the windows cluster.
What I like to do is something like this.
The reason that we have to do this is pretty basic, you just have to remember how the probe ports work. The probe port that you configure in Windows isn’t listening on the clustered IP Address. It’s listening on 0.0.0.0 or all IP addresses. Also, the Internal Load Balancer doesn’t connect to the probe port on the clustered IP, but instead on the clustered node’s IP address.
We can see this is we run netstat -a -n on the node that’s hosting the Availability Group replica. In this case the node of the VM is 10.0.0.8 and the probe for the AG is listening on port 59999. The SQL Server Availability Group is listening on 10.0.0.14 in this case and the cluster root of the Windows Cluster is listening on 10.0.0.13 with it’s probe on 59998. Before we reconfigured this cluster in this way, if the Cluster root and the AG were running on the same node then everything was fine. However if either one was moved to a different node of the cluster then connectivity to SQL became “interesting”. SSMS would only draw 1/2 the folders it is supposed to show. The application got impossible for users to use, etc. Basically everything broke and fell apart. Once we realized the problem, the fix became pretty straight forward. Change the probe configuration for the cluster root (10.0.0.13 in this case) to another port and configure the Internal Load Balancer with another probe, the configure the load balancing rules for the cluster root to use the new probe. Once all that way done everything started working as expected.
These sorts of configuration problems can creep up on you. This environment was 6 months old when this problem first showed up. Because until then, the cluster root and the AG always happened to reside on the same node. But recently that changed because the third node of the cluster (we have an extra machine in the config so that we are using a majority node set quorum configuration) lost network, then got it back. This caused a cluster vote and this machine became the cluster manager so the cluster room (10.0.0.13 and it’s network name) moved to this node causing the probe to be open on two machines and causing all hell to break loose as 1/2 the SQL connections were being directed to a server that didn’t even have SQL installed on it, much less have the Availability Group listener running on it.
The probe is very important to configure for everything that you want to load balance to, it just needs to be configured correctly.
It’s shockingly easy to setup as well. You’ll need this WordPress plugin to handle the actual authentication. It’s not in the WordPress gallery, instead it’s on GitHub so you’ll have to download it from there and install it.
Once it’s setup you’ll need to log into the Azure portal and tell Azure Active Directory that you’ll have a new application using Azure Active Directory. Then copy the keys from Azure Active Directory to the settings page of the Plugin.
Now don’t worry, there’s instructions on the GitHub page for the plugin that’ll walk you through setting it all up. It frankly took me all of about 5 minutes to get it setup and working.
Now if your WordPress is configured to prompt you to do a math problem when you login, you’ll need to disable the Protect feature in the Security tab in the JetPack Security tab.
The benefits of using Azure Active Directory for your authentication are enormous, including robust Two Factor Authentication, corporate password policies, granting access via AD group membership as well as the fact that your login process is now backed by the Worldwide Azure Active Directory infrastructure.
If you have WordPress and have access to Azure Active Directory I would HIGHLY recommend setting up this authentication process.
One of my clients recently had the need to upload tens of terabytes of data into Azure Blob Storage. This gave us the perfect opportunity to use the Azure Import Export service to get these several terabytes of data into Azure by using the Azure Import/Export Service. The Azure Import/Export Service allows you to ship hard drives to Microsoft with your data on them, which is then copied up to the Azure Blob Storage. You can then move the data around from there as needed. All this is done using the Azure Import/Export Tool which is a command line tool, which has a few quirks.
The biggest querks that we ran into getting the Azure Import/Export Tool working was that it doesn’t support quotes in the parameters unless there are spaces in the folder or file names. So the fix here, don’t use spaces and don’t use quotes. For example at first I was trying to use the following in my command line.
But what I needed to use was actually this.
That’s a pretty small difference, but an important one. And the error message that the tool gives doesn’t say that you have invalid characters in the logdir parameter. It just tells you that you have invalid characters in the path or file name. Which means any of the paths or file names, and you have to specify several of them including the journal file (/j) the log folder (/logdir) the drive configuration layout for what drives it’s writing to (/InitialDriveSet) and what folders to pull onto those drives (/DataSet).
Another annoying thing was that WAImportExport.exe didn’t like having /DataSet at the right end of the command line. It only liked it when it was at the left hand side of the command line. Now this was before I figured out the double quotes issue, and that may have been part of it but with the double quotes on all the parameters and with the DataSet parameter on the right hand side, it complained that there was no DataSet parameter provided.
When configuring the DataSet CSV file, you need to put the container name in all lowercase.
Overall I was pretty impressed with how the processed worked. The CSV files were pretty easy to put together. The DataSet file that you provide just tells the application what folders to move where. In this example I’m moving the files from C:\something to the “something” container in my blob storage account (you can use a network share instead of a local file).
In the InitialDriveSet parameter you tell the application which drives that are attached to your computer that it’s using to ship the data.
In my base the drive was formatted and the disk was already bit lockered.
The application has some quirks to it, like I said earlier in the post. But once those got figured out, it was pretty easy.
There are a lot of ways to build a database in the cloud. Picking the correct solution for your workload can be a daunting task. When building a data warehouse solution that needs to scale out from terabytes to near petabyte scale, you suddenly have a lot fewer options. The two biggest players in the cloud market are Amazon’s Red Shift product, and Microsoft Azure SQL Data Warehouse (SQL DW).
There two solutions are going to take a very different approach to building and designing your solution, and migrating between the two solutions can be tricky as there is a lot of data movement that needs to happen to move your data warehouse.
Azure SQL DW has some distinct advantages over Red Shift.
One of the biggest is its ability to scale up and down as needed within minutes with just a small service disconnection. Scaling a Red Shift environment up and down requires large amounts of downtime, the bigger the data warehouse, the more downtime is required as data must be moved around within the Red Shift environment for the data warehouse to be scaled up and down. So if you want to start small, and as the data warehouse grows, scale up you really can’t without a large interruption of service to your users. With SQL DW, the scale up or down operation is kicked off, and within a few minutes more (or less, depending on your need) computer resources are available behind the SQL DW database.
Another huge advantage with SQL DW is that as far as most tools are concerned, it’s just a SQLServer Database. Basically, anything that supports T-SQL can connect to the SQL DW database in Azure and run queries against it. The experience is much better if you use a new version of SQL Server management studio, but it isn’t required. There’s some abstraction that happens on the server side so take the queries which are run, which are written as normal T-SQL statements, to parallelize them against all the backed computer resources to give you the compute scale out that you need with the SQL DW platform. This makes migrating from a SQL Server data warehouse on-premises very easy because the reports which you run today, can just be pointed to use the SQL DW with no changes to the reports. If you are moving from an Oracle Data Warehouse to SQL DW, odds are the reports will be able to run with little to no change, as most SELECT statements convert from Oracle to SQL Server syntax very easily.
From a data visualization and tools layer the entire SQL Server stack just works with SQL DW. Running SSAS on premises or in the cloud–no problem. Power BI can execute live reports against your SQL DW.
Additionally, SQL DW supports Azure Active Directory authentication, which means your users can login with their on-premises credentials. If you have Active Directory Federation Services configured, your users can have pass through authentication.
With SQL DW if there are times of the day or night where you know that there are no users running queries or data loads against the data warehouse we can simply pause the SQL DW so that we aren’t paying for compute resources when we aren’t using them. This can make it more cost effective to run a large sized SQL DW so that processing can be completed faster, then the data warehouse paused during non-work hours.
These are just a few of the reasons that we’ve been using Azure SQL DW with our clients. We’ve had great success with data warehouse projects which have been using Azure SQL DW and we hope to have many more of them. If you’d like to work with our team, we’d be happy to assist with your Azure SQL DW projects. Contact the DCAC team to get started with your Azure SQL DW project today.
This sort of question comes up a lot. And there’s a lot of it-depends built into the answer, so I’ll try and break this down a little bit for you. Be warned, we’re going to be talking about NUMA and other hardware-y things here for the most part.
You wants 1×6 (one socket, 6 cores) because standard edition will only use the first 4 sockets in a server (up to 16 cores combined). There’s no getting around that.
From a NUMA perspective as long a vNUMA at the Hypervisor is disabled then it doesn’t matter as SQL Server standard edition isn’t NUMA aware (NUMA awareness is an Enterprise Edition feature).
This is where things get more complicated if vNUMA is enabled in VMware or Hyper-V.
If vNUMA is enabled then you want one multiple vSockets and multiple vNUMA nodes so that SQL Server is aware of how the CPUs and memory are laid out within the hardware so that SQL can made good decisions on how the processes are being laid out against the hardware.
Now that said, you probably don’t want 6 vSockets. You probably want 2 vSockets with three cores each so that you get multiple cores per vSocket. But a lot of that will depend on if you can control how many vSockets there are per vNUMA node.
If vNUMA is disabled then you care less because SQL thinks that everything is in a single NUMA node so it’s going to make decisions based on that.
More RAM than a pNUMA node
If the amount of RAM configured for your VM is larger than a physical NUMA node, then you need to turn on vNUMA for the VM (no matter how many cores you have) and configure the VM to the cores equally across the NUMA nodes that you present to the VM.
Yes, this is all very hardware-y, and requires some understanding of how pNUMA, vNUMA, vSockets, etc. all work together. For most DBAs you’ll want to just kick this over to the VMware / Hyper-V admin and have them do some tweaking.
From a management perspective Azure SQL DB and Azure Elastic Pools are the same. As far as the DBA or the developer is concerned they are the same. Think of Elastic Pools as a billing construct. With SQL DB (not in Pools) you pay for X amount of performance for each database. With Elastic Pools you pay for D performance for E databases and each database gets at most F performance.
So lets say I buy a Standard Elastic Pool with 800 eDTUs. I can put up to 500 databases in there and they are going to share that 800 eDTUs. Each database can have up to 100 eDTUs per database before it’s going to have it’s performance throttled.
The idea behind the elastic pools is that the performance profile for each database will be different, and they won’t need to spike to their limit all at the same time. If they did that would be a performance problem as you’re run out of DTUs to spread across the databases. In this example pool basically 8 databases can hit 100% of their DTU limit before everyone suffers.
The reason that you’d want to setup things this way is from a pricing perspective. An 800 eDTU pool is going to cost you ~$1800 a month. A single 100 DTU standard database will cost you ~$150 a month. If you have more than 12 databases that all need to cap out at 100 DTUs, and their performance profiles aren’t the same (the have their spikes in workload at different times) then putting them in an elastic pool will be cheaper.
From a management perspective databases in a SQL DB and outside a SQL DB are basically the same. Everything that’s supported inside a normal SQL DB is supported inside an Elastic Pool database. They have the same features and limits. The big difference is that the performance limits are now a shared pool of resources not dedicated per database.
For DBAs who are used to working with instances on-prem this should be a familiar pattern for you. Think of an Elastic Pool as a SQL Instance on a server. The server is a pool of resources that you have to decide what databases to stick inside that pool. It’s the same idea here. The difference between on-prem and in Elastic Pools is that in Azure you don’t know how many cores or RAM you have, you just have the eDTU construct. And you can have multiple Elastic Pools per Azure SQL Server name (something.database.windows.net).
For monitoring you have the same options between Azure SQL DB and Elastic Pools. You can use the Azure portal, you can query the API using PowerShell or the CLI. Or you can use DB Sentry from SentryOne (SQL Sentry) to monitor things just like you used to.