It’s shockingly easy to setup as well. You’ll need this WordPress plugin to handle the actual authentication. It’s not in the WordPress gallery, instead it’s on GitHub so you’ll have to download it from there and install it.
Once it’s setup you’ll need to log into the Azure portal and tell Azure Active Directory that you’ll have a new application using Azure Active Directory. Then copy the keys from Azure Active Directory to the settings page of the Plugin.
Now don’t worry, there’s instructions on the GitHub page for the plugin that’ll walk you through setting it all up. It frankly took me all of about 5 minutes to get it setup and working.
Now if your WordPress is configured to prompt you to do a math problem when you login, you’ll need to disable the Protect feature in the Security tab in the JetPack Security tab.
The benefits of using Azure Active Directory for your authentication are enormous, including robust Two Factor Authentication, corporate password policies, granting access via AD group membership as well as the fact that your login process is now backed by the Worldwide Azure Active Directory infrastructure.
If you have WordPress and have access to Azure Active Directory I would HIGHLY recommend setting up this authentication process.
One of my clients recently had the need to upload tens of terabytes of data into Azure Blob Storage. This gave us the perfect opportunity to use the Azure Import Export service to get these several terabytes of data into Azure by using the Azure Import/Export Service. The Azure Import/Export Service allows you to ship hard drives to Microsoft with your data on them, which is then copied up to the Azure Blob Storage. You can then move the data around from there as needed. All this is done using the Azure Import/Export Tool which is a command line tool, which has a few quirks.
The biggest querks that we ran into getting the Azure Import/Export Tool working was that it doesn’t support quotes in the parameters unless there are spaces in the folder or file names. So the fix here, don’t use spaces and don’t use quotes. For example at first I was trying to use the following in my command line.
But what I needed to use was actually this.
That’s a pretty small difference, but an important one. And the error message that the tool gives doesn’t say that you have invalid characters in the logdir parameter. It just tells you that you have invalid characters in the path or file name. Which means any of the paths or file names, and you have to specify several of them including the journal file (/j) the log folder (/logdir) the drive configuration layout for what drives it’s writing to (/InitialDriveSet) and what folders to pull onto those drives (/DataSet).
Another annoying thing was that WAImportExport.exe didn’t like having /DataSet at the right end of the command line. It only liked it when it was at the left hand side of the command line. Now this was before I figured out the double quotes issue, and that may have been part of it but with the double quotes on all the parameters and with the DataSet parameter on the right hand side, it complained that there was no DataSet parameter provided.
When configuring the DataSet CSV file, you need to put the container name in all lowercase.
Overall I was pretty impressed with how the processed worked. The CSV files were pretty easy to put together. The DataSet file that you provide just tells the application what folders to move where. In this example I’m moving the files from C:\something to the “something” container in my blob storage account (you can use a network share instead of a local file).
In the InitialDriveSet parameter you tell the application which drives that are attached to your computer that it’s using to ship the data.
In my base the drive was formatted and the disk was already bit lockered.
The application has some quirks to it, like I said earlier in the post. But once those got figured out, it was pretty easy.
There are a lot of ways to build a database in the cloud. Picking the correct solution for your workload can be a daunting task. When building a data warehouse solution that needs to scale out from terabytes to near petabyte scale, you suddenly have a lot fewer options. The two biggest players in the cloud market are Amazon’s Red Shift product, and Microsoft Azure SQL Data Warehouse (SQL DW).
There two solutions are going to take a very different approach to building and designing your solution, and migrating between the two solutions can be tricky as there is a lot of data movement that needs to happen to move your data warehouse.
Azure SQL DW has some distinct advantages over Red Shift.
One of the biggest is its ability to scale up and down as needed within minutes with just a small service disconnection. Scaling a Red Shift environment up and down requires large amounts of downtime, the bigger the data warehouse, the more downtime is required as data must be moved around within the Red Shift environment for the data warehouse to be scaled up and down. So if you want to start small, and as the data warehouse grows, scale up you really can’t without a large interruption of service to your users. With SQL DW, the scale up or down operation is kicked off, and within a few minutes more (or less, depending on your need) computer resources are available behind the SQL DW database.
Another huge advantage with SQL DW is that as far as most tools are concerned, it’s just a SQLServer Database. Basically, anything that supports T-SQL can connect to the SQL DW database in Azure and run queries against it. The experience is much better if you use a new version of SQL Server management studio, but it isn’t required. There’s some abstraction that happens on the server side so take the queries which are run, which are written as normal T-SQL statements, to parallelize them against all the backed computer resources to give you the compute scale out that you need with the SQL DW platform. This makes migrating from a SQL Server data warehouse on-premises very easy because the reports which you run today, can just be pointed to use the SQL DW with no changes to the reports. If you are moving from an Oracle Data Warehouse to SQL DW, odds are the reports will be able to run with little to no change, as most SELECT statements convert from Oracle to SQL Server syntax very easily.
From a data visualization and tools layer the entire SQL Server stack just works with SQL DW. Running SSAS on premises or in the cloud–no problem. Power BI can execute live reports against your SQL DW.
Additionally, SQL DW supports Azure Active Directory authentication, which means your users can login with their on-premises credentials. If you have Active Directory Federation Services configured, your users can have pass through authentication.
With SQL DW if there are times of the day or night where you know that there are no users running queries or data loads against the data warehouse we can simply pause the SQL DW so that we aren’t paying for compute resources when we aren’t using them. This can make it more cost effective to run a large sized SQL DW so that processing can be completed faster, then the data warehouse paused during non-work hours.
These are just a few of the reasons that we’ve been using Azure SQL DW with our clients. We’ve had great success with data warehouse projects which have been using Azure SQL DW and we hope to have many more of them. If you’d like to work with our team, we’d be happy to assist with your Azure SQL DW projects. Contact the DCAC team to get started with your Azure SQL DW project today.
This sort of question comes up a lot. And there’s a lot of it-depends built into the answer, so I’ll try and break this down a little bit for you. Be warned, we’re going to be talking about NUMA and other hardware-y things here for the most part.
You wants 1×6 (one socket, 6 cores) because standard edition will only use the first 4 sockets in a server (up to 16 cores combined). There’s no getting around that.
From a NUMA perspective as long a vNUMA at the Hypervisor is disabled then it doesn’t matter as SQL Server standard edition isn’t NUMA aware (NUMA awareness is an Enterprise Edition feature).
This is where things get more complicated if vNUMA is enabled in VMware or Hyper-V.
If vNUMA is enabled then you want one multiple vSockets and multiple vNUMA nodes so that SQL Server is aware of how the CPUs and memory are laid out within the hardware so that SQL can made good decisions on how the processes are being laid out against the hardware.
Now that said, you probably don’t want 6 vSockets. You probably want 2 vSockets with three cores each so that you get multiple cores per vSocket. But a lot of that will depend on if you can control how many vSockets there are per vNUMA node.
If vNUMA is disabled then you care less because SQL thinks that everything is in a single NUMA node so it’s going to make decisions based on that.
More RAM than a pNUMA node
If the amount of RAM configured for your VM is larger than a physical NUMA node, then you need to turn on vNUMA for the VM (no matter how many cores you have) and configure the VM to the cores equally across the NUMA nodes that you present to the VM.
Yes, this is all very hardware-y, and requires some understanding of how pNUMA, vNUMA, vSockets, etc. all work together. For most DBAs you’ll want to just kick this over to the VMware / Hyper-V admin and have them do some tweaking.
From a management perspective Azure SQL DB and Azure Elastic Pools are the same. As far as the DBA or the developer is concerned they are the same. Think of Elastic Pools as a billing construct. With SQL DB (not in Pools) you pay for X amount of performance for each database. With Elastic Pools you pay for D performance for E databases and each database gets at most F performance.
So lets say I buy a Standard Elastic Pool with 800 eDTUs. I can put up to 500 databases in there and they are going to share that 800 eDTUs. Each database can have up to 100 eDTUs per database before it’s going to have it’s performance throttled.
The idea behind the elastic pools is that the performance profile for each database will be different, and they won’t need to spike to their limit all at the same time. If they did that would be a performance problem as you’re run out of DTUs to spread across the databases. In this example pool basically 8 databases can hit 100% of their DTU limit before everyone suffers.
The reason that you’d want to setup things this way is from a pricing perspective. An 800 eDTU pool is going to cost you ~$1800 a month. A single 100 DTU standard database will cost you ~$150 a month. If you have more than 12 databases that all need to cap out at 100 DTUs, and their performance profiles aren’t the same (the have their spikes in workload at different times) then putting them in an elastic pool will be cheaper.
From a management perspective databases in a SQL DB and outside a SQL DB are basically the same. Everything that’s supported inside a normal SQL DB is supported inside an Elastic Pool database. They have the same features and limits. The big difference is that the performance limits are now a shared pool of resources not dedicated per database.
For DBAs who are used to working with instances on-prem this should be a familiar pattern for you. Think of an Elastic Pool as a SQL Instance on a server. The server is a pool of resources that you have to decide what databases to stick inside that pool. It’s the same idea here. The difference between on-prem and in Elastic Pools is that in Azure you don’t know how many cores or RAM you have, you just have the eDTU construct. And you can have multiple Elastic Pools per Azure SQL Server name (something.database.windows.net).
For monitoring you have the same options between Azure SQL DB and Elastic Pools. You can use the Azure portal, you can query the API using PowerShell or the CLI. Or you can use DB Sentry from SentryOne (SQL Sentry) to monitor things just like you used to.
There is a massive crypto worm running around the internet destroying computers and networks. You need to patch your machine. The reason that patching is so
important to get protected from this work, is that this worm doesn’t require that you click on anything to attack you. Once it’s in your companies network it can start hitting machines around the network that aren’t patched and it could, depending on your patching settings, encrypt every machine in your network.
This is bad. In just a day 45,000+ computers have been encrypted.
In case you missed it, patch … NOW!!!!!!
If you want to read up, this is the security hole that you need to fix.
This page from Microsoft shows how to ensure that you are fully patched.
It’s Mothers Day in the US this weekend, so you’re going to call Mom anyway. Tell her to patch her computer.
Today at Microsoft’s //build conference Microsoft announced a couple of new data platform services that are available for people who are looking to run MySQL driven applications using Microsoft’s cloud in a Platform as a Service (PaaS) solution. This solution gives you the same flexibility and ease of management of the other PaaS data platform solutions that Microsoft has made available in their offering, but now with a fully Open Source platform.
In addition to the MySQL platform they also announced PostgreSQL as an open source offering in their PaaS offering. While the PostgreSQL announcement is just as huge, I’m focusing on the MySQL announcement as that’s the part of the platform that I’ve been working with as it hosts WordPress.
Why would Microsoft be putting these open source offerings? Because they’ve been on the roadmap for quite some time. Remember that a large portion of Azure is running open source software and platforms for customers. But moving those open source platforms into a PaaS offering is quite a bit harder than just standing up a couple of VMs and calling it a PaaS platform. There’s a massive amount of tooling and automation that has to happen on the back end to configure all this, plus setup the portal to handle management, as well as the integration with all the other components in Azure (OMS, firewall, monitoring, alert rules, etc.)
Now I know that some people are going to drone on and on about how this is just Microsoft reacting to Google’s announcement back in March. And that couldn’t be further from the truth. Microsoft has been working on this service for months and it’s been in Private Preview since long before Google’s announcement. I know this because several of our websites have been using this service as the database backend for months now.
What you get when you spin up a MySQL (or PostgreSQL) database in this new PaaS service is a MySQL instance with HA built into the platform. You just connect to it like you would any other MySQL service by using the MySQL command line tools (or any of the web based management tools) and you create databases and point your applications at them.
Is it complete?
My assumption (and hope) is that the goal for these open source PaaS Data Platform services is parity with the SQL Server based solutions. So as this platform gets closer and closer to GA we’ll hopefully see this service getting closer and closer to the parity in features between the SQL Server based data platform solution and the MySQL based solution. Personally I can’t wait for multi-site configurations to be available to we can stretch a MySQL database across multiple sites, then I’ll have to come up with a plugin that’ll redirect read only queries to another database server in a MySQL configuration which’ll be kind of cool to work on.
This sure is an exciting time to be working in Data Platform especially in Azure as Microsoft brings us more Open Source Software into the Azure stack.
Using MySQL in PaaS
DCAC has been using this MySQL in PaaS solution for quite a while now, since probably sometime in January to run several of our sites, including the development version of our main website where we do some WordPress plugin development. We put the dev site there because when doing development and testing of WordPress plugins you end up generating a lot of controlled workload for a server so you can see everything that’s coming up and being a problem. And frankly it’s been pretty easy to forget that this is a private preview service.
The response time for an OLTP application (which was what we were testing) was really good. The database queries were a few milliseconds longer than when the MySQL database was running locally, and that’s all because of the network difference between talking to a VM in the same subnet and talking to a service in the same region, but that’s understandable and acceptable. Going across regions introduces the expected level of network latency that you’d get going across the network from one city to another. As this service is lit up in more regions this problem will become less and less as you’ll be able to have MySQL databases in the same region as your web servers.
All in all we’ve been very happy with the service as we’ve been using it against development and customer facing websites over the last several months. And we’re really looking forward to see what Microsoft gives us with the product in the coming weeks and months so that we can eventually move all our sites to it and stop managing VMs in Azure.
Back in March Redgate kicked off a contest on a few blogs to give away some SQL Clone licenses and some Amazon Echo Dots. That contest has come to a close, and Redgate has announced the winners of the contest. So if you entered on the DCAC blog post, Joey’s post, or any of the other pages go check out Redgate’s post announcing the winners to see if you’ve won the Echo and a license.
(I believe there will be other posts from Redgate in the future that’ll have the winners of SQL Clone in them, I’m not really sure how they’re announcing them.)
Go check out their post, and check out SQL Clone as well with their free trial.
If you’ve worked with SQL Server for any period of time you’ve problem run into what appear to be pretty random occurrences of SQL Server starting to have performance problems which go away after you rebuild the indexes. What’s going on to make the problem happen, and what’s solving the problem are however not what they appear.
When this happens the common thought it that it’s index fragmentation that is causing the problem. And while that isn’t helping you, it isn’t the root cause of the issue.
In most situations, however the cause of the problem is out of date statistics. And the solution is to update the statistics, or more specifically have SQL Server update them more often automatically.
If you are running SQL Server 2008 R2 SP1 through SQL Server 2016 then you want to turn on trace flag 2371 when SQL Server status (you can set it as a startup parameter, and then turn it on by running DBCC TRACEON (2371, -1) and that’ll make it take effect until the next restart). There’s really no downside to this trace flag being on, especially for systems that have large amounts of data change impacting query performance. The only negative impact that this trace flag will have is that the system will trigger update statistics more often throughout the day. To ensure that this doesn’t cause problems we also need to enable async auto stats updates.
After you’ve made these changes, you’ll need to update statistics as update stats won’t trigger right away, and the problem shouldn’t return.
ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data. However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index. This is because, you can’t split a ColumnStore partition once it contains data.
Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.
However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.
We can see this pretty easily with a quick repro.
CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES ( 1, 100, 1000 );
create partition scheme myScheme1
as partition myRangePF1
all to ([primary])
create table t1
create clustered columnstore index cs on t1 on myScheme1(c1)
insert into t1
(1), (10), (100), (600)
With our table here we’ve got a partition function with break points at 1, 100, and 1000. We then insert rows into a columnstore index on that table with the values of 1, 10, 100 and 600. If we try and split our partitions using the ALTER PARTITION FUNCTION command with any value between 1 and 1000 we’ll get an error message.
ALTER PARTITION FUNCTION myRangePF1 ()
SPLIT RANGE (500);
This is because those partitions contain data. If we try and more any of the partition ranges (1, 100, or 1000) then we’ll get an error because we can’t merge them together either.
So how to be work around this issue? Well the easiest way is to delete the data from the table, then fix the partitions, then put the data back. And frankly, that’s the only option. There’s a variety of ways to do that. The easiest option is probably to partition switch out that partition to an un-partitioned table. Then it’s just a matter of moving that data back into the partition in question.
Now if the partition that you’re currently writing to is the last partition, and you need to keep writing to that partition while this is all happening thing are going to get a little more interesting as you’ll need to take an outage in order to do all this work. There’s really no good solution besides taking an outage to move data around and get it correct in order to resolve this issue.
I wish I had a better answer here, but so far there’s no good solution.