Posted by: Denny Cherry
Back To Basics, SQL Server, Storage
SQL Server has three different storage terms which you need to know when dealing with physical file storage. These are files, file groups and disks. While each one of these is totally different, then work together giving the SQL Server database engine a very flexible way to define where the data is actually stored.
Disks are the physical storage within the server. If you look at the front of the server they will be the little devices which are inserted into the front of the server and typically have lights on them that flash. In larger environments where there are lots of servers you may have a storage array (also called a SAN). In it’s simplest terms the storage array is a bunch of disks that are in one large box. The space on these small disks is put together in various ways and parts of this space are allocated to specific servers. While the storage array may have anywhere from a couple of terabytes to thousands of terabytes the specific servers will only have access to a very small percentage of this space. In either case, either using local disks in the front of the server or a large storage array the concept is the same as far as SQL Server is concerned. The storage that is connected to the server is presented as a drive letter in Windows.
If you were to log onto the console of the server you would see an computer which would look very familiar to you. This is because the computer running SQL Server in the server room or data center is running Windows just like the computer which is on your desk and just like the computer at your home (assuming that you aren’t using Linux or a Mac). If you opened the My Computer on a server you would see hard drives just like you do on your home computer. On the server there are probably more hard drives than your workstation or home computer and the drives on the server probably have a lot more space than the ones on your workstation, but the idea is still very much the same. Any files that are needed are written to the disks and then accessed by users or by programs, with Microsoft SQL Server simply being a large complex program.
The files which the SQL Server database uses are called MDF, NDF and LDF files (for the most part). These are physical files which sit inside the disks (see above). A database will have at least two files, one of which is the MDF file which holds all of the data and the second which is the LDF file which holds the transaction log. In it’s simplest form the transaction log is a complete record of all the changes which have happened to the database since the database was first created. We manage the size of the transaction log through backups (which I’ll talk about in a later post) so for now we’ll just say that the transaction log holds the record of everything that has changed within the MDF data file. If a row is added that is written to the transaction log first. If a row is deleted that change is written to the transaction log before it is actually deleted from the database.
The NDF database files are simply additional data files. While the database must have one MDF and one LDF data file the use of NDF data files is optional. There is no performance benefit to using NDF data files instead of just a MDF data file (so this is only the case 99.9% of the time but that’s good enough for this post). The only time you’ll get a performance benefit from using NDF data files is if the MDF data file is on one physical hard drive and the NDF is on a different physical hard drive.
Each specific file exists on one and only one hard drive. If you need to store the data on multiple hard drives then you’ll need multiple data files.
File groups are simply logical groupings of files. We use file groups so that specific tables can be larger than any one specific hard drive. If we didn’t have file groups when we created a table and we told the SQL Server where to store that table we would have to tell it which physical data file to store the data in. This would mean that the tables would be limited by the size of the largest disk which was available. Because we want to be able to spread our data across multiple physical files we have file groups. The file groups allow for multiple files to be put into a single group. SQL Server will then create the table within that file group putting portions of the database on all the files within the file group. This allows us to have tables which are very large as we are only limited by the number of disks we can attach to the server and the size of those disks.
When we create a new table (or index, or service broker queue, or any other object which is a physical object which holds data) we specify which file group we want to create the object within. This allows us to leverage any files which are members of the file group. If there are multiple files the object will be spread over all the physical files. If there is just one file today when we create the table but then we add another file to the file group later there’s nothing that we need to do. The SQL Server will automatically begin using the new file as well as the old file. When you add a new file to a file group SQL Server doesn’t rebalance the data across the database files, you have to do this manually by rebuilding all the indexes, but new rows would be written to the new file.
When SQL Server is writing data to the files within the file group it uses something called proportional fill. This means that the SQL Server will do it’s best to keep the amount of free space within the data files the same. While it isn’t perfect at doing there as there are a lot of things that can effect it’s ability to do this, the SQL Server will do it’s best.
Hopefully this helped explain these three complimentary yet different concepts.