I was listening to a podcast on the latest data classification trends and got to thinking about how good of a job current methodologies are doing. When it comes to storage tiering, I’d have to say there’s lots of room for improvement. Tools that assess application latency can help improve the situation.
In the storage space, data classification deals with ways to identify data objects (files, usually) so that they can be stored “appropriately.” What’s appropriate depends on the motivation for classifying the data in the first place. After all, it’s certainly easier to throw it all into a junk drawer than to worry about how your data’s organized.
It’s important to use the right classification scheme or technology in order to get data organized in a way that’s useful. E-discovery objectives for classification deal with what a data object is, especially records of communications between people (instead of production data). E-discovery classification tools are typically appliances or software that proactively crawl file systems, spending most of the time on archives or data that’s been saved, since this represents a much larger volume of data than current records do.
Data protection needs to classify data in order to support the applications (like backup) that ensure business continuity. It’s focused on grouping data by how critical it is to the business, how often it needs to be copied (RPO), how quickly it needs to be restored (RTO) and how often it changes, etc. It also deals with how many copies need to be made and where they need to be stored (DR). Backup systems typically do this grouping and manage the classification process through configuration options.
Storage tiering is done mainly to save money. Let’s face it, if money was no object, all data would be kept on fast disk or SSD (to make the point, I’m ignoring power availability and DR considerations). The criteria for classification are mostly the performance requirements of the applications or the people using that data and to a lesser extent, how often it’s accessed. Unlike other classification criteria, performance is not a static characteristic that can be read from a file header or determined solely by data type. It’s a dynamic variable that depends on the speed of the other devices in the compute environment and how fast they need data supplied from the storage system. Most organizations don’t really know “how fast” this is, so they just assume that their most critical applications need their fastest storage. Unfortunately, this can lead to over-provisioning storage and wasting huge amounts of money — kind of ironic, since cost reduction was the objective of the tiering effort to begin with.
A new approach to storage tiering at the high end, sometimes referred to as “performance tiering,” is being used by companies to capture some real intelligence on the performance their storage systems really need to provide, in order to keep applications latency at a minimum. The key is the use of network-connected, physical-layer access points to get real-time data on application latency in the delivery of data through the network. As it turns out, most organizations are severely over-buying high-performance disk storage, as available network bandwidth and other application traffic slow these data deliveries down to the point where they could be supplied by slower disk arrays. The statistics vary, but it’s not uncommon for high-performance, Tier 1 array utilization to be less than 20%, based upon rated performance.
For a VAR, performance tiering solutions can give you a very disruptive tool to bring into an account that’s buying lots of high-end disk from someone else. Armed with this kind of knowledge, you can show them how much they’re over-paying for their Tier 1 storage, and possibly suggest a more cost effective alternative.
Follow me on Twitter: EricSSwiss.