Posted by: Eric Hansen
compression, data, debate
With solid-state drives (SSDs) getting a lot of usage now, especially with the further adaption of netbook usage, instead of the aging notebook and laptops, there have been people bringing up the idea of on-the-fly data compression. The main question is, how efficient is this concept? As it stands, I have yet to find a Linux file system that supports this out of the box. What I’m wanting to address in this post, is whether this should change or not.
What Is On-the-Fly Data Compression?
Basically what this whole post is even about is the ability to compress data on the hard drive, and uncompress it when it’s accessed/requested. The premise is that this will save disk space, and that’s it. Since SSDs have came into the market with a punch now, there has been a little bit more pressure for this ability, especially with non-Linux file systems being able to do this for years now.
Why Should I Use This?
In a non-biased manner, it’s not something that is clean-cut (similar to “which Linux distro/flavor should I use?”). It has its moments, definitely, but it also has its limits…both of which I will do my best to address in this post.
How Do I Use This In Linux?
This is what draws me away from this technology/feature. For Linux, ext2 and 3 do support this, but you have to compile in a patch into the kernel in order for it to be there. For ext2, there’s e2compr, which works with 2.2, 2.4 and 2.6 kernels (with the last 2.6 support being in 2.6.22-25). For ext3, there’s the ported version of e2compr called e3compr. The problem with both these solutions is that they haven’t been updated in some time (e2compr since 2009, e3compr since 2008). As for the other file systems (including ext4 and ResierFS), I’m not able to find any information on these supporting this feature.
Why Wouldn’t I Use This?
I really didn’t want to make a post starting out with the negatives on this from the get-go, as I love this concept. But there are a few issues here that I see with it, that I haven’t brought up yet.
The first issue, is disk performance. SSDs already have a lower life-span than it’s hard drive step-brother. This makes me wonder why people think this technology would be great for SSDs to begin with. I don’t see much of any issues when it comes from reading the files. There’s no difference between a compressed and uncompressed file in this regard, stats() will still show the same information. However, the write portion of files is what scares me the most. If you think there isn’t much to it, then consider this. Even with laptops able to have 2 or more GB of memory, how does the file get uncompressed? If it’s uncompressed on the drive, then you’ll be using up to double the space (even if it is temporary, you’d also have to make sure you have the space). If you decide to uncompress the files to a tmpfs (RAM drive), you still gotta make sure you have the space for RAM (which is even trickier as RAM usage fluctuates a lot more). Of course, there’s the possibility of swap helping out here, but it seems like a loss cause for the fact there’s a bigger chance of data loss or corruption, especially if you end up running a program that gets caught in a buffer overflow/overrun.
Would I use this in my every day life? No, not even on my netbook. I don’t feel that the possible threats justify the possible gain of disk space. Also, while I love using Linux, I don’t like tinkering with the kernel, especially with patches that are more than 6 months old (let alone 2-3 years).
There are systems out there (such as NTFS) that do offer this, but I feel there is a reason why this isn’t enabled by default. If you have a SSD set up, you might enjoy the added space, as they are limited in what they can hold, but with hard drives being a good 500 GB to 1 TB or more, the sacrifice is too great.