Storage Soup

Sep 10 2007   9:15AM GMT

Protecting millions of small files

cgibney Carolyn E.M. Gibney Profile: cgibney

Every week, I visit IT professionals and I often hear the same complaint about dealing with a file server environment that has grown out of control. The problem is that these file servers have millions of small files and customers are looking for ways to better protect this file data.

Second, disk-based archiving truly fixes areas of the backup that most D2D solutions do not. Customers are highly frustrated with backup applications stumbling over what I call the “millions of small files issue.” This is primarily caused by the never-ending growth of a standard file server’s data. Most backup applications struggle with this millions of files scenario. Customers are counting on D2D to help, and it will… a little. The target disk may be faster, but mostly it is much more forgiving than tape. Tape needs to stream, or be fed a constant flow of data, in order to reach maximum write performance. Millions of small files make it difficult for those tape drives to be fed consistently. Disk backup, on the other hand, will maintain the same write performance no matter how inconsistent the data feed is.

That solves half the backup problem. The other half of the performance problem with millions of small files backup is that the backup software still needs to walk those millions of small files, identifying which ones need to be backed up. This file system walk can be very time consuming. Then, the backup software needs to update its own database that tracks what files were backed up and where. Imagine adding millions of records to a database every night, as fast as possible. That database gets HUGE in a hurry, can easily be corrupted and again, even if everything goes right, is very time consuming. Lastly, with most D2D backup solutions you still need to send the entire data load across the network. Even with deduplication solutions, the entire data payload needs to get to the appliance before deduping happens. All of this consumes network bandwidth. Disk-based archiving may circumvent or delay the need to upgrade network bandwidth by clearing this old data out of the way.

Disk-based archiving eliminates the problem of moving most of these millions of files. With disk-based archiving, the “old” files are stored on the archive and no longer need to be backed up. They are safer on disk than they are on tape (data integrity checking and replication) and they are out of the way. The backup software no longer needs to walk those files to find which ones need to be protected, send the files across the wire to be backed up and they do not consume disk space on file server or the D2D backup target. Additionally, since the archive is disk and not tape, you can be more aggressive with what is archived.

With a classic tape-based archive, customers will wait for data to get very old before moving it to tape. In addition, they will invest in elaborate data movers to provide transparent access to tape. Lastly, data that has stopped changing but is still being referenced or viewed cannot move to tape at all. With a disk-based archive, the delivery back to the user is relatively fast, so you can be more aggressive with your move to archive disk storage and there is less of a need to build elaborate access schemes. Most disk-based archives simply show up as a share on the network and you can archive reference data, further eliminating the data that needs to be protected by traditional backup methods.

A disk-based archive is the perfect compliment to D2D backup. It will reduce the investment in disk needed for backup and an archive strategy may pay for its self on this reduction alone. This is because a disk-based archive will clear out the fixed data (data that has stopped changing), making the investment in the software modules required by most backup applications for D2D cheaper (since they charge on stored capacity) and disk-based archives reduce the disk capacity of the disk backup as well as on the primary (expensive) disk needed on the file server.

What does this look like in hard costs savings? Disk-based archiving can reduce primary storage requirements (at least 10X dollar saving: $4 vs. $43/GB) and they can reduce backup requirements (fixed information is said to occupy, on average 50% or most enterprise primary disk capacity) saving them an additional $6/GB.

For more information please email me at georgeacrump@mac.com or visit the Storage Switzerland Web site at: http://web.mac.com/georgeacrump.

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: