Posted by: Sharon Fisher
big data, ibm
The best part about IBM’s experimental 120-petabyte hard drive is reading all the ways that writers try to explain how big it is.
- 2.4 million Blu-ray disks
- 24 million HD movies
- 24 billion MP3s
- 1 trillion files
- Eight times as largest as the biggest disk array available previously
- More than twice the entire written works of mankind from the beginning of recorded history in all languages
- 6,000 Libraries of Congress (a standard unit of data measure)
- Almost as much data as Google processes every week
- Or, four Facebooks
It is not one humungo drive; it is, in fact, an array of 200,000 conventional hard drives (not even solid-state disk) hooked together (which would make them an average of 600 GB each).
Unfortunately, you’re not going to be able to trundle down to Fry’s and get one anytime soon. No, this is something being put together by the IBM Almaden research lab in San Jose, Calif., according to MIT Technology Review.
What exactly it’s going to be used for IBM wouldn’t say, only that it was “an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena.” Most writers speculated that that meant weather, though Popular Science thought it could be used for seismic monitoring — or by the NSA for spying on people.
Like the Cray supercomputer back in the day, and some high-powered PCs even now, the system is reportedly water-cooled rather than by using fans.
Needless to say, it also uses a different file system than a typical PC: IBM’s General Parallel File System (GPFS), which according to Wikipedia has been available on GPFS has been available on IBM’s AIX since 1998, on Linux since 2001 and on Microsoft Windows Server since 2008 and which some tests have shown can work up to 37 times faster than a typical system. (The Wikipedia entry also has an interesting comparison with the file system used by big data provider Hadoop.)
GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel.”
The system also has a kind of super-mondo RAID that lets dying disks store copies of themselves and then get replaced, which reportedly gives the system a mean time between failure of a million years.
Technology Review didn’t say how much space it took up, but if a typical drive is, say, 4 in. x 5.75 in. x 1 in, we’re talking 4.6 million cubic inches just for the drives themselves, not counting the cooling system and cables and so on. That’s a 20-ft. x 20-ft. square almost 7.5 feet high, just of drives. (This is all back-of-the-envelope calculations.)
In fact, the system needs two petabytes of its storage just to keep track of all the index files and metadata, Technology Review reported.