As enterprise Oracle DBAs know first-hand, data growth these days is exponential. Gartner calculates that the average site doubles its data under management every 6 to 18 months. Terabyte-sized data stores were a novelty not too long ago but you’ll soon be hearing more and more about petabtyes. That’s 1,000 terabytes. One million gigabytes. A lot of data.
A petabyte is a difficult concept to comprehend. Consider this:
- A petabyte of data is the equivalent of 250 billion pages of text, enough to fill 20 million four-drawer filing cabinets. Or imagine a 2,000-mile-high tower of 1 billion diskettes.
- For comparison, the U.S. Library of Congress, with 130 million items on about 530 miles of bookshelves — including 29 million books, 2.7 million recordings, 12 million photographs,
4.8 million maps and 58 million manuscripts — can be stored on only 10 terabytes.
- A single sequential scan through a megabyte takes less than a second, but at 10 MB/s it would take over a year to complete such a scan through a petabyte.
- It would be faster to send a petabyte of data from San Francisco to Hong Kong by sailboat than over a megabit per second internet connection — indeed, it would take over 250 years!
For this post, I began compiling what I thought would be a short list of petabyte-sized data collections (not necessarily single databases) to show what was on the horizon for DBAs. To my amazement, the list went on and on, for example:
- Stanford Linear Accelerator Center
- National Center for Atmospheric Research (who also just installed a massive 12 teraflop (12 trillion calculations per second) IBM supercomputer, with 4 terabytes of memory!)
- CERN particle accelerator
- National Energy Research Scientific Computing Center (Uses a Cray XT4 supercomputer with 19,344 CPUs, each with 2 GB of memory!)
- San Diego Supercomputer Center
- National Security Agency (which apparently keeps a record of every phone call made by every single person in the United States.)
- The Internet Archive
- Walt Disney
- Fair Isaac
- YouTube (streams approx. 200 TB/day, or 6 petabytes/month)
I soon gave up compiling an exhaustive list (email me at firstname.lastname@example.org with your suggestions). Are petabyte data stores really the new normal? Not quite, but we’re getting suprisingly close.
In all, the world has seen the amount of data grow from 5 exabytes in 2003 to 161 exabytes in 2006, according to IDC. The world’s storage systems can no longer store all of the data being created. This year,
the amount of information created and replicated (255 exabytes) will surpass, for the first time, the storage capacity available (246 exabytes).
Database and storage admins: your jobs are safe! (Edit: Or maybe not. MySpace has NO storage admins for its petabytes.)