Faithful readers of this blog are aware that we sometimes visit the issue of “what is the bandwidth of a station wagon full of magnetic tapes speeding down the highway” and other ways of putting Really Enormous Amounts of Data in context.
Similarly, this blog recently addressed the issue of how much data the NSA could store.
However, this week Randall Munroe, the author of the geek comic xkcd, came up with a new measurement of data, based on a reader question: “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” Munroe, a physicist who has worked for NASA, in addition to the comic, answers hypothetical reader questions involving physics like this once a week. Other examples include “How fast can you hit a speed bump while driving and live?” and “If you call a random phone number and say ‘God bless you,’ what are the chances that the person who answers just sneezed?”
Anyway, using publicly available data — sources of which were all dutifully footnoted — Munroe went through very much the same sort of back-of-the-envelope calculation that this blog and other sources have gone through, first to calculate the amount of data Google has — in punch card size — and next, to extrapolate from that the amount of data the NSA has.
In the process, there’s several interesting bits. For example:
“To make things worse, given the huge number of drives they manage, Google has a hard drive die every few minutes,” he writes, dutifully footnoting the source of this information. “ This isn’t actually all that expensive a problem, in the grand scheme of things — they just get good at replacing drives — but it’s weird to think that when a Googler runs a piece of code, they know that by the time it finishes executing, one of the machines it was running on will probably have suffered a drive failure.”
Anyway, the figure Munroe came up with for Google’s data store, after a bunch of this calculation, is 15 exabytes. How much is that in punch cards?
“15 exabytes of punch cards would be enough to cover my home region, New England, to a depth of about 4.5 kilometers,” Munroe writes. To put that into perspective (which is something he’s very good at), “That’s three times deeper than the ice sheets that covered the region during the last advance of the glaciers.”
Going on to the NSA, Munroe also pokes fun at some of the more breathless of the speculation. “A few headlines, rather than going with one estimate or the other, announced that the facility could hold ‘between an exabyte and a yottabyte’ of data … which is a little like saying ‘eyewitnesses report that the snake was between 1 millimeter and 1 kilometer long.’”
Munroe concludes with how to find out where the seekrit Google data centers are — like CNN’s Wolf Blitzer advises, it’s “Monitor the pizzas.” “Google has created what might be the most sophisticated information-gathering apparatus in the history of the Earth … and the only people with information about them are the pizza delivery drivers,” he writes.