However bad a day you might have had lately, it can’t compare with that of James Howells.
Howells is the guy from Wales who realized that the hard disk he threw away actually contained a cryptographic key giving him access to Bitcoin – the Internet’s open payment network — worth up to $7.5 million, so now he’s trying to find a way to root through the dump in hopes of finding it.
“Sitting beneath about four feet of garbage in an area of a Welsh landfill the size of a football field sits a fortune — in the form of a computer hard drive that James Howells threw out this summer while cleaning up his workspace,” writes USA Today. “On it: the cryptographic “private key” he needs to access 7,500 Bitcoins. And since the digital currency hit a major milestone yesterday, with a single coin now worth more than $1,000 on the most popular exchange, that tossed hard drive is worth more than $7.5 million.”
So there’s a couple of nuances to that. First of all, the Bitcoin may not *actually* be worth $7.5 mllion. Howells bought the Bitcoin in 2009. Even when he threw the disk drive away earlier this summer, they were worth about $800,000.
“Although Bitcoins have recently become part of the zeitgeist – with Virgin saying it will accept the currency for its Virgin Galactic flights, and central bankers considering its position in finance seriously – Howells generated his in early 2009, when the currency was only known in tech circles,” writes the Guardian. “At that time, a few months after its launch, it was comparatively easy to “mine” the digital currency, effectively creating money by computing: Howells ran a program on his laptop for a week to generate his stash. Nowadays, doing the same would require enormously expensive computing power.”
But just because an individual Bitcoin is worth $1,000 doesn’t mean that he actually may have been able to sell the total for $7.5 million. It’s complicated.
Second of all, Howells could actually have found himself out a lot more than $7.5 million, depending on what else might have been on that disk drive. Throwing away a disk drive with readable data on it? Really?
Periodically, someone discovers that discarded hard disks still have readable data on them. In 2006, a guy bought some hard disks on eBay and discovered all sorts of interesting account information from Idaho Power, a public utility in southwestern Idaho. It turned out that Idaho Power had contracted with a company to destroy 230 hard disks, and the company just put them up on eBay instead. And security experts such as Simson Garfinkel, now Associate Professor at the Naval Postgraduate School in Monterey, Calif., periodically go out and buy hard disks off eBay and Craigslist just to see what sort of interesting stuff people are throwing away.
In 2010, CBS News did a similar report noting that laser printers and photocopiers, too, had hard disks in them that contained data and that people were buying up old printers and finding interesting data on them.
In fact, for the next few months, it might actually be even more of a good idea to be diligent about properly destroying a hard drive. After the news of Howells’ windfall, there may be a sudden surge of interest in discarded hard drives, in case someone else forgot about their Bitcoin trove.
If Howells had destroyed his hard disk properly, he’d still be out the $7.5 million – but at least he wouldn’t be trying to find a way to root through garbage looking for it. (And perhaps he’s better now about doing backups?)
There is one consolation, though – Howells doesn’t have to worry about someone else finding it first. USA Today reports that the city council has said other searchers will be turned away.
Now you’ve gone and infected the International Space Station (ISS).
Eugene Kaspersky, the eponymous founder of the Kaspersky Lab security software company, let drop this little bombshell recently while speaking to the National Press Club in Australia. He said he was told this by “Russian space guys.”
“The space guys from time-to-time are coming with USBs, which are infected. I’m not kidding. I was talking to Russian space guys and they said, ‘yeah, from time-to-time there are viruses on the space station,’” Kaspersky reportedly said.
There’s two things to note about this story:
- While some publications pinned the blame on Russian astronauts specifically, it isn’t actually clear which astronauts did this, and whether they did it on purpose or on accident, as my daughter used to say. Kaspersky’s “Russian space guys” apparently didn’t reveal that detail. Either way, the ISS doesn’t control its USB ports and scan USBs before plugging them into multimillion-dollar things in orbit? Srsly? Didn’t they watch “Independence Day?”
- It isn’t clear exactly what sort of malware has infected the ISS. At various points in time, as least as far back as 2008, it has previously been infected with malware – intended to steal online game passwords. (This is what the astronauts do in their spare time? Play Spacecraft Simulator?) i09 reported receiving email from Kaspersky Lab claiming this incident is actually what he had been referring to, not some nefarious plan to crash the ISS into Manhattan or something.
Oh, and the laptops in question were reportedly running not just Windows, but Windows XP. Oy. Reportedly, the ISS switched to Linux in May, partly to avoid the malware problem. Incidentally, at least in the past, the laptops on the ISS didn’t have virus scanning software. Perhaps they do now? Please?
What is clear is that, despite some reports, the ISS has not been infected with Stuxnet, the virus intended to disable Iranian nuclear facilities. In the same speech, Kaspersky had mentioned that Russian nuclear facilities had been infected with Stuxnet, and non-technical reporters, hearing the words “Stuxnet” and “ISS” in the same speech, got excited and conflated the two.
Even if Stuxnet were found aboard the ISS, it would only be a problem if they were running uranium centrifuges up there, and if they are, we have bigger problems.
All together now:
- Don’t stick strange USB sticks in your ports.
- Control access to the USB drives.
- Scan USB drives before inserting them.
We don’t want to have to tell you this again!
Western Digital announced this week a 6TB disk drive filled with helium. Let the jokes begin.
The technology isn’t new; the company first floated the idea – wups, sorry – a year ago last September. The company said at the time that it didn’t have any specifications but that it would release them – wups, sorry again — when the product was announced.
(In addition, old-timers discussing the announcement recalled that HP had produced a helium-filled drive in the 1970s.)
Well, here it is and here they are, sort of. It’s called the Ultrastar He6 – He being the chemical symbol for helium, get it? It’s 6 TB – hence the 6 – which, incidentally, also makes it the highest capacity 3.5-inch disk drive in the world. The company didn’t say how fast it goes, but Extreme Tech expects it to be 7200 rpm like the air-filled equivalents. And the company still hasn’t said how much the darn thing will cost.
The company said the drives would be particularly suited for “high-density data centers, massive scale-out data centers, containerized data centers, nearline storage applications, bulk storage, and enterprise and data center applications where density and capacity are paramount.” Perhaps for the NSA?
So how is it so much faster and has so much more storage than a standard drive packed in air? Because helium is less dense than air, by a factor of 7, it offers less friction, so the platters can go faster and it can have more of them – that is, up to 7 in a space that typically these days holds 4 or 5. (One can consequently assume that the He7 might be trotting along one of these days.) This also means it needs less power to fight against air’s friction, meaning that it uses 23 percent less power when it’s idle, 49 percent fewer watts per TB, and on the whole runs 4-5 degrees Celsius cooler.
This will add up when you’re a CERN, Netflix, Huawei, or HP – to name a few companies that were said to be testing them — and have a whole warehouse full of the things, notes Arik Hesseldahl of AllThingsD. “Deploying 11 petabytes of storage using current drive technology requires 12 racks and 2,880 hard drives, and about 33 kilowatts of power to run them,” he writes. “With the new helium-based technology, you could do it with eight racks and 1,920 individual drives, and run them on 14 kilowatts. The setup would take up less space, and require fewer cables, too.”
While it’s not as fast as, say, a disk drive in vacuum — and no doubt some enterprising scientist is slaving away at that as we speak to eliminate the problems with that technology — it’ll do for now.
If you’ve already run into articles about this, you might be confused about which vendor we’re actually talking about. Some articles say Western Digital, others say Hitachi, others say HGST. Here’s the deal. Once upon a time, there was a company called IBM with a disk storage business. It wanted to get out of the business, so in 2002 it spun it off – wups, sorry again – where it was purchased by Hitachi, but to keep it separate from Hitachi’s own storage business, it was known as Hitachi Global Storage Technologies (GST). Then, in 2012, Western Digital bought it, but kept it as a separate organization – so it’s the HGST part of Western Digital.
What took so long? While vendors have been toying with the idea of helium-filled drives for 30 years, Western Digital had to find a way to build a sealed case for the thing so all the helium doesn’t leak out (or, as one pedant points out, if air leaks in), like a balloon does after a couple of days. After reportedly working on it for the past ten years, it now has what it calls a patented HelioSeal technology for that – which, incidentally, should also make them immersible. (You first – though speculation is that it would enable them to be used in liquid-cooled facilities. And by the way, would it float? Might be useful for flood zones.)
As it is, it will be interesting to see how sturdy the things are, how long they’ll be able to hold a seal (especially if dropped or jostled), and whether there’s any mechanism to refill it with helium should it slow down.
Not to mention, is there any way to test whether there’s a leak, or do we just check to see if the sysadmins get squeaky voices? If it does leak, do the whole thing squeal to a stop? A commenter to one article, who identified himself as a Western Digital engineer, said that the ones his part of the company were working on had monitors and that they lasted about five years until too much helium leaked out for them to be useful.
The company also didn’t address the issue of the helium shortage that has been a pall on children’s birthday parties for the last couple of years. Or is this the source of the shortage in the first place? Hmmm.
Where were you a year ago? If you were on the East Coast, chances are you were dealing with Hurricane Sandy, a storm that was unprecedented not so much for its size and damage but for the way it seemed to target New York data centers. As we come up on the anniversary, what have we learned?
As you may recall, a number of data centers shut down abruptly due to losing power — which was often situated in the basement. While some companies got generators, others were stymied due to a lack of diesel fuel for them — or having to take diesel fuel up flights of stairs in a bucket brigade. Salt water and other debris also damaged equipment at some data centers.
Companies such as PEER 1 Hosting, which set up the famous bucket brigade, have been talking about what they’ve learned and offering advice to other companies that find themselves in similar situations.
As PEER 1’s Ryan Murphey notes, an important factor is people. “If you can’t ‘staff up’ before the storm, think about how you’ll get additional support to the facility if it’s needed,” he recommends such as by setting up emergency response teams near data centers.”
“Focus on the people, stupid,” agrees Barb Darrow of GigaOm. “Before Sandy, nobody seemed to imagine that highways, tunnels and subways could be out for days on end. Now there have to be plans in place for how personnel can get to the affected area, and for how other personnel can work remotely as effectively as possible.”
Murphey also suggests stocking up on equipment and setting up contracts ahead of time for items such as fuel. For example, the organization now has a pump that can reach the 18th floor, as well as fuel hoses on-site — which fit the generators. And for stored diesel, organizations need to set up filters and other systems to remove any potential water from the fuel, which could keep generators from running, warns Alastair Trower in Data Center Knowledge.
At the same time, Murphey notes that stuff happens and you can’t always count on being able to get what you need when you need it, contracts or no.
Other people and entities are also making preparations. For example, the state of New York is setting up a strategic gasoline reserve of as much as 3 million gallons, though it isn’t clear how much of that would be regular unleaded gasoline for vehicles vs. diesel fuel that could be used in generators. The New York Stock Exchange has devised a plan takes advantage of the company’s data centers in New Jersey and in Chicago.
Some organizations are also working on getting better, more site-specific weather prediction in place so they have a better idea of what can happen in their own locations, Darrow writes.
At the same time, some things haven’t changed. While some organizations are looking at backup data centers in less hurricane-prone regions, such as Omaha, Nebraska, Darrow writes, an April survey found that two-thirds of data center managers would rather see the data center in the city where they worked — and even potential alternative locations tended to be vulnerable to natural disasters themselves. The most important reasons given for data center expansion, Digital Realty noted, were (in order of priority) the need for increased security, energy efficiency, new applications/services, and more space. It isn’t clear whether “Not Being Under Water,” “Not Being on Fire,” or other variations on “Not Being Destroyed” were choices.
Time-critical organizations such as stock exchanges and other financial companies are also concerned about latency, or the additional seconds involved in getting data from places like Nebraska rather than New York.
And Murphey notes that, despite his company’s experience, it still is unlikely to put its electrical equipment anyplace other than the basement. First, real estate on higher levels is more expensive. Second, there are structural issues associated with supporting the weight of the equipment, as well as practical issues with storing diesel fuel anywhere other than a basement.
If nothing else, maybe you’d better stock up on buckets.
Researchers say they have developed a data disk that could last a million years (as long as you don’t hit it too hard to get it too hot), enabling us to save our culture for future generations, even after we’re no longer here.
Goody goody gumdrops. That’s not going to do it.
A quick recap of the disk technology – developed by Jeroen de Vries, a PhD candidate at the University of Twente in the Netherlands — is that it uses a base of tungsten, encapsulated in silicon nitride, and then is etched with lines 100 nm wide. The example they used was QR codes, but it could have been anything.
How do they know it will last a million years? Well, they don’t, exactly; they artificially aged it in an oven, saying that an hour in an oven at 445 degrees Kelvinwas equivalent to aging it a million years, and then ascertained that the majority of the data was still readable. (As one commenter pointed out, “My oven survives at 200°C for more than 4 hours and I can assure you it won’t be around for longer than 20 years.”)
That said, even if the disk does last a million years, it’s not invulnerable. If something falls on it – like, you know, a wall — it can break. If it gets exposed to high heat – the example used in the articles was that of a house fire – the data will degrade. (Nobody seemed to want to use the words “nuclear bomb,” though it seemed an obvious question.) Presumably if the Yellowstone supervolcano goes off, we’re hosed as well.
So okay. We’ve got a disk that lasts a million years. But even assuming that future generations want to see Miley Cyrus twerking, chances are they’re not going to be able to – due to the same sort of problems we’re running into now with digital preservation.
When’s the last time you tried to read a Zip drive? How about a 3 ½” floppy? A 5 ¼” floppy? An 8” floppy? The media may be just fine, but if I don’t have a device to read it, if I don’t have drivers to communicate with the device, if I don’t know how to decode it, if I don’t know what language it’s written in, I’m SOL.
(I’m not going to mock the researchers for doing their testing with QR codes; they said themselves it was just an example and they weren’t actually suggesting that the QR code was a million-year format. By the way, have you seen a CueCat lately?)
Recall the potential problems that one game developer had in April, 2012, trying to read disks from a game he’d developed a few years before:
- Finding a drive to read the disk
- Finding software to read the disk
- Dealing with whatever forms of copy protection the disk might have had
- Finding software to run the software on the disk
- Dealing with whatever damage the disk itself might have suffered during its 22 years in his dad’s garage
- Dealing with whatever “bit rot” the data might have suffered
Even if this million-year disk takes care of the last two problems, you still have the other four to deal with.
Oh, the scientist told Motherboard, hand-wavingly, one of the first things that the disk should do is teach future generations how to read the disk. If he could solve that problem alone, then he’d be doing something even more significant than developing a disk that lasts a long time.
So the newest thing lately is to design a top-seekrit data center, and then invite the media to come take a look at it and take pictures. Google did it a while back, now it’s Facebook’s turn.
You may recall that a little over a year ago, Facebook revealed it was building a “cold storage” facility in Prineville, Ore. — so-called because the data on it wouldn’t need to be retrieved very often. While it saved a lot of energy compared with storage systems that were always on, it also took longer to retrieve the data when it was needed, because the disks needed to spin up again, which could take, gasp, up to 30 seconds.
If you’re not familiar with Prineville, it’s smack in the middle of Oregon — about two hours from The Dalles hydropower facility, about three hours from Portland, and about an hour from Bend. The operative part is that this whole area of central Oregon is data center central, because of its access to cheap land — because it’s out in the middle of nowhere — and cheap power — because of its proximity to The Dalles. Google has a facility near The Dalles, while Apple also has one in Prineville.
You may also recall that Facebook is on a mission, called the Open Compute Project, to do for hardware what the open source movement has done for software — that is, figure out the best, most minimal ways to design hardware, and then tell the world about it. It’s done this for servers, storage, and now archival storage. The Prineville Data Center even has its own Facebook page, and the company is diligently offering grants and such to the nearby community to be a good neighbor. (In another such indication, the 70 staff and contract employees make 150 percent of the prevailing local wage.)
Hence the field trip. And in this case, it pretty literally is out in a field.
“Each disk in the cold storage gear can hold 4 terabytes of data, and each 2U system contains two levels of 15 disks,” writes Jordan Novet in Data Center Knowledge. “This configuration allows for 4 petabytes of cold storage in a rack (each storage head has 2 PB attached and there are 2 heads per rack).” There were also pictures, and Facebook had already published the cold storage specifications.
“Less than a week into its operation, the cold storage facility is already storing nine petabytes of user data,” writes Elon Glucklight in The Bulletin of Bend (which includes video as well as pictures). “That’s equal to nearly 9.7 billion megabytes. A typical uploaded photo ranges from 2 to 10 megabytes. When it’s full, the 16,000-square-foot cold storage building would be able to hold thousands of petabytes of data.” The company could also add additional wings totalling up to 32,000 square feet, he added, noting that while Facebook would not reveal the cost of the facility, county permits put the cost of the first wing at $6.8 million.
Facebook officials told the media that 80 percent of the photo requests come for just 9 percent of the photos. Hence the need for the facility. The data center is scheduled to reach capacity in 2017, depending on how many cat pictures we take.
The cold storage aspect means that the facility uses 52 percent less energy than a comparable data storage facility, writes Andy Giegerich for Sustainable Business Oregon, who goes on to note that the facility meets LEED Gold standards for its design, use of sustainable, locally sourced materials, and care in disposing of its waste.
“The social media giant has, as part of its drive to operate a green data center, launched two public dashboards that report continuous data for such key efficiency metrics as power and water usage effectiveness,” Giegerich writes. “Not only are the dashboards available to Facebook workers, they’re available to the public.”
Meanwhile, some enterprising reporters realized they could see the more secretive Apple data center from the Facebook one, and took the opportunity to take pictures of that, too, as well as check out its county filings. No word on when their field trip is, but knowing Apple’s reputation for secrecy, it’s probably best not to make reservations yet.
Update: I have recently been informed by David Eskelsen, a spokesman for Rocky Mountain Power and PacifiCorp Energy, that there are two errors in this story.
You may recall that people have been speculating about how much data the NSA will be able to store in its seekrit Utah facility, with some estimating it in the zettabyte range and others pooh-poohing that figure.
What everybody could agree on, though, is that it would take a powerful lot of ‘lectricity to run – nearly as much as nearby Salt Lake City.
The Utah data center is reportedly slated to use up to 65 megawatts of power, or as much as the entire city of Salt Lake itself. Forbes quoted [WWW developer Brewster] Kahle’s estimate of $70 million a year for 70 megawatts, while Wired reportedly estimated $40 million a year for 65 megawatts. (And recall that Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could tack on up to $2.4 million annually on to $40 million.)
[Security consultant Mark] Burnett’s power calculation is even higher. “250 million hard drives would require 6.25 gigawatts of power (great Scott!). Of course, drives need servers and servers need switches and routers; they’re going to need a dedicated nuclear power plant. They’re going to need some fans too, 4.25 billion btu definitely would be uncomfortable.”
Well, the data center is apparently having trouble getting enough clean electricity to run the plant reliably, according to an article in the Wall Street Journal, which broke the story. In fact, the arcing – up to 10 incidents in the past 13 months, referred to as “meltdowns” — has slagged some of the equipment, as much as $100,000 worth per incident, delaying the opening of the data center for up to a year.
Oh, and they aren’t sure what causes it, but an NSA spokesperson assured the Journal that the problems have now been mitigated.
That’s not all. “Backup generators have failed numerous tests, according to project documents, and officials disagree about whether the cause is understood,” the WSJ writes. “There are also disagreements among government officials and contractors over the adequacy of the electrical control systems, a project official said, and the cooling systems also remain untested.”
Critics, of course, were having a field day with the story, suggesting sabotage, Stuxnet, and straight-out lying on the part of the NSA, as well as attributing the problem to whichever political affiliation of which they were not a member. Another commenter, claiming he’d actually worked there, chalked it up to simple government incompetence.
Others, equating it to the Tower of Babel, suggested God might be angry. (This is Utah we’re talking about.) In addition, the power going into the facility was cursed during a demonstration on July 4, according to Fox News at the time. “I pray Lord that you would have a curse on that facility. On the water that goes into that facility. On the electricity that goes into that facility,” speaker Dale Williams reportedly said.
Some other companies, such as Apple, eBay, and Google — faced with the massive electricity their data centers require — have been incorporating renewable energy systems into their data centers. Power for the NSA facility is reportedly largely derived from coal.
NASA recently announced that humanity had finally made it to space beyond our solar system – using less memory than that of a low-end iPhone, an 8-track tape player for storage, and other technology that was cutting-edge in 1977 when it was launched.
Now, just because it’s an 8-track, that doesn’t mean you’re going to be able to pop your Slim Whitman tape into it. Because this is NASA, it’s a special 8-track, if you go back and look at the specs in the original documentation. (And bravo to NASA for OCRing the original documentation to make it easier to search.)
“The data-storage subsystem can record at two rates: TV pictures, general science and engineering at 115.2 kbps; general science and engineering at 7.2 kbps; and engineering only at 7.2 kbps ,” the documentation reads. (To put that into perspective, the typical SATA drive today is specced at 3-6 gbps.) “The tape transport is belt-driven. Its 1/2 in. magnetic tape is 328 m (1,076 ft.) long and is divided into eight tracks that are recorded sequentially one track at a time. Total recycleable storage capacity is about 536 million bits — the equivalent of i00 TV pictures. Playback is at four speeds — 57.6; 33.6; 21.6 and 7.2 kbps.”
In other words, it had a total capacity of half a megabyte. Today, we can get thumb drives for less than a dollar a gigabyte.
“That means next time you go out and take a picture with your new camera, just 1 picture at a high resolution is equal to all the data storage Voyager 2 had available during its Jupiter/Saturn/Uranus/Neptune flyby!” noted one space buff in 2008 — a data point that is itself outdated.
Every six months, the stored data would get played back. “Voyager transmits information back to Earth using a 23-watt signal,” writes Caitlin Dewey in the Washington Post. “For comparison, my college radio station broadcast on a 20-watt signal and couldn’t be heard even a few blocks off campus. It is, per NPR, about eight times stronger than the average cellphone.”
The downside is when the spacecraft started to near the edge of the solar system, explains the New York Times. NASA wanted to be able to record more data with it. As in many other organizations that have dealt with digital preservation issues, NASA engineers — some of whom probably hadn’t been born yet when Voyager took off — didn’t know how to deal with the antiquated technology.
“NASA’s young programmers were accustomed to working with virtually unlimited storage capacity,” writes Dale McFeatters in a Scripps-Howard News Service editorial. “The solution was to bring out of retirement 77-year-old NASA engineer Lawrence Zottarelli, who had worked with the eight-track units. The team successfully fed data into two computers [Suns] made by a company that was merged out of existence three years ago.”
Just remember that the next time somebody tries to tell you that engineers over 40 aren’t good for anything.
You may recall Nirvanix as the company that would send out a press release after each natural disaster, urging people to come use its service. Well, apparently that strategy didn’t work too well, or maybe we just haven’t had enough natural disasters lately, because several publications, including Information Age, reported that its customers had been told they had two weeks to find another repository for their data, presumably before it shuts down its service.
What that means is, “If you used Nirvanix for third or fourth duplicate copies you need assurance that data will be destroyed,” writes Simon Robinson in Computer Weekly. “If you used it for primary data you need that data back, and that is no trivial task right now.”
Consequently, there’s some degree of poetic justice to the fact that other companies are taking the occasion to jump out of the woodwork to issue their own press releases, promising Nirvanix customers that they can be taken care of. Attunity, for example, announced on Monday a migration service from Nirvanix to AWS’ S3 Cloud, using Attunity’s CloudBeam service, which is intended to simplify and accelerate data loading into Amazon S3.
Network administrators are also scrambling to find alternatives and to figure out the logistics of getting copies of their Nirvanix data installed somewhere else, if they hadn’t done it before. Even organizations that didn’t use Nirvanix are taking this as a wake-up call about whatever cloud storage vendor they’re using, while others — those who never cottoned to the idea of cloud storage in the first place — are patting themselves on the back for their prescience.
“When relying on cloud services it is important to have a backup plan–or at least a way out should the service become untenable,” writes Isha Suri in the Silicon Angle blog. “In the wake of the news of Nirvanix shutting down opinions have begun to rise about how to prepare for and handle such an event.”
Analysts such as Forrester’s Henry Baltazar and Gartner’s Kyle Hilgendorf are suggesting that organizations make sure they have an exit strategy when they sign up with a cloud service, but point out the difficulty of getting data out of the cloud once it’s in. “One of the most significant challenges in cloud storage is related to how difficult it is to move large amounts of data from a cloud,” he writes. “While bandwidth has increased significantly over the years, even over large network links it could take days or even weeks to retrieve terabytes or petabytes of data from a cloud.” He also recommends that organizations look for cloud storage vendors that offer direct connect or shipments of portable hard drives.
The company has finally officially announced its demise on its website, saying it was “working hard” to keep the service available until October 15 to give customers a chance to move their data.
Faithful readers of this blog are aware that we sometimes visit the issue of “what is the bandwidth of a station wagon full of magnetic tapes speeding down the highway” and other ways of putting Really Enormous Amounts of Data in context.
Similarly, this blog recently addressed the issue of how much data the NSA could store.
However, this week Randall Munroe, the author of the geek comic xkcd, came up with a new measurement of data, based on a reader question: “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” Munroe, a physicist who has worked for NASA, in addition to the comic, answers hypothetical reader questions involving physics like this once a week. Other examples include “How fast can you hit a speed bump while driving and live?” and “If you call a random phone number and say ‘God bless you,’ what are the chances that the person who answers just sneezed?”
Anyway, using publicly available data — sources of which were all dutifully footnoted — Munroe went through very much the same sort of back-of-the-envelope calculation that this blog and other sources have gone through, first to calculate the amount of data Google has — in punch card size — and next, to extrapolate from that the amount of data the NSA has.
In the process, there’s several interesting bits. For example:
“To make things worse, given the huge number of drives they manage, Google has a hard drive die every few minutes,” he writes, dutifully footnoting the source of this information. “ This isn’t actually all that expensive a problem, in the grand scheme of things — they just get good at replacing drives — but it’s weird to think that when a Googler runs a piece of code, they know that by the time it finishes executing, one of the machines it was running on will probably have suffered a drive failure.”
Anyway, the figure Munroe came up with for Google’s data store, after a bunch of this calculation, is 15 exabytes. How much is that in punch cards?
“15 exabytes of punch cards would be enough to cover my home region, New England, to a depth of about 4.5 kilometers,” Munroe writes. To put that into perspective (which is something he’s very good at), “That’s three times deeper than the ice sheets that covered the region during the last advance of the glaciers.”
Going on to the NSA, Munroe also pokes fun at some of the more breathless of the speculation. “A few headlines, rather than going with one estimate or the other, announced that the facility could hold ‘between an exabyte and a yottabyte’ of data … which is a little like saying ‘eyewitnesses report that the snake was between 1 millimeter and 1 kilometer long.’”
Munroe concludes with how to find out where the seekrit Google data centers are — like CNN’s Wolf Blitzer advises, it’s “Monitor the pizzas.” “Google has created what might be the most sophisticated information-gathering apparatus in the history of the Earth … and the only people with information about them are the pizza delivery drivers,” he writes.