So the newest thing lately is to design a top-seekrit data center, and then invite the media to come take a look at it and take pictures. Google did it a while back, now it’s Facebook’s turn.
You may recall that a little over a year ago, Facebook revealed it was building a “cold storage” facility in Prineville, Ore. — so-called because the data on it wouldn’t need to be retrieved very often. While it saved a lot of energy compared with storage systems that were always on, it also took longer to retrieve the data when it was needed, because the disks needed to spin up again, which could take, gasp, up to 30 seconds.
If you’re not familiar with Prineville, it’s smack in the middle of Oregon — about two hours from The Dalles hydropower facility, about three hours from Portland, and about an hour from Bend. The operative part is that this whole area of central Oregon is data center central, because of its access to cheap land — because it’s out in the middle of nowhere — and cheap power — because of its proximity to The Dalles. Google has a facility near The Dalles, while Apple also has one in Prineville.
You may also recall that Facebook is on a mission, called the Open Compute Project, to do for hardware what the open source movement has done for software — that is, figure out the best, most minimal ways to design hardware, and then tell the world about it. It’s done this for servers, storage, and now archival storage. The Prineville Data Center even has its own Facebook page, and the company is diligently offering grants and such to the nearby community to be a good neighbor. (In another such indication, the 70 staff and contract employees make 150 percent of the prevailing local wage.)
Hence the field trip. And in this case, it pretty literally is out in a field.
“Each disk in the cold storage gear can hold 4 terabytes of data, and each 2U system contains two levels of 15 disks,” writes Jordan Novet in Data Center Knowledge. “This configuration allows for 4 petabytes of cold storage in a rack (each storage head has 2 PB attached and there are 2 heads per rack).” There were also pictures, and Facebook had already published the cold storage specifications.
“Less than a week into its operation, the cold storage facility is already storing nine petabytes of user data,” writes Elon Glucklight in The Bulletin of Bend (which includes video as well as pictures). “That’s equal to nearly 9.7 billion megabytes. A typical uploaded photo ranges from 2 to 10 megabytes. When it’s full, the 16,000-square-foot cold storage building would be able to hold thousands of petabytes of data.” The company could also add additional wings totalling up to 32,000 square feet, he added, noting that while Facebook would not reveal the cost of the facility, county permits put the cost of the first wing at $6.8 million.
Facebook officials told the media that 80 percent of the photo requests come for just 9 percent of the photos. Hence the need for the facility. The data center is scheduled to reach capacity in 2017, depending on how many cat pictures we take.
The cold storage aspect means that the facility uses 52 percent less energy than a comparable data storage facility, writes Andy Giegerich for Sustainable Business Oregon, who goes on to note that the facility meets LEED Gold standards for its design, use of sustainable, locally sourced materials, and care in disposing of its waste.
“The social media giant has, as part of its drive to operate a green data center, launched two public dashboards that report continuous data for such key efficiency metrics as power and water usage effectiveness,” Giegerich writes. “Not only are the dashboards available to Facebook workers, they’re available to the public.”
Meanwhile, some enterprising reporters realized they could see the more secretive Apple data center from the Facebook one, and took the opportunity to take pictures of that, too, as well as check out its county filings. No word on when their field trip is, but knowing Apple’s reputation for secrecy, it’s probably best not to make reservations yet.
Update: I have recently been informed by David Eskelsen, a spokesman for Rocky Mountain Power and PacifiCorp Energy, that there are two errors in this story.
You may recall that people have been speculating about how much data the NSA will be able to store in its seekrit Utah facility, with some estimating it in the zettabyte range and others pooh-poohing that figure.
What everybody could agree on, though, is that it would take a powerful lot of ‘lectricity to run – nearly as much as nearby Salt Lake City.
The Utah data center is reportedly slated to use up to 65 megawatts of power, or as much as the entire city of Salt Lake itself. Forbes quoted [WWW developer Brewster] Kahle’s estimate of $70 million a year for 70 megawatts, while Wired reportedly estimated $40 million a year for 65 megawatts. (And recall that Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could tack on up to $2.4 million annually on to $40 million.)
[Security consultant Mark] Burnett’s power calculation is even higher. “250 million hard drives would require 6.25 gigawatts of power (great Scott!). Of course, drives need servers and servers need switches and routers; they’re going to need a dedicated nuclear power plant. They’re going to need some fans too, 4.25 billion btu definitely would be uncomfortable.”
Well, the data center is apparently having trouble getting enough clean electricity to run the plant reliably, according to an article in the Wall Street Journal, which broke the story. In fact, the arcing – up to 10 incidents in the past 13 months, referred to as “meltdowns” — has slagged some of the equipment, as much as $100,000 worth per incident, delaying the opening of the data center for up to a year.
Oh, and they aren’t sure what causes it, but an NSA spokesperson assured the Journal that the problems have now been mitigated.
That’s not all. “Backup generators have failed numerous tests, according to project documents, and officials disagree about whether the cause is understood,” the WSJ writes. “There are also disagreements among government officials and contractors over the adequacy of the electrical control systems, a project official said, and the cooling systems also remain untested.”
Critics, of course, were having a field day with the story, suggesting sabotage, Stuxnet, and straight-out lying on the part of the NSA, as well as attributing the problem to whichever political affiliation of which they were not a member. Another commenter, claiming he’d actually worked there, chalked it up to simple government incompetence.
Others, equating it to the Tower of Babel, suggested God might be angry. (This is Utah we’re talking about.) In addition, the power going into the facility was cursed during a demonstration on July 4, according to Fox News at the time. “I pray Lord that you would have a curse on that facility. On the water that goes into that facility. On the electricity that goes into that facility,” speaker Dale Williams reportedly said.
Some other companies, such as Apple, eBay, and Google — faced with the massive electricity their data centers require — have been incorporating renewable energy systems into their data centers. Power for the NSA facility is reportedly largely derived from coal.
NASA recently announced that humanity had finally made it to space beyond our solar system — using less memory than that of a low-end iPhone, an 8-track tape player for storage, and other technology that was cutting-edge in 1977 when it was launched.
Now, just because it’s an 8-track, that doesn’t mean you’re going to be able to pop your Slim Whitman tape into it. Because this is NASA, it’s a special 8-track, if you go back and look at the specs in the original documentation. (And bravo to NASA for OCRing the original documentation to make it easier to search.)
“The data-storage subsystem can record at two rates: TV pictures, general science and engineering at 115.2 kbps; general science and engineering at 7.2 kbps; and engineering only at 7.2 kbps ,” the documentation reads. (To put that into perspective, the typical SATA drive today is specced at 3-6 gbps.) “The tape transport is belt-driven. Its 1/2 in. magnetic tape is 328 m (1,076 ft.) long and is divided into eight tracks that are recorded sequentially one track at a time. Total recycleable storage capacity is about 536 million bits — the equivalent of i00 TV pictures. Playback is at four speeds — 57.6; 33.6; 21.6 and 7.2 kbps.”
In other words, it had a total capacity of half a megabyte. Today, we can get thumb drives for less than a dollar a gigabyte.
“That means next time you go out and take a picture with your new camera, just 1 picture at a high resolution is equal to all the data storage Voyager 2 had available during its Jupiter/Saturn/Uranus/Neptune flyby!” noted one space buff in 2008 — a data point that is itself outdated.
Every six months, the stored data would get played back. “Voyager transmits information back to Earth using a 23-watt signal,” writes Caitlin Dewey in the Washington Post. “For comparison, my college radio station broadcast on a 20-watt signal and couldn’t be heard even a few blocks off campus. It is, per NPR, about eight times stronger than the average cellphone.”
The downside is when the spacecraft started to near the edge of the solar system, explains the New York Times. NASA wanted to be able to record more data with it. As in many other organizations that have dealt with digital preservation issues, NASA engineers — some of whom probably hadn’t been born yet when Voyager took off — didn’t know how to deal with the antiquated technology.
“NASA’s young programmers were accustomed to working with virtually unlimited storage capacity,” writes Dale McFeatters in a Scripps-Howard News Service editorial. “The solution was to bring out of retirement 77-year-old NASA engineer Lawrence Zottarelli, who had worked with the eight-track units. The team successfully fed data into two computers [Suns] made by a company that was merged out of existence three years ago.”
Just remember that the next time somebody tries to tell you that engineers over 40 aren’t good for anything.
You may recall Nirvanix as the company that would send out a press release after each natural disaster, urging people to come use its service. Well, apparently that strategy didn’t work too well, or maybe we just haven’t had enough natural disasters lately, because several publications, including Information Age, reported that its customers had been told they had two weeks to find another repository for their data, presumably before it shuts down its service.
What that means is, “If you used Nirvanix for third or fourth duplicate copies you need assurance that data will be destroyed,” writes Simon Robinson in Computer Weekly. “If you used it for primary data you need that data back, and that is no trivial task right now.”
Consequently, there’s some degree of poetic justice to the fact that other companies are taking the occasion to jump out of the woodwork to issue their own press releases, promising Nirvanix customers that they can be taken care of. Attunity, for example, announced on Monday a migration service from Nirvanix to AWS’ S3 Cloud, using Attunity’s CloudBeam service, which is intended to simplify and accelerate data loading into Amazon S3.
Network administrators are also scrambling to find alternatives and to figure out the logistics of getting copies of their Nirvanix data installed somewhere else, if they hadn’t done it before. Even organizations that didn’t use Nirvanix are taking this as a wake-up call about whatever cloud storage vendor they’re using, while others — those who never cottoned to the idea of cloud storage in the first place — are patting themselves on the back for their prescience.
“When relying on cloud services it is important to have a backup plan–or at least a way out should the service become untenable,” writes Isha Suri in the Silicon Angle blog. “In the wake of the news of Nirvanix shutting down opinions have begun to rise about how to prepare for and handle such an event.”
Analysts such as Forrester’s Henry Baltazar and Gartner’s Kyle Hilgendorf are suggesting that organizations make sure they have an exit strategy when they sign up with a cloud service, but point out the difficulty of getting data out of the cloud once it’s in. “One of the most significant challenges in cloud storage is related to how difficult it is to move large amounts of data from a cloud,” he writes. “While bandwidth has increased significantly over the years, even over large network links it could take days or even weeks to retrieve terabytes or petabytes of data from a cloud.” He also recommends that organizations look for cloud storage vendors that offer direct connect or shipments of portable hard drives.
The company has finally officially announced its demise on its website, saying it was “working hard” to keep the service available until October 15 to give customers a chance to move their data.
Faithful readers of this blog are aware that we sometimes visit the issue of “what is the bandwidth of a station wagon full of magnetic tapes speeding down the highway” and other ways of putting Really Enormous Amounts of Data in context.
Similarly, this blog recently addressed the issue of how much data the NSA could store.
However, this week Randall Munroe, the author of the geek comic xkcd, came up with a new measurement of data, based on a reader question: “If all digital data were stored on punch cards, how big would Google’s data warehouse be?” Munroe, a physicist who has worked for NASA, in addition to the comic, answers hypothetical reader questions involving physics like this once a week. Other examples include “How fast can you hit a speed bump while driving and live?” and “If you call a random phone number and say ‘God bless you,’ what are the chances that the person who answers just sneezed?”
Anyway, using publicly available data — sources of which were all dutifully footnoted — Munroe went through very much the same sort of back-of-the-envelope calculation that this blog and other sources have gone through, first to calculate the amount of data Google has — in punch card size — and next, to extrapolate from that the amount of data the NSA has.
In the process, there’s several interesting bits. For example:
“To make things worse, given the huge number of drives they manage, Google has a hard drive die every few minutes,” he writes, dutifully footnoting the source of this information. “ This isn’t actually all that expensive a problem, in the grand scheme of things — they just get good at replacing drives — but it’s weird to think that when a Googler runs a piece of code, they know that by the time it finishes executing, one of the machines it was running on will probably have suffered a drive failure.”
Anyway, the figure Munroe came up with for Google’s data store, after a bunch of this calculation, is 15 exabytes. How much is that in punch cards?
“15 exabytes of punch cards would be enough to cover my home region, New England, to a depth of about 4.5 kilometers,” Munroe writes. To put that into perspective (which is something he’s very good at), “That’s three times deeper than the ice sheets that covered the region during the last advance of the glaciers.”
Going on to the NSA, Munroe also pokes fun at some of the more breathless of the speculation. “A few headlines, rather than going with one estimate or the other, announced that the facility could hold ‘between an exabyte and a yottabyte’ of data … which is a little like saying ‘eyewitnesses report that the snake was between 1 millimeter and 1 kilometer long.'”
Munroe concludes with how to find out where the seekrit Google data centers are — like CNN’s Wolf Blitzer advises, it’s “Monitor the pizzas.” “Google has created what might be the most sophisticated information-gathering apparatus in the history of the Earth … and the only people with information about them are the pizza delivery drivers,” he writes.
Prosecutors have dropped attempts to force a suspect to give up the encryption key for his hard drives. Unfortunately, they dropped the attempts not because it was the right thing to do, but because they succeeded in breaking into his hard drives another way and getting the information they wanted.
As you may recall, this all started when Jeffrey Feldman was suspected of having child pornography, based on the names of files he allegedly exchanged on a file-sharing site. However, of his 16 hard drives, 9 were encrypted, and he refused to provide law enforcement with the decryption key. In April, a judge ruled at first that Feldman was not required to give up the decryption key, but then reversed himself in May after law enforcement succeeded in decrypting one drive, which linked the drive to Feldman. However, in June, a different judge granted a stay on that order.
As we noted in May, when the judge reversed himself, this is part of a continuing process where courts are trying to figure out what an encryption key is, legally speaking. Is it a physical thing, like a key to a lockbox, which is not protected by the Fifth Amendment? Or is it like the combination to a safe — the “expression of the contents of an individual’s mind” — which is protected? In some countries, people have even been jailed for refusing to reveal an encryption key.
This case, like most of the other ones regarding revealing encryption keys, has to do with child pornography, which adds another nuance to the issue. Are law enforcement and the legal profession more likely to push the envelope of legal search because they so badly want to catch child pornographers? Or because they think people will be less likely to criticize their methods because the crime is so heinous? (Or as Mike Wheatley put in his blog, Silicon Angle, about the original case, “Data Encryption Makes Perverts Untouchable.“)
“That’s also the whole point of the Bill of Rights: ‘mere suspicion’ is not enough to let the government search your premises and invade your privacy; the government needs actual evidence of wrongdoing before it can interfere with your life,” countered Jennifer Abel, in the Daily Dot, about the April case. “Nowhere in the text of the U.S. Constitution does it say ‘All rights listed herein may be suspended, if cops suspect you did something really really bad.’”
In July, the Electronic Frontier Foundation filed an amicus brief in the case, which laid out all the various reasons and legal precedents why it believes that forcing someone to reveal a decryption key violates the Fifth Amendment protection against self-incrimination. Increasingly, the EFF noted, people and businesses are encrypting their data for their own protection, not because they’re doing anything untoward.
In addition, Feldman’s attorneys contended in July that the prosecution had written its case in such a way as to make it sound like his encryption method and computer system was more sophisticated than that of the average person, with the intent to mislead the court. Examples it cited included describing Feldman’s drives having an “intricate electronic folder structure with thousands of files” when even Windows itself has such a folder structure.
In any event, Feldman was formally charged in August, based on evidence obtained when two of the hard drives were decrypted and sufficient evidence was found to charge him with the crimes. At that point, the prosecution dropped its efforts to force him to decrypt the drives.
Prosecution was under the gun here; the arrest happened the day before the prosecution was due to submit a brief explaining why its request would not violate Feldman’s Fifth Amendment rights, the Milwaukee-Wisconsin Journal Sentinel notes.
The upshot is that we’re no closer to a definitive ruling on whether people will be required to give up decryption keys based on law enforcement suspicions. Because of the varying rulings by lower courts, it is believed by experts that we will need a Supreme Court ruling before we get a definitive answer.
The virtual world was made real this week, as anybody who was anybody was in San Francisco, the site of this year’s VMworld conference for VMware. But there were more clouds in the air than the city’s traditional summer fog.
As always, such conferences feature a lot of new products, which you can read about more. But what many found more interesting was what it all meant for VMware itself, in a year marked by technology and leadership changes. The company became famous for helping organizations use their servers more efficiently, but in a time when server sales are going down and users are moving to the cloud, VMware is in the classic “innovator’s dilemma,” trying to catch up with newer, nimbler competition without alienating its traditional base.
No less a presence than the New York Times (the Times knows from virtualization? Who knew?) writes,
“VMware’s main product, virtualization software, allows one computer server to do the work of many, and for complex tasks to be shared across several machines. That disrupted the old computer server business, and helped usher in the current model of big data centers and cloud computing. But now, as other companies offer both proprietary and open source virtualization, VMware has to move on from the world it helped destroy.”
In the same way that VMware virtualized servers, it and other vendors have virtualized other aspects of computing, such as storage. VMware is looking to extend that to the network itself, through NSX, a product family based on its purchase a year ago of Nicira. And certainly there was a slide full of company logos ready of vendors that said they will support it — though some of them were complaining that the new APIs gave them less functionality than they had had.
On the other hand, one big name was missing: Cisco, which went on later in that week to criticize the whole idea of software-based networking. Of course, to a certain extent, Cisco is in the same dilemma as VMware — having to defend its turf against new, innovative technologies. “It’s hard to be a partner with someone when you’re on a collision course with them,” writes Barb Darrow for GigaOm.
All of this is happening against a backdrop of executives leaving the company in the past year — really, starting with Paul Maritz leaving as CEO to become chief strategy officer at EMC a year ago, and then heading up the Pivotal effort of “everything VMware had that wasn’t virtualization.” And current VMware CEO Pat Gelsinger has been talked about as a potential CEO for EMC once Joe Tucci decides to retire for good. But there’s been more, notes Darrow:
“Maritz took some people with him so they’re still under the umbrella held by parent company EMC. Others left as VMware de-emphasized or sold off ”non-core” technologies like Zimbra, Sliderocket and Wavemaker etc. But the departure of other top executives — CTO Stephen Herrod, and especially former cloud infrastructure head Bogomil Balkansky, definitely contributed — right or wrong — to a perception of brain drain.”
On the other hand, she notes that VMware this week brought in former Microsoft CIO Tony Scott as CIO, and also recently named former SAP mobile guy Sanjay Poonen aboard to lead its end-user computing effort.
It all creates a perception of a company that doesn’t quite know where it’s going, in contrast to the well-oiled machine that VMware has typically been thought of til now. As recently as March, VMware was predicting up to 20 percent revenue growth, because the formation of Pivotal was going to let it focus on its virtualization business. It will be interesting to see whether that prediction comes true.
Time to get out your Disaster Recovery binder. Skip past the sections on “Earthquakes,” “Tornadoes,” “Hurricanes,” “Forest Fires,” “Zombies,” and “Floods,” and stop at the one called “When the Sun Flips Magnetic Poles.”
What do you mean, you don’t have one? Better hurry up. You’re going to need it.
In case you’ve somehow missed the news, our sun is expected to flip its magnetic poles in the next few months. That is, the North Pole will be the South Pole, and vice versa. The sun itself doesn’t move — just the magnetic fields.
While this might sound surprising, it’s actually something the sun does every eleven years or so.
That’s fine, but what does that mean to you? It depends on whom you ask. It ranges from “Well, maybe nothing much, really” to “OMG, WE’RE ALL GONNA DIE!” And nobody really knows.
First of all, we don’t know how severe the associated magnetic shifts are going to be — just like we don’t know ahead of time what hurricane season will be like. Second, we’ve all acquired a lot more electronics in the past eleven years, and nobody really knows what effects the magnetic changes could have on them.
The “nothing much, really” contingent points out that the sun has flipped three times since 1976 and we haven’t had any tragedies yet and there’s no real reason to believe it’s going to be anything different this time.
The OMG contingent says it has the potential of blowing out all our electronics for months or years. “The big fear is what might happen to the electrical grid, since power surges caused by solar particles could blow out giant transformers,” reports National Geographic. “Such transformers can take a long time to replace, especially if hundreds are destroyed at once, said [the] co-author of a National Research Council report on solar-storm risks…The eastern half of the U.S. is particularly vulnerable, because the power infrastructure is highly interconnected, so failures could easily cascade like chains of dominoes. ‘Imagine large cities without power for a week, a month, or a year,’ [he] said. ‘The losses could be $1 to $2 trillion, and the effects could be felt for years.’”
GPSes and satellite systems are also vulnerable. As NASA notes, how’d you like to be coming in for a plane landing or a ship docking by GPS at that time?
A less severe event in 1989 caused power failures in Canada, and almost brought down the power grid on the East Coast. Scientists who studied an even more powerful storm in 1921 in the context of systems today found that a similar event now could cause cascading failures that could even affect the water system.
In addition, the OMG contingent is speculating that the flip could cause another “Carrington Event.” “The biggest solar storm on record happened in 1859, during a solar maximum about the same size as the one we’re entering,” writes National Geographic. It was discovered by a Scottish guy named Richard Carrington, who just happened to be looking at the sun at the same time it emitted a Coronal Mass Ejection (CME), which acted like a giant magnetic fart. So he knew it was coming. When the fart reached the Earth, all sorts of interesting things reportedly happened.
“Just before dawn the next day, skies all over planet Earth erupted in red, green, and purple auroras so brilliant that newspapers could be read as easily as in daylight. Indeed, stunning auroras pulsated even at near tropical latitudes over Cuba, the Bahamas, Jamaica, El Salvador, and Hawaii,” writes NASA. “Even more disconcerting, telegraph systems worldwide went haywire. Spark discharges shocked telegraph operators and set the telegraph paper on fire. Even when telegraphers disconnected the batteries powering the lines, aurora-induced electric currents in the wires still allowed messages to be transmitted.”
What do you think that’s going to do to your iPod? Not to mention your data center? It could give “flash drive” a whole new meaning.
“In 2008 solar scientists predicted that a Carrington scale solar event today could cause blackouts effecting 130 million people and result in economic losses of ‘$1 trillion to $2 trillion during the first year alone…with recovery times of 4 to 10 years,’” writes Data Center Pro. In fact, the article continues, one scientist predicts a 12 percent chance of a Carrington event in the next decade. It’s serious enough that even Homeland Security is looking into it.
“At the time of the Carrington Event, only the 125,000 miles of wire set up for the nascent telegraph network had the correct properties for the induction of auroral currents,” wrote Eric Gallant, one of the primary experts on the phenomenon with respect to data centers, in 2009. “In 2009, there are many more targets for a geomagnetic storm, including transcontinental pipelines, communication lines and power transmission lines. In addition, our vulnerability to geomagnetic storms is increased because modern infrastructure networks are vastly larger than the simple systems of Carrington’s day. In particular, the electrical properties and extent of our national electric grid has led industry professionals to compare it to a continent-wide antenna for geomagnetic energy.”
Needless to say, if the OMG contingent is right, or if we have another Carrington Event, chances are it doesn’t make much difference what you do; we’ll all be hosed anyway. But if it’s simply going to be a heavier-than-usual sunspot day, here’s some precautions to take before the magnetic storms reach their predicted peak in 2015:
- Have backup generators of some sort handy — preferably the kind that don’t require electronics to operate.
- Get UPSes, surge protectors, and so on, and make sure all your equipment is plugged into them. If the situation is severe enough, it won’t help, but it can’t hurt.
- Gallant recommends locating data centers near the lower latitudes, away from the poles.
- Pay attention to the news. The nice thing about the sun being so far away from the Earth — aside from the fact that if it weren’t, we’d, like, die — is that we have some warning. While it takes around eight minutes for light to get to the earth, it can actually take several days for a CME to get here, so you have time to, if necessary, unplug things in hopes there’ll still be a grid to plug them back into afterwards.
And get out your binoculars. The aurorae could be spectacular.
“Much has been written about just how much data that facility might hold, with estimates ranging from ‘yottabytes’ (inWired) to ‘5 zettabytes’ (on NPR), a.k.a. words that you probably can’t pronounce that translate to ‘a lot,'” writes Kashmir Hill in Forbes. “For some sense of scale, you would need just 400 terabytes to hold all of the books ever written in any language.”
However, Hill obtained what she said were actual blueprints for the data center that belied such figures.
“Within those data halls, an area in the middle of the room – marked ‘MR – machine room/data center’ on the blueprints – is the juicy center of the information Tootsie pop, where the digital dirt will reside. It’s surrounded by cooling and power equipment, which take up a goodly part of the floor space, leaving just over 25,000 square feet per building for data storage, or 100,000 square feet for all four buildings, which is the equivalent of a Wal-Mart superstore.”
Hill went to Brewster Kahle, who invented the precursor of the World Wide Web called WAIS, and who went on to found the Internet Archive.
“Kahle estimates that a space of that size could hold 10,000 racks of servers(assuming each rack takes up 10 square feet).’One of these racks cost about $100,000,’ says Kahle. ‘So we are talking $1 billion in machines.’
Kahle estimates each rack would be capable of storing 1.2 petabytes of data. Kahle says that voice recordings of all the phone calls made in the U.S. in a year would take up about 272 petabytes, or just over 200 of those 10,000 racks.
If Kahle’s estimations and assumptions are correct, the facility could hold up to 12,000 petabytes, or 12 exabytes – which is a lot of information(!) – but is not of the scale previously reported. Previous estimates would allow the data center to easily hold hypothetical 24-hour video and audio recordings of every person in the United States for a full year. “
Other experts, such as Paul Vixie, had even lower numbers. “Assuming larger 13 square feet racks would be used, factoring in space between the racks, and assuming a lower amount of data storage per rack, he came up with an estimate of less than 3 exabytes of data capacity for the facility,” Forbes writes.
Hill isn’t the only one who’s been thinking about the storage capacity of that Utah data center.
“To put this into perspective, a yottabyte would require about a trillion 1tb hard drives and data centers the size of both Rhode Island and Delaware,” writes security consultant Mark Burnett. “Further, a trillion hard drives is more than a thousand times the number of hard drives produced each year. In other words, at current manufacturing rates it would take more than a thousand years to produce that many drives. Not to mention that the price of buying those hard drives would cost up to 80 trillion dollars–greater than the GDP of all countries on Earth.”
Even looking at a zettabyte, or .1 percent of a yottabyte, is unrealistic, Burnett continues. “Let’s assume that if you buy 250 million hard cheap consumer-grade drives you get a discount, so they get them at $150 each which would come to a $37.5 billion for the bare hard drives alone (well, and a billion tiny screws).”
That might sound familiar. You may recall that Backblaze powers its backup service (disclaimer: I use it) with commodity drives in that way. You may also recall that it occasionally has a hell of a time finding enough drives.
As it turns out, Backblaze has also examined the NSA claims — and it did so back in 2009:
“The cost per GB has dropped consistently 4% per month for the last 30 years. Assume the trend continues for the next 5 years, by when the NSA needs their yottabyte of storage. The costs in 2015 then would be:
* $8 trillion for the raw drives
*$80 trillion for a storage system
Well, that’s getting closer – a bit less than today’s global GDP.
Per historical metrics, a drive should hold 10 TB by 2015. The NSA would require:
* 100 billion hard drives
* 2 billion Backblaze storage pods
And of course, they would probably want this data backed up. That might really test our offer of $5 for unlimited storage.”
Backblaze isn’t the only vendor doing back-of-the-envelope calculations (perhaps practicing for an RFP?) NetApp technologist Larry Freeman is as well:
“Assuming that 40% of the 25,000 sq ft floor space in each of the 4 data halls would be used to house storage, 2,500 storage racks could be housed on a single floor (with accommodations for front and rear service areas). Each rack could contain about 450 high capacity 4TB HDDs which would mean that 1,125,000 disk drives could be housed on a single data center floor, with 4.5 Exabytes of raw storage capacity.”
And that’s not even getting into the power consumption aspect. The Utah data center is reportedly slated to use up to 65 megawatts of power, or as much as the entire city of Salt Lake itself. Forbes quoted Kahle’s estimate of $70 million a year for 70 megawatts, while Wired reportedly estimated $40 million a year for 65 megawatts. (And recall that Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could tack on up to $2.4 million annually on to $40 million.)
Burnett’s power calculation is even higher. “250 million hard drives would require 6.25 gigawatts of power (great Scott!). Of course, drives need servers and servers need switches and routers; they’re going to need a dedicated nuclear power plant. They’re going to need some fans too, 4.25 billion btu definitely would be uncomfortable.” Of course, there are other options, he notes. “Another option that would use much less electricity and far less space would be 128 GB microSDXC cards. Except that you would need 9,444,732,965,739,290 of them. At $150 each.”
Freeman’s power calculation is high as well.
“HOWEVER, each storage rack consumes about 5 Kilowatts of power, meaning the storage equipment alone would require 12.5 Megawatts. On the other hand, servers consume much more power per rack. Up to 35 Kilowatts. Assuming an equivalent number of server racks (2,500), servers would eat up 87.5 Megawatts, for a total of 100 Megawatts. Also, cooling this equipment would require another 100 Megawatts of power, making the 65 Megawatt power substation severely underpowered — and so far we’ve only populated a single floor. Think that the NSA can simply replace all those HDDs with Flash SSDs to save power? Think again, an 800GB SSD (3 watts) actually consumes more power per GB than a 4TB HDD (7.8 watts).
Something I haven’t seen anyone address is what buying that much storage would do to the revenues of the lucky hardware vendor — or vendors. How in the world would Seagate, or any of the component vendors, be able to keep a purchase of that size secret?
Moreover, with many hard drive component manufacturers located outside the U.S., and with there already being concern that computer components might have malware baked in, how would the NSA guarantee the integrity of non-U.S. components? (For that matter, with so many NSA whistleblowers wandering around, could it trust the integrity of U.S.-built components?)
Meanwhile, Datacenter Dynamics notes that, in this case, “size doesn’t matter,” particularly since the NSA is likely to be using state-of-the-art deduplication and compression technologies to reduce the amount of data stored. “The capacity for storing data is not nearly as important as being able to process data and derive valuable information from it,” writes Yevgeniy Sverdlik. “Making sense out of data is a lot harder than storing it, so the NSA’s compute capacity, in terms of processor cores, and the analytics methods its data-miners use are much more interesting questions.”
Incidentally, the NSA recently responded to a Freedom of Information Act request by saying it didn’t have the capability to search its own employees’ email in bulk.
A large number of Oregonians looking for state services — including 63,000 unemployed people expecting checks for a total of $18 million in benefits — were left high and dry for a day recently due to problems with a Hitachi storage upgrade.
Hitachi contractors were doing what was supposed to be a routine upgrade to the State Data Center in Salem when a connectivity issue caused the system to go down, KGW News reported state spokesman Matt Shelby as saying. “Hitachi worked overnight to fix the problem. All state agency websites were affected, but no data was lost,” the station said. The outage started at 7 p.m. Monday and was repaired by Tuesday morning, while state services were restored by midday.
Up to 90 percent of the weekly unemployment benefits are normally processed on Monday nights, according to an AP story in The Columbian.
Other issues, according to Oregon Public Radio and The Oregonian, included:
- Inability for the state’s more than 90 agencies to communicate directly with each other via email
- Any jobs that needed to pull data from the data center couldn’t run
- The Department of Transportation TripCheck was down
- The Department of Forestry, which was fighting a fire in Prineville (ironically, where Facebook has one of its data centers) didn’t have access to email or database forms
- 35 applications for food stamps scheduled for overnight processing were delayed
Ironically, to a certain extent Oregon brought this on itself by planning to consolidate its various state data centers into the single State Data Center in 2004. “The State Data Center was authorized in July 2004 to consolidate the computer operations of the 12 largest agencies,” notes the Statesman-Journal. “A $20 million building on Airport Road SE houses the center, which opened in fall 2005. Lawmakers in 2005 approved $43.6 million for the consolidation process.” But in July, 2008 — almost exactly five years ago — the state’s plan for consolidating data centers was sharply criticized for not adequately consolidating the servers themselves.
The system has also been plagued by crashes. In October, 2009, a network failure on the State Data Center system caused an overload on the unemployment system, shutting it down for 12 hours. In October, 2011, unemployment payments were delayed a day because a computer upgrade had “unintended consequences.” Then in May, 2012, a number of state websites were down for most of a day due to problems in a Texas data center that stored their content.
That was just two months after the Secretary of State’s office performed an audit of the department, noting that it needed improvement in the area of disaster recovery. That letter referenced the Federal Information Systems Controls Audit Manual, which notes, among other things, that “Spare or backup hardware is used to provide a high level of system availability for critical and sensitive applications.”
And, a month ago, three senior officials in the Department of Employment lost their jobs due in part to problems with the department’s computer systems. “Audit after audit exposed leadership problems that festered as they agency wasted as much as $30 million on computer software programs that didn’t work,” reported The Oregonian. “IT employees ‘are appointed to positions that they may or may not be suitable for, they are not coached and then their job duties were significantly changed.’ It said that the IT division needed “leadership, governance, priority setting, methodology, contract administration and appropriate HR practices.”
State officials pointed out that no data was lost in the recent incident, and that it was simply a matter of access to the systems that was lost for a day.
This is not to pick on Oregon; as IEEE Spectrum pointed out, the state government computer systems of New Mexico, Kansas, North Carolina, New Jersey, and Iowa all ran into problems that same week. These incidents do demonstrate, though, the challenges for citizens needing services — who tend to be the less computer-savvy ones — when the increasingly computerized state computer systems run into problems.
“Just who in their right mind upgrades a live system?” noted one commenter.
Analyst Greg Schulz of Storage I/O agrees, calling it “CYA 101.” “Anytime there is a person involved — regardless of if it’s hardware, cables, software, firmware, configurations or physical environments –something can happen,” he writes. “If the vendor drops the ball or a cable or card or something else and causes an outage or downtime, it is their responsibility to discuss those issues. However, it is also the customer’s responsibility to discuss why they let the vendor do something during that time without taking adequate precautions. Likewise, if the storage system was a single point of failure for an important system, then there is the responsibility to discuss the cost cutting concerns of others and have them justify why a redundant solution is not needed.”