“Much has been written about just how much data that facility might hold, with estimates ranging from ‘yottabytes’ (inWired) to ‘5 zettabytes’ (on NPR), a.k.a. words that you probably can’t pronounce that translate to ‘a lot,'” writes Kashmir Hill in Forbes. “For some sense of scale, you would need just 400 terabytes to hold all of the books ever written in any language.”
However, Hill obtained what she said were actual blueprints for the data center that belied such figures.
“Within those data halls, an area in the middle of the room – marked ‘MR – machine room/data center’ on the blueprints – is the juicy center of the information Tootsie pop, where the digital dirt will reside. It’s surrounded by cooling and power equipment, which take up a goodly part of the floor space, leaving just over 25,000 square feet per building for data storage, or 100,000 square feet for all four buildings, which is the equivalent of a Wal-Mart superstore.”
Hill went to Brewster Kahle, who invented the precursor of the World Wide Web called WAIS, and who went on to found the Internet Archive.
“Kahle estimates that a space of that size could hold 10,000 racks of servers(assuming each rack takes up 10 square feet).’One of these racks cost about $100,000,’ says Kahle. ‘So we are talking $1 billion in machines.’
Kahle estimates each rack would be capable of storing 1.2 petabytes of data. Kahle says that voice recordings of all the phone calls made in the U.S. in a year would take up about 272 petabytes, or just over 200 of those 10,000 racks.
If Kahle’s estimations and assumptions are correct, the facility could hold up to 12,000 petabytes, or 12 exabytes – which is a lot of information(!) – but is not of the scale previously reported. Previous estimates would allow the data center to easily hold hypothetical 24-hour video and audio recordings of every person in the United States for a full year. “
Other experts, such as Paul Vixie, had even lower numbers. “Assuming larger 13 square feet racks would be used, factoring in space between the racks, and assuming a lower amount of data storage per rack, he came up with an estimate of less than 3 exabytes of data capacity for the facility,” Forbes writes.
Hill isn’t the only one who’s been thinking about the storage capacity of that Utah data center.
“To put this into perspective, a yottabyte would require about a trillion 1tb hard drives and data centers the size of both Rhode Island and Delaware,” writes security consultant Mark Burnett. “Further, a trillion hard drives is more than a thousand times the number of hard drives produced each year. In other words, at current manufacturing rates it would take more than a thousand years to produce that many drives. Not to mention that the price of buying those hard drives would cost up to 80 trillion dollars–greater than the GDP of all countries on Earth.”
Even looking at a zettabyte, or .1 percent of a yottabyte, is unrealistic, Burnett continues. “Let’s assume that if you buy 250 million hard cheap consumer-grade drives you get a discount, so they get them at $150 each which would come to a $37.5 billion for the bare hard drives alone (well, and a billion tiny screws).”
That might sound familiar. You may recall that Backblaze powers its backup service (disclaimer: I use it) with commodity drives in that way. You may also recall that it occasionally has a hell of a time finding enough drives.
As it turns out, Backblaze has also examined the NSA claims — and it did so back in 2009:
“The cost per GB has dropped consistently 4% per month for the last 30 years. Assume the trend continues for the next 5 years, by when the NSA needs their yottabyte of storage. The costs in 2015 then would be:
* $8 trillion for the raw drives
*$80 trillion for a storage system
Well, that’s getting closer – a bit less than today’s global GDP.
Per historical metrics, a drive should hold 10 TB by 2015. The NSA would require:
* 100 billion hard drives
* 2 billion Backblaze storage pods
And of course, they would probably want this data backed up. That might really test our offer of $5 for unlimited storage.”
Backblaze isn’t the only vendor doing back-of-the-envelope calculations (perhaps practicing for an RFP?) NetApp technologist Larry Freeman is as well:
“Assuming that 40% of the 25,000 sq ft floor space in each of the 4 data halls would be used to house storage, 2,500 storage racks could be housed on a single floor (with accommodations for front and rear service areas). Each rack could contain about 450 high capacity 4TB HDDs which would mean that 1,125,000 disk drives could be housed on a single data center floor, with 4.5 Exabytes of raw storage capacity.”
And that’s not even getting into the power consumption aspect. The Utah data center is reportedly slated to use up to 65 megawatts of power, or as much as the entire city of Salt Lake itself. Forbes quoted Kahle’s estimate of $70 million a year for 70 megawatts, while Wired reportedly estimated $40 million a year for 65 megawatts. (And recall that Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could tack on up to $2.4 million annually on to $40 million.)
Burnett’s power calculation is even higher. “250 million hard drives would require 6.25 gigawatts of power (great Scott!). Of course, drives need servers and servers need switches and routers; they’re going to need a dedicated nuclear power plant. They’re going to need some fans too, 4.25 billion btu definitely would be uncomfortable.” Of course, there are other options, he notes. “Another option that would use much less electricity and far less space would be 128 GB microSDXC cards. Except that you would need 9,444,732,965,739,290 of them. At $150 each.”
Freeman’s power calculation is high as well.
“HOWEVER, each storage rack consumes about 5 Kilowatts of power, meaning the storage equipment alone would require 12.5 Megawatts. On the other hand, servers consume much more power per rack. Up to 35 Kilowatts. Assuming an equivalent number of server racks (2,500), servers would eat up 87.5 Megawatts, for a total of 100 Megawatts. Also, cooling this equipment would require another 100 Megawatts of power, making the 65 Megawatt power substation severely underpowered — and so far we’ve only populated a single floor. Think that the NSA can simply replace all those HDDs with Flash SSDs to save power? Think again, an 800GB SSD (3 watts) actually consumes more power per GB than a 4TB HDD (7.8 watts).
Something I haven’t seen anyone address is what buying that much storage would do to the revenues of the lucky hardware vendor — or vendors. How in the world would Seagate, or any of the component vendors, be able to keep a purchase of that size secret?
Moreover, with many hard drive component manufacturers located outside the U.S., and with there already being concern that computer components might have malware baked in, how would the NSA guarantee the integrity of non-U.S. components? (For that matter, with so many NSA whistleblowers wandering around, could it trust the integrity of U.S.-built components?)
Meanwhile, Datacenter Dynamics notes that, in this case, “size doesn’t matter,” particularly since the NSA is likely to be using state-of-the-art deduplication and compression technologies to reduce the amount of data stored. “The capacity for storing data is not nearly as important as being able to process data and derive valuable information from it,” writes Yevgeniy Sverdlik. “Making sense out of data is a lot harder than storing it, so the NSA’s compute capacity, in terms of processor cores, and the analytics methods its data-miners use are much more interesting questions.”
Incidentally, the NSA recently responded to a Freedom of Information Act request by saying it didn’t have the capability to search its own employees’ email in bulk.
A large number of Oregonians looking for state services — including 63,000 unemployed people expecting checks for a total of $18 million in benefits — were left high and dry for a day recently due to problems with a Hitachi storage upgrade.
Hitachi contractors were doing what was supposed to be a routine upgrade to the State Data Center in Salem when a connectivity issue caused the system to go down, KGW News reported state spokesman Matt Shelby as saying. “Hitachi worked overnight to fix the problem. All state agency websites were affected, but no data was lost,” the station said. The outage started at 7 p.m. Monday and was repaired by Tuesday morning, while state services were restored by midday.
Up to 90 percent of the weekly unemployment benefits are normally processed on Monday nights, according to an AP story in The Columbian.
Other issues, according to Oregon Public Radio and The Oregonian, included:
- Inability for the state’s more than 90 agencies to communicate directly with each other via email
- Any jobs that needed to pull data from the data center couldn’t run
- The Department of Transportation TripCheck was down
- The Department of Forestry, which was fighting a fire in Prineville (ironically, where Facebook has one of its data centers) didn’t have access to email or database forms
- 35 applications for food stamps scheduled for overnight processing were delayed
Ironically, to a certain extent Oregon brought this on itself by planning to consolidate its various state data centers into the single State Data Center in 2004. “The State Data Center was authorized in July 2004 to consolidate the computer operations of the 12 largest agencies,” notes the Statesman-Journal. “A $20 million building on Airport Road SE houses the center, which opened in fall 2005. Lawmakers in 2005 approved $43.6 million for the consolidation process.” But in July, 2008 — almost exactly five years ago — the state’s plan for consolidating data centers was sharply criticized for not adequately consolidating the servers themselves.
The system has also been plagued by crashes. In October, 2009, a network failure on the State Data Center system caused an overload on the unemployment system, shutting it down for 12 hours. In October, 2011, unemployment payments were delayed a day because a computer upgrade had “unintended consequences.” Then in May, 2012, a number of state websites were down for most of a day due to problems in a Texas data center that stored their content.
That was just two months after the Secretary of State’s office performed an audit of the department, noting that it needed improvement in the area of disaster recovery. That letter referenced the Federal Information Systems Controls Audit Manual, which notes, among other things, that “Spare or backup hardware is used to provide a high level of system availability for critical and sensitive applications.”
And, a month ago, three senior officials in the Department of Employment lost their jobs due in part to problems with the department’s computer systems. “Audit after audit exposed leadership problems that festered as they agency wasted as much as $30 million on computer software programs that didn’t work,” reported The Oregonian. “IT employees ‘are appointed to positions that they may or may not be suitable for, they are not coached and then their job duties were significantly changed.’ It said that the IT division needed “leadership, governance, priority setting, methodology, contract administration and appropriate HR practices.”
State officials pointed out that no data was lost in the recent incident, and that it was simply a matter of access to the systems that was lost for a day.
This is not to pick on Oregon; as IEEE Spectrum pointed out, the state government computer systems of New Mexico, Kansas, North Carolina, New Jersey, and Iowa all ran into problems that same week. These incidents do demonstrate, though, the challenges for citizens needing services — who tend to be the less computer-savvy ones — when the increasingly computerized state computer systems run into problems.
“Just who in their right mind upgrades a live system?” noted one commenter.
Analyst Greg Schulz of Storage I/O agrees, calling it “CYA 101.” “Anytime there is a person involved — regardless of if it’s hardware, cables, software, firmware, configurations or physical environments –something can happen,” he writes. “If the vendor drops the ball or a cable or card or something else and causes an outage or downtime, it is their responsibility to discuss those issues. However, it is also the customer’s responsibility to discuss why they let the vendor do something during that time without taking adequate precautions. Likewise, if the storage system was a single point of failure for an important system, then there is the responsibility to discuss the cost cutting concerns of others and have them justify why a redundant solution is not needed.”
We’re always into the geekly here at Yottabytes, like data under glass and so on. Naturally, we were fascinated to read about “freezing” light and its implications for data storage.
If you missed it, a detailed description comes from the BBC:
“The team fired a light beam called a signal pulse through a sealed glass cylinder containing a hot gas containing atoms of the element rubidium, illuminated by a strong ray of light known as a control beam. While the pulse was travelling through the rubidium gas, the researchers switched off the control beam, creating a holographic imprint of the signal pulse on the rubidium atoms,” the BBC reports. “Earlier experimental methods had then switched on a single control beam to recreate the signal pulse, which then continued on its way. However, in this latest study, researchers switched on two control beams which created an interference pattern that behaves like a stack of mirrors. As the regenerated signal pulse tries to continue on its way through the glass cylinder, the photons bounce back and forth, but the overall signal pulse remains stationary. The light beam was essentially frozen.”
The light was frozen for an entire minute. While this may not seem like long, it’s enough time for 20 round trips to the moon.
Another version was also printed in i09 (though in the process they said light traveled at 300 mps; hilarity ensued).
(You can also read the actual abstract.)
Research into the stopping-light area has been going on for some time, reports New Scientist. “Physicists managed to slow it down to just 17 metres per second in 1999 and then halt it completely two years later, though only for a fraction of a second. Earlier this year, researchers kept it still for 16 seconds using cold atoms.” In this particular experiment, the light-freezing was also enhanced using magnetism.
Where the storage comes in was part of the demonstration. “And they proved the accomplishment by storing — and then successfully retrieving — information in the form of a 100-micrometer-long picture with three horizontal stripes on it.” The one-minute storage time is about six orders of magnitude longer than previous experiments, notes the American Physical Society. Moreover, the fact that the storage time can be manipulated based on the use of magnetism means that storage could be “spatially multiplexed, i.e., can store different quantum bits as different pixels,” they write.
Of course, nobody’s talking yet about when this might actually be usable for storage. “The efficiency of the storage (<1% in the present scheme) will have to be significantly increased for applications,” the American Physical Society admits. However, the researchers are planning to try different substances to increase the duration of information storage. Tens of seconds of light storage are needed for a device called a quantum repeater, which would stop and then re-emit photons used in secure communications, to preserve their quantum state over long distances, New Scientist says.
There are also implications for security, the BBC adds. “Quantum cryptography might provide very secure forms of electronic encryption, because the process of eavesdropping on an electronic message would introduce errors in the message, garbling it.” How Heisenberg of it.
If you — or, more likely, your boss — are having conniptions about the alleged Seekrit Backdoors in HP storage hardware, you can relax. Sort of. On the other hand, you may have a bigger problem.
To recap — a blogger discovered an administrative account with an easily-guessed password in HP’s StoreOnce storage hardware. HP has reportedly done this before, in other hardware. In response, a number of publications have leapt to claim that “HP is putting back doors into its equipment!”
Part of the problem is the whole term “back door,” which implies something nefarious the vendor put in on purpose to be able to have access to the data on the system. And that’s not what this is. If HP is “guilty” of anything, it’s guilty of something a whole lot of vendors also do: That is, putting in a set of administrative logins, default passwords, or features — typically to allow the administrator, or the vendor, or the support organization, to recover the system from some sort of user screwup. It happens with all sorts of networking hardware, not just storage, and certainly not just HP.
It’s like the way I left a spare house key in the freezer in my garage. If I was stupid and locked myself out, it was a way to get in without having to call a locksmith or break a window.
Now, if burglars found out I did this, that would be bad, because they could all go fishing around in the freezer and find my spare key. Similarly, what makes this issue a problem in computers is when it becomes known that, psst, all of the boxes from Vendor Y ship with an account called “admin” and a default password of “password.” That makes it a security vulnerability, because, you know, this doesn’t always get changed the way it should and, you know, hackers share this sort of information with each other. Then we have a problem.
One of the standard things administrators are supposed to do when they get in a new piece of equipment is to look for these standard admin accounts, and either get rid of them, change the default password they ship with, or whatever. A lot of these details get documented, either in the manual or on the support forums.
Sadly, not every administrator reads the manual and does research on what vulnerabilities are baked in to a new piece of equipment. This is why, every few months, there’s a new warning about this kind of thing. This time, it just happened to be storage hardware, and from HP.
As recently as late June, the Computer Emergency Response Team (CERT) issued a warning about default passwords in new equipment. Chances are, before the year is out, there’ll be yet another incident based on the fact that administrators don’t always do the work they should before they connect the new hardware to the network. It’s just one of those Things.
And it’s been going on a long time. If you read any of the “Eek! HP Backdoor!” articles, check out the comments, where the graybeards are rolling their eyes and patiently pointing out all the other systems that have built-in admin accounts and default passwords.
Yes, it’s an issue, but not just for HP, and not just for storage hardware. So go check your equipment — all of it — read the manuals, and make sure all the default passwords are changed, and you can tell your boss you’ve taken care of all the scary “back doors.”
Incidentally, I have a new place to stash my spare house key.
(Geekly aside: Technically, there is a distinction between flash and solid state storage. On a practical level, though, the terms are pretty much interchangeable these days.)
First, Western Digital and sTec announced a merger where sTec will be acquired by HGST, a wholly-owned subsidiary of Western Digital. sTec will be acquired for approximately $340 million in cash, or $6.85 per share, Western Digital said. “STec started its life as Simple Technology in 1990 and went public in 2000, but later sold its consumer flash business to Fabrik to focus solely on the enterprise flash business,” writes Om Malik of GigaOm.
Second, SanDisk said it was paying $307 million for Smart Storage Systems, a developer of enterprise solid-state memory drives that has been owned since 2011 by the investment firm Silver Lake Partners. This is SanDisk’s fourth acquisition in that market, according to the Associated Press, including FlashSoft Corp. in Feb 2012 and enterprise SSD solutions provider Pliant Technology in May 2011. “Leveraging Smart Storage’s capabilities and intellectual properties, SanDisk will be able to enhance its existing enterprise SSD and software portfolio, gain economies of scale and increase share in the potential $1.6 billion enterprise SATA and SAS space,” writes Zacks Equity Research, but warning that it faces tough competition from companies such as Western Digital and Seagate.
The two acquisitions happened within a week of each other.
“This SSD frenzy is being driven by data centers which are dealing with much heavier demands on the machines, and of course our need to access information quickly,” Malik writes, noting that if the storage is fast enough, it can actually reduce the need for storage in an organization or service – which matters if you’re someone the size of Facebook.
Flash storage startups have sprouted like mushrooms in a summer rain, receiving millions in VC funding, notes Investor’s Business Daily. However, it’s natural for consolidation to take place as the industry matures, winners and losers start shaking out, and VCs start itching to get their payout.
Meanwhile, one of the original big flash startups, Fusion-io – which actually went public about two years ago — is also being eyed as an acquisition target, though its value is high enough that other companies might be a better buy, writes Jordan Novet for GigaOm.
Seagate hasn’t bought anybody this week yet, but invested $40 million in Virident in January, reports Investor’s Business Daily.
Has it been a year already? Gartner has released its third Magic Quadrant for e-discovery vendors, and while some of the names are new, the song is the same: growth and acquisitions.
The Leaders quadrant is now pretty crowded with nine vendors, four of them new to the quadrant this year. The first MQ, after all, had FTI Technology, kCura, Clearwell Systems, Guidance Software, and Autonomy in the leaders spot; a year later, two of those had been purchased and the Leaders quadrant was then Symantec, ZyLAB, AccessData, Guidance Software, Autonomy, and Recommind, with FTI and kCura slipping back to mere Challengers.
This year, the gang’s all here. FTI and kCura are back in the Leaders quadrant. Symantec, Autonomy (now called HP-Autonomy), Recommind, Guidance, and AccessData are all still there, and Kroll Ontrack has managed to creep from Niche in 2011 to Challenger in 2012 to Leader in 2013. The remaining new Leader is Exterro, which was a Visionary in 2011 and 2012.
On the other hand, there’s no clear leader among the Leaders. HP/Autonomy, which has had its own problems, is considered the “most visionary,” while Symantec, which purchased Clearwell before the 2011 MQ was even published, is considered to have the best “ability to execute,” but they’re still pretty darn bunched up.
(This brings up a point that needs to be made: a “leader” isn’t inherently better than a “visionary” or a “challenger” for a particular company, and that being relegated to “niche” doesn’t necessarily mean there’s anything wrong with that vendor, if its niche happens to meet a particular organization’s business needs. Everyone gets so gol-durned hung up on who’s in the Leaders quadrant, but many of the vendors in the other quadrants are perfectly serviceable and even preferable for some situations. “Leaders” and “Challengers” are thought to be best in their ability to execute; “Leaders” and “Visionaries” are thought to be best in, well, vision. But just being in the MQ in the first place is a pretty good sign.)
If anything is surprising, it’s that in the past year – after a number of acquisitions in the previous two years, amid Gartner’s prediction in 2011 that “by 2014, consolidation will have eliminated one in every four enterprise e-discovery vendors,” with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors – there hasn’t been much in the way of major acquisitions. Guidance bought visionary Case Central, plus there were a couple of acquisitions of vendors not on either of the MQs, and that’s about it. Gartner continues to predict a 25% reduction in the field in the next two years, but now predicts “most of the attrition among service providers and the legal solution channel, not software vendors” – in other words, the names on the MQ aren’t likely to change much.
Gartner also updated its revenue projections; it now says that “revenue in the enterprise e-discovery software market will grow from $1.7 billion in 2013 to $2.9 billion in 2017,” for a growth rate of 15%, while in 2011 it had predicted a growth rate of 14%, which would result in a total of $1.5 billion in 2013. So it seems like things are right on target in that respect.
Surprisingly, Gartner didn’t seem to include much of an overview of the past year, perhaps because, as compared with previous years, not much happened. One would think that with e-discovery playing such a major role in lawsuits such as Apple vs. Samsung (including Google), companies would be paying more attention to e-discovery, but perhaps everyone’s bought it now and it’s all set up perfectly and there’s nothing left to worry about?
C’mon, let’s get some action going here. EMC or Oracle, clean out a desk drawer for spare change and buy somebody. Maybe one of the eight privately held companies out of the 23 in the MQ might look for an exit strategy, either an IPO or a merger with someone else? Maybe someone with a lot of vision and somebody with a lot of ability to execute might get together?Let’s hope next year’s report is more exciting.
“The point is this: No matter how fat a pipe you have to the Internet, at some given amount of data, it’s going to be faster, cheaper, or both to use some manual method to ship data on some storage medium.”
I wrote that in September, 2011, in what remains (if I do say so myself) a fascinating and true look at the challenges in transferring really, really big amounts of data (aka “calculating the bandwidth of a station wagon full of tapes hurtling down the highway,” with a side look at the Avian Carrier Protocol, or sending data by carrier pigeon).
Well, it’s time to update it — and, actually, there was something back from 2009 I failed to include.
Let’s say you’re setting up a cloud storage service. Okay. How do you fill it up in the first place? If you’ve got terabytes or petabytes of data, getting it over to the new cloud storage service is going to be a hassle. And a time-consuming hassle at that — 166 days to move a terabyte of data over DSL, according to Amazon.
It turns out that Amazon announced in 2009, and Google duplicated this week, a service that enables you to burn all your data to a hard disk, and simply mail it to a facility, where elves from Amazon or Google will upload the data for you. (No word on whether Google sends the disk drives back to you afterwards, though the form does ask for a return mailing address; Amazon specifies that it returns the drive.) Amazon’s service is called AWS Import/Export, while Google’s is called Offline Disk Import for Google Cloud Storage.
The companies each charge $80 per disk drive for this service, which means it behooves you to use the biggest disk drive you can rather than a whole lot of commodity disks (a la Backblaze). Amazon has an upper limit of 16 TB, though it indicates it might be able to handle bigger ones on request; Google’s form lists varying total size options, ending at more than 400 TB.) Amazon also charges a $2.49 per hour upload fee; Google doesn’t mention such a charge. For the time being, at least, Google is limiting the service to U.S. users; while Amazon supports it in Singapore and Ireland as well as in several U.S. locations, the return address for the Ireland location must be in the European Union.
In addition, Google requires that you encrypt the data before sending. While this adds time and space and money, it adds an interesting wrinkle in terms of security. We now know that online services are regularly being monitored by the Feds, and are required to decrypt data based on court orders and other legal documents. But vendors such as Dropbox, which got busted for this (coincidentally) in 2011, pointed out that if the data was encrypted before it was uploaded to the service, there was nothing they could do in the Feds came calling. So if you’re newly worried about what the NSA might find in your data on the cloud, send it all encrypted on a disk drive, and Bob’s your uncle
While Amazon’s 2009 announcement didn’t create a stampede of providers offering a similar service, it will be interesting to see, in light of Google’s announcement, whether other providers will follow suit – particularly after the NSA revelations. (One wonders if Google’s timing was a coincidence.)
Google said the feature was experimental and that it might “rapidly evolve the feature,” which might break backwards compatibility, and that it couldn’t guarantee quality of service.
Meanwhile, a number of people want to know whether the shipping-data-on-disk services will also work the other way, for quicker restores.
Whether you believe Edward Snowden is a traitor or a hero, one thing is clear: the federal government is still apparently clueless when it comes to thumb drive security.
Word is that Snowden — as well as Bradley Manning before him, three years ago — downloaded information onto a thumb drive that he’d smuggled in. “Apparently he’s got a thumb drive,” Sen. Saxby Chambliss (R-Ga.) said Tuesday in the New York Daily News. “He’s already exposed part of it and I guess he’s going to expose the rest of it.”
The thing is, what Snowden allegedly did could have been done just as easily by many other people. “You can walk out of a building with a Zip drive or a USB stick on the end of your keychain with all of the information that’s in that building and walk right out without sweating a bit or anybody noticing what you’re doing,” says Joel Brenner, former inspector for the National Security Agency, on NPR.
This is nothing new. As far back as Stuxnet, thumb drives have been implicated in all sorts of security issues, both bringing malware in and taking legitimate data out. A 2011 Ponemon study found all sorts of security issues around thumb drives.
Thumb drives have been banned from the Pentagon, including the NSA, since October, 2008, according to the LA Times. Oh goody. That should have solved the problem, because of course everyone obeys regulations, especially people who are about to blow the whistle on the country’s security agency, right?
Aside from the fact that there were always “exceptions” to the bans, especially for network administrators, look at it this way: there is no way that the NSA, or any other organization (including yours) is going to be able to keep people from smuggling in thumb drives. Even if they set off metal detectors (and I’m not sure they do), for someone dedicated enough, they’re going to find a way. (Let’s just say it’s a new meaning for “dark fiber” and leave it at that, shall we?)
Investigators are now saying they know how many documents Snowden allegedly downloaded and what server they came from,” according to an official who would not be named while speaking about the ongoing investigation. Well, that’s very nice, but why didn’t they know that before he left the building?
“The federal government uses a variety of tools that could identify the activities of employees,” writes Eric Chabrow in BankInfoSecurity. “Those include keylogging software and computer logs that pinpoint staff members’ whereabouts and actions within federal IT systems and networks, sources familiar with the federal government’s security clearance systems say. But having the tools in place — and not all tools are used by all agencies at all times — doesn’t mean that the proper authorities are alerted in a timely manner to activities that could jeopardize the nation’s security.”
Chabrow went on to quote Robert Bigman, who retired last year after 15 years as the CIA’s CISO, who said the Defense Department and the intelligence community continually rejected the idea of using digital rights management tools to restrict access to specified content in order to secure intelligence reporting. “They need to re-evaluate that decision,” he says in the article. You think?
So the question is, what is it you can do to keep someone from using this smuggled-in thumb drives?
- Do your computers have functioning USB slots?
- Can someone plug something into one of these USB slots without being detected?
- To what sort of data do people have access?
- Can someone download that data onto a thumb drive without being detected?
- Is that downloaded data unencrypted?
If any of these things are true in your organization, you, too, are vulnerable. And whistleblower, traitor, or run-of-the-mill thief, it won’t make a difference.
Legality and ethics aside — which is something for the law and politics blogs to talk about — the interesting part about the alleged NSA tracking issue are the logistical issues involved. How much data is there? How did the NSA manage it? Where did they put it?
“The NSA is storing all those Verizon (and, presumably, other carrier records) in a massive database system called Accumulo, which it built itself (on top of Hadoop) a few years ago because there weren’t any other options suitable for its scale and requirements around stability or security,” claims Derrick Harris in GigaOm. “The NSA is currently storing tens of petabytes of data in Accumulo.”
With Accumulo, the NSA has said it can process a 4.4-trillion-node (callers), 70-trillion-edge (connections between two callers) graph, according the NSA’s own slide presentation, Harris says. “By way of comparison, the graph behind Facebook’s Graph Search feature contains billions of nodes and trillions of edges.”
(And, in keeping with the Obama Administration’s effort to use open source tools, the NSA donated Accumulo to the Apache Foundation in 2011.)
While people like to think that the amount of data required to spy on a country just isn’t realistic, in practice it doesn’t take that much – particularly if the data stored is only traffic analysis, or the date, length, time, duration, and parties to a call, and not the content of the call itself. Researcher John Villasenor noted last summer that storage costs had dropped by a factor of a million since 1984. And the NSA has always been on the bleeding edge of such research.
“In the 1960s, the National Security Agency used rail cars to store magnetic tapes containing audio recordings and other material that the agency had collected but had never managed to examine, said James Bamford, an author of three books on the agency,” reported Scott Shane in the New York Times, about Villasenor’s work. “In those days, the agency used the I.B.M. 350 disk storage unit, bigger than a full-size refrigerator but with a capacity of 4.4 megabytes of data. Today, some flash drives that are small enough to put on a keychain hold a terabyte of data, about 227,000 times as much.”
And how many calls are we talking now? AT&T has 107.3 million wireless customers and 31.2 million landline customers; Verizon has 98.9 million wireless customers and 22.2 million landline customers; and Sprint has 55 million customers in total, according to the Wall Street Journal. Another Journal article went into more detail on the requirements.
“The task of storing and processing the metadata for all the calls in the U.S. is actually rather trivial, according to Jack Norris, chief marketing officer at MapR Technologies Inc., a company that provides commercial-grade services based on open source database technology such as Hadoop, originally developed by Google Inc. ‘This amount of data is easily analyzed on a MapR Hadoop cluster,’ Mr. Norris said in an email. He assumed, in his calculation, that there are 250 million teenagers and adults in the U.S., each making an average of 10 calls a day, or 2.5 billion calls in total. He also assumed that the average call data record is 2,000 kilobytes. That means all the calls records take up five terabytes worth of storage.”
That would be per day. The storage costs, the Journal quoted Gartner analyst David Cappuccio as saying, would be 46.8 million — 20% less if open source technologies were used.
Reportedly, the data is being stored and analyzed in a $2 billion NSA data center in Utah, code-named Bumblehive (a double allusion to Utah’s large LDS population, which uses the bee as its symbol – to the extent that state highway road signs are beehive-shaped). The 1 million sq. ft. facility is thought to use flash storage for improved performance. Some reports said the center was due to be completed in March of this year, others this fall, while others said it could be as far away as 2016.
Interestingly, this isn’t the first time the NSA Utah data center has come up. The Salt Lake City Tribune reported on the data center as long ago as July, 2009. “The enormous building, which will have a footprint about three times the size of the Utah State Capitol building, will be constructed on a 200-acre site near the Utah National Guard facility’s runway.”
Numerous other sites have reported on the progress of the Utah data center over the years since then, along with data centers from vendors such as Oracle and eBay being built there – not to mention Twitter, which would be convenient for NSA monitoring of its data. (And hey! The NSA data center is reportedly compliant with the silver-level Leadership in Energy and Environmental Design specifications in sustainable development!) In fact, many states and small cities love such data center developments for the construction jobs (as many as 10,000 for the NSA site) and other economic development benefits they bring.
According to the Tribune, the Baltimore Sun had reported in 2006 that the NSA was forced to move outside the Beltway area because it had maxed out the local power grid. The Utah data center will reportedly use up to 65 megawatts of power — or as much as the entire city of Salt Lake City itself. The facility also asked to be annexed into the nearby city of Bluffdale to ensure it would have an adequate water supply for cooling the computers.
In fact, the state of Utah passed a law earlier this year that would enable it to add a new 6% tax to the power used, which could raise up to $2.4 million annually on the expected power costs of $40 million. The NSA is apparently quite put out at the potential additional charges.
Courts have been ruling one way or another in the past few years about whether someone accused of encrypting incriminating information needs to reveal the encryption key to law enforcement. Now we actually have a case where the judge reversed himself.
In April, Judge William Callahan ruled that Jeffrey Feldman — a Wisconsin software engineer accused of possessing child pornography, and who had 16 storage devices, nine of which were encrypted — did not have to reveal his encryption key, saying it would violate his Fifth Amendment right against self-incrimination.
But this week, Judge Callahan reversed himself. His original ruling had been based on there not being enough evidence tying Feldman to child pornography or the disk drives in question. However, prosecutors were able to decrypt one of the drives — out of a total of almost 20 TB of storage — and reportedly found some 700,000 child pornography files, along with enough personal information about Feldman to tie him to the disk. This was enough to persuade the judge to change his ruling.
This is part of a continuing process where courts are trying to figure out what an encryption key is, legally speaking. Is it a physical thing, like a key to a lockbox, which is not protected by the Fifth Amendment? Or is it like the a combination to a safe — the “expression of the contents of an individual’s mind” — which is protected? In some countries, people have even been jailed for refusing to reveal an encryption key.
This case, like most of the other ones regarding revealing encryption keys, has to do with child pornography, which adds another nuance to the issue. Are law enforcement and the legal profession more likely to push the envelope of legal search because they so badly want to catch child pornographers? Or because they think people will be less likely to criticize their methods because the crime is so heinous? (Or as Mike Wheatley put in his blog, Silicon Angle, about the original case, “Data Encryption Makes Perverts Untouchable.“)
“That’s also the whole point of the Bill of Rights: ‘mere suspicion’ is not enough to let the government search your premises and invade your privacy; the government needs actual evidence of wrongdoing before it can interfere with your life,” countered Jennifer Abel, in the Daily Dot, about the April case. “Nowhere in the text of the U.S. Constitution does it say ‘All rights listed herein may be suspended, if cops suspect you did something really really bad.'”
Legal experts expect the issue will eventually be decided by the Supreme Court.