If you’d invested $1,000 in Box’s IPO last Friday, you’d have $1554.29 now.
At least, as I write this. Who knows. It may have gone up another 50 percent by now.
After what seems like years (and it may be almost exactly a year, given that the initial filing was seekrit, though the formal filing was on March 24) and the on-again off-again IPO as the stock market waned and rose, the cloud storage company finally went public. After estimating it would be priced at $11 to $13 a share, the company decided on $14, but never mind; it blew through that on the first day, gaining 66 percent (after opening 44 percent higher at $20.20), and is currently plotzing around in the low $20s.
It turns out that Box needed to go public sometime this year; a $150 million funding round from last summer would have imposed fines if the company didn’t do so. (Also, TIL that Box was originally funded on poker winnings. Seems appropriate.)
On the other hand, Shawn Tully of Fortune points out that if Box had priced its offering at $20 or so in the first place, it would have made another $120 million; it chose to forego that in favor of having a big attention-getting pop.
Well, that worked. “Call your broker immediately!” advised Mad Money host Jim Cramer, though he was thinking in terms of $18 or less per share. Still, he thinks it’s going to go higher.
So now there’s two questions. The first is, can they keep it up? The second is, what next?
As far as the first question, well, that’s the rub, isn’t it? “Many remain skittish about the company’s precarious financial health and ability to compete in an increasingly crowded pool of rivals,” writes the San Jose Mercury News. “Although it has spent the last year reining in spending, Box is still burning through cash and spending far more to acquire customers — through marketing and other means — than many of those customers initially pay for Box’s services, some experts say.”
Even stock boosters point out things like, “Box is at the forefront of cloud sharing and collaboration, only rivaled by a few products from Google, Microsoft and Cisco.”
Those are big opponents to have.
Analysts say that Box’ ability to keep it up will depend on how it does not on the cloud storage service per se, but on the tools it is adding to the service. The problem with depending just on the service itself is that “larger competitors have moved in with offerings that are often significantly cheaper,” writes the New York Times.
And getting cheaper all the time. Microsoft, for example – following the lead of some of its other products, such as Internet Explorer – is offering its OneDrive service for free. And as we all know from Internet Explorer, if it’s free, it doesn’t need to be as good – just good enough.
Others expect Box to be bought. “Despite Box’s initial success on Wall Street, I remain skeptical of its long term viability as an independent company,” writes Forbes’ Kurt Marko, noting that only 10 percent of the company’s customers actually pay for the service. “In the months since my initial column on Box’s IPO, I believe events support my thesis that ‘cloud storage and file sharing isn’t a product, it’s a feature.’”
For now, there’s this: Box was valued at $2.4 billion in July, and had dropped to $1.67 billion at the opening, but after opening day was worth $2.78 billion. The company also raised at least $175 million and as much as $201.3 million if bankers exercise options to sell more shares, the Mercury News writes.
So, what next? Disappointed investors are already asking about the “next Box.” And after the company’s stellar debut, no doubt the “increasingly crowded pool of rivals” is considering its own moves. “That optimism may very well spill over to the entire sector of cloud-computing companies, which have drawn skepticism from investors over their financial vitality,” the Mercury News wrote in its followup story.
“Box is blazing a trail in terms of being the first company in this space to go public,” Anthony Foy, CEO of Workshare, a UK-based file sharing and collaboration company with an office in San Francisco, told the Mercury News. “Box going public … establishes that this is a many multi-million dollar marketplace that we are competing in.”
Ultimately, what could end up happening is that while Box itself might flame out and die, it could do so while being a trailblazer for other companies in the industry.
“If you are looking for good drive at a good value, it’s hard to beat the current crop of 4 TB drives from HGST and Seagate.” On the other hand, if you’ve got a 1.5 TB or 3.0 TB Seagate Barracuda, you might want to stop reading this, go back it up, and replace it.
That’s the conclusion from the most recent BackBlaze data about the failure rates of the 17 varieties of disk drives it uses.
BackBlaze, in case you’re not aware, is a backup service that, instead of using real real big storage, uses a whole whole lot of commodity storage devices hooked together into “pods,” with as much of the extraneous stuff stripped off as possible. This reduces costs and is more scalable than large storage systems that require forklift upgrades to be expandable. Companies such as Netflix are using it as well, and several vendors have started selling storage systems based on the Backblaze designs. While the company occasionally has trouble finding commodity disk drives, in general the system works pretty well.
Because BackBlaze uses a whole whole lot of commodity storage, it is in a good position to judge performance and failure rates of these commodity drives, as opposed to those of us who buy one or two every couple of years.
Some commenters pointed out, in various degrees of politeness, that BackBlaze is not a typical user. But absent a Consumer Reports study, a company that uses 41,213 of a thing can generally be thought of as having a reasonable idea of the quality of the thing. Or, as one commenter puts it, “Much more helpful than a guy/gal saying ‘I used this drive for a week and I give it 5 stars!’”
Plus, BackBlaze is pretty good about releasing its data in periodic blog posts. “As far as I know (and please educate me if I’ve missed one), there’s no other mass studies of hard drives that have been released to the public, naming specific brand names and models,” writes one commenter. “Google has a 2007 white paper on the topic, but like Backblaze’s, it’s based off of their data centers, plus they didn’t reveal names and models. While Backblaze’s data center doesn’t directly equate to your home PC’s usage, they have done one thing that’s super useful — gather a statistically significant amount of data in a relatively variable controlled environment.”
All that said, what about the results? At this point, BackBlaze has migrated many of its storage pods to 4.0 TB drives, writes Brian Beach, distinguished engineer. Part of this migration is due to what the company says is lower reliability of 3.0 TB drives. “The HGST Deskstar 5K3000 3 TB drives have proven to be very reliable, but expensive relative to other models (including similar 4 TB drives by HGST),” he writes. “The Western Digital Red 3 TB drives annual failure rate of 7.6% is a bit high but acceptable. The Seagate Barracuda 7200.14 3 TB drives are another story.”
Which gets back to the advice in the first paragraph. While the average failure rate of most of the disk drives the company has is in the single digits, two drives show double-digit failure rates: the 1.5 TB Seagate Barracuda 7200.11, with an average age of 4.7 years, and the 3.0 TB Seagate Barracuda 7200.14, with an average age of 2.2 years.
Frustratingly, BackBlaze doesn’t say what the problem is with the Seagate drives, indicating only that it will write about it in a future blog post. The good news is that the company reports it isn’t having the same problem with the Seagate 4.0 drives. “The Seagate Desktop HDD.15 has had the best price, and we have a LOT of them,” Beach writes. “Over 12 thousand of them. The failure rate is a nice low 2.6% per year.”
Seagate’s perspective is that BackBlaze is using commodity consumer drives for enterprise purposes, so naturally they’re going to fail more often. (Confirmation bias, but commenters went on to largely concur with BackBlaze’s experience, noting also that the other drives were running under the same condition.)
Moreover, Seagate’s 4.0 TB drives appear to be more reliable than the 3.0 TB drives, Beach adds. “You might ask why we think the 4 TB Seagate drives we have now will fare better than the 3 TB Seagate drives we bought a couple years ago. We wondered the same thing,” he writes. “When the 3 TB drives were new and in their first year of service, their annual failure rate was 9.3%. The 4 TB drives, in their first year of service, are showing a failure rate of only 2.6%. I’m quite optimistic that the 4 TB drives will continue to do better over time.”
So how did those 6 TB drives do in terms of reliability? It’s a little early to tell, Beach writes. “Currently we have 270 of the Western Digital Red 6 TB drives. The failure rate is 3.1%, but there have been only 3 failures. The statistics give a 95% confidence that the failure rate is somewhere between 0.1% and 17.1%. We have just 45 of the Seagate 6 TB SATA 3.5 drives, although more are on order. They’ve only been running a few months, and none have failed so far.”
That said, because Western Digital drives use a little less electricity – “This small difference adds up when you place 45 drives in a Storage Pod and then stack 10 Storage Pods in a cabinet,” notes director of cloud storage Andy Klein – and load a little faster, the company is primarily going to migrate to the Western Digital 6 TB drives. However, it will still buy some of the Seagate ones for diversification purposes, he adds.
Meanwhile, the company is already testing pods made of 8 TB drives.
Disclaimer: I am a BackBlaze customer.
A Democratic Senator from Oregon is attempting to prevent government agencies from requiring vendors to build “back doors” into their software and electronic products by playing two kinds of security fears against each other. Sen. Ron Wyden introduced the Secure Data Act earlier this month.
People supporting such “back doors” say they are necessary to help protect Americans from terrorists and other criminals. FBI director James Comey, among other law enforcement officials, called for them after vendors such as Apple and Google implemented encryption on their smart phones by default. But Wyden is saying such “back doors” also make it easier for hackers to break in – an increasingly major issue in the past year.
And Wyden isn’t just speculating about the possibility; he cited an incident in 2005 where “an unknown entity had exploited a ‘lawful intercept’ capability built into Greek cellphone technology and had used it to listen to users’ phone calls” — including those of dozens of senior government officials.
“Unfortunately, there are no magic keys that can be used only by good guys for legitimate reasons,” Wyden wrote in an op-ed supporting the bill. “There is only strong security or weak security.”
“Security is a lot like a ship at sea,” agreed Alan McQuinn, a research assistant with the Information Technology and Innovation Foundation, in a blog post in The Hill. “The more holes you put in the system—government mandated or not—the faster it will sink.” Just a few years ago, the FBI was encouraging Americans to use encryption to better protect their data, he noted.
Another major issue in the past year has been revelations about agencies spying on Americans, which Wyden said is eroding trust in the government. “Strong encryption and sound computer security is the best way to keep Americans’ data safe from hackers and foreign threats. It is the best way to protect our constitutional rights at a time when a person’s whole life can often be found on his or her smartphone. And strong computer security can rebuild consumer trust that has been shaken by years of misstatements by intelligence agencies about mass surveillance of Americans,” he said in a statement.
Requiring back doors would also make U.S. companies less able to sell their products outside the U.S., Wyden noted. This could exacerbate problems that vendors such as cloud storage companies are already having outside the U.S. due to agencies using the courts to claim access to such data, even when it’s outside the U.S.
Wyden isn’t alone. The Hill noted that there was bipartisan opposition to Comey’s proposal, which he said didn’t call for a back door but a “front door with clarity and transparency.” But security experts dismissed that as a semantical difference. “The notion that it’s not a backdoor; it’s a front door — that’s just wordplay,” Bruce Schneier, a computer security expert and fellow at the Berkman Center for Internet & Society at Harvard University, told The Hill. “It just makes no sense.”
Nothing happened with the bill in the lame duck Congress, but Wyden reportedly expects to introduce it in the new Congress in 2015. Lily Hay Newman notes in Slate, however, that such bills have typically faced an uphill battle. For example, a similar measure was passed on the House side earlier this year, but funding for it was stripped from the “cromnibus” bill. It also is expected to be reintroduced next year.
Moreover, the Secure Data Act doesn’t prohibit back doors—it just prohibits agencies from mandating them, Newman writes. “There are a lot of other types of pressure government groups could still use to influence the creation of backdoors, even if they couldn’t flat-out demand them.” There are other weaknesses in the bill as well, notes the Electronic Frontier Foundation.
On the other hand, this isn’t Wyden’s first cybersecurity rodeo; he also essentially singlehandedly killed two bills in the past several years that the computer industry said could give the government too much control over the Internet, as well as worked on other Internet control issues.
Microsoft is continuing its fight with the U.S. government regarding access to data located on the company’s servers outside the U.S. And this time, it brought some friends.
28 major companies, including Microsoft competitors such as Apple, Amazon, and AT&T (but not Google, surprisingly enough), filed friend of the court briefs on December 15, after Microsoft formally appealed the ruling on December 8. Other organizations filing briefs of support include the U.S. Chamber of Commerce, CNN, ABC, Fox News, the Guardian, and Verizon.
Altogether, ten briefs were signed by 28 leading technology and media companies, 35 leading computer scientists, and 23 trade associations and advocacy organizations “that together represent millions of members on both sides of the Atlantic,” noted Microsoft legal counsel Brad Smith. Signatories also included nonprofit organizations such as the Center for Democracy & Technology, the American Civil Liberties Union, the Electronic Frontier Foundation, the Brennan Center for Justice at New York University School of Law, and the Berkman Center for Internet & Society at Harvard.
If upheld, the decision “allows the government to adopt a ‘seize first, search later’ view of the Fourth Amendment, where the government can seize a computer, copy all of its data, and keep that information indefinitely—without a search warrant at all,” writes the EFF in explaining its support.
Why do news organizations such as CNN and ABC care? Because they want to protect their reporters and sources, Smith writes. “These organizations are concerned that the lower court’s decision, if upheld, will erode the legal protections that have long restricted the government’s ability to search reporters’ email for information without the knowledge of news organizations,” he writes.
In addition, the Irish government also stepped in, saying the ruling violated its sovereignty, as did a German representative to the European Parliament.
In case you’ve missed it, a judge ruled in May that a search warrant with which it was served also applied to data on servers in data centers in Dublin, Ireland. (The exact person and crime has not been revealed, but it is reportedly drug-related.) Microsoft is protesting this ruling. Another U.S. judge reiterated this decision in August.
There’s more than just data at stake. The ruling means that the U.S. government lays claim to any data owned by a U.S. company, no matter where in the world it is located — such as in the cloud on servers in another country. This has the potential to conflict with privacy laws in other countries, as well as makes it a lot less likely that customers outside the U.S. will be willing to put their trust into U.S.-based cloud companies. In addition, it opens the door for non-U.S. governments to make their own data demands of countries operating within their borders.
Microsoft’s appeal wasn’t a surprise; in fact, the company had said in May that it intended to appeal the decision. Several other U.S. companies had also announced their support of Microsoft in August, since the decision has such wide-ranging effects.
The notion of data sovereignty has been discussed for several years, and in fact Microsoft’s Dublin data center had been specifically cited as an example, before this case came up. “Microsoft, like other cloud providers, will need to clarify data sovereignty issues, if Office Live is to be taken seriously,” wrote Computerweekly presciently in June, 2011. “While it does have a datacentre in Dublin – so it can guarantee data resides in the EU – Microsoft is headquartered in the US and will be subject to US legislation, such as Homeland Security, as well as UK and EU law.”
Ireland, in its brief, indicated that it wasn’t unwilling to grant the U.S. government access to the data in question, but that the mechanism for doing so was the Mutual Legal Assistance Treaty (MLAT) between Ireland and the United States, and that it was up to the U.S. to ask first, not Ireland to stop the U.S. from taking the data. “Ireland respectfully asserts that foreign courts are obliged to respect Irish sovereignty (and that of all other sovereign states) whether or not Ireland is a party or intervener in the proceedings before them,” the brief warned, before going on to hint, “Ireland would be pleased to consider, as expeditiously as possible, a request under the treaty, should one be made.”
In addition, Jan Philipp Albrecht, a Member of the European Parliament (“MEP”) from Germany, filed his own brief urging the U.S. to use the MLAT mechanism, and warning that failing to do so could make it more difficult for European and U.S. companies to work together.
“European citizens are highly sensitive to the differences between European and U.S. standards on data protection. Such concerns are frequently raised in relation to the regulation of cross-border data flows and the mass-processing of data by U.S. technology companies,” Albrecht writes. “The successful execution of the warrant at issue in this case would extend the scope of this anxiety to a sizeable majority of the data held in the world’s datacenters outside the U.S. (most of which are controlled by U.S. corporations) and would thus undermine the protections of the EU data protection regime, even for data belonging to an EU citizen and stored in an EU country.”
Legal argument is expected this spring or summer, according to the EFF.
Political wonks got something to salivate over earlier this month when former Florida governor and presumed Republican presidential candidate Jeb Bush announced that in the spring he intended to release some 250,000 email messages from his time as Florida governor, to demonstrate his transparency. Whether it’s actual transparency, or using a firehose to produce a document dump in hopes that the Internet will be so overwhelmed that they won’t find the good stuff, remains to be seen.
Not to mention overwhelmingly bored. Can you imagine being a reporter and having to read 250,000 email messages? Think of how banal most of your email communication is. Snore.
This isn’t a political blog, so we’re not going to go into the political ramifications of why Bush might be doing this and what might be in there. It is, however, a technical blog, so we’re going to talk about some of the technical issues around this that the, perhaps, less-technical reporters don’t appear to have brought up yet and may not be aware of. It’s certainly an interesting announcement, particularly in contrast with some of the other governors who have behaved badly with their email. At the same time, it’s important that we not be so dazzled by the announcement that we fail to give it due diligence.
- Will he really release all of the email? Bush was governor of Florida from 1999 to 2007. Granted, that was a while ago, before everyone and his brother (no pun intended) used email for everything, but even so, 250,000 messages for that many years sounds awfully small. Shoot, I’ve got more than that in my Gmail account. Bush did say “all,” but is he curating them in some way? Is this really going to be every-every-everything? And if not, is there going to be a mechanism for getting every-every-everything? Or is everything else gone? And given that some governors have wiped their systems when they left office – and even bought all the disk drives of the office computers – how is it that the Florida email messages from 1999 are still around?
- Did he ever use any unofficial or personal email address? As we’ve learned, governors from states such as Wisconsin and Alaska – not to mention the current Governor of Florida — have used personal email addresses for state business, potentially to avoid having their statements retrieved later. To what degree did Governor Bush do that?
- What format will it be in? As of yet, none of the articles have indicated what email system was in use in the Florida governor’s office at the time. Are we talking about a decade-old .pst file? Bush indicated it would be on a website; what will the interface to it be? Will people be able to look at the entire corpus at once, only one message at a time, or what? Will it be threaded? Several of the articles quoted Bush saying that some of the email was funny, some of it was sad, and some of it was serious; is it going to be available under categories such as Funny, Sad, or Serious rather than by date or other subject? How it’s set up could certainly limit the degree to which people could use it – though presumably there’ll be a crowdsourcing army to read the messages in any way they’re provided and let the world know if there’s anything juicy.
- Will it be full-text searchable? Anybody who’s tried to look at some older documents, such as some .pdfs, has run into times when the document is saved only as an image. That’s going to make it pretty tough to do any sort of reasonable search – not to mention answering questions like, “What word does Jeb Bush use in his email the most?”
- So, where is this email now? Who owns it? Who controls it? It’s not the Governor himself, is it? Presumably he just got a copy? Is there someone who’s going to be able to confirm that yes, the release is an accurate and complete release of what the state has?
- How is it that the Governor has it in the first place? Did he just go, welp, leaving office, let me just burn a copy of eight years of my email to a thumb drive and I’ll be on my merry way? Did the state save it and he asked them a while back – and when was that? – for a copy? In what sort of format does he have it, and what sort of searching mechanisms does he have? (The Chicago Tribune quoted Florida legal experts as saying the state’s public-records law requires the release of Bush’s emails and other correspondence as governor, “with few exceptions,” on which it didn’t elaborate.)
- Is personal information going to be redacted? One of the things that makes it difficult for governments to release email is that it can include personally identifiable information. Is someone going through all 250,000 email messages and putting big black marks through things? If not, isn’t that kind of risky? If so, then how do we know what sort of information we’re going to be missing?
- Is the metadata going to be in there? Are we going to be able to see headers? Email addresses? Datestamps? IP addresses? What are the chances that this is going to get anybody else in trouble?
If nothing else, this certainly sets a high bar for other government officials, particularly gubernatorial Presidential candidates, regarding their release of information in the future, pundits note.
If you’re anything like me, between your various smartphones, cameras, video cameras, and tablets – not to mention all the similar devices owned by various members of your family or your coworkers – you’re collecting quite a little pile of micro SD cards. I’m old enough to still be charmed that I can spend $15 and get 32 gigabytes in something the size of my pinky nail. (And if you want to spend more money, you can get 128 GB for $100.) Why, I remember when a 10-megabyte drive was the same size as my desktop computer and cost the same amount.
Sorry. Where was I? Have you seen my cane?
Anyway, I’m not the only person who’s accumulating a pile of the things, and I expect I’m not the only one who lives in terror that the cat will knock over the pile or we’ll accidentally vacuum one up. And if you poke around message boards, it’s pretty easy to find professionals who are dealing with the same problem. For example, filmmakers who are trying to put together a movie end up with a handful of them, especially if they’d like to look at files on more than one at a time. Plus, the teeny tiny disks are too small to label, and sometimes you really don’t want to mix up some pictures with what you’re going to show to the PTA, know what I mean?
It turns out that there are some cases for the things, so you can at least store them in an adorable wee little briefcase rather than having them in an easily-knocked-over saucer or something. There’s also a manufacturer who makes a plastic case the size of a credit card that can hold ten of the things, plus an SD card adapter. Moreover, the cases come in colors, so you can put all the music on ones in a blue case and all the movies on ones in a red case, or whatever.
Now we’re talking.
But why stop there? We can’t label the micro SD cards themselves the way we used to with 3 ½” floppy disks. So why hasn’t some enterprising manufacturer started making the micro SD cards themselves in colors? Green for the CFO and red for the CIO and blue for the CEO, say? In an era when you can order an entire bag of green M&Ms if that’s what floats your boat, why do our micro SD cards mostly have to be black?
And while we’re suggesting things, what would it take to be able to read and write data to and from multiple micro SD cards at once? There are a couple of devices that give you access to up to four micro SD cards at the same time, but there doesn’t seem to be more than that. And apparently they work great if you’re trying to copy identical data to four micro SD cards at a time. But reports on these devices note that because they share a single input to your computer, reading and writing is slow, and some of them don’t let you swap cards in and out, so if you’ve got data on five or six cards, you’re hosed.
Come on, storage industry. I’m counting on you.
We wrote in February about the notion of legal and discovery issues around wearable technology such as Google Glass and smartwatches, and as an aside joked, “Not to mention the wealth of data preserved by a Fitbit.”
Little did we know.
In what is being billed as the first case of its kind, an attorney is using data from a client’s Fitbit as evidence of a client disability in a lawsuit.
For a client’s accident injury claim, the McLeod Law Office will start processing data from their client’s Fitbit to show that her activity levels are now under a baseline for someone of her age and profession, demonstrating her injury, according to Forbes. Lawyers for the office say they’re working on other similar cases as well.
And that’s just the beginning, attorneys say. In the same way that legal teams now comb Facebook pages looking for evidence that a supposedly disabled person is actually secretly swimming or hiking, they expect Fitbits and similar devices to be used to demonstrate that the person isn’t so sick or disabled as all that. The development could see insurance companies, for example, insisting that claimants undergo assessment via fitness tracker, Samuel Gibbs writes in the UK newspaper The Guardian.
“Privacy considerations aside—and there are many—wearables are yet another example of how technology may be a gold mine of potentially relevant ESI [electronically stored information] for use in litigation,” attorney Neda Shakoori wrote in August.
“Wearables data could just as easily be used by insurers to deny disability claims, or by prosecutors seeking a rich source of self-incriminating evidence,” writes Kate Crawford in the Atlantic. But there’s more. “In America, the Fifth Amendment protects the right against self-incrimination and the Sixth Amendment provides the right in criminal prosecutions ‘to be confronted with the witnesses’ against you,” she continues. “Yet with wearables, who is the witness? The device? Your body? The service provider? Or the analytics algorithm operated by a third party? It’s unclear how courts will handle the possibility of quantified self-incrimination.”
Not to mention, will legal firms be able to subpoena your cloud provider if that’s where your fitness data is stored? How much are they going to fight to protect you? If it’s stored in your phone, will you need to provide your password? And how does this all fit into HIPAA and other health information privacy rules? (Apple has already said, for example, that health data can’t be stored on the iCloud and that any health data on an iPhone has to be encrypted.)
“It might seem odd at first (particularly to non-tech savvy judges), but it’s no different than any other type of e-discovery that has come before,” writes attorney Keith Lee in Above the Law. “Another question is how will companies like Fitbit respond. There is no mention of providing access to your data in response to a legal inquiry in Fitbit’s Terms of Service. The client in the above matter is voluntarily providing her data to help her case, but what happens when Fitbit is subpoenaed to provide data? Will they push back, citing user privacy, or immediately comply?”
Two other factors make this issue even more problematic. First, wearables aren’t necessarily consistent in how they track your activity. “The Jawbone UP, Nike Fuelband, Fitbit, and Withings Pulse all have their own peculiarities in how they work: Some will count moving your arms around as walking (which is great if you want writing to count as exercise), others can’t easily register cycling as activity,” Crawford writes. “This ‘chaos of the wearable’ might be merely amusing or frustrating when you’re using the data to reflect on our own lives. But it can be perilous when that data is used to represent objective truth for insurers or courtrooms.”
Second, as in the McLeod case, not just the raw data is being used. “Now that data is being further abstracted by analytics companies that create proprietary algorithms to analyze it and map it against their particular standard of the ‘normal’ healthy person,” Crawford adds.
This also, as we discussed earlier, hypothetically makes Fitbits subject to discovery, meaning you can get in trouble for wiping or otherwise failing to preserve the data on it.
Nobody’s saying that people will be required to wear a Fitbit as proof of fitness, or lack thereof. Yet. But in the same way that some car owners and renters are required to install a GPS unit on their vehicle, could it be far behind?
The Department of Homeland Security announced, in a very low-key way, on November 19 that it was planning to delete “Master files and outputs of an electronic information system which performs information technology infrastructure intrusion detection, analysis, and prevention.” It gave people until December 19 to ask for copies of the plan, following standard National Archives and Records Administration protocol. After requesters receiver their copies, they have 30 days to comment.
According to Nexgov, what the agency is looking to delete are records more than three years old from its Einstein network monitoring system, which is intended to help DHS cybersecurity experts look for malware such as Heartbleed in government networks. This is making some security people happy, because they are concerned about the government keeping all these records. At the same time, it is making some security people sad, because they wonder if the government is trying to hide something by deleting the records.
“As a general matter, getting rid of data about people’s activities is a pro-privacy, pro-security step,” Nextov quoted Lee Tien, senior staff attorney with the Electronic Frontier Foundation, as saying. But “if the data relates to something they’re trying to hide, that’s bad,” he continued.
DHS says it wants to delete the data because since it’s three years old, it’s not useful anymore. (The agency still keeps incident reports.) Others disagree.”Some security experts say, to the contrary, DHS would be deleting a treasure chest of historical threat data,” writes Nextgov’s Aliya Sternstein. “And privacy experts, who wish the metadata wasn’t collected at all, say destroying it could eliminate evidence that the governmentwide surveillance system does not perform as intended.”
What’s causing some people to feel suspicious is that the rationale the agency is using to delete the data is the cost, which it estimates at $50 per month per terabyte. Given that you can get a 1-terabyte drive from Staples for less than that these days (yes, we know, there’s more to it than the hardware cost), this seems…excessive. On the other hand, some people are wondering just how much data DHS has that it’s a significant amount of money.
Data to be deleted includes email, contact and other personal information of federal workers and public citizens who communicate concerns about potential cyber threats to DHS; intrusion detection data; intrusion prevention data; analysis data such as files from the U.S. Computer Emergency Readiness Team (CERT); and a catch-all “information sharing” including data from white papers and conferences, Nextgov reports.
So what is Einstein? It is the result of automated processes that collect, correlate, analyze, and share computer security information across federal U.S. civilian agencies, according to BiometricUpdate. “By collecting information from participating federal government agencies, ‘Einstein’ builds and enhances cyber-related situational awareness,” writes Rawlson King. “The belief is that awareness can assist with identifying and responding to cyber threats and attacks, improve the government’s network security, increase the resiliency of critical, electronically delivered government services, and enhance the survivability of the Internet. The program provides federal civilian agencies with a capability to detect behavioral anomalies within their networks. By analyzing the data and detecting these anomalies, the ability to detect new exploits and attacks in cyberspace are believed to be greatly increased.”
That said, this is all happening against a background of other changes in DHS involving cybersecurity that are making some people nervous.
- Brendan Goode, the director of the Network Security Deployment division in the Office of Cybersecurity and Communications (CS&C) who built the Einstein system, announced earlier in November that he was leaving for the private sector, according to Federal News Radio. While his last day was scheduled to be November 21, he hadn’t yet announced where he was going, nor has he updated his LinkedIn page.
- After its initial setup in 2004, Einstein is now on its third implementation and has agreements with 15 out of the 23 agencies expected to sign up for it (out of nearly 600 agencies, according to RT.com), and implementations with 9 of them, all at a cost of hundreds of millions of dollars.
- Due to incidents such as Heartbleed — where DHS had to wait up to a week for agency approvals, all while news of the vulnerability was out in the wild — the DHS now has the authority, as of October, to proactively monitor federal networks for vulnerabilities without having to wait for agency permission. “Agencies must provide DHS with an authorization for scanning of Internet accessible addresses and systems, as well as provide DHS, on a semiannual basis, with a complete list of all internet accessible addresses and systems, including static IP addresses for external websites, servers and other access points and domain name service names for dynamically provisioned systems. Agencies must give DHS at least five days advanced notice of changes to IP ranges as well. Further, agencies must enter into legal agreements for the deployment of DHS’s EINSTEIN monitoring system, provide DHS with names of vendors who manage, host, or provide security for Internet accessible systems, including external websites and servers, and ensure that those vendors have provided any necessary authorizations for DHS scanning of agency systems,” summarized FedWeek.
- On the other hand, contractor vendors aren’t exactly leaping to be included.
It isn’t clear how much DHS was hoping that this would all be lost in the shuffle around the holidays. Presumably organizations such as the EFF and Nextgov have filed requests for the plans, and will follow up. If it’s the sort of thing you might feel the need to comment on, however, it might be a good idea to make your own request, if comments are limited to people who request the documents.
You may recall that in June, the Interwebs were burning up with the story about former director of exempt organizations for the IRS Lois Lerner, and how something like two years’ worth of email messages — conveniently covering a period of time under Congressional investigation — were unavailable because employees could only store 500 mb of email, backup tapes were only saved for six months, and her computer had crashed, wiping out her hard disk drive. While not everyone thought it was a coverup on the order of the missing 18 minutes on the Watergate tapes, few would argue that it was no way to run a railroad.
Now, it turns out that the IRS might have backup copies of the email messages after all — but retrieving them is likely to take a lot of time and money.
We’re not going to get into the politics of the investigation. As before, we’re just interested in this as a government IT problem — and it’s a dilly.
In the fine tradition of Taking Out the Trash Day — and like the original announcement of the missing email messages itself — this news was released on the Friday afternoon before Thanksgiving.
“The U.S. Treasury Inspector General for Tax Administration (TIGTA) informed congressional staffers from several committees on Friday that the emails were found among hundreds of “disaster recovery tapes” that were used to back up the IRS email system,” reports the Washington Examiner. As many as 30,000 email messages could be found.
Finding them might take a while, and technical details of exactly what’s going on are sketchy. Most of the coverage is in the mainstream or right-wing media, which isn’t necessarily all that tech-savvy to begin with. Moreover, they’re also quoting Congressmen and their staffs, who aren’t exactly technical experts either. And while there may be technical people explaining more detail in comments on the stories, finding those comments among the hundreds railing about “libtards” and “Obummer” and “Benghazi” is more difficult than finding Lerner’s messages on the tapes themselves.
So here’s what’s happened since the original story in June.
In August, a representative from a watchdog organization called, appropriately, Judicial Watch told Fox News that it had heard from a Justice Department official that there were backup tapes “in case something terrible happened in Washington” and that Lerner’s email messages might be on those tapes. Congressional representatives wrote to the IRS in September asking about those. But court documents filed in October said there was no such thing beyond the standard disaster recovery tapes that were overwritten every six months, although it did agree there were server backups, which were being examined by TIGTA.
(Lerner also had a Blackberry that was replaced in February 2012, and while Judicial Watch felt that some of the email messages might be on that older Blackberry, it had been destroyed when it was replaced.)
Now, apparently some backups have been found. Where exactly these tapes came from is not clear. Are they different from the tapes that are supposedly recycled every six months? If so, where did they come from? Or did that recycling not occur? If not, why not?
Wherever the tapes themselves came from, here’s some of the problems in finding the missing messages.
- The 30,000 email messages are scattered among 250 million email messages on 744 disaster recovery tapes, according to the Washington Examiner.
- Moreover, finding the actual messages could take a while because it could take weeks to learn their content “because they are encoded,” according to Fox News, quoting Frederick Hill, a spokesman for Republicans on the Oversight committee. Does “encoded” mean “encrypted”? Or is this simply referring to the encoding the email messages have to work with the email program?
- Before the messages can be released, any personally identifiable information in them about individual taxpayers has to be redacted.
- Even when the messages are tracked down, investigators may find that they’re simply duplicates of the 24,000 messages they already have already located, such as by getting copies from the people with whom Lerner had exchanged email, reports The Hill.
Ironically, what might have saved the messages was budget cuts. The Washington Examiner reported in September that some 760 “exchange servers”[sic; do they mean Microsoft Exchange email servers?] — which were supposed to have been destroyed two years previously — might have been spared due to budgetary constraints. It isn’t clear whether these tapes come from those servers, or if the examination of those servers is complete; there could be further revelations forthcoming.
Anyone who’s had a hard drive fail just as they were about to do a backup on it, honest! will understand how much we’d all like to know when our hard disks are about to fail.
Some time ago (between 1995 and 2004, depending on how you count), a standard was developed called Self-Monitoring, Analysis and Reporting Technology (SMART, get it?) that was intended to help with this problem.
Unfortunately, like many other technologies, its user experience was not the best. SMART defines — and measures, for those vendors that support it — more than 70 characteristics of a particular disk drive. But while it’s great to know how many High Fly Writes or Free Fall Events a disk has undergone, these figures aren’t necessarily useful in any real sense of being able to predict a hard drive failure.
Part of this is because of the typical problem with standards: Just because two vendors implement a standard, it doesn’t mean they’ve implemented it in the same way. So the way Seagate counts something might not be the same way as Hitachi counts something. In addition, vendors might not implement all of the standard. Finally, in some cases, even the standard itself is…unclear, as with Disk Shift, or the distance the disk has shifted relative to the spindle (usually due to shock or temperature), where Wikipedia notes, “Unit of measure is unknown.”
That’s not going to be helpful if, for example, one vendor is measuring it in microns and one in centimeters.
There have been various attempts at dealing with this problem of figuring out which of these statistics are actually useful. One in particular was a paper presented at 2007 Usenix by three Google engineers, “Failure Trends in a Large Disk Drive Population.” What was interesting about Google is that it used enough hard drives to be able to develop some useful correlations between these 70-odd (and some of them are very odd) measurements and actual failure.
Now there’s sort of an update to that paper, but it uses littler words and is generally more accessible to people. It’s put out by Brian Beach, an engineer at BackBlaze; we’ve written about them before. Like Google, their insights into commodity hard disk drives are useful, simply because they use so darn many of them.
What BackBlaze has done this time is look at all the drives they have that have failed, and then go back and look at all their SMART statistics, and then correlate them. The company also looked at how different vendors measure these different statistics, so they have a good idea about which statistics are relatively common across vendors. This gives us a better idea of which statistics we should actually be paying attention to.
As it turns out, there’s really just one: SMART 187 – Reported_Uncorrectable_Errors.
“Number 187 reports the number of reads that could not be corrected using hardware [Error Correcting Code] ECC,” BackBlaze explains. “Drives with 0 uncorrectable errors hardly ever fail. Once SMART 187 goes above 0, we schedule the drive for replacement.”
Interestingly, this particular statistic isn’t even mentioned in the Google paper, nor is it called out in the Wikipedia entry for SMART as being a potential indicator of imminent electromechanical failure.
BackBlaze also discusses its results with several other statistics, and explains why it doesn’t find them useful. Finally, for the statistics wonks among you, the company also published a complete list of SMART results among its 40,000 disk drives. (And for some, that’s still not enough; in the comments section, people are asking BackBlaze to release the raw data in spreadsheet form.)
In addition to giving us one useful stat to look at rather than 70 un-useful ones, this research will hopefully encourage hardware vendors to work together to report their statistics more meaningfully, and for software vendors to develop better, more useful tools to interpret the statistics.
Disclaimer: I am a BackBlaze customer.