Yottabytes: Storage and Disaster Recovery

December 27, 2014  6:33 PM

Microsoft Brings Its Homies to Fight Ruling

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Apple, Microsoft

Microsoft is continuing its fight with the U.S. government regarding access to data located on the company’s servers outside the U.S. And this time, it brought some friends.

28 major companies, including Microsoft competitors such as Apple, Amazon, and AT&T (but not Google, surprisingly enough), filed friend of the court briefs on December 15, after Microsoft formally appealed the ruling on December 8. Other organizations filing briefs of support include the U.S. Chamber of Commerce, CNN, ABC, Fox News, the Guardian, and Verizon.

Altogether, ten briefs were signed by 28 leading technology and media companies, 35 leading computer scientists, and 23 trade associations and advocacy organizations “that together represent millions of members on both sides of the Atlantic,” noted Microsoft legal counsel Brad Smith. Signatories also included nonprofit organizations such as the Center for Democracy & Technology, the American Civil Liberties Union, the Electronic Frontier Foundation, the Brennan Center for Justice at New York University School of Law, and the Berkman Center for Internet & Society at Harvard.

If upheld, the decision “allows the government to adopt a ‘seize first, search later’ view of the Fourth Amendment, where the government can seize a computer, copy all of its data, and keep that information indefinitely—without a search warrant at all,” writes the EFF in explaining its support.

Why do news organizations such as CNN and ABC care? Because they want to protect their reporters and sources, Smith writes. “These organizations are concerned that the lower court’s decision, if upheld, will erode the legal protections that have long restricted the government’s ability to search reporters’ email for information without the knowledge of news organizations,” he writes.

In addition, the Irish government also stepped in, saying the ruling violated its sovereignty, as did a German representative to the European Parliament.

In case you’ve missed it, a judge ruled in May that a search warrant with which it was served also applied to data on servers in data centers in Dublin, Ireland. (The exact person and crime has not been revealed, but it is reportedly drug-related.) Microsoft is protesting this ruling. Another U.S. judge reiterated this decision in August.

There’s more than just data at stake. The ruling means that the U.S. government lays claim to any data owned by a U.S. company, no matter where in the world it is located — such as in the cloud on servers in another country. This has the potential to conflict with privacy laws in other countries, as well as makes it a lot less likely that customers outside the U.S. will be willing to put their trust into U.S.-based cloud companies. In addition, it opens the door for non-U.S. governments to make their own data demands of countries operating within their borders.

Microsoft’s appeal wasn’t a surprise; in fact, the company had said in May that it intended to appeal the decision. Several other U.S. companies had also announced their support of Microsoft in August, since the decision has such wide-ranging effects.

The notion of data sovereignty has been discussed for several years, and in fact Microsoft’s Dublin data center had been specifically cited as an example, before this case came up. “Microsoft, like other cloud providers, will need to clarify data sovereignty issues, if Office Live is to be taken seriously,” wrote Computerweekly presciently in June, 2011. “While it does have a datacentre in Dublin – so it can guarantee data resides in the EU – Microsoft is headquartered in the US and will be subject to US legislation, such as Homeland Security, as well as UK and EU law.”

Ireland, in its brief, indicated that it wasn’t unwilling to grant the U.S. government access to the data in question, but that the mechanism for doing so was the Mutual Legal Assistance Treaty (MLAT) between Ireland and the United States, and that it was up to the U.S. to ask first, not Ireland to stop the U.S. from taking the data. “Ireland respectfully asserts that foreign courts are obliged to respect Irish sovereignty (and that of all other sovereign states) whether or not Ireland is a party or intervener in the proceedings before them,” the brief warned, before going on to hint, “Ireland would be pleased to consider, as expeditiously as possible, a request under the treaty, should one be made.”

In addition, Jan Philipp Albrecht, a Member of the European Parliament (“MEP”) from Germany, filed his own brief urging the U.S. to use the MLAT mechanism, and warning that failing to do so could make it more difficult for European and U.S. companies to work together.

“European citizens are highly sensitive to the differences between European and U.S. standards on data protection. Such concerns are frequently raised in relation to the regulation of cross-border data flows and the mass-processing of data by U.S. technology companies,” Albrecht writes. “The successful execution of the warrant at issue in this case would extend the scope of this anxiety to a sizeable majority of the data held in the world’s datacenters outside the U.S. (most of which are controlled by U.S. corporations) and would thus undermine the protections of the EU data protection regime, even for data belonging to an EU citizen and stored in an EU country.”

Legal argument is expected this spring or summer, according to the EFF.

December 23, 2014  10:07 PM

8 Geeky Questions About Jeb Bush’s Email Dump

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Email, Storage

Political wonks got something to salivate over earlier this month when former Florida governor and presumed Republican presidential candidate Jeb Bush announced that in the spring he intended to release some 250,000 email messages from his time as Florida governor, to demonstrate his transparency. Whether it’s actual transparency, or using a firehose to produce a document dump in hopes that the Internet will be so overwhelmed that they won’t find the good stuff, remains to be seen.

Not to mention overwhelmingly bored. Can you imagine being a reporter and having to read 250,000 email messages? Think of how banal most of your email communication is. Snore.

This isn’t a political blog, so we’re not going to go into the political ramifications of why Bush might be doing this and what might be in there. It is, however, a technical blog, so we’re going to talk about some of the technical issues around this that the, perhaps, less-technical reporters don’t appear to have brought up yet and may not be aware of. It’s certainly an interesting announcement, particularly  in contrast with some of the other governors who have behaved badly with their email. At the same time, it’s important that we not be so dazzled by the announcement that we fail to give it due diligence.

  1. Will he really release all of the email? Bush was governor of Florida from 1999 to 2007. Granted, that was a while ago, before everyone and his brother (no pun intended) used email for everything, but even so, 250,000 messages for that many years sounds awfully small. Shoot, I’ve got more than that in my Gmail account. Bush did say “all,” but is he curating them in some way? Is this really going to be every-every-everything? And if not, is there going to be a mechanism for getting every-every-everything? Or is everything else gone? And given that some governors have wiped their systems when they left office – and even bought all the disk drives of the office computers – how is it that the Florida email messages from 1999 are still around?
  2. Did he ever use any unofficial or personal email address? As we’ve learned, governors from states such as Wisconsin and Alaska – not to mention the current Governor of Florida — have used personal email addresses for state business, potentially to avoid having their statements retrieved later. To what degree did Governor Bush do that?
  3. What format will it be in? As of yet, none of the articles have indicated what email system was in use in the Florida governor’s office at the time. Are we talking about a decade-old .pst file? Bush indicated it would be on a website; what will the interface to it be? Will people be able to look at the entire corpus at once, only one message at a time, or what? Will it be threaded? Several of the articles quoted Bush saying that some of the email was funny, some of it was sad, and some of it was serious; is it going to be available under categories such as Funny, Sad, or Serious rather than by date or other subject? How it’s set up could certainly limit the degree to which people could use it – though presumably there’ll be a crowdsourcing army to read the messages in any way they’re provided and let the world know if there’s anything juicy.
  4. Will it be full-text searchable? Anybody who’s tried to look at some older documents, such as some .pdfs, has run into times when the document is saved only as an image. That’s going to make it pretty tough to do any sort of reasonable search – not to mention answering questions like, “What word does Jeb Bush use in his email the most?”
  5. So, where is this email now? Who owns it? Who controls it? It’s not the Governor himself, is it? Presumably he just got a copy? Is there someone who’s going to be able to confirm that yes, the release is an accurate and complete release of what the state has?
  6. How is it that the Governor has it in the first place? Did he just go, welp, leaving office, let me just burn a copy of eight years of my email to a thumb drive and I’ll be on my merry way? Did the state save it and he asked them a while back – and when was that? – for a copy? In what sort of format does he have it, and what sort of searching mechanisms does he have? (The Chicago Tribune quoted Florida legal experts as saying the state’s public-records law requires the release of Bush’s emails and other correspondence as governor, “with few exceptions,” on which it didn’t elaborate.)
  7. Is personal information going to be redacted? One of the things that makes it difficult for governments to release email is that it can include personally identifiable information. Is someone going through all 250,000 email messages and putting big black marks through things? If not, isn’t that kind of risky? If so, then how do we know what sort of information we’re going to be missing?
  8. Is the metadata going to be in there? Are we going to be able to see headers? Email addresses? Datestamps? IP addresses? What are the chances that this is going to get anybody else in trouble?

If nothing else, this certainly sets a high bar for other government officials, particularly gubernatorial Presidential candidates, regarding their release of information in the future, pundits note.

December 20, 2014  10:57 PM

Managing a Multitude of Micro SD Cards

Sharon Fisher Sharon Fisher Profile: Sharon Fisher

If you’re anything like me, between your various smartphones, cameras, video cameras, and tablets – not to mention all the similar devices owned by various members of your family or your coworkers – you’re collecting quite a little pile of micro SD cards. I’m old enough to still be charmed that I can spend $15 and get 32 gigabytes in something the size of my pinky nail. (And if you want to spend more money, you can get 128 GB for $100.) Why, I remember when a 10-megabyte drive was the same size as my desktop computer and cost the same amount.

Sorry. Where was I? Have you seen my cane?

Anyway, I’m not the only person who’s accumulating a pile of the things, and I expect I’m not the only one who lives in terror that the cat will knock over the pile or we’ll accidentally vacuum one up. And if you poke around message boards, it’s pretty easy to find professionals who are dealing with the same problem. For example, filmmakers who are trying to put together a movie end up with a handful of them, especially if they’d like to look at files on more than one at a time. Plus, the teeny tiny disks are  too small to label, and sometimes you really don’t want to mix up some pictures with what you’re going to show to the PTA, know what I mean?

It turns out that there are some cases for the things, so you can at least store them in an adorable wee little briefcase rather than having them in an easily-knocked-over saucer or something. There’s also a manufacturer who makes a plastic case the size of a credit card that can hold ten of the things, plus an SD card adapter. Moreover, the cases come in colors, so you can put all the music on ones in a blue case and all the movies on ones in a red case, or whatever.

Now we’re talking.

But why stop there? We can’t label the micro SD cards themselves the way we used to with 3 ½” floppy disks. So why hasn’t some enterprising manufacturer started making the micro SD cards themselves in colors? Green for the CFO and red for the CIO and blue for the CEO, say? In an era when you can order an entire bag of green M&Ms if that’s what floats your boat, why do our micro SD cards mostly have to be black?

And while we’re suggesting things, what would it take to be able to read and write data to and from multiple micro SD cards at once? There are a couple of devices that give you access to up to four micro SD cards at the same time, but there doesn’t seem to be more than that. And apparently they work great if you’re trying to copy identical data to four micro SD cards at a time. But reports on these devices note that because they share a single input to your computer, reading and writing is slow, and some of them don’t let you swap cards in and out, so if you’ve got data on five or six cards, you’re hosed.

Come on, storage industry. I’m counting on you.

November 30, 2014  10:45 PM

Eek! Lawyers are Coming After Your Fitbit!

Sharon Fisher Sharon Fisher Profile: Sharon Fisher

We wrote in February about the notion of legal and discovery issues around wearable technology such as Google Glass and smartwatches, and as an aside joked, “Not to mention the wealth of data preserved by a Fitbit.

Little did we know.

In what is being billed as the first case of its kind, an attorney is using data from a client’s Fitbit as evidence of a client disability in a lawsuit.

For a client’s accident injury claim, the McLeod Law Office will start processing data from their client’s Fitbit to show that her activity levels are now under a baseline for someone of her age and profession, demonstrating her injury, according to Forbes. Lawyers for the office say they’re working on other similar cases as well. 

And that’s just the beginning, attorneys say. In the same way that legal teams now comb Facebook pages looking for evidence that a supposedly disabled person is actually secretly swimming or hiking, they expect Fitbits and similar devices to be used to demonstrate that the person isn’t so sick or disabled as all that. The development could see insurance companies, for example, insisting that claimants undergo assessment via fitness tracker, Samuel Gibbs writes in the UK newspaper The Guardian.

“Privacy considerations aside—and there are many—wearables are yet another example of how technology may be a gold mine of potentially relevant ESI [electronically stored information] for use in litigation,” attorney Neda Shakoori wrote in August. 

“Wearables data could just as easily be used by insurers to deny disability claims, or by prosecutors seeking a rich source of self-incriminating evidence,” writes Kate Crawford in the Atlantic. But there’s more. “In America, the Fifth Amendment protects the right against self-incrimination and the Sixth Amendment provides the right in criminal prosecutions ‘to be confronted with the witnesses’ against you,” she continues. “Yet with wearables, who is the witness? The device? Your body? The service provider? Or the analytics algorithm operated by a third party? It’s unclear how courts will handle the possibility of quantified self-incrimination.”

Not to mention, will legal firms be able to subpoena your cloud provider if that’s where your fitness data is stored? How much are they going to fight to protect you? If it’s stored in your phone, will you need to provide your password? And how does this all fit into HIPAA and other health information privacy rules? (Apple has already said, for example, that health data can’t be stored on the iCloud and that any health data on an iPhone has to be encrypted.)

“It might seem odd at first (particularly to non-tech savvy judges), but it’s no different than any other type of e-discovery that has come before,” writes attorney Keith Lee in Above the Law. “Another question is how will companies like Fitbit respond. There is no mention of providing access to your data in response to a legal inquiry in Fitbit’s Terms of Service. The client in the above matter is voluntarily providing her data to help her case, but what happens when Fitbit is subpoenaed to provide data? Will they push back, citing user privacy, or immediately comply?”

Two other factors make this issue even more problematic. First, wearables aren’t necessarily consistent in how they track your activity. “The Jawbone UP, Nike Fuelband, Fitbit, and Withings Pulse all have their own peculiarities in how they work: Some will count moving your arms around as walking (which is great if you want writing to count as exercise), others can’t easily register cycling as activity,” Crawford writes. “This ‘chaos of the wearable’ might be merely amusing or frustrating when you’re using the data to reflect on our own lives. But it can be perilous when that data is used to represent objective truth for insurers or courtrooms.”

Second, as in the McLeod case, not just the raw data is  being used. “Now that data is being further abstracted by analytics companies that create proprietary algorithms to analyze it and map it against their particular standard of the ‘normal’ healthy person,” Crawford adds.

This also, as we discussed earlier, hypothetically makes Fitbits subject to discovery, meaning you can get in trouble for wiping or otherwise failing to preserve the data on it.

Nobody’s saying that people will be required to wear a Fitbit as proof of fitness, or lack thereof. Yet. But in the same way that some car owners and renters are required to install a GPS unit on their vehicle, could it be far behind?

November 30, 2014  7:48 PM

DHS Plans to Delete Surveillance Data. Are We Glad or Not?

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
privacy, Security

The Department of Homeland Security announced, in a very low-key way, on November 19 that it was planning to delete “Master files and outputs of an electronic information system which performs information technology infrastructure intrusion detection, analysis, and prevention.” It gave people until December 19 to ask for copies of the plan, following standard National Archives and Records Administration protocol. After requesters receiver their copies, they have 30 days to comment.

According to Nexgov, what the agency is looking to delete are records more than three years old from its Einstein network monitoring system, which is intended to help DHS cybersecurity experts look for malware such as Heartbleed in government networks. This is making some security people happy, because they are concerned about the government keeping all these records. At the same time, it is making some security people sad, because they wonder if the government is trying to hide something by deleting the records.

“As a general matter, getting rid of data about people’s activities is a pro-privacy, pro-security step,” Nextov quoted Lee Tien, senior staff attorney with the Electronic Frontier Foundation, as saying. But “if the data relates to something they’re trying to hide, that’s bad,” he continued.

DHS says it wants to delete the data because since it’s three years old, it’s not useful anymore. (The agency still keeps incident reports.) Others disagree.”Some security experts say, to the contrary, DHS would be deleting a treasure chest of historical threat data,” writes Nextgov’s Aliya Sternstein. “And privacy experts, who wish the metadata wasn’t collected at all, say destroying it could eliminate evidence that the governmentwide surveillance system does not perform as intended.”

What’s causing some people to feel suspicious is that the rationale the agency is using to delete the data is the cost, which it estimates at $50 per month per terabyte. Given that you can get a 1-terabyte drive from Staples for less than that these days (yes, we know, there’s more to it than the hardware cost), this seems…excessive. On the other hand, some people are wondering just how much data DHS has that it’s a significant amount of money.

Data to be deleted includes email, contact and other personal information of federal workers and public citizens who communicate concerns about potential cyber threats to DHS; intrusion detection data; intrusion prevention data; analysis data such as files from the U.S. Computer Emergency Readiness Team (CERT); and a catch-all “information sharing” including data from white papers and conferences, Nextgov reports. 

So what is Einstein? It is the result of automated processes that collect, correlate, analyze, and share computer security information across federal U.S. civilian agencies, according to BiometricUpdate. “By collecting information from participating federal government agencies, ‘Einstein’ builds and enhances cyber-related situational awareness,” writes Rawlson King. “The belief is that awareness can assist with identifying and responding to cyber threats and attacks, improve the government’s network security, increase the resiliency of critical, electronically delivered government services, and enhance the survivability of the Internet. The program provides federal civilian agencies with a capability to detect behavioral anomalies within their networks. By analyzing the data and detecting these anomalies, the ability to detect new exploits and attacks in cyberspace are believed to be greatly increased.”

That said, this is all happening against a background of other changes in DHS involving cybersecurity that are making some people nervous.

  • Brendan Goode, the director of the Network Security Deployment division in the Office of Cybersecurity and Communications (CS&C) who built the Einstein system, announced earlier in November that he was leaving for the private sector, according to Federal News Radio. While his last day was scheduled to be November 21, he hadn’t yet announced where he was going, nor has he updated his LinkedIn page. 
  • After its initial setup in 2004, Einstein is now on its third implementation and has agreements with 15 out of the 23 agencies expected to sign up for it (out of nearly 600 agencies, according to RT.com), and implementations with 9 of them, all at a cost of hundreds of millions of dollars.
  • Due to incidents such as Heartbleed — where DHS had to wait up to a week for agency approvals, all while news of the vulnerability was out in the wild — the DHS now has the authority, as of October, to proactively monitor federal networks for vulnerabilities without having to wait for agency permission. “Agencies must provide DHS with an authorization for scanning of Internet accessible addresses and systems, as well as provide DHS, on a semiannual basis, with a complete list of all internet accessible addresses and systems, including static IP addresses for external websites, servers and other access points and domain name service names for dynamically provisioned systems. Agencies must give DHS at least five days advanced notice of changes to IP ranges as well. Further, agencies must enter into legal agreements for the deployment of DHS’s EINSTEIN monitoring system, provide DHS with names of vendors who manage, host, or provide security for Internet accessible systems, including external websites and servers, and ensure that those vendors have provided any necessary authorizations for DHS scanning of agency systems,” summarized FedWeek.
  • On the other hand, contractor vendors aren’t exactly leaping to be included.

It isn’t clear how much DHS was hoping that this would all be lost in the shuffle around the holidays. Presumably organizations such as the EFF and Nextgov have filed requests for the plans, and will follow up. If it’s the sort of thing you might feel the need to comment on, however, it might be a good idea to make your own request, if comments are limited to people who request the documents.

November 24, 2014  6:44 PM

Were the Missing IRS Email Messages Found? Not Quite

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Backup, Email, Storage

You may recall that in June, the Interwebs were burning up with the story about former director of exempt organizations for the IRS Lois Lerner, and how something like two years’ worth of email messages — conveniently covering a period of time under Congressional investigation — were unavailable because employees could only store 500 mb of email, backup tapes were only saved for six months, and her computer had crashed, wiping out her hard disk drive. While not everyone thought it was a coverup on the order of the missing 18 minutes on the Watergate tapes, few would argue that it was no way to run a railroad.

Now, it turns out that the IRS might have backup copies of the email messages after all — but retrieving them is likely to take a lot of time and money.

We’re not going to get into the politics of the investigation. As before, we’re just interested in this as a government IT problem — and it’s a dilly.

In the fine tradition of Taking Out the Trash Day — and like the original announcement of the missing email messages itself — this news was released on the Friday afternoon before Thanksgiving.

“The U.S. Treasury Inspector General for Tax Administration (TIGTA) informed congressional staffers from several committees on Friday that the emails were found among hundreds of “disaster recovery tapes” that were used to back up the IRS email system,” reports the Washington Examiner. As many as 30,000 email messages could be found.

Finding them might take a while, and technical details of exactly what’s going on are sketchy. Most of the coverage is in the mainstream or right-wing media, which isn’t necessarily all that tech-savvy to begin with. Moreover, they’re also quoting Congressmen and their staffs, who aren’t exactly technical experts either.  And while there may be technical people explaining more detail in comments on the stories, finding those comments among the hundreds railing about “libtards” and “Obummer” and “Benghazi” is more difficult than finding Lerner’s messages on the tapes themselves.

So here’s what’s happened since the original story in June.

In August, a representative from a watchdog organization called, appropriately, Judicial Watch told Fox News that it had heard from a Justice Department official that there were backup tapes “in case something terrible happened in Washington” and that Lerner’s email messages might be on those tapes. Congressional representatives wrote to the IRS in September asking about those. But court documents filed in October said there was no such thing beyond the standard disaster recovery tapes that were overwritten every six months, although it did agree there were server backups, which were being examined by TIGTA.

(Lerner also had a Blackberry that was replaced in February 2012, and while Judicial Watch felt that some of the email messages might be on that older Blackberry, it had been destroyed when it was replaced.)

Now, apparently some backups have been found.  Where exactly these tapes came from is not clear. Are they different from the tapes that are supposedly recycled every six months? If so, where did they come from? Or did that recycling not occur? If not, why not?

Wherever the tapes themselves came from, here’s some of the problems in finding the missing messages.

  • The 30,000 email messages are scattered among 250 million email messages on 744 disaster recovery tapes, according to the Washington Examiner.
  • Moreover, finding the actual messages could take a while because it could take weeks to learn their content “because they are encoded,” according to Fox News, quoting Frederick Hill, a spokesman for Republicans on the Oversight committee. Does “encoded” mean “encrypted”? Or is this simply referring to the encoding the email messages have to work with the email program?
  • Before the messages can be released, any personally identifiable information in them about individual taxpayers has to be redacted.
  • Even when the messages are tracked down, investigators may find that they’re simply duplicates of the 24,000 messages they already have already located, such as by getting copies from the people with whom Lerner had exchanged email, reports The Hill.

Ironically, what might have saved the messages was budget cuts. The Washington Examiner reported in September that some 760 “exchange servers”[sic; do they mean Microsoft Exchange email servers?] — which were supposed to have been destroyed two years previously — might have been spared due to budgetary constraints. It isn’t clear whether these tapes come from those servers, or if the examination of those servers is complete; there could be further revelations forthcoming.

November 18, 2014  2:11 PM

Here’s the One Thing to Look At to See If Your Hard Drive Will Fail

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Backup, Storage

Anyone who’s had a hard drive fail just as they were about to do a backup on it, honest! will understand how much we’d all like to know when our hard disks are about to fail.

Some time ago (between 1995 and 2004, depending on how you count), a standard was developed called Self-Monitoring, Analysis and Reporting Technology (SMART, get it?) that was intended to help with this problem.

Unfortunately, like many other technologies, its user experience was not the best. SMART defines — and measures, for those vendors that support it — more than 70 characteristics of a particular disk drive. But while it’s great to know how many High Fly Writes or Free Fall Events a disk has undergone, these figures aren’t necessarily useful in any real sense of being able to predict a hard drive failure.

Part of this is because of the typical problem with standards: Just because two vendors implement a standard, it doesn’t mean they’ve implemented it in the same way. So the way Seagate counts something might not be the same way as Hitachi counts something. In addition, vendors might not implement all of the standard. Finally, in some cases, even the standard itself is…unclear, as with Disk Shift, or the distance the disk has shifted relative to the spindle (usually due to shock or temperature), where Wikipedia notes, “Unit of measure is unknown.”

That’s not going to be helpful if, for example, one vendor is measuring it in microns and one in centimeters.

There have been various attempts at dealing with this problem of figuring out which of these statistics are actually useful. One in particular was a paper presented at 2007 Usenix by three Google engineers, “Failure Trends in a Large Disk Drive Population.” What was interesting about Google is that it used enough hard drives to be able to develop some useful correlations between these 70-odd (and some of them are very odd) measurements and actual failure.

Now there’s sort of an update to that paper, but it uses littler words and is generally more accessible to people. It’s put out by Brian Beach, an engineer at BackBlaze; we’ve written about them before. Like Google, their insights into commodity hard disk drives are useful, simply because they use so darn many of them.

What BackBlaze has done this time is look at all the drives they have that have failed, and then go back and look at all their SMART statistics, and then correlate them. The company also looked at how different vendors measure these different statistics, so they have a good idea about which statistics are relatively common across vendors. This gives us a better idea of which statistics we should actually be paying attention to.

As it turns out, there’s really just one: SMART 187 – Reported_Uncorrectable_Errors.

“Number 187 reports the number of reads that could not be corrected using hardware [Error Correcting Code] ECC,” BackBlaze explains. “Drives with 0 uncorrectable errors hardly ever fail. Once SMART 187 goes above 0, we schedule the drive for replacement.”

Interestingly, this particular statistic isn’t even mentioned in the Google paper, nor is it called out in the Wikipedia entry for SMART as being a potential indicator of imminent electromechanical failure.

BackBlaze also discusses its results with several other statistics, and explains why it doesn’t find them useful. Finally, for the statistics wonks among you, the company also published a complete list of SMART results among its 40,000 disk drives. (And for some, that’s still not enough; in the comments section, people are asking BackBlaze to release the raw data in spreadsheet form.)

In addition to giving us one useful stat to look at rather than 70 un-useful ones, this research will hopefully encourage hardware vendors to work together to report their statistics more meaningfully, and for software vendors to develop better, more useful tools to interpret the statistics.

Disclaimer: I am a BackBlaze customer.

October 31, 2014  10:15 PM

Yes, Cops Can Make You Use Your Fingerprint to Unlock Your Phone

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Encryption, privacy, Security, Smartphones

While courts are still arguing back and forth about whether people can be compelled to give up the encryption key for their laptops and other devices, it looks like they may have decided that it’s okay to force you to use your fingerprint to unlock smartphones with that capability.

Judge Steven C. Frucci, of Virginia Beach, Va., ruled that David Baust, who was charged in February with trying to strangle his girlfriend, had to give up his fingerprint so prosecutors could check whether his cellphone had video of the incident. 

The distinction that courts draw in general is that a physical thing, like a key to a lockbox, is not protected by the Fifth Amendment. But the “expression of the contents of an individual’s mind,” such as the combination to a safe, is protected. Courts have been debating for a couple of years now about whether an encryption key is something you have or something you know. A fingerprint, however, is something you have, similar to the way that you can be compelled to give up a blood sample to test for alcohol, ruled the judge.

Phones that include fingerprint detectors include the Apple iPhone 5S and the Samsung Galaxy S5, according to the Wall Street Journal. In fact, when phones with fingerprint capability came out last year, organizations such as the Electronic Freedom Foundation and other legal experts warned that this could happen. “It isn’t hard to imagine police also forcing a suspect to put his thumb on his iPhone to take a look inside,” Brian Hayden Pascal, a research fellow at the University of California Hastings Law School’s Institute for Innovation Law, told the Journal last fall. Ironically, fingerprint scanners were supposed to make the phones more secure.

This also fits in with recent moves from companies such as Apple to make encryption the default on smartphones so the companies can’t be compelled to reveal information on the phones. If the phone is protected only by a fingerprint, then police could use the fingerprint to decrypt data on the phone. “One of the major selling points for the recent generation of smartphones has been that many of them don’t save their data in a way accessible to anyone without the phone itself,” writes Eric Hal Schwartz in In the Capital. “It’s something that has annoyed law enforcement like FBI director James Comey, but it chips away at some of that much-touted privacy if police can get into a phone with your fingerprint without your permission.”

Actually, Frucci made a distinction between Baust giving up his fingerprint, which he could be forced to do, and not having to give up a password for the phone, which the judge said he could not be forced to do. In other words, if the smartphone was protected by both a fingerprint and a password — such as, if the phone had been turned off — prosecutors would still be out of luck. If you’re concerned about this, some people are recommending turning off your phone when police approach, or by messing up the fingerprint unlocking multiple times, to force the phone to require you to enter a password.

October 30, 2014  6:37 PM

The Breakup of the Great Ottoman Storage and E-Discovery Empires

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
Autonomy, Documentum, E-discovery, ediscovery, EMC, HP, Storage, Symantec

With this being the centennial of the start of World War I, and with what’s going on in the storage industry lately, it isn’t surprising if you’re also being reminded of the decline and fall of the Ottoman Empire.

Well, okay. Maybe only if you’re a history buff.

In case you were dozing in the back row during world history class in tenth grade (or, if, like me, your history teacher was actually a repurposed Latin teacher and you spent all but the last two weeks of the school year on Greece and Rome, meaning you covered a millennium a day those last two weeks), the Ottoman Empire lasted in one way, shape, or form for more than 500 years. It spanned three continents — Europe, Asia, and Africa — and contained 29 provinces and many other states. But it fell during World War I, and nations such as Britain and France carved up the pieces willy-nilly into ways that made sense to them, without paying much attention to cultural boundaries or what the people in those states might actually want to do. (In fact, some of the current conflict in the Middle East dates directly back to those actions. But I digress.)

Any of this ringing a bell yet?

So at this point, in the storage and e-discovery industry that this blog covers, we have not one but three Ottoman empires potentially in the process of dissolving, with a bunch of people on both the outside and the inside watching and speculating about how the pieces might all eventually fit together.

We’ve already talked about EMC, which is under pressure from shareholders to break itself up so the pieces can be worth more — a case of the whole being worth less than the sum of the parts. It isn’t clear yet exactly what’s going to happen with EMC, though there’s been plenty of speculation. (To further complicate things, EMC and Cisco are breaking up their partnership, which resulted in the software-defined networking joint venture VCE, with EMC taking control of it. More pieces to juggle.)

In the meantime, both HP and Symantec have announced their intentions to split in two. HP’s pieces are going to be one for its printer and PC business, and one for its corporate computer hardware and software business. Symantec’s pieces are going to be one for its security management products and one for its information management products.

And while the Britains and the Frances of the computer industry are arguing over the bigger pieces and how they will best fit together, other people — especially in e-discovery — are talking about some of the other pieces that haven’t gotten as much love lately and how this could all work out for them.

The HP split, for example, could result in new support for Autonomy, which HP bought for what everyone — including HP — agrees was way too much money. Not only was it not great for HP, but it hasn’t been too great for the Autonomy people either, who are kinda HP’s red-headed stepchildren.

The HP split, in fact, is “probably good news for long-suffering customers of the former Autonomy products,” writes Tony Byrne of Real Story Group. “You know why? Because things couldn’t get much worse for them.”

Meanwhile, Gartner pointed out this summer in its e-discovery Magic Quadrant that although it still positioned Symantec in the Leaders quadrant, its Clearwell product — one of the first big acquisitions in the 2011 e-discovery land grab — had languished under Symantec’s control. Or, as Gartner puts it, “The innovation pipeline for the eDiscovery Platform has slowed during Symantec’s acquisition and integration of Clearwell Systems, resulting in the product’s lack of growth and new releases.”

(Keep in mind that Autonomy and Clearwell had both individually been listed in the Leaders quadrant in the original 2011 e-discovery Magic Quadrant. Almost makes you wish that some company that really had a great vision for e-discovery would buy both pieces, integrate them, and really do it right.)

At the same time, some people are looking at some of the less-loved, neglected pieces of EMC, such as Documentum, and thinking that maybe there’s some way these could get involved, too.

“[Documentum] doesn’t seem to play a role in EMC’s survival,” writes Virginia Backaitis in CMSWire, before going on to suggest that HP buy it and integrate it with Autonomy. “In EMC’s quarterly call with investors last week, neither EMC CEO Joe Tucci nor his lieutenants (David Goulden, CEO of EMC Information Infrastructure and CFO Zane Rowe) uttered the name of its spawn at all.”

It remains to be seen how the various pieces of all three companies will combine (hopefully not in some e-discovery version of Iraq, with different factions battling for control). If nothing else, it could mean that next year’s Gartner e-discovery Magic Quadrant, which has been pretty much of a snore the last couple of years, has the potential to be a lot more interesting.

October 29, 2014  9:57 PM

To Heck With Station Wagons. How Much Data Fits on BART?

Sharon Fisher Sharon Fisher Profile: Sharon Fisher
privacy, Security, Storage

Periodically, people take the new capacity of storage media — not to mention the new increasing sizes of motor vehicles — and uses it to recalculate that lovely statistic, “what is the bandwidth of a station wagon full of tapes speeding down the highway?” So now we have a new one — how much data goes back and forth to major cities, especially using public transit?

We now have that data courtesy of Mozy, a cloud backup service that describes itself as the “most trusted.” (Exactly how they figured out it was the “most trusted,” they don’t say.) According to the company, when you add up laptops, smartphones, personal hard drives, thumb drives, and so on, you end up with a pretty horrendous amount of data leaving the office every day:

  •  The average commuter takes 470GB of company data home with them at the end of every day — 2,500 times the amount of data they’ll move across the Internet in the same timeframe
  •  Every day, 1.4 exabytes of data moves through New York City alone – that’s more data than will cross the entire Internet in a day
  •  As much as 33.5PB of data will travel over the Oakland Bay Bridge every day
  •  As much as 49 PB of data will travel through the Lincoln Tunnel each day
  •  Up to 328PB of data travels in the London Tube network every day
  •  Up to 69PB of data leaves Munich’s Hauptbahnhof on a daily basis
  •  The Paris Metro carries as much as 138PB of data every day

(There’s also some really cool maps showing where the data is coming from.)

There is, however, one flaw in the Mozy description, which is that it refers to this phenomenon as a “data drain.” That’s not really accurate. A “brain drain,” for example, typically refers to people leaving an area. Their brains are therefore gone from the area. But this data isn’t actually leaving the area, in the context of it being gone. Instead, the data is copied. This leads to its own issues, such as version control, security, and simply taking up much more storage space than is really required. (Good thing storage is so cheap these days, amirite?)

And certainly one could quibble with the figure. Mozy doesn’t explain the methodology, but presumably it’s adding up the storage in each of the devices that people carry back and forth. And who knows, really, how much of it is actually corporate data, and how much of it is cat pictures? That said, it’s certainly a fun back-of-the-envelope statistic to calculate.

Anyway, it’s the security issue that is particularly catching Mozy’s interest. “With 41.33 percent of people having lost a device that stores data in the past 12 months, huge amounts of business data is put at risk every rush hour,” the company writes. “There isn’t a CIO we know who would risk sending massive volumes of data over the internet without protecting it first.”

Well, we have to say, Mozy must not know very many CIOs. That aside, the company has a point: with all the evidence we have of companies and governments behaving badly with personally identifiable data, there’s an awful lot of data at risk every day.

“A thief holding up a New York subway car at rush-hour capacity could walk away with over 100TB of data,” the company notes. (Which actually sounds like an interesting premise for a movie. Starring Denzel Washington? Jeff Goldblum? Sandra Bullock?)

This commuting data is vulnerable in two ways, Mozy notes. First, bad guys could get access to the data. Second, the person with whom the data is riding could lose access to the data, if that data is the only copy. “It’s also the most-critical data; the edits to the contract that we’ve just worked through in today’s meeting, the presentation that we’re giving tomorrow morning, the tax forms that you’re halfway through filling in,” the company writes. “Losing this data can have an immediate impact on a company’s success.”

Mozy, however, doesn’t go far enough. Let’s go to the root cause: Why are people taking so much data home with them? And if this is something we don’t want to have happen, what is the alternative? There’s already been any amount of hand-wringing over the notion of people setting up Dropbox and similar accounts to make copies of corporate data. Is carrying the data on a device more or less secure, or desirable, than saving it to a public cloud service?

Either way, device or cloud, it boils down to the same issue: People are making copies of the corporate data, by and large, because they feel they need to do that to do their jobs. So either there isn’t a reliable way for them to gain access to the corporate data they need any other way, or, if there is, they don’t know about it.

The point being, if people feel they have to do this to do their jobs, then you need to give them a better way. Simply issuing an edict that Thou Shalt Not is not going to work, even if you put teeth in it. Because, ultimately, they’re not as afraid of you as they are of their boss.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: