Think your backup job is tough? How about backing up the entire Internet?
You may wonder, what’s the point of archiving the Internet? Do we really need to save all those memes and cat pictures? But the Internet is more than that, insist preservationists.
Jill Lepore leads off her New Yorker article by noting that the Internet Archive’s web preservation service, known as the Wayback Machine, was the only remaining source of evidence that Ukraine separatists had posted that they had shot down Malaysia Airlines Flight 17 on a Russian social media site – a site that the Internet Archive had begun saving just two weeks before. “On July 17th, at 3:22 P.M. G.M.T., the Wayback Machine saved a screenshot of Strelkov’s VKontakte post about downing a plane,” she writes. “Two hours and twenty-two minutes later, Arthur Bright, the Europe editor of the Christian Science Monitor, tweeted a picture of the screenshot, along with the message ‘Grab of Donetsk militant Strelkov’s claim of downing what appears to have been MH17.’ By then, Strelkov’s VKontakte page had already been edited: the claim about shooting down a plane was deleted. The only real evidence of the original claim lies in the Wayback Machine.”
In addition to web pages, the Internet Archive – which, incidentally, is hosted in a former Christian Science church because it looked like the organization’s logo — also hosts books, videos, “ephemeral” films such as advertising, audio recordings, concert recordings, audio books, television news broadcasts, and historical software (including Oregon Trail and Leisure Suit Larry in the Land of the Lounge Lizards), writes Andy Baio in Medium. Altogether, it includes 500,000 pieces of software, more than 2 million books, 3 million hours of TV, and 430 billion web pages, writes Justin Ellis. “In a single day, they digitize more than 1,000 books. They capture TV 24 hours a day. In a week, they save more than 1 billion URLs.”
So how do pages get saved into the Wayback Machine? There’s three ways, Lepore writes:
- There’s a crawler that attempts to make a copy of every Web page it can find every two months or so, though she points out that the New Yorker’s home page gets saved about six times a day
- Librarians choose certain pages to be archived in certain subject areas, through a service called Archive It, at archive-it.org, which also lets individuals and institutions build their own archives
- Anyone who wants to can preserve a Web page, at any time, by going to archive.org/web, typing in a URL, and clicking “Save Page Now,” which is how five of the twelve screenshots of the Malaysian Airlines post were made
At this point, the Wayback Machine has archived more than 430 billion Web pages, comprising 20 petabytes of storage – which is double its 2012 figure, Lepore writes. 600,000 people use it every day, conducting 2,000 searches a second, she adds.
That said, it’s not difficult to keep the Wayback Machine from trawling a site; all it takes is a single text file, Lepore writes – which has the effect of deleting all the archives as well. “Blocking a Web crawler requires adding only a simple text file, ‘robots.txt,’ to the root of a Web site,” she writes. “The Wayback Machine will honor that file and not crawl that site, and it will also, when it comes across a robots.txt, remove all past versions of that site. When the Conservative Party in Britain deleted ten years’ worth of speeches from its Web site, it also added a robots.txt, which meant that, the next time the Wayback Machine tried to crawl the site, all its captures of those speeches went away, too.”
The biggest problem with the Internet Archive is that it’s so big it’s really difficult to search, Lepore writes, because it lacks the tools. “You can do something more like keyword searching in smaller subject collections, but nothing like Google searching (there is no relevance ranking, for instance), because the tools for doing anything meaningful with Web archives are years behind the tools for creating those archives,” she writes. “Doing research in a paper archive is to doing research in a Web archive as going to a fish market is to being thrown in the middle of an ocean; the only thing they have in common is that both involve fish.”
To this end, the Internet Archive was recently one of 22 organizations to share in $3 million of grants from is the Knight Foundation through the Knight News Challenge, towards projects that provide new tools and ideas for making libraries more accessible. “The Internet Archive will get $600,000 to develop new technology to give users more control over how materials are uploaded, categorized, and curated in the archive,” Ellis writes. “What they plan to do with the funding from Knight is create a simpler upload system that works across any browser, a contributor management system that lets one or many people work on collections, expanded search functions, and improved tools for organizing what material can be added to certain collections.”
Including cat pictures, one presumes.
Not with a bang, but with a whimper. After HP’s monstrous $10 billion acquisition of Autonomy in 2011, for which nearly everyone agreed it overpaid, it took an $8 billion writedown on the deal, a whole bunch of people threw lawyers at each other, and some of those proceedings are still dragging on.
First, there was the lawsuit of HP stockholders suing HP. Turns out that some HP shareholders took exception to the whole sorry incident and sued, claiming current and former H-P executives and directors, including CEO Meg Whitman, failed to heed warning signs about problems with Autonomy’s business, writes the Wall Street Journal.
Because that’s the way these things are done, HP is attempting to settle, but keeps being shot down by the courts, because its proposed settlements have been too nice to HP. District Judge Charles Breyer said in December that “the proposed settlement improperly protected the H-P directors, officials and professional firms from a wide swath of potential future shareholder litigation, including some suits that might not be related to the Autonomy deal,” writes the Journal.
This is after a similar decision in August, where Judge Breyer criticized an earlier version of the settlement because of the proposed fees for the shareholders’ lawyers, and a different list of protections from future lawsuits against the H-P officials and others, the Journal continues.
Hoping that the third time’s the charm, HP filed a third settlement attempt last week. If you’re just dying to look it up for yourself, it’s In Re Hewlett-Packard Co. Shareholder Derivative Litigation, 12-cv-06003, U.S. District Court, Northern District of California (San Francisco), according to Bloomberg. Reportedly, it protects the company officers – including those of both of the new companies, too – only from future lawsuits that have to do with Autonomy.
Second, there was the matter of HP suing Autonomy, which was complicated by the fact that HP is based in the U.S. and Autonomy was based in U.K. Earlier this month, the U.K.’s Serious Fraud Office (no word on whether there’s an Insignificant Fraud Office to go with) ruled that it had closed its investigation, which it began in early 2013 following a referral from HP. “The SFO has concluded that, on the information available to it, there is insufficient evidence for a realistic prospect of conviction,” the organization reports.
Naturally, there’s still an ongoing investigation on the U.S side, the SFO reports. The U.K. Financial Reporting Council is also still investigating, reports Bloomberg.
And in an amusing sidenote, the SFO (which has come under some criticism of its own) itself uses the Autonomy software, which the office assures us is not a conflict of interest. “Throughout the investigation we have kept the potential for conflict of interest under review,” the organization writes. “Such a conflict of interest does not exist, nor has it ever existed, and the matter played no part in any decision concerning this investigation.”
All righty then.
Heck, Autonomy’s still even listed in the Leaders section in the 2014 Gartner E-discovery Magic Quadrant.
But fear not, attorneys. The lawsuits are ongoing. Your jobs are still safe.
If you’d invested $1,000 in Box’s IPO last Friday, you’d have $1554.29 now.
At least, as I write this. Who knows. It may have gone up another 50 percent by now.
After what seems like years (and it may be almost exactly a year, given that the initial filing was seekrit, though the formal filing was on March 24) and the on-again off-again IPO as the stock market waned and rose, the cloud storage company finally went public. After estimating it would be priced at $11 to $13 a share, the company decided on $14, but never mind; it blew through that on the first day, gaining 66 percent (after opening 44 percent higher at $20.20), and is currently plotzing around in the low $20s.
It turns out that Box needed to go public sometime this year; a $150 million funding round from last summer would have imposed fines if the company didn’t do so. (Also, TIL that Box was originally funded on poker winnings. Seems appropriate.)
On the other hand, Shawn Tully of Fortune points out that if Box had priced its offering at $20 or so in the first place, it would have made another $120 million; it chose to forego that in favor of having a big attention-getting pop.
Well, that worked. “Call your broker immediately!” advised Mad Money host Jim Cramer, though he was thinking in terms of $18 or less per share. Still, he thinks it’s going to go higher.
So now there’s two questions. The first is, can they keep it up? The second is, what next?
As far as the first question, well, that’s the rub, isn’t it? “Many remain skittish about the company’s precarious financial health and ability to compete in an increasingly crowded pool of rivals,” writes the San Jose Mercury News. “Although it has spent the last year reining in spending, Box is still burning through cash and spending far more to acquire customers — through marketing and other means — than many of those customers initially pay for Box’s services, some experts say.”
Even stock boosters point out things like, “Box is at the forefront of cloud sharing and collaboration, only rivaled by a few products from Google, Microsoft and Cisco.”
Those are big opponents to have.
Analysts say that Box’ ability to keep it up will depend on how it does not on the cloud storage service per se, but on the tools it is adding to the service. The problem with depending just on the service itself is that “larger competitors have moved in with offerings that are often significantly cheaper,” writes the New York Times.
And getting cheaper all the time. Microsoft, for example – following the lead of some of its other products, such as Internet Explorer – is offering its OneDrive service for free. And as we all know from Internet Explorer, if it’s free, it doesn’t need to be as good – just good enough.
Others expect Box to be bought. “Despite Box’s initial success on Wall Street, I remain skeptical of its long term viability as an independent company,” writes Forbes’ Kurt Marko, noting that only 10 percent of the company’s customers actually pay for the service. “In the months since my initial column on Box’s IPO, I believe events support my thesis that ‘cloud storage and file sharing isn’t a product, it’s a feature.’”
For now, there’s this: Box was valued at $2.4 billion in July, and had dropped to $1.67 billion at the opening, but after opening day was worth $2.78 billion. The company also raised at least $175 million and as much as $201.3 million if bankers exercise options to sell more shares, the Mercury News writes.
So, what next? Disappointed investors are already asking about the “next Box.” And after the company’s stellar debut, no doubt the “increasingly crowded pool of rivals” is considering its own moves. “That optimism may very well spill over to the entire sector of cloud-computing companies, which have drawn skepticism from investors over their financial vitality,” the Mercury News wrote in its followup story.
“Box is blazing a trail in terms of being the first company in this space to go public,” Anthony Foy, CEO of Workshare, a UK-based file sharing and collaboration company with an office in San Francisco, told the Mercury News. “Box going public … establishes that this is a many multi-million dollar marketplace that we are competing in.”
Ultimately, what could end up happening is that while Box itself might flame out and die, it could do so while being a trailblazer for other companies in the industry.
“If you are looking for good drive at a good value, it’s hard to beat the current crop of 4 TB drives from HGST and Seagate.” On the other hand, if you’ve got a 1.5 TB or 3.0 TB Seagate Barracuda, you might want to stop reading this, go back it up, and replace it.
That’s the conclusion from the most recent BackBlaze data about the failure rates of the 17 varieties of disk drives it uses.
BackBlaze, in case you’re not aware, is a backup service that, instead of using real real big storage, uses a whole whole lot of commodity storage devices hooked together into “pods,” with as much of the extraneous stuff stripped off as possible. This reduces costs and is more scalable than large storage systems that require forklift upgrades to be expandable. Companies such as Netflix are using it as well, and several vendors have started selling storage systems based on the Backblaze designs. While the company occasionally has trouble finding commodity disk drives, in general the system works pretty well.
Because BackBlaze uses a whole whole lot of commodity storage, it is in a good position to judge performance and failure rates of these commodity drives, as opposed to those of us who buy one or two every couple of years.
Some commenters pointed out, in various degrees of politeness, that BackBlaze is not a typical user. But absent a Consumer Reports study, a company that uses 41,213 of a thing can generally be thought of as having a reasonable idea of the quality of the thing. Or, as one commenter puts it, “Much more helpful than a guy/gal saying ‘I used this drive for a week and I give it 5 stars!’”
Plus, BackBlaze is pretty good about releasing its data in periodic blog posts. “As far as I know (and please educate me if I’ve missed one), there’s no other mass studies of hard drives that have been released to the public, naming specific brand names and models,” writes one commenter. “Google has a 2007 white paper on the topic, but like Backblaze’s, it’s based off of their data centers, plus they didn’t reveal names and models. While Backblaze’s data center doesn’t directly equate to your home PC’s usage, they have done one thing that’s super useful — gather a statistically significant amount of data in a relatively variable controlled environment.”
All that said, what about the results? At this point, BackBlaze has migrated many of its storage pods to 4.0 TB drives, writes Brian Beach, distinguished engineer. Part of this migration is due to what the company says is lower reliability of 3.0 TB drives. “The HGST Deskstar 5K3000 3 TB drives have proven to be very reliable, but expensive relative to other models (including similar 4 TB drives by HGST),” he writes. “The Western Digital Red 3 TB drives annual failure rate of 7.6% is a bit high but acceptable. The Seagate Barracuda 7200.14 3 TB drives are another story.”
Which gets back to the advice in the first paragraph. While the average failure rate of most of the disk drives the company has is in the single digits, two drives show double-digit failure rates: the 1.5 TB Seagate Barracuda 7200.11, with an average age of 4.7 years, and the 3.0 TB Seagate Barracuda 7200.14, with an average age of 2.2 years.
Frustratingly, BackBlaze doesn’t say what the problem is with the Seagate drives, indicating only that it will write about it in a future blog post. The good news is that the company reports it isn’t having the same problem with the Seagate 4.0 drives. “The Seagate Desktop HDD.15 has had the best price, and we have a LOT of them,” Beach writes. “Over 12 thousand of them. The failure rate is a nice low 2.6% per year.”
Seagate’s perspective is that BackBlaze is using commodity consumer drives for enterprise purposes, so naturally they’re going to fail more often. (Confirmation bias, but commenters went on to largely concur with BackBlaze’s experience, noting also that the other drives were running under the same condition.)
Moreover, Seagate’s 4.0 TB drives appear to be more reliable than the 3.0 TB drives, Beach adds. “You might ask why we think the 4 TB Seagate drives we have now will fare better than the 3 TB Seagate drives we bought a couple years ago. We wondered the same thing,” he writes. “When the 3 TB drives were new and in their first year of service, their annual failure rate was 9.3%. The 4 TB drives, in their first year of service, are showing a failure rate of only 2.6%. I’m quite optimistic that the 4 TB drives will continue to do better over time.”
So how did those 6 TB drives do in terms of reliability? It’s a little early to tell, Beach writes. “Currently we have 270 of the Western Digital Red 6 TB drives. The failure rate is 3.1%, but there have been only 3 failures. The statistics give a 95% confidence that the failure rate is somewhere between 0.1% and 17.1%. We have just 45 of the Seagate 6 TB SATA 3.5 drives, although more are on order. They’ve only been running a few months, and none have failed so far.”
That said, because Western Digital drives use a little less electricity – “This small difference adds up when you place 45 drives in a Storage Pod and then stack 10 Storage Pods in a cabinet,” notes director of cloud storage Andy Klein – and load a little faster, the company is primarily going to migrate to the Western Digital 6 TB drives. However, it will still buy some of the Seagate ones for diversification purposes, he adds.
Meanwhile, the company is already testing pods made of 8 TB drives.
Disclaimer: I am a BackBlaze customer.
A Democratic Senator from Oregon is attempting to prevent government agencies from requiring vendors to build “back doors” into their software and electronic products by playing two kinds of security fears against each other. Sen. Ron Wyden introduced the Secure Data Act earlier this month.
People supporting such “back doors” say they are necessary to help protect Americans from terrorists and other criminals. FBI director James Comey, among other law enforcement officials, called for them after vendors such as Apple and Google implemented encryption on their smart phones by default. But Wyden is saying such “back doors” also make it easier for hackers to break in – an increasingly major issue in the past year.
And Wyden isn’t just speculating about the possibility; he cited an incident in 2005 where “an unknown entity had exploited a ‘lawful intercept’ capability built into Greek cellphone technology and had used it to listen to users’ phone calls” — including those of dozens of senior government officials.
“Unfortunately, there are no magic keys that can be used only by good guys for legitimate reasons,” Wyden wrote in an op-ed supporting the bill. “There is only strong security or weak security.”
“Security is a lot like a ship at sea,” agreed Alan McQuinn, a research assistant with the Information Technology and Innovation Foundation, in a blog post in The Hill. “The more holes you put in the system—government mandated or not—the faster it will sink.” Just a few years ago, the FBI was encouraging Americans to use encryption to better protect their data, he noted.
Another major issue in the past year has been revelations about agencies spying on Americans, which Wyden said is eroding trust in the government. “Strong encryption and sound computer security is the best way to keep Americans’ data safe from hackers and foreign threats. It is the best way to protect our constitutional rights at a time when a person’s whole life can often be found on his or her smartphone. And strong computer security can rebuild consumer trust that has been shaken by years of misstatements by intelligence agencies about mass surveillance of Americans,” he said in a statement.
Requiring back doors would also make U.S. companies less able to sell their products outside the U.S., Wyden noted. This could exacerbate problems that vendors such as cloud storage companies are already having outside the U.S. due to agencies using the courts to claim access to such data, even when it’s outside the U.S.
Wyden isn’t alone. The Hill noted that there was bipartisan opposition to Comey’s proposal, which he said didn’t call for a back door but a “front door with clarity and transparency.” But security experts dismissed that as a semantical difference. “The notion that it’s not a backdoor; it’s a front door — that’s just wordplay,” Bruce Schneier, a computer security expert and fellow at the Berkman Center for Internet & Society at Harvard University, told The Hill. “It just makes no sense.”
Nothing happened with the bill in the lame duck Congress, but Wyden reportedly expects to introduce it in the new Congress in 2015. Lily Hay Newman notes in Slate, however, that such bills have typically faced an uphill battle. For example, a similar measure was passed on the House side earlier this year, but funding for it was stripped from the “cromnibus” bill. It also is expected to be reintroduced next year.
Moreover, the Secure Data Act doesn’t prohibit back doors—it just prohibits agencies from mandating them, Newman writes. “There are a lot of other types of pressure government groups could still use to influence the creation of backdoors, even if they couldn’t flat-out demand them.” There are other weaknesses in the bill as well, notes the Electronic Frontier Foundation.
On the other hand, this isn’t Wyden’s first cybersecurity rodeo; he also essentially singlehandedly killed two bills in the past several years that the computer industry said could give the government too much control over the Internet, as well as worked on other Internet control issues.
Microsoft is continuing its fight with the U.S. government regarding access to data located on the company’s servers outside the U.S. And this time, it brought some friends.
28 major companies, including Microsoft competitors such as Apple, Amazon, and AT&T (but not Google, surprisingly enough), filed friend of the court briefs on December 15, after Microsoft formally appealed the ruling on December 8. Other organizations filing briefs of support include the U.S. Chamber of Commerce, CNN, ABC, Fox News, the Guardian, and Verizon.
Altogether, ten briefs were signed by 28 leading technology and media companies, 35 leading computer scientists, and 23 trade associations and advocacy organizations “that together represent millions of members on both sides of the Atlantic,” noted Microsoft legal counsel Brad Smith. Signatories also included nonprofit organizations such as the Center for Democracy & Technology, the American Civil Liberties Union, the Electronic Frontier Foundation, the Brennan Center for Justice at New York University School of Law, and the Berkman Center for Internet & Society at Harvard.
If upheld, the decision “allows the government to adopt a ‘seize first, search later’ view of the Fourth Amendment, where the government can seize a computer, copy all of its data, and keep that information indefinitely—without a search warrant at all,” writes the EFF in explaining its support.
Why do news organizations such as CNN and ABC care? Because they want to protect their reporters and sources, Smith writes. “These organizations are concerned that the lower court’s decision, if upheld, will erode the legal protections that have long restricted the government’s ability to search reporters’ email for information without the knowledge of news organizations,” he writes.
In addition, the Irish government also stepped in, saying the ruling violated its sovereignty, as did a German representative to the European Parliament.
In case you’ve missed it, a judge ruled in May that a search warrant with which it was served also applied to data on servers in data centers in Dublin, Ireland. (The exact person and crime has not been revealed, but it is reportedly drug-related.) Microsoft is protesting this ruling. Another U.S. judge reiterated this decision in August.
There’s more than just data at stake. The ruling means that the U.S. government lays claim to any data owned by a U.S. company, no matter where in the world it is located — such as in the cloud on servers in another country. This has the potential to conflict with privacy laws in other countries, as well as makes it a lot less likely that customers outside the U.S. will be willing to put their trust into U.S.-based cloud companies. In addition, it opens the door for non-U.S. governments to make their own data demands of countries operating within their borders.
Microsoft’s appeal wasn’t a surprise; in fact, the company had said in May that it intended to appeal the decision. Several other U.S. companies had also announced their support of Microsoft in August, since the decision has such wide-ranging effects.
The notion of data sovereignty has been discussed for several years, and in fact Microsoft’s Dublin data center had been specifically cited as an example, before this case came up. “Microsoft, like other cloud providers, will need to clarify data sovereignty issues, if Office Live is to be taken seriously,” wrote Computerweekly presciently in June, 2011. “While it does have a datacentre in Dublin – so it can guarantee data resides in the EU – Microsoft is headquartered in the US and will be subject to US legislation, such as Homeland Security, as well as UK and EU law.”
Ireland, in its brief, indicated that it wasn’t unwilling to grant the U.S. government access to the data in question, but that the mechanism for doing so was the Mutual Legal Assistance Treaty (MLAT) between Ireland and the United States, and that it was up to the U.S. to ask first, not Ireland to stop the U.S. from taking the data. “Ireland respectfully asserts that foreign courts are obliged to respect Irish sovereignty (and that of all other sovereign states) whether or not Ireland is a party or intervener in the proceedings before them,” the brief warned, before going on to hint, “Ireland would be pleased to consider, as expeditiously as possible, a request under the treaty, should one be made.”
In addition, Jan Philipp Albrecht, a Member of the European Parliament (“MEP”) from Germany, filed his own brief urging the U.S. to use the MLAT mechanism, and warning that failing to do so could make it more difficult for European and U.S. companies to work together.
“European citizens are highly sensitive to the differences between European and U.S. standards on data protection. Such concerns are frequently raised in relation to the regulation of cross-border data flows and the mass-processing of data by U.S. technology companies,” Albrecht writes. “The successful execution of the warrant at issue in this case would extend the scope of this anxiety to a sizeable majority of the data held in the world’s datacenters outside the U.S. (most of which are controlled by U.S. corporations) and would thus undermine the protections of the EU data protection regime, even for data belonging to an EU citizen and stored in an EU country.”
Legal argument is expected this spring or summer, according to the EFF.
Political wonks got something to salivate over earlier this month when former Florida governor and presumed Republican presidential candidate Jeb Bush announced that in the spring he intended to release some 250,000 email messages from his time as Florida governor, to demonstrate his transparency. Whether it’s actual transparency, or using a firehose to produce a document dump in hopes that the Internet will be so overwhelmed that they won’t find the good stuff, remains to be seen.
Not to mention overwhelmingly bored. Can you imagine being a reporter and having to read 250,000 email messages? Think of how banal most of your email communication is. Snore.
This isn’t a political blog, so we’re not going to go into the political ramifications of why Bush might be doing this and what might be in there. It is, however, a technical blog, so we’re going to talk about some of the technical issues around this that the, perhaps, less-technical reporters don’t appear to have brought up yet and may not be aware of. It’s certainly an interesting announcement, particularly in contrast with some of the other governors who have behaved badly with their email. At the same time, it’s important that we not be so dazzled by the announcement that we fail to give it due diligence.
- Will he really release all of the email? Bush was governor of Florida from 1999 to 2007. Granted, that was a while ago, before everyone and his brother (no pun intended) used email for everything, but even so, 250,000 messages for that many years sounds awfully small. Shoot, I’ve got more than that in my Gmail account. Bush did say “all,” but is he curating them in some way? Is this really going to be every-every-everything? And if not, is there going to be a mechanism for getting every-every-everything? Or is everything else gone? And given that some governors have wiped their systems when they left office – and even bought all the disk drives of the office computers – how is it that the Florida email messages from 1999 are still around?
- Did he ever use any unofficial or personal email address? As we’ve learned, governors from states such as Wisconsin and Alaska – not to mention the current Governor of Florida — have used personal email addresses for state business, potentially to avoid having their statements retrieved later. To what degree did Governor Bush do that?
- What format will it be in? As of yet, none of the articles have indicated what email system was in use in the Florida governor’s office at the time. Are we talking about a decade-old .pst file? Bush indicated it would be on a website; what will the interface to it be? Will people be able to look at the entire corpus at once, only one message at a time, or what? Will it be threaded? Several of the articles quoted Bush saying that some of the email was funny, some of it was sad, and some of it was serious; is it going to be available under categories such as Funny, Sad, or Serious rather than by date or other subject? How it’s set up could certainly limit the degree to which people could use it – though presumably there’ll be a crowdsourcing army to read the messages in any way they’re provided and let the world know if there’s anything juicy.
- Will it be full-text searchable? Anybody who’s tried to look at some older documents, such as some .pdfs, has run into times when the document is saved only as an image. That’s going to make it pretty tough to do any sort of reasonable search – not to mention answering questions like, “What word does Jeb Bush use in his email the most?”
- So, where is this email now? Who owns it? Who controls it? It’s not the Governor himself, is it? Presumably he just got a copy? Is there someone who’s going to be able to confirm that yes, the release is an accurate and complete release of what the state has?
- How is it that the Governor has it in the first place? Did he just go, welp, leaving office, let me just burn a copy of eight years of my email to a thumb drive and I’ll be on my merry way? Did the state save it and he asked them a while back – and when was that? – for a copy? In what sort of format does he have it, and what sort of searching mechanisms does he have? (The Chicago Tribune quoted Florida legal experts as saying the state’s public-records law requires the release of Bush’s emails and other correspondence as governor, “with few exceptions,” on which it didn’t elaborate.)
- Is personal information going to be redacted? One of the things that makes it difficult for governments to release email is that it can include personally identifiable information. Is someone going through all 250,000 email messages and putting big black marks through things? If not, isn’t that kind of risky? If so, then how do we know what sort of information we’re going to be missing?
- Is the metadata going to be in there? Are we going to be able to see headers? Email addresses? Datestamps? IP addresses? What are the chances that this is going to get anybody else in trouble?
If nothing else, this certainly sets a high bar for other government officials, particularly gubernatorial Presidential candidates, regarding their release of information in the future, pundits note.
If you’re anything like me, between your various smartphones, cameras, video cameras, and tablets – not to mention all the similar devices owned by various members of your family or your coworkers – you’re collecting quite a little pile of micro SD cards. I’m old enough to still be charmed that I can spend $15 and get 32 gigabytes in something the size of my pinky nail. (And if you want to spend more money, you can get 128 GB for $100.) Why, I remember when a 10-megabyte drive was the same size as my desktop computer and cost the same amount.
Sorry. Where was I? Have you seen my cane?
Anyway, I’m not the only person who’s accumulating a pile of the things, and I expect I’m not the only one who lives in terror that the cat will knock over the pile or we’ll accidentally vacuum one up. And if you poke around message boards, it’s pretty easy to find professionals who are dealing with the same problem. For example, filmmakers who are trying to put together a movie end up with a handful of them, especially if they’d like to look at files on more than one at a time. Plus, the teeny tiny disks are too small to label, and sometimes you really don’t want to mix up some pictures with what you’re going to show to the PTA, know what I mean?
It turns out that there are some cases for the things, so you can at least store them in an adorable wee little briefcase rather than having them in an easily-knocked-over saucer or something. There’s also a manufacturer who makes a plastic case the size of a credit card that can hold ten of the things, plus an SD card adapter. Moreover, the cases come in colors, so you can put all the music on ones in a blue case and all the movies on ones in a red case, or whatever.
Now we’re talking.
But why stop there? We can’t label the micro SD cards themselves the way we used to with 3 ½” floppy disks. So why hasn’t some enterprising manufacturer started making the micro SD cards themselves in colors? Green for the CFO and red for the CIO and blue for the CEO, say? In an era when you can order an entire bag of green M&Ms if that’s what floats your boat, why do our micro SD cards mostly have to be black?
And while we’re suggesting things, what would it take to be able to read and write data to and from multiple micro SD cards at once? There are a couple of devices that give you access to up to four micro SD cards at the same time, but there doesn’t seem to be more than that. And apparently they work great if you’re trying to copy identical data to four micro SD cards at a time. But reports on these devices note that because they share a single input to your computer, reading and writing is slow, and some of them don’t let you swap cards in and out, so if you’ve got data on five or six cards, you’re hosed.
Come on, storage industry. I’m counting on you.
We wrote in February about the notion of legal and discovery issues around wearable technology such as Google Glass and smartwatches, and as an aside joked, “Not to mention the wealth of data preserved by a Fitbit.”
Little did we know.
In what is being billed as the first case of its kind, an attorney is using data from a client’s Fitbit as evidence of a client disability in a lawsuit.
For a client’s accident injury claim, the McLeod Law Office will start processing data from their client’s Fitbit to show that her activity levels are now under a baseline for someone of her age and profession, demonstrating her injury, according to Forbes. Lawyers for the office say they’re working on other similar cases as well.
And that’s just the beginning, attorneys say. In the same way that legal teams now comb Facebook pages looking for evidence that a supposedly disabled person is actually secretly swimming or hiking, they expect Fitbits and similar devices to be used to demonstrate that the person isn’t so sick or disabled as all that. The development could see insurance companies, for example, insisting that claimants undergo assessment via fitness tracker, Samuel Gibbs writes in the UK newspaper The Guardian.
“Privacy considerations aside—and there are many—wearables are yet another example of how technology may be a gold mine of potentially relevant ESI [electronically stored information] for use in litigation,” attorney Neda Shakoori wrote in August.
“Wearables data could just as easily be used by insurers to deny disability claims, or by prosecutors seeking a rich source of self-incriminating evidence,” writes Kate Crawford in the Atlantic. But there’s more. “In America, the Fifth Amendment protects the right against self-incrimination and the Sixth Amendment provides the right in criminal prosecutions ‘to be confronted with the witnesses’ against you,” she continues. “Yet with wearables, who is the witness? The device? Your body? The service provider? Or the analytics algorithm operated by a third party? It’s unclear how courts will handle the possibility of quantified self-incrimination.”
Not to mention, will legal firms be able to subpoena your cloud provider if that’s where your fitness data is stored? How much are they going to fight to protect you? If it’s stored in your phone, will you need to provide your password? And how does this all fit into HIPAA and other health information privacy rules? (Apple has already said, for example, that health data can’t be stored on the iCloud and that any health data on an iPhone has to be encrypted.)
“It might seem odd at first (particularly to non-tech savvy judges), but it’s no different than any other type of e-discovery that has come before,” writes attorney Keith Lee in Above the Law. “Another question is how will companies like Fitbit respond. There is no mention of providing access to your data in response to a legal inquiry in Fitbit’s Terms of Service. The client in the above matter is voluntarily providing her data to help her case, but what happens when Fitbit is subpoenaed to provide data? Will they push back, citing user privacy, or immediately comply?”
Two other factors make this issue even more problematic. First, wearables aren’t necessarily consistent in how they track your activity. “The Jawbone UP, Nike Fuelband, Fitbit, and Withings Pulse all have their own peculiarities in how they work: Some will count moving your arms around as walking (which is great if you want writing to count as exercise), others can’t easily register cycling as activity,” Crawford writes. “This ‘chaos of the wearable’ might be merely amusing or frustrating when you’re using the data to reflect on our own lives. But it can be perilous when that data is used to represent objective truth for insurers or courtrooms.”
Second, as in the McLeod case, not just the raw data is being used. “Now that data is being further abstracted by analytics companies that create proprietary algorithms to analyze it and map it against their particular standard of the ‘normal’ healthy person,” Crawford adds.
This also, as we discussed earlier, hypothetically makes Fitbits subject to discovery, meaning you can get in trouble for wiping or otherwise failing to preserve the data on it.
Nobody’s saying that people will be required to wear a Fitbit as proof of fitness, or lack thereof. Yet. But in the same way that some car owners and renters are required to install a GPS unit on their vehicle, could it be far behind?
The Department of Homeland Security announced, in a very low-key way, on November 19 that it was planning to delete “Master files and outputs of an electronic information system which performs information technology infrastructure intrusion detection, analysis, and prevention.” It gave people until December 19 to ask for copies of the plan, following standard National Archives and Records Administration protocol. After requesters receiver their copies, they have 30 days to comment.
According to Nexgov, what the agency is looking to delete are records more than three years old from its Einstein network monitoring system, which is intended to help DHS cybersecurity experts look for malware such as Heartbleed in government networks. This is making some security people happy, because they are concerned about the government keeping all these records. At the same time, it is making some security people sad, because they wonder if the government is trying to hide something by deleting the records.
“As a general matter, getting rid of data about people’s activities is a pro-privacy, pro-security step,” Nextov quoted Lee Tien, senior staff attorney with the Electronic Frontier Foundation, as saying. But “if the data relates to something they’re trying to hide, that’s bad,” he continued.
DHS says it wants to delete the data because since it’s three years old, it’s not useful anymore. (The agency still keeps incident reports.) Others disagree.”Some security experts say, to the contrary, DHS would be deleting a treasure chest of historical threat data,” writes Nextgov’s Aliya Sternstein. “And privacy experts, who wish the metadata wasn’t collected at all, say destroying it could eliminate evidence that the governmentwide surveillance system does not perform as intended.”
What’s causing some people to feel suspicious is that the rationale the agency is using to delete the data is the cost, which it estimates at $50 per month per terabyte. Given that you can get a 1-terabyte drive from Staples for less than that these days (yes, we know, there’s more to it than the hardware cost), this seems…excessive. On the other hand, some people are wondering just how much data DHS has that it’s a significant amount of money.
Data to be deleted includes email, contact and other personal information of federal workers and public citizens who communicate concerns about potential cyber threats to DHS; intrusion detection data; intrusion prevention data; analysis data such as files from the U.S. Computer Emergency Readiness Team (CERT); and a catch-all “information sharing” including data from white papers and conferences, Nextgov reports.
So what is Einstein? It is the result of automated processes that collect, correlate, analyze, and share computer security information across federal U.S. civilian agencies, according to BiometricUpdate. “By collecting information from participating federal government agencies, ‘Einstein’ builds and enhances cyber-related situational awareness,” writes Rawlson King. “The belief is that awareness can assist with identifying and responding to cyber threats and attacks, improve the government’s network security, increase the resiliency of critical, electronically delivered government services, and enhance the survivability of the Internet. The program provides federal civilian agencies with a capability to detect behavioral anomalies within their networks. By analyzing the data and detecting these anomalies, the ability to detect new exploits and attacks in cyberspace are believed to be greatly increased.”
That said, this is all happening against a background of other changes in DHS involving cybersecurity that are making some people nervous.
- Brendan Goode, the director of the Network Security Deployment division in the Office of Cybersecurity and Communications (CS&C) who built the Einstein system, announced earlier in November that he was leaving for the private sector, according to Federal News Radio. While his last day was scheduled to be November 21, he hadn’t yet announced where he was going, nor has he updated his LinkedIn page.
- After its initial setup in 2004, Einstein is now on its third implementation and has agreements with 15 out of the 23 agencies expected to sign up for it (out of nearly 600 agencies, according to RT.com), and implementations with 9 of them, all at a cost of hundreds of millions of dollars.
- Due to incidents such as Heartbleed — where DHS had to wait up to a week for agency approvals, all while news of the vulnerability was out in the wild — the DHS now has the authority, as of October, to proactively monitor federal networks for vulnerabilities without having to wait for agency permission. “Agencies must provide DHS with an authorization for scanning of Internet accessible addresses and systems, as well as provide DHS, on a semiannual basis, with a complete list of all internet accessible addresses and systems, including static IP addresses for external websites, servers and other access points and domain name service names for dynamically provisioned systems. Agencies must give DHS at least five days advanced notice of changes to IP ranges as well. Further, agencies must enter into legal agreements for the deployment of DHS’s EINSTEIN monitoring system, provide DHS with names of vendors who manage, host, or provide security for Internet accessible systems, including external websites and servers, and ensure that those vendors have provided any necessary authorizations for DHS scanning of agency systems,” summarized FedWeek.
- On the other hand, contractor vendors aren’t exactly leaping to be included.
It isn’t clear how much DHS was hoping that this would all be lost in the shuffle around the holidays. Presumably organizations such as the EFF and Nextgov have filed requests for the plans, and will follow up. If it’s the sort of thing you might feel the need to comment on, however, it might be a good idea to make your own request, if comments are limited to people who request the documents.