December 27, 2012 7:04 PM
Posted by: Sharon Fisher
IBM announced earlier this month that it was acquiring StoredIQ, but exactly what the company does isn’t quite obvious. Part big data, part e-discovery, it’s sort of neither fish nor fowl.
As an example, analyst firm IDC included the Austin, Texas-based StoredIQ in its IDC MarketScape: Worldwide Standalone Early Case Assessment Applications 2011 Vendor Analysis, but Gartner hasn’t included it in either of its e-discovery Magic Quadrants — from which a number of larger vendors have plucked other acquisitions. (However, Gartner did name StoredIQ as a “Cool Vendor” in April of this year.”)
Instead, IBM is working on creating a family of “information lifecycle management” applications, which are kinda both — big data, because it covers all an organization’s data, but also e-discovery, because part of the reason for having such applications is for litigation support, both for identifying data needed in legal situations but also to help reduce, in a legally justifiable way, the amount of such data in the first place.
StoredIQ’s advantage is that it manages the data in situ rather than by moving it to a secondary location, which saves the cost of the secondary storage, noted Zacks Equity Research, adding that the company had received $11.4 million in funding in August and had 120 clients — though it warns that IBM faces competition from vendors such as EMC, Oracle, and SAP.
The company has also been working to make its product, which includes software and an appliance, easy enough for even legal professionals to use, rather than requiring IT people to operate. In addition, it has partnered with a wide variety of other vendors over the years, including NetApp, EMC, and NewsGator, and supported a number of formats, including SharePoint and Office 365.
As big data has become more prevalent, companies are interested in saving their data in hopes of being able to analyze it at some point and improve their businesses. But what it calls data hoarding is a problem for two reasons, notes Law Technology News.
First, there’s the cost. Though the price of storage itself has been dropping, it still costs something, plus there’s the cost of managing it, backing it up, and so on — which could amount to $5,000 per terabyte, Law Technology News said.
Second, there’s the legal cost. Should an organization be sued, it not only needs to provide all the pertinent information that the other side asks for, but it has to find it in the first place — and the more data a company has, the more expensive that search is. Also, companies have to balance the value of the data for analysis with what it might cost them should it reveal something in a lawsuit. This cost is on the order of $15,000 per gigabyte, Law Technology News said.
In fact, legal organizations have been advising companies to look for opportunities to delete data, pointing out how much money they can save. However, they have to do this in a regular fashion, because once a lawsuit is filed, a “legal hold” is put on the data and it can’t be deleted, or a company is subject to large fines.
The acquisition becomes part of IBM’s Information Lifecycle Governance suite, headed by Deidre Paknad, vice president of Information Lifecycle Governance. Paknad had been CEO of PSS Systems Inc. in Mountain View, Calif., a pioneer in the e-discovery space, which itself was acquired by IBM in 2010. The group also includes Vivisimo, which IBM acquired earlier this year.
The acquisition was not a surprise; IBM had partnered with StoredIQ for two years. As is typical for IBM, it did not reveal the cost of the acquisition. It is expected to be finalized in the first calendar quarter of 2013.
December 20, 2012 7:59 PM
Posted by: Sharon Fisher
, data storage
, law enforcement
It shouldn’t be any surprise in this incident, about which nothing makes any sense, but it isn’t clear what the status is of Adam Lanza’s computer hard drive, which was smashed/damaged/destroyed by a hammer/screwdriver/sharp object that left data on it irretrievable/able to be recovered, according to which publication you read and which data forensics expert they consulted.
Here’s a breakdown of the issues involved.
Was the disk drive solid-state or traditional spinning disk? There has been increasing use of solid-state drives in computers, either due to interest in improved performance or in reaction to last year’s Thai flooding, which damaged a number of hard disk manufacturing plants and made spinning disk storage more scarce and expensive.
What’s the difference? While both kinds of drive are susceptible to damage — as anyone who’s lost a drive by dropping it knows — solid-state drives are even more susceptible to damage.
“Many SSD hard drive failures are in fact unrecoverable,” writes The Inquisitor. “If the remapping tables that keep track of data in memory cells get trashed the data is effectively randomized and mixed up with data blocks which were marked as corrupted and unusable even before the SSD failed. Many SSD models also come with internal encryption that will make the lives of data forensics experts difficult.”
If it was a spinning disk, how was it damaged? For the sake of argument, though, let’s assume it’s a traditional spinning disk drive. Then the question becomes, how was it damaged? Neither reporters nor crime investigators are necessarily computer experts, and the descriptions of the damage have been vague — they don’t even specify whether Lanza had a desktop or a laptop.
Some reports indicate that Lanza removed the hard drive from the computer before damaging it, which would make it more likely that the drive itself would actually have sustained damage.
But because the platters in the hard drives that hold data are so sensitive, manufacturers tend to do what they can to protect them. Consequently, depending on how the hard drive was damaged, the platters inside could have been anything from undamaged to shattered.
How could the data be retrieved from the damaged hard drive? There are all sorts of third-party data recovery services, and chances are the FBI — which has plenty of forensics chops itself — is talking to all of them about the best way to retrieve data from whatever remains of the platters, as well as, more than likely, the manufacturer of the drive itself. Even if the platters were shattered, they could conceivably be reassembled and at least partially read.
“The level of detail they can rip out of systems these days seems incomprehensible to most people,” Rob Lee, a forensic specialist who has examined computers seized from terrorists for the U.S. intelligence community, told the Washington Post, which wrote in detail about the various ways data could be recovered. Even data from the crashed space shuttle Columbia was nearly 100% recoverable, the article noted.
Is the data available anywhere else? Even if all the data on the drive itself is irretrievable, it might be available else, ranging from a backup, to a synchronization service such as Dropbox, to obtaining copies of data and other information from sources such as Lanza’s Internet service provider, email services such as Google, or his online gaming records.
“Many e-mail providers, such as Yahoo and Google, store data on their servers for a period of time, meaning that police might be able to subpoena Lanza’s provider for access to whatever data they have,” writes the Christian Science Monitor. “Google also stores information about users’ searches and other online activity indefinitely, although it anonymizes IP addresses after 9 months, making it impossible to tell what a given user was doing online prior to that time.”
While there has been increasing concern from civil liberties organizations about the amount of information that services collect and to which law enforcement organizations have access, in this particular case, it may be our best hope in trying to make some sort of sense of this tragedy.
What it takes is enough motivation and the right equipment — and the F.B.I. has both, writes Popular Mechanics.
December 12, 2012 9:13 PM
Posted by: Sharon Fisher
, flash drives
, memory stick
, thumb drives
Somewhere along a long, nondescript brick wall, there’s a little spot that’s different from the rest. Poking out from the rough surface of the wall is the half-inch extension of a USB flash drive. You connect it to your computer, upload or download files, and you’re on your way, with no one the wiser.
We learned about “dead drops” (at least, those who didn’t know about them already) a few weeks ago with General Petraeus got caught exchanging messages with his mistress by leaving messages in draft form in a shared Gmail account. But there’s another kind that offers a lot more possibilities — and risks.
It all started in October, 2010, when Berlin-based media artist Aram Bartholl came up with the idea as an art project: Install a USB flash drive in a wall, and people could freely upload and download art from it. He started out with five USB dead drops in New York, and posted a website with instructions, including an instructional video.
“Dead Drops is an anonymous, offline, peer to peer file-sharing network in public space,” reads the Dead Drop Manifesto. “Anyone can access a Dead Drop and everyone may install a Dead Drop in their neighborhood/city. A Dead Drop must be public accessible. A Dead Drop inside closed buildings or private places with limited or temporary access is not a Dead Drop. A real Dead Drop mounts as read and writeable mass storage drive without any custom software. Dead Drops don’t need to be synced or connected to each other. Each Dead Drop is singular in its existence. A very beautiful Dead Drop shows only the metal sheath enclosed type-A USB plug and is cemented into walls.You would hardly notice it. Dead Drops don’t need any cables or wireless technology. Your knees on the ground or a dirty jacket on the wall is what it takes share files offline. A Dead Drop is a naked piece of passively powered Universal Serial Bus technology embedded into the city, the only true public space. In an era of growing clouds and fancy new devices without access to local files we need to rethink the freedom and distribution of data.”
The idea exploded, and soon there were USB flash drives poking out of walls (and dogs) all over the world. Srsly, there’s more than 1100 of the things out there, according to the most recent map, ranging from New York to Toronto (where it contains porn and recipes) to New Zealand. (And those are just the public ones.) There’s also apps to tell you where Dead Drops are, as well as a Flickr set and a Twitter feed. (In addition, there’s wireless ones and DVD ones being set up as well.)
Certainly the serendipity of these little data glory holes is high. It’s basically superduper high-tech geocaching. Just think of the data, good and bad, that could be exchanged: Pictures, movies, building plans for terrorists, porn, Anonymous plans, Wikileaks data… They’re even being used to generate fiction. Honestly, I’m surprised it hasn’t shown up in a Will Smith movie yet.
Needless to say, the whole process, like any USB stick, is fraught. What keeps people from downloading something like a virus (which was raised as a concern almost immediately) or child porn onto their laptops? (I cringe every time I see a picture of someone with their laptop plugged into one of these things, and hope that at least it’s a junk laptop devoted to the purpose.)
For that matter, what keeps someone from uploading a virus, and from there spreading it around the world? Recall that the Stuxnut virus was spread through USB flash drives enticingly scattered around. Set up something like this at Burning Man with a virus and you could shut down all of Silicon Valley by mid-September.
On the other hand, in a day and age where governments are shutting down the entire Internet in their countries, the notion of a way for rebels to exchange information in this clandestine way sounds pretty darn cool. What a great way for Mr. Phelps to get information — though of course you’d have to make sure that the government hadn’t set up its own USB dead drop to try to catch you. Or for people trapped in a country to get information outside the country — post a code message to Twitter and wait for someone with a tablet and a USB port to come along.
Or maybe I’ve just seen Red Dawn too many times.
December 5, 2012 3:20 PM
Posted by: Sharon Fisher
, cloud storage
, file sharing
In its list for 2013, IDC has predicted that the cloud file-sharing company Dropbox will be acquired next year.
“Dropbox will be acquired by a major enterprise infrastructure player,” the company wrote. “In another sign that “consumerization” doesn’t mean mimicking consumer technologies in the enterprise but actually acquiring and/or integrating with widely adopted consumer offerings in the enterprise, IDC predicts that Dropbox will be acquired by a major enterprise infrastructure player in 2013. This will certainly be an expensive acquisition, but it will be one that brings an enormous number of consumers (many of whom are also employees), and a growing number of ecosystem partners, along with Dropbox’s technology.”
“Expensive” is putting it mildly; a $250 Series B funding round last fall gave the company a $4 billion valuation, which is expected to be even higher now (though GigaOm still thinks the market is small). Only a major enterprise infrastructure player would be able to afford it.
Part of what makes this prediction interesting is that a Dropbox IPO has been rumored — and highly anticipated — since last year. Dropbox founder and CEO Drew Houston had reportedly received a nine-figure acquisition offer from Apple early on, Forbes reported last year, but turned it down because he wanted to run a big company — though he sounded at the end of the article as though he might be reconsidering that.
As he walked out of [Facebook founder Mark] Zuckerberg’s relatively modest Palo Alto colonial, clearly enroute to becoming the big company CEO he had told Steve Jobs he would be, Houston noticed the security guard parked outside, presumably all day, every day and pondered the corollaries of the path: “I’m not sure I want to live that life, you know?”
The downside with getting a big funding round is that eventually investors want to see some return on their investment — and typically that means either an IPO or an acquisition. Employees also typically want their big buyout, though Dropbox employee stock has reportedly been available on the secondary market.
The advantage of an acquisition by a major vendor is that it could give Dropbox the credibility and structure it would need to fit into the enterprise. It’s not that people aren’t using Dropbox. Quite the contrary — a recent survey by storage vendor Nasuni found that 20% of corporate users were using Dropbox.
This is despite the security and governance holes inherent with using a system such as Dropbox, the security holes in Dropbox in particular, and rules that corporations have attempted to put into place to keep people from using it. (Nasuni found that 49% of the people whose companies had rules against it were using it anyway.) As long as people have multiple devices — and they show no signs of stopping — and need access to their files, as well as the ability to send large files to other people, there’s going to be a need for the functionality, and all the rules in the world aren’t going to stop it, especially when, as Nasuni’s survey indicated, some of the worst offenders are executives.
“The most blatant offenders are near the top of the corporate heap — VPs and directors are most likely to use Dropbox despite the documented risks and despite corporate edicts,” writes GigaOm’s Barb Darrow. “C-level and other execs are the people who brought their personal iPads and iPhones into the office in the first place and demanded they be supported.”
So being purchased by a major player offers the opportunity to rein in some of these users, while still giving them the functionality they need. The company itself has also indicated that it plans to address the issue to make the product safer for corporate users — which would also make it more attractive to an acquirer.
The other likely aspect is that, as we’ve seen with e-discovery and other emerging markets, when the first big vendor goes, many of the smaller vendors quickly follow like dominoes. A Dropbox acquisition would likely presage a whole round of other ones; Wikipedia lists 17 “notable competitors,” including Box.Net and YouSendIt, and there are others. Acquisitions would also help simplify the complicated market.
Although major players such as Apple, Google, and Microsoft already offer their own cloud storage solutions, the vendors might want to acquire other ones for their technology, their people, or simply to get them off the market, while other vendors (dare I suggest HP, which doesn’t have a great track record on acquisitions these days?) would do so simply to get a toe in the market.
Either way, it seems likely that something will happen to this market next year.
November 28, 2012 4:10 PM
Posted by: Sharon Fisher
There’s been a couple of instances recently where government agencies have been careless with data, losing access to personally identifiable information such as Social Security numbers.
First, a NASA laptop that “contained records of sensitive personally identifiable information for a large number of NASA employees, contractors and others” was stolen from a vehicle, and while the laptop itself was password-protected, the data on it was not encrypted. In its memo about the incident, NASA didn’t say how many staffers might have been affected.
Second, the state of South Carolina’s Department of Revenue determined that hackers had broken into its database, putting the PII of up to 4 million people and 700,000 businesses at risk — again, because data had not been encrypted — in what is said to be the largest breach ever of a state agency. “Hackers also stole 3.3 million bank account numbers and the tax files of 700,000 businesses,” wrote Reuters. The Social Security numbers of 1.9 million children on parents’ returns were also compromised.
Are you detecting a Trend? Like, maybe, that encrypting PII is a Good Idea?
NASA, which had already lost another laptop in March to a similar theft, is actually in the process of implementing encryption on its systems — the stolen laptop just hadn’t gotten through the process yet. However, the agency expects all of its laptops to be encrypted by December 21, a spokeswoman told the New York Times. The agency didn’t say how much the breach would cost.
With South Carolina, its encryption plans are less clear. Gov. Nikki Haley — who had reportedly claimed the breach wasn’t the state’s fault until an investigation by the security company Mandiant proved her wrong — has been blaming the problem on “antiquated state software and outdated IRS security guidelines” that don’t require encryption. But while the state has implemented some security measures, such as increased monitoring, reports haven’t indicated anything yet about South Carolina installing encryption, though the Republican governor wrote the IRS a Strongly Worded Letter encouraging the federal agency to require states to do so.
“Had I known that IRS compliance meant that our Social Security numbers were not encrypted, I would have been shocked,” Haley was quoted as saying on local news.
Haley said the state also hadn’t encrypted the data because it was complicated. “But it’s highly unlikely that anyone on the security team at the Department of Revenue recommended storing millions of SSNs in plaintext because the alternative–deploying an encryption package–was too complicated,” wrote Dennis Fisher of Threatpost in a scathing rebuttal. “More likely, someone looked at his budget, looked at the price of the database encryption package, and made a hard choice. Lots of businesses, government agencies, non-profits and other organizations face the same choice every year and some of them decide that the cost of the encryption outweighs the potential benefit. And that can work out fine. That is, until something like the South Carolina data breach happens. Then things tend to be not fine.”
If the goal was to save money, they chose…poorly. “The cost of the state’s response has exceeded $14 million,” reported the Post. “That includes $12 million to the Experian credit-monitoring agency to cover taxpayers who sign up — half of which is due next month — and nearly $800,000 for the extra security measures ordered last week. The Revenue Department has estimated spending $500,000 for Mandiant, $100,000 for outside attorneys and $150,000 for a public relations firm. But those costs will depend on the total hours those firms eventually spend on the issue. The agency also expects to spend $740,000 to mail letters to an estimated 1.3 million out-of-state taxpayers.”
Plus, there’s the class action lawsuit, which could amount to $4 billion or more.
Meanwhile, other states such as Georgia and Alabama are hastening to point out that they don’t have any problems like this because they encrypt their data. However, most other states don’t, said Larry Ponemon, chairman of The Ponemon Institute, which researches privacy and data protection.
November 21, 2012 2:51 PM
Posted by: Sharon Fisher
Acquisitions are hard.
HP announced this week that it was being forced to write off $8.8 billion of the $9.7 billion cost of its year-old acquisition of Autonomy.
Let the fingerpointing begin.
HP blamed the write-down on what it said were systematic accounting games that made the U.K. company look much more valuable than it was — to which it said it was alerted only when a senior Autonomy official pointed them out.
Autonomy grew through acquisitions, buying everything from storage companies like Iron Mountain to enterprise software firms like Interwoven. They’d then go to customers and offer them a deal they couldn’t refuse. Say a customer had $5 million and four years left on a data-storage contract, or “disk,” in the trade. Autonomy would offer them, say, the same amount of storage for $4 million but structure it as a $3 million purchase of IDOL software, paid for up front, and $1 million worth of disk. The software sales dropped to the bottom line and burnished Autonomy’s reputation for being a fast-growing, cutting-edge software company a la Oracle, while the revenue actually came from the low-margin, commodity storage business.
Mike Lynch, former CEO of Autonomy, who was reportedly fired by HP in April, strenuously denied the allegations, which he said he knew nothing about until the press release came out, and said that HP had mismanaged the company in the year it had run it.
I think what has happened here is that they have got themselves in a mess. They did the acquisition of EDS, they had to write that one down. They had to write Palm down. When Autonomy was acquired it was done by a CEO who wanted to get rid of various divisions of that business and lead with software. He was ousted in an internal coup d’etat. From that point Autonomy was at odds with the divisions that were in power. There was a series of mismanagement steps. They lost hundreds of the talented people at Autonomy. They whole management team basically went out of the door. Sadly they are left with the results of having destroyed all that value.
Analysts blamed HP for spending too much on the acquisition — which they’d said since the initial announcement — and for not doing due diligence.
Even if Autonomy passed the auditors’ scrutiny the way it did, how can you pay $10 billion for a company that has $1 billion in revenue and growing at 10% a year only,” writes Jean-Baptiste Su in Forbes. “It didn’t make sense then, and now H-P is taking an $8.8 billion charge and blaming everyone else for its mistake.”
(Oracle’s Larry Ellison caused a kerfluffle in September, 2011, when he said that Autonomy had been shopped to Oracle as well — ironically, on April 1 of that year — which Lynch denied at the time until Ellison posted the Powerpoint presentation from the meeting. Ellison said the company was overpriced — but due to the feud between Oracle and HP at the time, this likely sounded like sour grapes.)
HP’s Meg Whitman, who’d been CEO for only a few weeks when the acquisition was finalized but who had been on the board at the time, blamed former CEO Leo Apotheker, who was forced out after the acquisition when he also tried to kill HP’s PC division, and former chief strategy officer Shane Robison, who has also left the company. She also said that accounting firm Deloitte had vetted the company, and that KPMG had vetted Deloitte.
Apotheker declared he was shocked, shocked, and talked about the due diligence he had done.
Deloitte, for its part, had already come under criticism earlier this year for making mistakes.
There. Have we left anyone out?
HP would have much more credibility about its accusations if the company didn’t have such a poor track record of its other acquisitions. Wrote Su:
Unfortunately, Autonomy is just the latest example of the company’s dreadful acquisition track record and value destruction that started, in earnest, 10-years ago with Carly Fiorina‘s decision to acquire Compaq for $25 billion, followed then by Mercury Interactive in 2006 ($4.5B), EDS in 2009 ( $13.9B) and Palm in 2010 ($1.2B).
In what seems like a prescient story, the Wall Street Journal wrote earlier this month about HP’s troubles.
Like most big tech companies, H-P has acquired new technologies. But many of its biggest purchases have fizzled, leaving the company with less cash and more debt than its rivals and effectively shutting it out of future deals,” the paper wrote. “Neither H-P’s buybacks nor acquisitions have panned out. H-P wrote off $8 billion of the $13 billion EDS deal earlier this year. In 2010, H-P also spent $1.2 billion to acquire mobile-device maker Palm, but shut down the unit and wrote it off a year later. Last year, H-P paid more than $10 billion to acquire software maker Autonomy, but has already said sales in that business are declining.”
One also wonders, if the tale about HP discovering the issue after performing an internal investigation based on the Autonomy executive’s tip in May is true, what took the company so long to have noticed? If it hadn’t been for the tip, just when would it have come out?
And if “H-P’s internal team was aware of talk about accounting irregularities at the time the deal was struck…[and] was looking for a way to unwind the deal before it closed, but couldn’t find any material accounting issues,” as the Wall Street Journal writes now, wouldn’t the company have studied those irregularities right away? Were Lynch and the other executives supposed to be so diabolically clever that an army of accountants couldn’t find the fraud until the former Autonomy executive pointed it out for them? (U.K. laws reportedly make it difficult and expensive to renege on an acquisition.)
And finally, just what sort of due diligence did HP and Deloitte do? Did they actually research the numbers themselves, or just trust the numbers it got from Autonomy?
HP acquired Autonomy in August, 2011, soon after Gartner did its first e-discovery Magic Quadrant, in May, 2011. At the time, Gartner spoke glowingly about the company, which it placed in the Leaders quadrant.
Autonomy is a brand and marketing powerhouse that appears on many clients’ shortlists,” Gartner said in its earlier report. “Although we have seen little appetite for ‘full-service e-discovery platforms’ from clients as yet, Autonomy is positioned to seize these opportunities when they do arise — indeed, the overall market may evolve in that direction.”
In that report, Gartner also predicted that consolidation would have eliminated one in four enterprise e-Discovery vendors by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. Autonomy itself acquired Iron Mountain’s archiving, e-discovery and online backup business in May 2011 for US$ 380 million in cash.
At the time, HP was thought to be pursuing a similar strategy to that of IBM, which divested itself of its PC business and moved instead to primarily software and services.
Now, HP’s strategy, as well as that of Whitman — whom some are, fairly or unfairly, blaming for the whole situation — is going to be finding a way to survive.
November 15, 2012 8:10 PM
Posted by: Sharon Fisher
, data storage
When I first started this blog almost two years ago, I called it “yottabytes” because that was the term commonly accepted for the biggest size of storage (1000^8, or a 10 followed by 24 zeroes, compared with, say, 1000^4 for a terabyte). But as people are actually starting to refer to petabytes (1000^5) and exabytes (1000^6) of storage, there’s starting to be more discussion of what comes next.
No, I’m not changing the name of the blog.
The proximate cause for the discussion now is a presentation by Shantanu Gupta, director of Connected Intelligent Solutions for Intel’s Data Center and Connected Systems Group, which showed up in GigaOm the other day. According to this presentation, what comes after yottabyte is brontobyte, or 10 followed by 27 zeroes.
This is not definite; as GigaOm’s Stacey Higginbotham points out, it’s not an official prefix, though it has been discussed since at least 1991 — though, that far back, it was 10 followed by 15 zeroes. It does, however, appear to be more accepted for the number than does hella-, which had a brief flurry a couple of years ago as people tried to promote it.
Past bronto, to 10 followed by 30 zeroes, it gets more complicated, partly due to what honestly looks like typographical errors.
Gupta refers to “geobyte” in his presentation — but also refers to “bronobyte” as opposed to “brontobyte” for 10 followed by 27 zeroes . Wikianswers also refers to “geobyte.”
Higginbotham refers to “gegobyte” for the figure, as does Seagate in a blog posting riffing on the GigaOm post.
On the other hands, answers.yahoo.com uses “geopbyte” for the figure, as does the Urban Dictionary and whatisabyte.com.
Geo-, gego-, or geop-? It kind of doesn’t matter, because it’s all unofficial anyway, but somebody might want to figure it out at some point.
Beyond what-do-we-call-it, we also have the obligatory how-to-put-it-in-terms-we-puny-humans-can-understand discussion, aka the Flurry of Analogies that came up when IBM announced a 120-petabyte hard drive a year ago. Depending on where you read about it, that drive was:
- 2.4 million Blu-ray disks
- 24 million HD movies
- 24 billion MP3s
- 1 trillion files
- Eight times as largest as the biggest disk array available previously
- More than twice the entire written works of mankind from the beginning of recorded history in all languages
- 6,000 Libraries of Congress (a standard unit of data measure)
- Almost as much data as Google processes every week
- Or, four Facebooks
So, how big are bronto- and geo/gego/geop-?
Well, GigaOm wrote, ”Cisco estimates we’ll see a 1.3 zettabytes of traffic annually over the internet in 2016.” On the other hand, GigaOm cited a piece with the Cisco estimate being 130 exabytes, which would only be .13 zettabytes if I have my math right. Seagate estimates that total storage capacity demand will reach 7 zettabytes in 2020.
Yottabytes is in the realm of CIA and NSA spy data, noted a piece in the Examiner.com, which went on to point out, “As of 2011, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world does not amount to even one yottabyte, but was estimated at approximately 160 exabytes in 2006. As of 2009, the entire Internet was estimated to contain close to 500 exabytes.” A yottabyte would also be 250 trillion DVDs, GigaOm wrote.
For brontobyte, which Gupta said would be used primarily for the “Internet of Things” ubiquitous sensors, there are also somewhat fanciful definitions such as “More than the number of all the cells of the human body in each person living in Indiana and then some,” and ”You would need a brontobyte computer to download everything on the Internet” (though, apparently not, according to Examiner.com).
Of course, once we start talking in terms of trillions of DVDs, obviously we’ve got to find another unit of measure. Interestingly, Seagate used geographic area.
“If today’s 4 terabyte 3.5-inch drive is roughly .16 square feet, you can get approximately 24 terabytes per square foot. That’s .0046 square miles of land mass per 4 terabytes. Assuming 1 terabyte per disk was the maximum areal density, and hard drives will not get any thicker than 1 inch:
- An exabyte hard drive would be about the size of Connecticut [or, I would add, Owyhee County in Idaho]
- A zettabyte hard drive would be about the size of Antarctica
- A yottabyte hard drive would cover the earth 23 times
- A brontobyte hard drive would cover the earth 23,000 times
- A gegobyte hard drive would cover the earth 23,000,000 times”
Of course, that would be using today’s technology.
Beyond that? Wikianswers postulated Saganbyte, Jotabyte, and Gatobyte, while Wikipedia referred to a system working backward through the Greek alphabet — though that one wouldn’t include brontobyte or geo/gego/geopbyte.
November 8, 2012 3:01 PM
Posted by: Sharon Fisher
computer assisted review
, predictive coding
, technology assisted review
When one thinks of E-discovery pioneers, one doesn’t tend to think of Hooters. But a recent legal case with the, ahem, female-oriented restaurant has ramifications for the E-discovery industry, specifically in the area of predictive coding.
Predictive coding, or predictive technology, is the use of computer-assisted review software to help determine the relevance of documents to be potentially used in a legal case. It wasn’t until March that a judge first ruled that predictive coding could be used in a case.
Now, we’ve gone one step further, with a judge actually *requiring* two parties — one of which is Hooters — to use predictive coding in their case. Well, at least, Strongly Suggesting.
“Vice Chancellor J. Travis Laster in Delaware Chancery Court has made e-discovery history, again, with a surprise bench order requiring both sides to use predictive coding and to use the same vendor,” writes Ralph Losey for the E-Discovery Team blog. “This appears to be the first time a judge has required both sides of a dispute to use predictive coding when neither has asked for it. It may also be the first time a judge has ordered parties to use the same vendor.” If the parties could not agree on a vendor, the judge continued, he would select one.
(What’s the case about? It’s kind of hard to tell. “A complex multimillion dollar commercial indemnity dispute involving the sale of Hooters, a very well-known restaurant, famous for its chicken and wings, beer, and other things,” Losey writes.)
The goal appeared to be to use the case as an example of how to save money. “The problem is that these types of indemnification claims can generate a huge amount of documents,” the judge said in his decision. “That’s why I would really encourage you all, instead of burning lots of hours with people reviewing, it seems to me this is the type of non-expedited case where we could all benefit from some new technology use.”
“Predictive coding used correctly, promises to reduce costs and turn over at least as much responsive ESI and less unresponsive ESI than our current eyeballs-on-every-document approach,” agrees Karl Schieneman in E-Discovery Journal.
At this point, the two sides need to determine whether they will go along with the judge’s recommendation, or fight it using the Sedona Principles that, basically, leave it up to the participants to determine the best way to proceed. But that might be fraught. “Who wants to tell the judge to butt-out, no matter how politely you say it, or how many Sedona Principles you cite?” Losey writes. “Better to let the other side be the complainer, even if you do not much like it either. Much will depend on who has the heaviest production burden.”
The two sides might also object to the suggestion that they use the same vendor, and the judge’s motivation for suggesting that isn’t clear, Losey continues. Schieneman agrees, noting that it might be difficult to find a neutral vendor and it also isn’t clear how the software will be paid for. “There is an argument to be made that this is a well-intentioned, but possibly uneducated bench that is forcing parties to use and pay for an undefined, black box marketing label.”
Schieneman went on to write that he hopes judges don’t get too enthusiastic about predictive coding and start requiring it willy-nilly. “It could be an unintended disaster if every judge ordered the use of predictive coding,” both because of the learning curve required for legal firms and the ability for predictive coding vendors to be able to adequately support all these new potential customers, he says. “While judicial encouragement of predictive coding is great and absolutely necessary, blind encouragement could be dangerous,” he says.
November 1, 2012 10:00 AM
Posted by: Sharon Fisher
, disaster recovery
Though there have been a number of data center outages associated with the Sandy megastorm, and it’s not over yet, what may be most surprising is how little disruption it actually caused — particularly in comparison to the outages caused by June’s thunderstorm.
While several data centers were knocked offline due to flooding — most notably Datagram, which hosts Gawker, Gizmodo, Buzzfeed, The Huffington Post, and Media — many stayed on, often through generators running on diesel fuel. (The New York Times – which criticized data centers just last month about their use of diesel backup generators — was strangely silent on the subject this week.)
The problem then switched to getting fuel delivered, since typically generators would keep only three days’ fuel on-site. That time, however, did give users of those data centers time to find other alternatives.
Though data centers went through extensive preparation, the ones that were knocked offline typically had either the data center, or the fuel systems, or both, in the basement, which flooded. Some sites went offline after they weren’t able to get fuel delivered to the island of Manhattan.
“The situation shows that in many ways, Lower Manhattan is one terrible place to put a data center,” noted Cade Metz in Wired. On the other hand, he said, data centers need to be near where the business action is to provide low-latency data transmission.
In one case, Fog Creek Software and Squarespace – with fuel pumps in the flooded basement and a generator on the 17th floor — employees used a bucket brigade to get fuel up the stairs to run the generator.
Other customers were migrated to cloud services such as Amazon Web Services — ironically, since it has suffered a number of outages over the past few months.
Not only was the Internet relatively resilient to the hurricane itself, but to the increased load of all the East Coast people who stayed at home, watched Netflix, and chatted with their loved ones over Skype.
Numerous other posts over the years have described how data centers have handled earthquakes, wildfires, and other disasters. That said, there’s a few other lessons to have been learned from Sandy:
- Don’t put data centers, or their diesel backup, in the basement. On the other hand, it’s not like you want it up over your head, either — especially if you end up needing to do a bucket brigade to the roof.
- Have data centers, or backup data centers, located in separate geographic regions.
- Plan, plan, plan. And don’t wait until an actual emergency to test the plan. “You can’t wait ’til folks’ hair is on fire to plan these things,” Shannon Snowden, a data center professional who is now senior technical marketing architect for Zerto, a company with technology that helps companies move and failover applications, told GigaOm. “What you should be doing from the data center perspective is [always] make sure the power has been tested, that you can fail over to generators, that those those generators are tested to make sure they’re functional and that they have enough fuel,” he said.
Finally, the vendor I lambasted in April for exploiting natural disasters to promote their product was at it again — press release issued right on schedule on Monday afternoon.