When one thinks of E-discovery pioneers, one doesn’t tend to think of Hooters. But a recent legal case with the, ahem, female-oriented restaurant has ramifications for the E-discovery industry, specifically in the area of predictive coding.
Predictive coding, or predictive technology, is the use of computer-assisted review software to help determine the relevance of documents to be potentially used in a legal case. It wasn’t until March that a judge first ruled that predictive coding could be used in a case.
Now, we’ve gone one step further, with a judge actually *requiring* two parties — one of which is Hooters — to use predictive coding in their case. Well, at least, Strongly Suggesting.
“Vice Chancellor J. Travis Laster in Delaware Chancery Court has made e-discovery history, again, with a surprise bench order requiring both sides to use predictive coding and to use the same vendor,” writes Ralph Losey for the E-Discovery Team blog. “This appears to be the first time a judge has required both sides of a dispute to use predictive coding when neither has asked for it. It may also be the first time a judge has ordered parties to use the same vendor.” If the parties could not agree on a vendor, the judge continued, he would select one.
(What’s the case about? It’s kind of hard to tell. “A complex multimillion dollar commercial indemnity dispute involving the sale of Hooters, a very well-known restaurant, famous for its chicken and wings, beer, and other things,” Losey writes.)
The goal appeared to be to use the case as an example of how to save money. “The problem is that these types of indemnification claims can generate a huge amount of documents,” the judge said in his decision. “That’s why I would really encourage you all, instead of burning lots of hours with people reviewing, it seems to me this is the type of non-expedited case where we could all benefit from some new technology use.”
“Predictive coding used correctly, promises to reduce costs and turn over at least as much responsive ESI and less unresponsive ESI than our current eyeballs-on-every-document approach,” agrees Karl Schieneman in E-Discovery Journal.
At this point, the two sides need to determine whether they will go along with the judge’s recommendation, or fight it using the Sedona Principles that, basically, leave it up to the participants to determine the best way to proceed. But that might be fraught. “Who wants to tell the judge to butt-out, no matter how politely you say it, or how many Sedona Principles you cite?” Losey writes. “Better to let the other side be the complainer, even if you do not much like it either. Much will depend on who has the heaviest production burden.”
The two sides might also object to the suggestion that they use the same vendor, and the judge’s motivation for suggesting that isn’t clear, Losey continues. Schieneman agrees, noting that it might be difficult to find a neutral vendor and it also isn’t clear how the software will be paid for. “There is an argument to be made that this is a well-intentioned, but possibly uneducated bench that is forcing parties to use and pay for an undefined, black box marketing label.”
Schieneman went on to write that he hopes judges don’t get too enthusiastic about predictive coding and start requiring it willy-nilly. “It could be an unintended disaster if every judge ordered the use of predictive coding,” both because of the learning curve required for legal firms and the ability for predictive coding vendors to be able to adequately support all these new potential customers, he says. “While judicial encouragement of predictive coding is great and absolutely necessary, blind encouragement could be dangerous,” he says.
Though there have been a number of data center outages associated with the Sandy megastorm, and it’s not over yet, what may be most surprising is how little disruption it actually caused — particularly in comparison to the outages caused by June’s thunderstorm.
While several data centers were knocked offline due to flooding — most notably Datagram, which hosts Gawker, Gizmodo, Buzzfeed, The Huffington Post, and Media — many stayed on, often through generators running on diesel fuel. (The New York Times — which criticized data centers just last month about their use of diesel backup generators — was strangely silent on the subject this week.)
The problem then switched to getting fuel delivered, since typically generators would keep only three days’ fuel on-site. That time, however, did give users of those data centers time to find other alternatives.
Though data centers went through extensive preparation, the ones that were knocked offline typically had either the data center, or the fuel systems, or both, in the basement, which flooded. Some sites went offline after they weren’t able to get fuel delivered to the island of Manhattan.
“The situation shows that in many ways, Lower Manhattan is one terrible place to put a data center,” noted Cade Metz in Wired. On the other hand, he said, data centers need to be near where the business action is to provide low-latency data transmission.
In one case, Fog Creek Software and Squarespace — with fuel pumps in the flooded basement and a generator on the 17th floor — employees used a bucket brigade to get fuel up the stairs to run the generator.
Other customers were migrated to cloud services such as Amazon Web Services — ironically, since it has suffered a number of outages over the past few months.
Not only was the Internet relatively resilient to the hurricane itself, but to the increased load of all the East Coast people who stayed at home, watched Netflix, and chatted with their loved ones over Skype.
- Don’t put data centers, or their diesel backup, in the basement. On the other hand, it’s not like you want it up over your head, either — especially if you end up needing to do a bucket brigade to the roof.
- Have data centers, or backup data centers, located in separate geographic regions.
- Plan, plan, plan. And don’t wait until an actual emergency to test the plan. “You can’t wait ’til folks’ hair is on fire to plan these things,” Shannon Snowden, a data center professional who is now senior technical marketing architect for Zerto, a company with technology that helps companies move and failover applications, told GigaOm. “What you should be doing from the data center perspective is [always] make sure the power has been tested, that you can fail over to generators, that those those generators are tested to make sure they’re functional and that they have enough fuel,” he said.
Finally, the vendor I lambasted in April for exploiting natural disasters to promote their product was at it again — press release issued right on schedule on Monday afternoon.
It’s been a busy week in the storage market. The biggest player is that Microsoft acquiring cloud storage vendor StorSimple, but in addition, Carbonite, a provider of online backup solutions, acquired open source and SMB cloud backup vendor Zmanda, while Persistent Systems acquired Dovenz, which sells disaster recovery as a service.
The nice thing about three of them happening at once is that this makes it a Trend, so instead of addressing each acquisition individually, we can talk about What It All Means.
From the startup side, there’s really only three exit strategies you can have. You can die. You can file an IPO (like Violin also did last week). Or you can get acquired. If either the company isn’t strong enough, or the market isn’t strong enough, an IPO isn’t necessarily a good idea. So that leaves acquisitions. (We’ll assume no startup plans to die.)
Being acquired doesn’t mean giving up or throwing in the towel. Particularly in the case of the company doing the acquiring, it can be a good idea. It’s a quick way to collect a bunch of new people, a new technology, and perhaps some new customers. “The vast majority (over 90 percent) of the successful private company exits in 2011 and 2012 have been through company sale or M&A,” writes Jim Price in Business Insider.
The next thing to look at is who’s doing the acquiring. Is it two small companies hoping that together they’ll be strong enough to survive? I don’t want to pick on Carbonite, but given the sort of year they’ve had, that might be a factor. Or is it a big company looking to add an innovative new technology to its portfolio? Certainly in the case of Microsoft and StorSimple, it’s the latter.
As far as what’s next, keep in mind that acquisitions tend to run in clumps. A new technology comes along, a bunch of little companies start up to use it, and then some of them die, some of them merge with each other, and some of them get acquired by larger companies — typically with the strongest players going first and the later ones being picked up by latecomers in the market who are desperate to own a piece of it, in sort of a high-tech version of Musical Chairs. For a big company, it can be a much safer way to innovate than trying to develop a new technology yourself.
We saw something similar a year ago, when Gartner did its first Magic Quadrant on E-Discovery, and predicted that 25% of the companies in it would be acquired by 2014 by major vendors. As it happened, Gartner didn’t even get the report published before the first acquisition happened, and they’ve been falling steadily like little dominoes every since — especially after Gartner conveniently provided a shopping list.
“Probably the prime imperative for Fortune 500 managers is to find areas for revenue and profit growth,” writes Price. “But the challenge is to do so without endangering the existing franchise. Too often, the dilemma from the helm looks like this: You know you need to get into a promising new space, but it’s quite unproven and you suspect running two or three concurrent experiments might bleed cash for years. So in a real sense, you can’t afford – on a quarter-to-quarter income statement basis – to run too many such risky projects. But if you let entrepreneurial startups run the experiments with their energy, time and capital – and let them ring out the technology risk and the market risk – then once a winner appears, you can buy that winner with capital off your balance sheet.”
Certainly Microsoft and StorSimple would qualify.
Since that’s the case, it seems likely that StorSimple competitors like Nasuni and Panzura — which were speaking with a great deal of bravado about the 800-pound gorilla suddenly in their midst — should be expecting to get calls from other large vendors in the next few weeks, and decide which startup exit strategy they plan to follow.
All of a sudden, backing up Oracle databases is big news, with Wikibon and Amazon Web Services each releasing new insights about how to do it.
What makes this a big deal? As Wikibon mentions, nearly 30% of Oracle shops are managing more than 100 TB of data that needs to be backed up. And with ‘big data’ becoming a buzzword, not only is the data getting bigger, but people are paying more attention to it.
Wikibon points out several trends, including increasing virtualization, more space devoted to backups, and that tape is still around. 45% of customers report that more than half of backup data resides on tape, Wikibon says.
But one of the newer backup choices that Wikibon mentions is RMAN. And the advantage to that is brought up in one of the other big recent developments in Oracle backup, which is RMAN’s newer ability to back up to the cloud.
That’s where the Amazon Web Services white paper comes in. It describes how Amazon itself started backing up all its Oracle databases to the cloud using RMAN. While such white papers are often pretty self-serving — and now we’re talking about one where a vendor is using its own product, or what EMC’s Paul Maritz refers to as “eating your own dog food” — this one has some hard numbers behind it.
“The transition to S3-based backup started last year and by summer, 30 percent of backups were on S3; three months later it was 50 percent. The company expects the transition to be done by year’s end — except for databases in regions where Amazon s3 is not available,” writes Barb Darrow for GigaOm. Moreover, the company is saving $1 million per year for backups that take only half as long, she writes.
Whether you want to go the AWS route for Oracle backups or not, the Wikibon report has some interesting information on the backup subject. Granted, some of them are pretty Mom-and-apple pie — implement redundancy, test your backups, use dedupe — but others are more nuanced.
For example, the company notes, organizations are increasingly virtualizing their Oracle servers — which could have an impact on the speed of backing them up. “The big initial attraction of server virtualization is that it increased average utilization from 15% to about 85%,” Wikibon writes. “This means that virtualized environments will see a drastic reduction in overall server capacity, some of which was used to run backups.”
It was just a year ago that the Thailand flooding — only a few months after the Japan earthquake — devastated the storage industry, causing a temporary shortage of disk drives and increase in prices. But now that it’s all over, a funny story is coming out of BackBlaze, which found itself literally thinking outside the box.
The company, which is known for providing low-cost constant backups for its subscribers, is also known for building its cloud out of a whole lot of teeny (well, 3 TB) commodity disk drives rather than a few great big ones. This saves money and helps the company grow more granularly.
The only problem is if you suddenly run out of teeny commodity disk drives — or find that, in a matter of two weeks, that they’ve tripled in price, as BackBlaze did, when it was adding 50 TB of capacity a day. At the same time, the company wasn’t buying enough to be able to get deals from the manufacturers.
In an extremely detailed, hysterically funny blog post, the company is now relating how it dealt with the crisis — basically, by buying them as consumer commodities rather than as parts, and turning them into the parts they needed to build the “storage pods” on which their service was based.
“With our normal channels charging usury prices for the hard drives core to our business, we needed a miracle,” writes Andrew Klein, director of product marketing. “We got two: Costco and Best Buy. On Brian [Wilson, CTO]’s whiteboard he listed every Costco and Best Buy in the San Francisco Bay Area and then some. We would go to each location and buy as many 3 TB drives as possible.”
While the company then had to “shuck” the drives from their cases, this saved the company $100 per drive over buying them from its usual suppliers. Problem solved.
For a while.
“The “Two Drive Limit” signs started appearing in retail stores in mid-November,” Klein writes. “At first we didn’t believe them, but we quickly learned otherwise.” So workers started making the circuit — circled the San Francisco Bay hitting local Costco and Best Buy stores: 10 stores, 46 disk drives, for 212 miles. It put a lot of miles on the cars, and a lot of time, but it solved that problem.
For a while.
Then BackBlaze employees started getting banned from stores.
At that point, they started hitting up friends and family, and not just in the Bay Area, but nationwide. “It was cheaper to buy external drives at a store in Iowa and have Yev’s dad, Boris, ship them to California than it was to buy internal drives through our normal channels,” Klein writes.
(The company also apparently considered renting a moving van to drive across the country, hitting stores along the way — a variation on the “bandwidth of a station wagon of tapes” problem — but decided it wouldn’t be economical.)
By the time internal drive prices got to their normal level, the company had bought 5.5 petabytes of storage through retail channels — or more than 1800 disk drives. But finally, it could go back to its normal practices.
“On July 25th of this year, Backblaze took $5M in venture funding,” Klein writes. “At the same time, Costco was offering 3TB external drives for $129 about $30 less than we could get for internal drives. The limit was five drives per person. Needless to say, it was a deal we couldn’t refuse.”
Disclosure: I am a BackBlaze customer.
First it was HGST with helium. Now it’s Hitachi itself with glass. The company has announced a technology that enables it to store data for what it says is forever.
The technology works with a 2cm square piece of glass that’s 2mm thick, and is etched in binary with a laser. There are four layers, which results in a density of 40MB per square inch. “That’s better than a CD (which tops out at 35MB per square inch), but not nearly as good as a standard hard disk, which can encode a terabyte in the same space,” writes Sam Grobart in Bloomberg. The company said it could also add more layers for more density.
Of course, the selling point is not how dense it is, but that it will, supposedly, last forever, without the bit rot that degrades magnetic storage and is leading some to fear a “digital dark ages” where we will lose access to large swathes of our history and culture because it’s being stored magnetically.
The technology was developed in 2009 and may be made available as a product by 2015, Hitachi said, according to Broadcast Engineering.
There’s more to the digital dark ages than simply preserving the media, however — there’s also the factor of having the hardware and software that enables people to read the data. Anyone who’s found a perfectly pristine 78-rpm record in their grandparents’ attic is familiar with that problem.
Hitachi says that won’t be a problem because all computers, ultimately, store data in binary, and the glass could be read using a microscope. But how it’s encoded in binary — the translation between the binary and turning it into music or movies or whatever — the company didn’t say. The microscope could read it, but how would it know what it meant?
The way it may work is to have organizations with a great deal of data to preserve, such as governments, museums and religious organizations, send their data to Hitachi to encode it, wrote Broadcast Engineering.
The quartz glass is said to be impervious to heat — the demonstration included being baked at 1000 degrees Celsius for two hours to simulate aging — as well as to water, radiation, radio waves and most chemicals, which is why many laboratory containers are made of glass.
On the other hand, the glass is vulnerable to breakage. And as anyone who’s used a microscope has probably experienced, imagine reading the data and then, trying to improve the focus, turning the microscope too far and watching in horror as centuries-old data gets crunched.
Virtualization. In talking about how under-utilized data center servers are, and in appearing to limiting himself to less than state-of-the-art facilities, Glanz failed to notice how prevalent virtualization is becoming, which enables an organization to set up numerous “virtual servers” inside a physical server — which, in the process, results in much higher utilization. “[V]irtualized systems can be easily run at greater than 50% utilization rates, and cloud systems at greater than 70%,” writes Clive Longbottom in SearchDataCenter.
“[I]n many cases the physical “server” doesn’t even exist since everyone doing web at scale makes extensive use of virtualization, either by virtualizing at the OS level and running multiple virtual machines (in which case, yes, perhaps that one machine is bigger than a desktop, but it runs several actual server processes in it) or distributing the processing and storage at a more fine-grained level,” writes Diego Doval in his critique of the New York Times piece. “There’s no longer a 1-1 correlation between “server” and “machine,” and, increasingly, “servers” are being replaced by services.”
“Although the article mentions virtualization and the cloud as possible solutions to improve power utilization, VMware is not mentioned,” agrees Dan Woods in Forbes‘ critique of the piece. “If the reporter talked to VMware or visited their web site, he would have found massive amounts of material that documents how thousands of data centers are using virtualization to increase server utilization.”
Storage. Similarly, Glanz appeared to not be aware of advances in storage technology, even though some of them are taking place in the very data centers he lambasted in his articles. In Prineville, Ore., for example, not all that far from the Quincy, Wash., data centers he criticized, Facebook is working on designing its own storage to eliminate unnecessary parts, as well as setting up low-cost slow-access storage that is spun down most of the time.
Facebook — which does this research precisely because of the economies of scale in its massive data centers — is making similar advances in servers. Moreover, the company’s OpenCompute initiative is releasing all these advances to the computer industry in general to help it take advantage of them, too.
In addition, Glanz focused on the “spinning disks” of the storage systems, apparently not realizing that increasingly organizations like eBay are moving to solid-state “flash” storage technology that use much less power.
Also, storage just isn’t as big a deal as it used to be and as the story makes out. “A Mr Burton from EMC lets slip that the NYSE ‘produces up to 2,000 gigabytes of data per day that must be stored for years’,” reports Ian Bitterlin of Data Center Dynamics in its critique of the New York Times piece. “A big deal? No, not really, since a 2TB (2,000 gigabytes) hard-drive costs $200 – less than a Wall Street trader spends on lunch!”
Disaster recovery. Glanz also criticized data centers for redundancy — particularly their having diesel generators on-site to deal with power failures — apparently not realizing that such redundancy is necessary to make sure the data centers stay up.
And yet, even with all this redundancy, there have been a number of well-publicized data center failures in recent months caused by events as mundane as a thunderstorm. Such outages can cost up to $200,000 per hour for a single company — and a data center such as Amazon’s can service multiple companies. If anything, one might argue that the costs of downtime require more redundancy, not less.
Of course it’s important to ensure that data centers are making efficient use of power, but it’s also important to understand the context.
The only problem with HGST’s helium-filled disk drive is that any audio ends up sounding like this.
The company — formerly known as Hitachi Global Storage Technologies, and now a Western Digital company — has announced a helium-filled hard disk platform, scheduled to ship next year for an undetermined price without specifications, all of which are supposed to be announced when it ships. The technology was demonstrated at a recent Western Digital investor event.
Okay, so why helium? Said the company:
The density of helium is one-seventh that of air, delivering significant advantages to HGST’s sealed-drive platform. The lower density means dramatically less drag force acting on the spinning disk stack so that mechanical power into the motor is substantially reduced. The lower helium density also means that the fluid flow forces buffeting the disks and the arms, which position the heads over the data tracks, are substantially reduced allowing for disks to be placed closer together (i.e., seven disks in the same enclosure) and to place data tracks closer together (i.e., allowing continued scaling in data density). The lower shear forces and more efficient thermal conduction of helium also mean the drive will run cooler and will emit less acoustic noise.
That’s seven platters as opposed to the current five, though HGST didn’t specify how much more dense the data could be nor would this could mean in terms of improved disk capacity. However, storage analyst Tom Coughlin wrote in Forbes that this means “HGST could ship close to 6 TB drives in 2013 and even 10 TB drives with 7 platters could be possible within two years after that.”
The company did say, however, that the helium-filled drive used 23 percent less power, for a 45 percent improvement in watts-per-TB. In addition to consuming less power, the drive operates four degrees Celsius cooler, requiring less cooling in the system rack and data center, the company said.
HGST has been working on the technology — the operative part of which is designing a leakproof case — for six years, before Western Digital bought it in March, 2011, and took possession in March, 2012.
What the companies didn’t mention, however, is how they might deal with a worldwide shortage of helium that is causing a ballooning of the price, literally — helium balloons now cost three times as much as they did just six months ago. As it turns out, the gas is heavily used in the computer industry.
“Helium is usually generated as a byproduct of natural gas mining, and we’re currently in the middle of a shortage of helium, due partly because the recession has slowed natural gas production,” wrote Brad Tuttle in Time. “About three-quarters of the world’s helium is produced in the U.S., according to the Kansas City Star, and while production is supposed to be increased by the end of the year in spots ranging from Wyoming to Russia, the element is expected to be in short supply for months, if not years.”
OMG. Hold the presses. In a shocking power grab, EMC CEO fought off attacks by underlings to maintain his position.
No, not really.
Tucci had announced a year ago that he planned to step down from EMC (as well as VMware, of which it owns a majority) by the end of this year. (In fact, the Boston Globe suggested that he had first announced his retirement in September 2009.) He then announced in January that, never mind, he was going to stay through 2013.
While there has been some executive reshuffling since then, on the whole it appears to be an orderly transition, with several potential competent successors.
Now Tucci says he’s going to stay through at least February 2015, and at some time before that he’s supposed to pick a successor and transition to a purely chairman of the board role.
Roger Cox, vice president of research for Gartner Inc., told the Globe that Tucci’s decision to stay longer is probably more about his unwillingness to let go than dissatisfaction by the EMC board with potential successors, of which there are at least three internal ones. While Tucci is 65, he is reportedly in good health and the company is doing well — so well that perhaps the board and stockholders are leery about turning the company over to someone else, no matter how well-groomed they are for the position. And perhaps he is hoping that one or more of the three will move on and make the decision easier.
EMC’s orderly transition is in sharp contrast to the traumatic ones in other companies such as HP, notes Channelnomics.
Oh, and should Tucci achieve “certain performance targets, including targets relating to total shareholder return, revenue and other metrics” for 2013 and 2014, he also stands to gain $8 million in stock by the February 2015 deadline.
If you needed a reason to implement e-discovery in your company, you now have one. 1.05 billion of them, in fact.
A number of legal experts — as well as e-discovery vendors — have pointed to discovery of electronic documents such as email as an important factor in Apple’s patent victory over Samsung. Writes Doug Austin in E-Discovery Daily:
Interviewed after the trial, some of the jurors cited video testimony from Samsung executives and internal emails as key to the verdict. Jury foreman Velvin Hogan indicated that video testimony from Samsung executives made it “absolutely” clear the infringement was done on purpose. Another juror, Manuel Ilagan, said , “The e-mails that went back and forth from Samsung execs about the Apple features that they should incorporate into their devices was pretty damning to me.”
E-discovery vendors, such as Jeffrey Hartman of EDiscovery Labs, were quick to pounce on the case as an example.
This is yet another clear reminder that otherwise smart people continue to create electronic documents that are both dangerous and discoverable; even as awareness of these pitfalls increases. This is bad news for general counsels and company shareholders…but good news for plaintiff’s attorneys seeking the digital goodies that will help them win lawsuits. A large courtroom display of a blow-up of an emotionally charged internal report or email is often worth even more than technical testimony or other hard evidence.
Another important e-discovery aspect to the case is that first Samsung, and then Apple as well, were hit with “spoilation” charges for failing to preserve electronic evidence — in the case of Samsung, for example, for failing to turn off a function that automatically deletes email that’s more than two weeks old. While a number of e-discovery experts do recommend implementing such an autodelete feature, you have to turn it off once a case starts to preserve evidence that could be useful to the case, known as a “litigation hold.”
There’s a compilation of articles about the case if you want to read more — seriously, a lot more — about this.