We’ve all seen it happen. A couple gets together and things seem great for a while. Then they start pulling apart. They might even start seeing other people. By the time the split finally comes, it’s not only anticlimactic but almost a relief.
That’s probably how it feels for EMC now that Dell has finally ended a ten-year relationship (two years early) where Dell sold EMC storage hardware, as Dell increasingly became a storage player in its own right.
“We’ve grown apart,” Dell said. “I needed to feel like I could be my own person.”
Well, okay, not really. “Over the past few years, Dell has grown to become a robust storage technology provider with differentiated capabilities across several product families, including Compellent, EqualLogic, PowerVault, and Dell / EMC,” is the way the company actually put it, on what used to be the Dell/EMC product page.
Dell bought EqualLogic in 2007, and Compellent in 2010 — spending a total of $2 billion on storage acquisitions — after starting its partnership with EMC in 2001. Other acquisitions included Exanet for scale-out NAS technology and Ocarina for data compression and optimization, as well as making its own DX6000 object storage hardware, partnering with Caringo for the software. The company also reportedly said that its own storage properties provide almost 80 percent of its storage revenues and 90 percent of its profits in the second quarter of this year.
Dell also said it planned to spend an additional $1 billion this fiscal year to strengthen its storage offerings.
Dell promised it would continue to be a good parent for the children — that is, that it would continue to support the EMC hardware. However, when people want to upgrade that hardware, they will be offered Dell storage products.
EMC asked that people respect its privacy during this difficult time. Well, not really. Actually, it had no comment.
Another day, another e-discovery survey. Enterprise Strategy Group has released its report, e-Discovery Market Trends: A View from the Legal Department.
The results aren’t so very different from Symantec’s survey last month — though, frankly, the ESG survey isn’t as statistically rigorous; it surveyed only 48 general counsel.
The following are some of the conclusions ESG came up with.
- E-discovery pain is most acute for high-revenue, serial-litigant enterprise companies, particularly by large organizations with revenue exceeding US$1B, serial litigation demands, and high legal expenses.
- Corporate counsel leads internal e-discovery decision making, but process management is interdisciplinary. Internally, corporate counsel bears the responsibility for litigation response, even as other players are involved in execution. Externally, corporate counsel is wielding growing influence with law firms in choosing third-party providers.
- Most organizations are not tracking e-discovery spending, but organizations with at least US$1B in annual revenue were more than three times as likely as their counterparts earning less than US$1B annually to track expenses related to e-discovery activities including document review and technology investments.
- They also don’t track the accuracy and efficiency of document review. Less than one-third of respondents have ever tracked the productivity or efficiency of document review. Half of respondents with less than US$1B in annual revenue don’t monitor these processes — with internal or external resources – and have no plans to do so.
- Enterprise litigants are exerting influence (and pricing pressure) on law firms. Corporate counsel increasingly suggest specific tools and technical methods to their law firms. However, they are even more likely to simply request more cost-driven measures such as itemization or alternative fee arrangements.
- Corporate information governance and litigation readiness are a priority, but not yet a widespread reality. Top internal priorities include defensible deletion and data mapping of enterprise ESI inventory for better litigation preparedness.
- Enterprises use diverse technologies for litigation response. E-mail archives and content management systems top current usage for preservation and collection. Future purchasing plans are dominated by content management, e-discovery platforms, and enterprise search.
- Challenges to collection and preservation persist, including collecting data from endpoint devices, over-collection, and supporting staffing requirements and short timelines for conducting litigation response. Diverse data formats and locations present a moving target for e-discovery as newer data sources like SharePoint or the cloud emerge and older data is rendered inaccessible in legacy applications or backups.
- Corporate litigants are complying with court standards for supervising collections and notifying custodians of legal hold, though these are largely still through manual methods. They are less vigilant in documenting chain of custody or proving they’ve physically prevented spoliation in their systems.
IBM announced this week that it had been selected for a 10-year $240 million operations and maintenance contract with the National Archives and Records Administration, but there’s a lot more to the story than that. IBM is actually taking over from Lockheed Martin after several years of a project that’s fallen behind schedule and over budget.
The project is to manage the Electronic Records Archive, and is intended to ensure the transparency of government documents, allowing broader citizen access to public records. The project was started in 2001 to preserve and provide both internal and external electronic access to the records. But it had its problems, noted Elizabeth Montalbano of Information Week:
NARA began working on the digital archive in 2001 and in 2005 awarded Lockheed Martin a $317 million contract to develop it. However, the project has not been without its troubles along the way. Earlier this year a report by the Government Accountability Office found that the project likely will cost $1.2 billion to $1.4 billion, exceeding its estimated cost of $995 million by 21% to 41%. The report cited poor project management as the reason for the soaring costs.”
In fact, due to its inclusion on in the GAO report, NARA cut some of the functionality from the project in February and decided to do no new development past September, which is what enabled IBM to get an O&M contract after the contract with Lockheed ended on September 30, the end of the federal fiscal year — about a year earlier than planned. Originally, NARA had had a sixth option year on the Lockheed Martin deal for development, and a seventh year for operations and maintenance, FederalNewsRadio.com reported.
The project was officially launched in April, particularly with what were called three “pathfinder” agencies, so-called because of the amount of requests those agencies received: Justice, Health & Human Services, and State. 27 other agencies were supposed to start bringing their records online by the end of November, while independent agencies were supposed to start bringing their records online in July, FederalNewsRadio.com noted.
But IBM’s role will be more than just maintenance and operations. An agency spokesman said that IBM would be adding functionality to the system through a series of work orders and other enhancements — in particular, improving the search system, the spokesman said.
One of the most interesting aspects about the announcement this week that EMC CEO Joe Tucci was planning to step down by the end of next year was how blase’ everyone was about it. He wasn’t fired. He isn’t dying (so far as we know, existential aren’t-we-all-dying questions aside). He’s not part of a parade of CEOs who have come and gone. It’s just, hey, next year I’ll be 65, time to go.
Part of this, of course, is in contrast to other CEO departures this year where people were fired, dying, part of a parade, and so on. Compared to, say, HP, Apple, or HP again, respectively, the notion of a guy who become CEO ten years ago, did his job, and is leaving at a normal retirement age seems almost quaint.
Part of this, too, is the company culture. EMC may be one of the biggest storage companies out there, but it’s not a rock star consumer-driven company the way Apple is. It’s normal there for the succession to be a relatively gentlemanly affair. Tucci did his time before he became CEO, serving under the previous CEO as executive chair for two years, and will serve as executive chair for the next EMC CEO, whomever he may be (nobody’s suggesting that the next CEO of EMC might be female).
Part of it is also the lack of drama around the succession. Yes, it’s true, nobody was named as the next CEO yet, and of course there’s always the potential of a bunch of little storage Borgias backstabbing and poisoning each other. But EMC is the sort of company where people use the term “deep bench” a lot. Most articles around Tucci’s announcement (which he made to the Wall Street Journal, naturally) named at least four potential successors, any one of whom would be qualified to run the company. Nobody’s wringing their hands suggesting that EMC will have to go outside the company to find someone qualified.
Part of it is that even with his more than one-year notice, this isn’t a surprise; Tucci started talking about succession a year ago — with the same four guys as potential successors. (And nobody’s trying to out any of them, as people are doing with Apple’s Tim Cook.)
The Motley Fool is trying to beat the drum for a shareholder revolt against the fact that the next EMC CEO will be continue to be both CEO and chairman, but they’re pretty alone in that.
At this point, about all we can do is wait to see who gets appointed the next EMC CEO — and there’s no timetable for that yet.
The Electronic Frontier Foundation has announced that two vendors, Apple and Dropbox, have signed a pledge to help support its Digital Due Process initiative, which calls for a rewrite of the Electronic Communications Privacy Act to better protect user data.
The initiative has more than 50 members, including Amazon, AT&T, Facebook, Google, Microsoft, Twitter, and Yahoo!, which were called out in April as being major computer vendors that should support the proposal. Steps included in the proposal include telling users about data demands, being transparent about government requests, fighting for user privacy in the courts, and fighting for user privacy in Congress. Companies received from one to four stars (including partial stars) depending on how well they are implementing each of these policies.
Dropbox was a particularly interesting addition, because the company has been criticized about its policies regarding protecting user data in its cloud storage service.
Other vendors pf the 13 that the EFF called out in April that have not yet responded include Comcast, Myspace, Skype (since purchased by Microsoft, which is a member), and Verizon.
Organizations such as the American Civil Liberties Union and the Center for Democracy & Technology are also members.
It’s typically a good idea to take vendor surveys with a grain of salt; they tend to be slanted and unscientific. Not so with Symantec; they have actual scientific surveys with margins of error and everything.
Not to say, of course, that they’re completely unbiased; recall in this case that Symantec purchased Clearwell earlier this year in an attempt to improve its ranking after a recent Gartner Magic Quadrant on eDiscovery vendors.
That said, its Information Retention and eDiscovery Survey has some interesting points to be made — not the least of which is actual evidence from users that implementing an information retention policy saves money.
- Respondents using best practices reported a 64% faster response time with a 2.3 times higher success rate when responding to eDiscovery requests.
- They were 78% less likely to be sanctioned by the courts and 47% less likely to find themselves in a compromised legal position.
- They were also 20% less likely to have fines levied against them. In addition, they were 45% less likely to disclose too much information.
- Nearly half of respondents do not have an information retention plan in place.
- 30% are only discussing how to do so.
- 14% have no plan to do so.
- When asked why they don’t have information retention programs, respondents indicated the top reasons are: lack of need (41%), too costly (38%); nobody has been chartered with that responsibility (27%); don’t have time (26%); and lack of expertise (21%).
The part about “too costly” is particularly telling in light of the results.
Respondents who said they’d been asked to respond to a legal, compliance or regulatory request for electronically stored information reported the following results:
- Completely failed to fulfill the request 10%
- Partially failed to fulfill the request 10%
- Successfully fulfilled the request, but more slowly than the requestor would like 25%
- Successfully fulfilled the request in a timeframe that is acceptable to the requestor 35%
- Damage to Enterprise reputation or embarrassment 42%
- Fines 41%
- Compromised legal position 38%
- Sanctions by courts 28%
- Hampered our ability to make decisions in a timely fashion 26%
- Raised our profile as a potential litigation target 25%
The thing is, it’s true. Even though Internet speeds continue to increase, the amount of data we want to transmit continues to increase, too.
Which is why the various Internet denizens have developed….workarounds for large file transfers, which also provides the opportunity for the wonderful Internet pastime of geekly arguing.
Which brings us to station wagons, pigeons, and Blu-ray.
The canonical statement, by Andrew Tannenbaum in his 1996 book Computer Networks, is basically “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” And ever since then, there have been numerous websites devoted to how-many-angels-can-dance-on-the-head-of-a-pin discussions about just what that bandwidth would be.
You can tell how old the websites are based on what figures they use for comparable Internet bandwidth, the size of a magnetic tape, and so on. The Wikipedia entry for “Sneakernet” appears to have the most up-to-date calculations.
(The actual calculation using today’s technologies is left as an exercise for the reader.)
The Internet being the Internet, the calculations have been extended, ranging from petabytes in a sailboat to Blu-ray discs in a 747 (which, as it turns out, would actually be too heavy for a 747 to carry), to, more mundanely, the number of SD cards that fit into a Fed Ex box — as well as the bandwidth of a Netflix movie shipment through the mail.
And then there’s the pigeons.
Really truly, carrier pigeons have been used for a remarkable amount of data transfer in history — not just short messages, and aerial photography predating satellites, but things like blueprints from military installations in the U.S.
In fact, in 1982, Computerworld ran an article about how Lockheed Missile & Space Co. used pigeons to carry microfilm copies of blueprints to a research facility in Santa Cruz, because it was cheaper than printing out and transporting hard copies. And if you have $100 per half hour for someone to dig it up, you can apparently get a copy of Dan Rather introducing a story about it on CBS News.
Consequently, not one but two April Fool’s Internet protocols were developed — Transmission of IP Datagrams on Avian Carriers, and Transmission of IP Datagrams on Avian Carriers with Quality Control — for transmitting Internet data by carrier pigeon. The first one was even demonstrated, and while the experiment left something to be desired, Wikipedia points out that “during the last 20 years, the information density of storage media and thus the bandwidth of an Avian Carrier has increased 3 times faster than the bandwidth of the Internet.”
That’s not all. In various remote areas, such as rural U.K., Australia, and parts of South Africa, people have used carrier pigeons to demonstrate that they’re faster than what passes for high-speed Internet there.
The point is this: No matter how fat a pipe you have to the Internet, at some given amount of data, it’s going to be faster, cheaper, or both to use some manual method to ship data on some storage medium. It makes sense for you to do a back-of-the-envelope calculation to figure out where the data boundaries are for different mediums and different shipping methods, and update them as technology changes.
Tape’s not dead. Really. Products supporting the Linear Tape Open (LTO) 5 specification just began shipping this year, but already vendors are starting to make noises about LTO 6, for which there isn’t even an availability date announced yet.
In sort of the tape storage equivalent to Moore’s Law, a consortium of three vendors — Hewlett Packard, IBM, and Quantum, known as the Technology Provider Companies (TPC) — get together every few years and decide upon specifications for tape cartridges with a steady increase in speed and capacity. This helps keep users convinced that there’s still a future for tape.
For example, the specifications for LTO 5 (as well as LTO 6) were announced in December 2004, but it took until January 2010 before licenses for the LTO 5 specification was available, and products supporting it started to be available in the second quarter of that year.
Similarly, the LTO TPCs announced in June of this year that licenses for the LTO 6 specification were available. By extrapolation, one can assume that LTO 6 products could be announced any day.
LTO 6 is defined as having a capacity of 8 TB with a data transfer speed of up to 525 MB/s, assuming a 2.5:1 compression. This is in comparison to LTO 5, which has a capacity of 3 TB with a data transfer speed of up to 280 MB/s, assuming a 2.5:1 compression.
Lest people get fidgety about the future of tape after that, the LTO TPC announced this spring the next two generations, LTO 7 and LTO 8, with compressed capacities of 16 TB and 32 TB and data transfers speeds of 788 MB/s and 1180 MB/s, respectively. As with LTO 6, no dates were announced, but one might expect each will come out about two to three years in succession.
The thing to remember, also, is that each LTO generation can typically only read two generations before it — meaning users needs to either rewrite their tape library every few years or keep a bunch of old LTO machines around. “By the time LTO 8 is released, organizations will need, at a minimum, LTO 3 drives to read LTO 1 through LTO 3 cartridges; LTO 6 drives to read LTO 4 through LTO 6 cartridges; and LTO 8 drives to read the LTO 7 and LTO 8 cartridges,” wrote Graeme Elliott earlier this year.
The best part about IBM’s experimental 120-petabyte hard drive is reading all the ways that writers try to explain how big it is.
- 2.4 million Blu-ray disks
- 24 million HD movies
- 24 billion MP3s
- 1 trillion files
- Eight times as largest as the biggest disk array available previously
- More than twice the entire written works of mankind from the beginning of recorded history in all languages
- 6,000 Libraries of Congress (a standard unit of data measure)
- Almost as much data as Google processes every week
- Or, four Facebooks
It is not one humungo drive; it is, in fact, an array of 200,000 conventional hard drives (not even solid-state disk) hooked together (which would make them an average of 600 GB each).
Unfortunately, you’re not going to be able to trundle down to Fry’s and get one anytime soon. No, this is something being put together by the IBM Almaden research lab in San Jose, Calif., according to MIT Technology Review.
What exactly it’s going to be used for IBM wouldn’t say, only that it was “an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena.” Most writers speculated that that meant weather, though Popular Science thought it could be used for seismic monitoring — or by the NSA for spying on people.
Like the Cray supercomputer back in the day, and some high-powered PCs even now, the system is reportedly water-cooled rather than by using fans.
Needless to say, it also uses a different file system than a typical PC: IBM’s General Parallel File System (GPFS), which according to Wikipedia has been available on GPFS has been available on IBM’s AIX since 1998, on Linux since 2001 and on Microsoft Windows Server since 2008 and which some tests have shown can work up to 37 times faster than a typical system. (The Wikipedia entry also has an interesting comparison with the file system used by big data provider Hadoop.)
GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel.”
The system also has a kind of super-mondo RAID that lets dying disks store copies of themselves and then get replaced, which reportedly gives the system a mean time between failure of a million years.
Technology Review didn’t say how much space it took up, but if a typical drive is, say, 4 in. x 5.75 in. x 1 in, we’re talking 4.6 million cubic inches just for the drives themselves, not counting the cooling system and cables and so on. That’s a 20-ft. x 20-ft. square almost 7.5 feet high, just of drives. (This is all back-of-the-envelope calculations.)
In fact, the system needs two petabytes of its storage just to keep track of all the index files and metadata, Technology Review reported.
In the winter, I keep my thermostat set to a particular temperature. When I leave the house, or go to bed, I turn the thermostat down, and when I get home or wake up, I turn it back up. This ensures that the house is comfortable when I’m using it, and more energy-efficient when I’m not.
Now, someone is talking about doing the same thing for hard disk drives.
Eran Tal, a hardware engineer at Facebook, is talking about the idea. In case you didn’t know, Facebook has some of the largest data centers in the world, and has begun publicizing some details of their design to help other data center managers leverage what Facebook has learned in the process.
Consequently, earlier this year, Facebook created when it called the Open Compute Project, which is, essentially, to hardware design what open source is to software design. Thus far, the site’s blog has a grand total of two postings, along with a number of comments on them.
And that’s where Tal comes in. A few days ago, he made one of those two posts, musing about what it would be like to have hard disks with a toggle switch between low speed and high speed, so that as the data on them became older and less actively used, the switch could be toggled to put the hard disks on a lower speed — saving energy in the process, without having to do the data migration that active tiering requires.
Reducing HDD RPM by half would save roughly 3-5W per HDD. Data centers today can have up to tens and even hundreds of thousands of cold drives, so the power savings impact at the data center level can be quite significant, on the order of hundreds of kilowatts, maybe even a megawatt. The reduced HDD bandwidth due to lower RPM would likely still be more than sufficient for most cold use cases, as a data rate of several (perhaps several dozen) MBs should still be possible. In most cases a user is requesting less than a few MBs of data, meaning that they will likely not notice the added service time for their request due to the reduced speed HDDs.
Once upon a time — seven whole years ago — there was a vendor that did something like this: Copan, with what it called its Massive Array of Idle Disk (MAID) technology, produced disk drives where only up to 25% of them were on at a time. Unfortunately, after getting new funding as recently as February 2009, Copan declared bankruptcy in 2010 and was bought by SGI (yes, it’s still around), which still markets the technology, after a fashion at least.
Several other vendors, including Nexsan with its AutoMAID technology, also have products in this area.
The big trick with any of these systems is ensuring that the data on them really isn’t used very much, because it can take up to 30 seconds for the disk to start from zero, and up to 15 seconds from the slower speed. But as Derrick Harris of GigaOm writes, the savings for a data center the size of Facebook’s can be considerable, and the technology could end up trickling down in the process.