IBM announced this week that it had been selected for a 10-year $240 million operations and maintenance contract with the National Archives and Records Administration, but there’s a lot more to the story than that. IBM is actually taking over from Lockheed Martin after several years of a project that’s fallen behind schedule and over budget.
The project is to manage the Electronic Records Archive, and is intended to ensure the transparency of government documents, allowing broader citizen access to public records. The project was started in 2001 to preserve and provide both internal and external electronic access to the records. But it had its problems, noted Elizabeth Montalbano of Information Week:
NARA began working on the digital archive in 2001 and in 2005 awarded Lockheed Martin a $317 million contract to develop it. However, the project has not been without its troubles along the way. Earlier this year a report by the Government Accountability Office found that the project likely will cost $1.2 billion to $1.4 billion, exceeding its estimated cost of $995 million by 21% to 41%. The report cited poor project management as the reason for the soaring costs.”
In fact, due to its inclusion on in the GAO report, NARA cut some of the functionality from the project in February and decided to do no new development past September, which is what enabled IBM to get an O&M contract after the contract with Lockheed ended on September 30, the end of the federal fiscal year — about a year earlier than planned. Originally, NARA had had a sixth option year on the Lockheed Martin deal for development, and a seventh year for operations and maintenance, FederalNewsRadio.com reported.
The project was officially launched in April, particularly with what were called three “pathfinder” agencies, so-called because of the amount of requests those agencies received: Justice, Health & Human Services, and State. 27 other agencies were supposed to start bringing their records online by the end of November, while independent agencies were supposed to start bringing their records online in July, FederalNewsRadio.com noted.
But IBM’s role will be more than just maintenance and operations. An agency spokesman said that IBM would be adding functionality to the system through a series of work orders and other enhancements — in particular, improving the search system, the spokesman said.
One of the most interesting aspects about the announcement this week that EMC CEO Joe Tucci was planning to step down by the end of next year was how blase’ everyone was about it. He wasn’t fired. He isn’t dying (so far as we know, existential aren’t-we-all-dying questions aside). He’s not part of a parade of CEOs who have come and gone. It’s just, hey, next year I’ll be 65, time to go.
Part of this, of course, is in contrast to other CEO departures this year where people were fired, dying, part of a parade, and so on. Compared to, say, HP, Apple, or HP again, respectively, the notion of a guy who become CEO ten years ago, did his job, and is leaving at a normal retirement age seems almost quaint.
Part of this, too, is the company culture. EMC may be one of the biggest storage companies out there, but it’s not a rock star consumer-driven company the way Apple is. It’s normal there for the succession to be a relatively gentlemanly affair. Tucci did his time before he became CEO, serving under the previous CEO as executive chair for two years, and will serve as executive chair for the next EMC CEO, whomever he may be (nobody’s suggesting that the next CEO of EMC might be female).
Part of it is also the lack of drama around the succession. Yes, it’s true, nobody was named as the next CEO yet, and of course there’s always the potential of a bunch of little storage Borgias backstabbing and poisoning each other. But EMC is the sort of company where people use the term “deep bench” a lot. Most articles around Tucci’s announcement (which he made to the Wall Street Journal, naturally) named at least four potential successors, any one of whom would be qualified to run the company. Nobody’s wringing their hands suggesting that EMC will have to go outside the company to find someone qualified.
Part of it is that even with his more than one-year notice, this isn’t a surprise; Tucci started talking about succession a year ago — with the same four guys as potential successors. (And nobody’s trying to out any of them, as people are doing with Apple’s Tim Cook.)
The Motley Fool is trying to beat the drum for a shareholder revolt against the fact that the next EMC CEO will be continue to be both CEO and chairman, but they’re pretty alone in that.
At this point, about all we can do is wait to see who gets appointed the next EMC CEO — and there’s no timetable for that yet.
The Electronic Frontier Foundation has announced that two vendors, Apple and Dropbox, have signed a pledge to help support its Digital Due Process initiative, which calls for a rewrite of the Electronic Communications Privacy Act to better protect user data.
The initiative has more than 50 members, including Amazon, AT&T, Facebook, Google, Microsoft, Twitter, and Yahoo!, which were called out in April as being major computer vendors that should support the proposal. Steps included in the proposal include telling users about data demands, being transparent about government requests, fighting for user privacy in the courts, and fighting for user privacy in Congress. Companies received from one to four stars (including partial stars) depending on how well they are implementing each of these policies.
Dropbox was a particularly interesting addition, because the company has been criticized about its policies regarding protecting user data in its cloud storage service.
Other vendors pf the 13 that the EFF called out in April that have not yet responded include Comcast, Myspace, Skype (since purchased by Microsoft, which is a member), and Verizon.
Organizations such as the American Civil Liberties Union and the Center for Democracy & Technology are also members.
It’s typically a good idea to take vendor surveys with a grain of salt; they tend to be slanted and unscientific. Not so with Symantec; they have actual scientific surveys with margins of error and everything.
Not to say, of course, that they’re completely unbiased; recall in this case that Symantec purchased Clearwell earlier this year in an attempt to improve its ranking after a recent Gartner Magic Quadrant on eDiscovery vendors.
That said, its Information Retention and eDiscovery Survey has some interesting points to be made — not the least of which is actual evidence from users that implementing an information retention policy saves money.
- Respondents using best practices reported a 64% faster response time with a 2.3 times higher success rate when responding to eDiscovery requests.
- They were 78% less likely to be sanctioned by the courts and 47% less likely to find themselves in a compromised legal position.
- They were also 20% less likely to have fines levied against them. In addition, they were 45% less likely to disclose too much information.
- Nearly half of respondents do not have an information retention plan in place.
- 30% are only discussing how to do so.
- 14% have no plan to do so.
- When asked why they don’t have information retention programs, respondents indicated the top reasons are: lack of need (41%), too costly (38%); nobody has been chartered with that responsibility (27%); don’t have time (26%); and lack of expertise (21%).
The part about “too costly” is particularly telling in light of the results.
Respondents who said they’d been asked to respond to a legal, compliance or regulatory request for electronically stored information reported the following results:
- Completely failed to fulfill the request 10%
- Partially failed to fulfill the request 10%
- Successfully fulfilled the request, but more slowly than the requestor would like 25%
- Successfully fulfilled the request in a timeframe that is acceptable to the requestor 35%
- Damage to Enterprise reputation or embarrassment 42%
- Fines 41%
- Compromised legal position 38%
- Sanctions by courts 28%
- Hampered our ability to make decisions in a timely fashion 26%
- Raised our profile as a potential litigation target 25%
The thing is, it’s true. Even though Internet speeds continue to increase, the amount of data we want to transmit continues to increase, too.
Which is why the various Internet denizens have developed….workarounds for large file transfers, which also provides the opportunity for the wonderful Internet pastime of geekly arguing.
Which brings us to station wagons, pigeons, and Blu-ray.
The canonical statement, by Andrew Tannenbaum in his 1996 book Computer Networks, is basically “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” And ever since then, there have been numerous websites devoted to how-many-angels-can-dance-on-the-head-of-a-pin discussions about just what that bandwidth would be.
You can tell how old the websites are based on what figures they use for comparable Internet bandwidth, the size of a magnetic tape, and so on. The Wikipedia entry for “Sneakernet” appears to have the most up-to-date calculations.
(The actual calculation using today’s technologies is left as an exercise for the reader.)
The Internet being the Internet, the calculations have been extended, ranging from petabytes in a sailboat to Blu-ray discs in a 747 (which, as it turns out, would actually be too heavy for a 747 to carry), to, more mundanely, the number of SD cards that fit into a Fed Ex box — as well as the bandwidth of a Netflix movie shipment through the mail.
And then there’s the pigeons.
Really truly, carrier pigeons have been used for a remarkable amount of data transfer in history — not just short messages, and aerial photography predating satellites, but things like blueprints from military installations in the U.S.
In fact, in 1982, Computerworld ran an article about how Lockheed Missile & Space Co. used pigeons to carry microfilm copies of blueprints to a research facility in Santa Cruz, because it was cheaper than printing out and transporting hard copies. And if you have $100 per half hour for someone to dig it up, you can apparently get a copy of Dan Rather introducing a story about it on CBS News.
Consequently, not one but two April Fool’s Internet protocols were developed — Transmission of IP Datagrams on Avian Carriers, and Transmission of IP Datagrams on Avian Carriers with Quality Control — for transmitting Internet data by carrier pigeon. The first one was even demonstrated, and while the experiment left something to be desired, Wikipedia points out that “during the last 20 years, the information density of storage media and thus the bandwidth of an Avian Carrier has increased 3 times faster than the bandwidth of the Internet.”
That’s not all. In various remote areas, such as rural U.K., Australia, and parts of South Africa, people have used carrier pigeons to demonstrate that they’re faster than what passes for high-speed Internet there.
The point is this: No matter how fat a pipe you have to the Internet, at some given amount of data, it’s going to be faster, cheaper, or both to use some manual method to ship data on some storage medium. It makes sense for you to do a back-of-the-envelope calculation to figure out where the data boundaries are for different mediums and different shipping methods, and update them as technology changes.
Tape’s not dead. Really. Products supporting the Linear Tape Open (LTO) 5 specification just began shipping this year, but already vendors are starting to make noises about LTO 6, for which there isn’t even an availability date announced yet.
In sort of the tape storage equivalent to Moore’s Law, a consortium of three vendors — Hewlett Packard, IBM, and Quantum, known as the Technology Provider Companies (TPC) — get together every few years and decide upon specifications for tape cartridges with a steady increase in speed and capacity. This helps keep users convinced that there’s still a future for tape.
For example, the specifications for LTO 5 (as well as LTO 6) were announced in December 2004, but it took until January 2010 before licenses for the LTO 5 specification was available, and products supporting it started to be available in the second quarter of that year.
Similarly, the LTO TPCs announced in June of this year that licenses for the LTO 6 specification were available. By extrapolation, one can assume that LTO 6 products could be announced any day.
LTO 6 is defined as having a capacity of 8 TB with a data transfer speed of up to 525 MB/s, assuming a 2.5:1 compression. This is in comparison to LTO 5, which has a capacity of 3 TB with a data transfer speed of up to 280 MB/s, assuming a 2.5:1 compression.
Lest people get fidgety about the future of tape after that, the LTO TPC announced this spring the next two generations, LTO 7 and LTO 8, with compressed capacities of 16 TB and 32 TB and data transfers speeds of 788 MB/s and 1180 MB/s, respectively. As with LTO 6, no dates were announced, but one might expect each will come out about two to three years in succession.
The thing to remember, also, is that each LTO generation can typically only read two generations before it — meaning users needs to either rewrite their tape library every few years or keep a bunch of old LTO machines around. “By the time LTO 8 is released, organizations will need, at a minimum, LTO 3 drives to read LTO 1 through LTO 3 cartridges; LTO 6 drives to read LTO 4 through LTO 6 cartridges; and LTO 8 drives to read the LTO 7 and LTO 8 cartridges,” wrote Graeme Elliott earlier this year.
The best part about IBM’s experimental 120-petabyte hard drive is reading all the ways that writers try to explain how big it is.
- 2.4 million Blu-ray disks
- 24 million HD movies
- 24 billion MP3s
- 1 trillion files
- Eight times as largest as the biggest disk array available previously
- More than twice the entire written works of mankind from the beginning of recorded history in all languages
- 6,000 Libraries of Congress (a standard unit of data measure)
- Almost as much data as Google processes every week
- Or, four Facebooks
It is not one humungo drive; it is, in fact, an array of 200,000 conventional hard drives (not even solid-state disk) hooked together (which would make them an average of 600 GB each).
Unfortunately, you’re not going to be able to trundle down to Fry’s and get one anytime soon. No, this is something being put together by the IBM Almaden research lab in San Jose, Calif., according to MIT Technology Review.
What exactly it’s going to be used for IBM wouldn’t say, only that it was “an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena.” Most writers speculated that that meant weather, though Popular Science thought it could be used for seismic monitoring — or by the NSA for spying on people.
Like the Cray supercomputer back in the day, and some high-powered PCs even now, the system is reportedly water-cooled rather than by using fans.
Needless to say, it also uses a different file system than a typical PC: IBM’s General Parallel File System (GPFS), which according to Wikipedia has been available on GPFS has been available on IBM’s AIX since 1998, on Linux since 2001 and on Microsoft Windows Server since 2008 and which some tests have shown can work up to 37 times faster than a typical system. (The Wikipedia entry also has an interesting comparison with the file system used by big data provider Hadoop.)
GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel.”
The system also has a kind of super-mondo RAID that lets dying disks store copies of themselves and then get replaced, which reportedly gives the system a mean time between failure of a million years.
Technology Review didn’t say how much space it took up, but if a typical drive is, say, 4 in. x 5.75 in. x 1 in, we’re talking 4.6 million cubic inches just for the drives themselves, not counting the cooling system and cables and so on. That’s a 20-ft. x 20-ft. square almost 7.5 feet high, just of drives. (This is all back-of-the-envelope calculations.)
In fact, the system needs two petabytes of its storage just to keep track of all the index files and metadata, Technology Review reported.
In the winter, I keep my thermostat set to a particular temperature. When I leave the house, or go to bed, I turn the thermostat down, and when I get home or wake up, I turn it back up. This ensures that the house is comfortable when I’m using it, and more energy-efficient when I’m not.
Now, someone is talking about doing the same thing for hard disk drives.
Eran Tal, a hardware engineer at Facebook, is talking about the idea. In case you didn’t know, Facebook has some of the largest data centers in the world, and has begun publicizing some details of their design to help other data center managers leverage what Facebook has learned in the process.
Consequently, earlier this year, Facebook created when it called the Open Compute Project, which is, essentially, to hardware design what open source is to software design. Thus far, the site’s blog has a grand total of two postings, along with a number of comments on them.
And that’s where Tal comes in. A few days ago, he made one of those two posts, musing about what it would be like to have hard disks with a toggle switch between low speed and high speed, so that as the data on them became older and less actively used, the switch could be toggled to put the hard disks on a lower speed — saving energy in the process, without having to do the data migration that active tiering requires.
Reducing HDD RPM by half would save roughly 3-5W per HDD. Data centers today can have up to tens and even hundreds of thousands of cold drives, so the power savings impact at the data center level can be quite significant, on the order of hundreds of kilowatts, maybe even a megawatt. The reduced HDD bandwidth due to lower RPM would likely still be more than sufficient for most cold use cases, as a data rate of several (perhaps several dozen) MBs should still be possible. In most cases a user is requesting less than a few MBs of data, meaning that they will likely not notice the added service time for their request due to the reduced speed HDDs.
Once upon a time — seven whole years ago — there was a vendor that did something like this: Copan, with what it called its Massive Array of Idle Disk (MAID) technology, produced disk drives where only up to 25% of them were on at a time. Unfortunately, after getting new funding as recently as February 2009, Copan declared bankruptcy in 2010 and was bought by SGI (yes, it’s still around), which still markets the technology, after a fashion at least.
Several other vendors, including Nexsan with its AutoMAID technology, also have products in this area.
The big trick with any of these systems is ensuring that the data on them really isn’t used very much, because it can take up to 30 seconds for the disk to start from zero, and up to 15 seconds from the slower speed. But as Derrick Harris of GigaOm writes, the savings for a data center the size of Facebook’s can be considerable, and the technology could end up trickling down in the process.
Another e-Discovery vendor has been purchased: Hewlett-Packard has announced its intent to purchase UK vendor Autonomy — which, like Symantec purchasee Clearwell earlier this year, was also in the Leaders section of Gartner’s e-Discovery Magic Quadrant released in May.
In that report, Gartner predicted that consolidation would have eliminated one in four enterprise e-Discovery vendors by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. Autonomy itself acquired Iron Mountain’s archiving, e-discovery and online backup business in May for US$ 380 million in cash.
HP offered the US equivalent of $42.11 per share for Autonomy, which it said was a 64% premium over the one-day stock price and a 58% premium over the one-month average stock price. The overall price is on the order of $10 billion.
Autonomy is a brand and marketing powerhouse that appears on many clients’ shortlists,” Gartner said in its earlier report. “Although we have seen little appetite for ‘full-service e-discovery platforms’ from clients as yet, Autonomy is positioned to seize these opportunities when they do arise — indeed, the overall market may evolve in that direction.”
HP’s chief executive officer, Leo Apotheker, formerly of SAP, has said he wants to focus on higher-margin businesses such as software and de-emphasize the personal computer business, said the New York Times. The company also said it is eliminating its WebOS business and is reportedly considering spinning off its PC business, just a decade after acquiring major PC vendor Compaq.
[T]he decision to buy Autonomy also marks a change of course for HP, one that makes HP’s trajectory look remarkably similar to rival IBM’s nearly a decade ago. IBM, a key player in building the PC market in the 1980s, sold its PC business in 2004 to focus on software and services, which aren’t as labor- or component-intensive as building computer hardware.”
However, such a transition may not be easy, said an article in the Wall Street Journal, which examined how IBM had made that transition.
The Autonomy deal offered another advantage to HP, noted a different New York Times article. Like Microsoft’s purchase of Skype earlier this year, it gives HP the opportunity to spend money it had earned outside the U.S. — reportedly as much as $12 billion — without having to pay taxes on that money by bringing it into the U.S.
Other e-Discovery vendors include FTI Technology, Guidance Software, and kCura, the remaining vendors in the “Leaders” section in the Gartner Magic Quadrant. Less attractive, but also likely to be less expensive and, maybe, more desperate, will be the other vendors, such as AccessData Group, CaseCentral, Catalyst Repository Systems, CommVault, Exterro, Recommind and ZyLab in the “visionaries” quadrants, and Daegis, Epiq Systems, Integreon, Ipro, Kroll Ontrack, as well as the ediscovery components of Lexis/Nexis and Xerox Litigation Services in the “niche” quadrant.
To anybody who’s run a USB memory stick through the laundry or left one sitting in a remote machine, there’s no surprise in the results from the recent Ponemon Institute study, The State of USB Drive Security.
Ponemon, which performed the study on behalf of Kingston, a manufacturer of encrypted USB thumb drives, did not fully describe its methodology, but said it had surveyed 743 IT and IT security practitioners with an average of 10 years of relevant experience.
Interesting tidbits from the survey include the following:
- More than 70% of respondents in this study say that they are absolutely certain (47%) or believe that it was most likely (23%) that a data breach was caused by sensitive or confidential information contained on a missing USB drive within the past two years.
- The majority of organizations (67%) that had lost data confirmed that they had multiple loss events –- in some cases, more than 10 separate events.
- More than 40% of organizations surveyed report having more than 50,000 USB drives in use in their organizations, with nearly 20% having more than 100,000 drives in circulation
- On average, organizations had lost more than 12,000 records about customers, consumers and employees as a result of missing USBs.
- The average cost per record of a data breach is $214, making the average cost of lost records to an organization $2,568,000.
This isn’t new; there’ve been numerous incidents of data loss via USB memory stick, either by losing them or by theft, ever since the handy little things came out. But those have been largely anecdotal reports, while this was a more broadly based survey.
And that’s just data going out. Another issue is that of malware coming in, also via thumb drive. Again, we have heard of anecdotal incidents, but the survey also reported that incoming security was an issue as well.
“The most recent example of how easily rogue USB drives can enter an organization can be seen in a Department of Homeland Security test in which USBs were ‘accidentally’ dropped in government parking lots. Without any identifying markings on the USB stick, 60% of employees plugged the drives into government computers. With a ‘valid’ government seal, the plug-in rate reached 90%.”
For example, the survey found that free USB sticks from conferences/trade shows, business meetings and similar events are used by 72% of employees ― even in organizations that mandate the use of secure USBs. And there’s not very many of those: Only 29% felt that their organizations had adequate policies to prevent USB misuse.
The report went on to list 10 USB security recommendations — which many or most organizations do not practice:
1. Providing employees with approved, quality USB drives for use in the workplace.
2. Creating policies and training programs that define acceptable and unacceptable uses of USB drives.
3. Making sure employees who have access to sensitive and confidential data only use secure USB drives.
4. Determining USB drive reliability and integrity before purchasing by confirming compliance with leading security standards and ensuring that there is no malicious code on these tools.
5. Deploying encryption for data stored on the USB drive.
6. Monitoring and tracking USB drives as part of asset management procedures.
7. Scanning devices for virus or malware infections.
8. Using passwords or locks.
9. Encrypting sensitive data on USB drives.
10. Having procedures in place to recover lost USB drives.