We’re used to hearing stories about politicians erasing things from hard disk drives that they shouldn’t have. Now we’re hearing about a politician allegedly putting things onto a hard disk drive that he shouldn’t have.
Maine legislator Rep. Alex Cornell du Houx (D-Brunswick) was the subject of a temporary protection from abuse order this spring filed by Rep. Erin Herbig (D-Belfast), after their relationship ended, according to the Bangor Daily News. The two people made a private agreement, which was not filed in court but was said to be legally binding, around the relationship and their future contact.
“In addition to having no contact with Herbig, the private agreement, which was obtained by the Bangor Daily News, requires that Cornell du Houx pay all of Herbig’s $9,000 in legal fees and turn over any computer hard drives and data storage devices with pictures of Herbig or ‘any other women sleeping or in a state of undress’ to Herbig’s lawyer.”
Apparently, after their relationship ended, du Houx had allegedly “stalked, harassed and threatened her,” including taking more than 100 photos and video of Herbig sleeping.
Without addressing the veracity of the charges, let’s look at the data storage issues involved. (The private agreement doesn’t appear to be available online; excerpts that the paper posted didn’t cover these aspects.)
- The agreement would appear to mention only hard drives and data storage devices. Apparently, if du Houx had any pictures of sleeping or undressed women stored in the cloud, those are fair game. (Not to mention, sleeping or undressed men.)
- As Bangor Daily News letter writer Sid Duncan legitimately pointed out (in an otherwise kind of gross and disgusting letter to the editor on July 16 that the paper has since retracted), what if some of the data storage devices are actually state-owned and contain state documents?
- What if the data storage devices contain communication with du Houx’ attorney? Passing them to Herbig’s attorney could violate attorney-client privilege.
- What is Herbig’s attorney intending to do with the data storage devices? Is he a computer forensics expert who could legitimately retrieve any such images, even if deleted, from the devices? Is he intending to do so? Is he going on a fishing expedition to see what other charges could be filed – such as, conceivably, child pornography or stalking? Has he obtained a warrant for this? And what is he planning to do with the images and data storage devices? Is he planning to destroy these images? Or, as Duncan implied, will he be, um, “perusing” them? If he does destroy those images, could he be accused of destroying evidence (as in 2007’s United States vs. Philip D. Russell)? But if there is child porn on any of the devices, could the attorney himself then be charged with possession of child pornography?
- What about the privacy and rights of any other sleeping or undressed women who might also have images – obtained legitimately or otherwise – stored on du Houx’ devices? Especially since he appears to be a professional photographer? Or, in fact, any other personal pictures he might have of other people? Or, for that matter, of himself?
- Will du Houx get his data storage devices and data back, including all the data that isn’t of sleeping or undressed women?
If you’ve been holding your breath waiting for the new Gartner Enterprise Backup/Recovery Magic Quadrant to come out so you could see how vendors had moved around since last year, you can let go. Other than FalconStor moving from Visionary to Niche, and several players in the Niche quadrant changing around primarily due to acquisitions, there wasn’t much change. Commvault, EMC, Symantec, and IBM are still in the Leaders quadrant; HP (now Autonomy, an HP company) and CA are still in the Challengers quadrant; and NetApp is now all alone in the Visionary quadrant. The Niche Players in this Magic Quadrant are Acronis; Asigra; EVault, a Seagate Company; FalconStor Software; Quest Software and Syncsort.
However, Gartner did change some of its strategic planning assumptions (that is, predictions) between last year and this. While it still believes that by 2014, 80% of the market will choose advanced, disk-based appliances and backup software-only solutions over distributed VTLs, it now believes that one-third, not 30%, of organizations will change backup vendors due to frustration over cost, complexity and/or capability, and it will be by 2016, not 2014.
It also now believes that by 2015, at least 25% of large enterprises will have given up on conventional backup/recovery software, and will use snapshot and replication techniques instead, and that by the end of 2016, at least 45% of large enterprises, up from 22% at year-end 2011, will have eliminated tape for operational recovery.
And it is no longer predicting that by 2014, deduplication will cease to be available as a stand-alone; rather, it will become a feature of broader data storage and data management solutions; while the company didn’t say so explicitly in this report, product descriptions seemed to indicate this has largely taken place already.
Other interesting tidbits from the report include:
- Gartner end-user inquiry call volume regarding backup has been rising at about 20% each year for the past four years.
- “The rising frustration with backup implies that the data protection approaches of the past may no longer suffice in meeting current, much less future, recovery requirements,” Gartner notes. “As such, companies are willing to adopt new technologies and products from new vendors, and have shown an
- increased willingness to switch backup/recovery providers to better meet their increasing service levels.”
- Companies are increasingly considering cloud-based recovery systems, predominantly for midsize enterprise servers and branch-office and desktop/laptop data.
- Symantec currently owns 34.1% of the market, which has decreased over the past five years. IBM and EMC have 17.3% and 17.0% market share, respectively.
- No other vendor has more than a 7% market share.
- In 2011, CommVault and EMC increased their market shares.
- Along with Symantec, CA Technologies, IBM and Quest Software slid slightly in market share in 2011.
Gartner also identified five trends that it believes will emerge over the next several years:
- Re-expanding the number of backup solutions and technologies
- Backup application switching
- Decreasing backup data retention
- Backup modernization
- Deployment of new technologies and vendors
Oh, hey, just put all your stuff in the cloud. It has IT people to watch it and take care of it, and if there’s any sort of problem, it’s replicated in other places so you’ll still have access to it whenever you want.
It turns out, not so much.
Major Internet services such as Netflix, Pinterest, and Instagram were taken offline on Friday night due to thunderstorms that took out power to the Amazon Web Services site in Virginia.
“Amazon’s Cloud services status page was full of power-related error messages,” wrote MSNBC’s Bob Sullivan. “Amazon’s ElastiCache, for example, indicated that starting at 8:43 p.m., the service was “affected by a power event.” At 9:25 p.m., this message was posted: “We can confirm that a large number of cache clusters are impaired. We are actively working on recovering them.””
But, oh, Sullivan continued, this wasn’t really Amazon’s fault, because the storm was just so very bad. Why Amazon didn’t have backups or replicated copies elsewhere on the network, he didn’t say. (Apparently the problem may have been a routing issue.)
“Outages like this morning’s are a reminder of how fragile, still, our digital architectures actually are,” intoned The Atlantic‘s Megan Garber. “As much as we try to bolster them against the elements, they are made of sand, not stone. Buildings can be brought down by storms; but so, today reminds us, can their digital counterparts. Even the structures that lack structures can be torn by nature’s whims. That is, in its way, terrifying. And yet — here’s the other sliver — it is also, just a tiny bit, reassuring. No matter how advanced we get, today reminds us, nature will always be one step ahead.”
This is probably not a great consolation to those companies that have moved their operations to the cloud because they were assured it would still be there in a disaster. And this is a disaster? A thunderstorm (albeit one that has caused at least a dozen deaths)? Are those companies feeling “reassured” today by discovering that their disaster recovery systems are, in fact, vulnerable to an outage that might be in a completely different part of the country?
What happens if there’s an earthquake, hurricane, or some more severe natural disaster? If the Red Cross or FEMA loses its connectivity because it depended on the cloud, will its managers philosophically fold their hands and talk about how in the great scheme of things this shows just how little we all are? Will the people asking to be saved or helped see it this way?
Instead, hopefully Amazon and other cloud providers will take this as a wake-up call before hurricane season gets going, and ensure that the virtual cloud can stand up to the real thing.
In a move that has been expected for the past month — and desired for much longer than that — Google has made it possible for Google Docs users to edit documents offline and then synchronize them when the user logs back in again. Google called offline editing “one of its most requested features” for Google Drive, which the company said has 10 million users.
At present, the function works only with Google Docs — that is to say, document files using Google Drive — but is expected to be available for spreadsheets and presentations at some point in the future, Google said in its instructions. It is also only available for the Chrome browser, and Google didn’t say whether it expected to make the feature available to other browsers.
Pundits are claiming this functionality will negatively affect other consumer cloud storage systems, such as Dropbox. “Google Kneecapped Dropbox,” proclaimed Business Insider.
The one other point that’s worth noting is that changes made to the online file while the user is offline take precedence over whatever changes the offline user makes.
“If an online collaborator deletes the text you edit while offline, their changes will override yours. If a collaborator deletes the document you’re editing offline, your changes will be lost when you come back online because the document will no longer exist. Try to use offline editing for documents that you own and that won’t be deleted without your knowledge.”
Well, yeah, but that sort of defeats the purpose of using Google Drive for collaboration in the first place, doesn’t it?
Another point — only enable Google Drive on your own computers.
“Enabling offline access on public or shared computers can put your data at risk, since others may be able to view your synced Google documents and spreadsheets.”
Which also sort of seems to defeat the purpose — wouldn’t a great use case for this feature be “I’m traveling without my computer or it broke, and so I’m using somebody else’s”? Might be nice to have a “Mr. Phelps” feature that automatically deleted itself after syncing a file, or on command.
Needless to say, features that require the Internet won’t be available online — sharing, publishing, reporting a problem, etc. Surprisingly, however, so is inserting an image or a picture.
Google made the announcement at its sold-out I/O conference in San Francisco.
As you may recall, a few months back I did a piece on Massachusetts Governor and candidate (now Republican presidential nominee) Mitt Romney, talking about how he not only deleted all his email messages when he left office, but had his staff members buy 17 hard disk drives. used in his office (reportedly spending $100,000 of government funds to do so).
Turns out, he wasn’t quite thorough enough. The Wall Street Journal, upon discovering that email of one cabinet member, then-Administration and Finance Secretary Thomas Trimarco, had been accidentally retained, made a public records request for copies of emails between Trimarco and top Romney officials, and reportedly got 73 pages’ worth.
Some of them are available here but it doesn’t really matter. They’re about the implementation of his health care plan (aka “Romneycare”), and honestly, I don’t care what they’re about. What’s interesting is the process of finding them, and how even someone who went to as much care as Romney to scrub his past was still tripped up by missing one guy’s email cache.
The Journal article, by Mark Maremont, also mentioned in passing that Romney occasionally used a private email account for discussing official business, which is non-optimal on a business basis, let alone politically. Moreover, such a tactic doesn’t protect a person from an electronic discovery request; typically they cover all email accounts that a person might use, not just the business one — it just makes complying with the request more of a challenge for the IT department.
This wasn’t the only recent news on the Romney disk drive front. A few days later, while leaving a note to journalists following his campaign teasing them about the cushy bus they got to travel in, he added, “PS — erased your hard drives.”
While he obviously didn’t do that, the note gave all the reporters the opportunity to rehash the erased disk drive story from his gubernatorial days, as well as gave President Obama’s campaign the opportunity to criticize him.
“Mitt Romney may joke about how his staff erased government hard drives to keep his records secret, but what we do know about his record as Governor is anything but funny — he left the state 47th in job creation and number one in per capita debt in the nation,” Obama spokesman Danny Kanner said in a statement.
With data centers increasingly being built in less-urban areas, and with the increasing number of wildfires in recent years, this sort of disaster needs to be added to the panoply of hurricane/tornado/earthquake for disaster recovery.
Last summer, the data center at the Los Alamos National Lab in New Mexico was surrounded on two sides by a 60,000-acre fire, while in 2007, the data center at Pepperdine University was threatened by a 1,200-acre fire in Malibu, Calif., that came within 100 feet of it.
More recently, we’ve had the fires in Colorado, which led one data center manager to post to Slashdot asking for advice. While there were the usual number of jokes, tangents, and speculation about his motives, there was also useful advice for data center managers as fire season approaches. (And most of this advice is useful for disasters in general.)
- Have a disaster recovery plan and make sure it’s updated — for example, are all the contacts and their phone numbers correct? “DR plans are a living document that should be updated for every significant change to your infrastructure,” noted Slashdot user Macgrrl. “They should have an annual ‘trial run’ to see if they work. The worst time to find out your DR plan doesn’t work is in an actual disaster event.”
- Priorities are people, data, equipment.
- This is one advantage of using the cloud — data is by definition offsite.
- Perform regular offsite backups.
- Make sure the network is documented and up-to-date, with the documentation available electronically and offsite. Save configuration settings to a text file and store it both electronically and on paper.
- Label everything — including AC adapters to keep from zapping things afterwards.
- Take pictures of the cabling for documentation purposes.
- If you have to save equipment, focus on disk drives and servers first. And keep in mind that insurance that reimburses for equipment lost in a fire might not reimburse for equipment damaged in a bugout.
- To save time, use wire cutters to disconnect cables (*not* power cables!).
- Cover things you’ve left behind with plastic or trash bags to help protect them from water and smoke.
- Consider setting up your data center to be portable in the first place — set it up in a shipping container, put racks on wheels (and make sure doors are wide enough to move them through and you have a forklift if necessary), use quick-disconnect hard drive enclosures, buy a truck or van to store onsite, etc.
“Any disaster plan should be able to cope with ‘and then a giant foot appeared above the building and squished it flat,'” noted Slashdot user GirlInTraining. “Yours should be no different. It might not be a wild fire that threatens your servers… it could be a UPS that shorts out, or a tornado, flood, a failed fire suppression unit, or simple human incompetence.”
When you have a whole lot of stuff, you have two choices. You can get one really, really big box. Or, you can get a whole lot of little boxes, and find ways to use them efficiently — like having them all be the same size so they’re interchangeable, and finding a good way at indexing the stuff in them so you can find it. And if you can solve the latter problem, little boxes tend to be a lot cheaper than big ones, and a lot more versatile.
This works for anything, whether you’re talking about logistics shipping to the Gulf War, moving cross-country, or organizing the pantry. It’s also the same theory behind virtualization — if you get a whole bunch of little processors working together well enough, they’re at least as good as one big processor, because you can keep adding little processors to them.
Traditionally, storage companies have worked by making bigger and bigger boxes; it’s part of what has kept companies like EMC and IBM in business, because really big boxes cost a lot of money.
However, we’re increasingly seeing cases where users are, instead, getting a whole bunch of little boxes to work together. It’s only worth the effort if you are, yourself, a great big company, so that a) you have the expertise around to hire people to get the little boxes to work together better and b) buying all the big boxes you would need just costs too darn much and it actually does save you money to find a way to let little boxes do it.
This is where companies like Facebook, Backblaze, and now Netflix come in. (And, likely, companies such as Google, but they don’t talk about it — though if you Google YouTube and “content delivery network” on the Google site, you sure end up with a lot of interesting patents.)
Backblaze has been patting itself on the, well, back for being the inspiration behind Netflix’ move, but really, the credit goes to the moving companies that figured out that, instead of sending gigantic trucks to all sorts of places to pick up stuff to move it, instead they should send a bunch of storage containers to the people who are moving, let the people fill them up, and then drive around and pick up all the storage containers. This was called a pod, and Backblaze called its similar system — a standardized bunch of storage and hardware and software to manage it — a Storage Pod.
(When you think about it, we’re even moving that way with coffee, with those little K -Cup things. And, really, it’s how the Internet itself works — instead of trying to send one large message, it breaks all the messages up into packets of the same size and then reassembles them at the other end, because the simplicity of only having to transport a single size of packets is worth the effort to break the message up and reassemble it.)
So, what Netflix decided was, rather than building a centralized gigundo data center with a ton of storage in it to hold all the movies, instead it would build a whole bunch of standardized pods — which it is calling the Open Connect Architecture — and placing the pods all over the country so the data doesn’t have to go as far.
True, if you’re renting something esoteric they’ll probably have it in some main office somewhere, but it’s a pretty safe bet that you’re renting one of the ten most popular movies of the past six months. It’s basically the same method behind Redbox — take care of the 80% of movie watchers and then figure out how to deal with the other 20%. So far Netflix is only taking care of 5% of its data this way, but it expects to ship most of its data this way in the future.
This sort of system only works if you’re really big — in the case of Netflix, streaming a billion hours of programming a month. The announcement hammered the stock of the vendors Netflix has been using to deliver its content, and there’s some dire warnings about what it means to big storage vendors like EMC, but, practically speaking, most companies just don’t operate on the economies of scale to make it worthwhile to do on their own.
A year after releasing its first Magic Quadrant in e-discovery, Gartner has released a new one with big changes — and it has only itself to blame.
In that MQ, Gartner predicted that a quarter of all e-discovery companies would be consolidated by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. It also helpfully produced a list of vendors that could be acquired.
Consequently, this year’s report noted a number of acquisitions, including CaseCentral and Clearwell. The Clearwell acquisition, by Symantec, also pushed Symantec into the head position in the Leaders quadrant, from its position in the Challengers quadrant the year before.
Another big acquisition in the past year was the admittedly criticized purchase by HP of Autonomy. The company is considered independent enough from HP that it is still referred to as Autonomy in the report, and it appears to have improved its position since last year, with Gartner noting it is now being sold through the channel as well as direct.
And the acquisitions aren’t over, Gartner says.
Big vendors — such as HP, Symantec, IBM and EMC — have made acquisitions in this space and we expect that other big players will do the same, or build offerings of their own within the next 12 to 24 months. The next big round of acquisitions will be of legal review tools, with the capacity to perform the review, analysis and production functions carried out by lawyers and paralegals, in service firms, law firms, corporations and government agencies.
The functionality of the existing products is also expected to change, Gartner says.
This year, we expect to see a consolidation of functionality to deal with electronic information across a spectrum that includes identification, preservation, collection, ECA or early data assessment, processing, review, analysis and production of data. The market will contain software pure-plays (e-discovery only), as well as product groups or divisions in large well-known IT providers.
It’s slated to be a fast-growing market, though, with Gartner estimating that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010, with the five-year CAGR to 2015 to be approximately 16%.
The industry is also likely to see growth in e-discovery outside the U.S., where it has primarily been based. While the U.S. accounted for 85% of market revenue in 2010, vendor revenues outside the U.S. almost doubled between 2009 and 2010, Gartner noted, adding that many vendors will realize up to a third of their revenue outside North America during the next three years. Gartner also expects vendors in other areas, including enterprise information archiving, enterprise content management, enterprise search and content analytics, to start adding e-discovery functionality.
Gartner also emphasized that the E-Discovery Reference Model was playing more of a role in e-discovery, with users increasingly wanting vendors to support it.
Finally, e-discovery and the costs around it may end up encouraging users to delete outdated data — with the benefit of saving money on storage, Gartner said.
While the White House released its digital government plan last week, it appears to have left out one major factor: just where the heck all that data is going to be stored, especially when storage already appears to be an issue for federal agencies, according to a recent survey.
The Digital Government plan doesn’t even mention the word “storage,” even though open data accessible to everyone is one of the linchpins of the plan.
But a recent survey by MeriTalk of 151 federal government IT professionals about big data found that storage was already an issue.
Factors found in the survey indicate the following:
- 87% of IT professionals say their stored data has grown in the last two years (by an average of 61%)
- 96% expect their data to grow in the next two years (by an average of 64%)
- 31% of data is unstructured, and that amount is increasing
- Agencies estimate they have just 49% of the data storage/access they need to leverage big data and drive mission results
- 40% of respondents pointed to storage capacity as one of the most significant challenges their agency faced when it came to managing large amounts of data
- Agencies currently store an average of 1.61 petabytes of data, but expect to get to 2.63 petabytes in just the next two years
- 57% of agencies say they have at least one dataset that’s grown too big to work with using their current data management tools and/or infrastructure
- While 64% of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand, they estimate10 months as the average time they could double their short-to medium-term capacity
- The #1 step that agencies say they are taking to improve their ability to manage and make decisions with big data is to invest in IT infrastructure to optimize data storage (39%)
A few weeks back, I picked on disaster recovery vendors that wait until there’s been a natural disaster and then use that to promote their services. “If one wants to offer such a service to one’s clients, how about issuing a generic press release at the beginning of the disaster seasons so that it looks less like a vendor exploiting a particular tragedy?” I suggested.
Well, someone did that, so I need to encourage the behavior I want to see.
The Games of the XXX Olympiad (that’s 30th, for those of you who don’t do Roman numerals) are scheduled for July 27 through August 12 this year in London. But that’s not the only event London is hosting this summer, or, arguably, even the biggest; it’s also the year of mayoral elections, as well as Queen Elizabeth II’s Diamond Jubilee next month, marking the 60th anniversary of her reign.
To tell you something about how special this is, the Queen is the only British monarch to celebrate a Diamond Jubilee, other than Queen Victoria in 1897.
Bad time for a data center outage.
The city department in charge of the data center is the Greater London Authority (GLA), a strategic and delivery authority with the role of designing a better future for the capital. The GLA’s Technology Team, based in City Hall, provides IT support for the Mayor’s Office, the London Assembly, and the GLA’s staff.
Four years ago, just before a previous mayoral election, a burst water main in a nearby street cut power supplies to City Hall and caused major disruption. Since then, the GLA manages six times as much data, making a potential outage that much more devastating.
To prepare for this, the GLA — working with Cristie Data, a Stroud, Gloucestershire provider — implemented a disaster recovery infrastructure that will enable it to get its IT systems up and running in four hours, compared with the more than three days that it had previously taken when it was based on backup magnetic tapes.
The new infrastructure incorporates FalconStor storage-management and replication technology with Nexsan E60 disk-storage systems. The FalconStor product virtualizes the GLA’s storage environment and replicates it across a shared metropolitan area network to London‘s data center in Woking, Surrey, about 20 miles away. That’s where the Nexsan E60 storage hosts the GLA’s replicated environment. Virtualization means that the storage network can be managed from one interface. If there’s a disaster, FalconStor’s RecoverTrac technology automates recovery of the IT environment.
On top of providing improved disaster recovery ability, it is estimated that the new infrastructure will save the GLA £90,000 a year, providing payback within four years. Of course, should the GLA have to invoke its disaster-recovery systems for real, the cost would be recovered a lot quicker — like, immediately.