Oh, hey, just put all your stuff in the cloud. It has IT people to watch it and take care of it, and if there’s any sort of problem, it’s replicated in other places so you’ll still have access to it whenever you want.
It turns out, not so much.
Major Internet services such as Netflix, Pinterest, and Instagram were taken offline on Friday night due to thunderstorms that took out power to the Amazon Web Services site in Virginia.
“Amazon’s Cloud services status page was full of power-related error messages,” wrote MSNBC’s Bob Sullivan. “Amazon’s ElastiCache, for example, indicated that starting at 8:43 p.m., the service was “affected by a power event.” At 9:25 p.m., this message was posted: “We can confirm that a large number of cache clusters are impaired. We are actively working on recovering them.””
But, oh, Sullivan continued, this wasn’t really Amazon’s fault, because the storm was just so very bad. Why Amazon didn’t have backups or replicated copies elsewhere on the network, he didn’t say. (Apparently the problem may have been a routing issue.)
“Outages like this morning’s are a reminder of how fragile, still, our digital architectures actually are,” intoned The Atlantic‘s Megan Garber. “As much as we try to bolster them against the elements, they are made of sand, not stone. Buildings can be brought down by storms; but so, today reminds us, can their digital counterparts. Even the structures that lack structures can be torn by nature’s whims. That is, in its way, terrifying. And yet — here’s the other sliver — it is also, just a tiny bit, reassuring. No matter how advanced we get, today reminds us, nature will always be one step ahead.”
This is probably not a great consolation to those companies that have moved their operations to the cloud because they were assured it would still be there in a disaster. And this is a disaster? A thunderstorm (albeit one that has caused at least a dozen deaths)? Are those companies feeling “reassured” today by discovering that their disaster recovery systems are, in fact, vulnerable to an outage that might be in a completely different part of the country?
What happens if there’s an earthquake, hurricane, or some more severe natural disaster? If the Red Cross or FEMA loses its connectivity because it depended on the cloud, will its managers philosophically fold their hands and talk about how in the great scheme of things this shows just how little we all are? Will the people asking to be saved or helped see it this way?
Instead, hopefully Amazon and other cloud providers will take this as a wake-up call before hurricane season gets going, and ensure that the virtual cloud can stand up to the real thing.
In a move that has been expected for the past month — and desired for much longer than that — Google has made it possible for Google Docs users to edit documents offline and then synchronize them when the user logs back in again. Google called offline editing “one of its most requested features” for Google Drive, which the company said has 10 million users.
At present, the function works only with Google Docs — that is to say, document files using Google Drive — but is expected to be available for spreadsheets and presentations at some point in the future, Google said in its instructions. It is also only available for the Chrome browser, and Google didn’t say whether it expected to make the feature available to other browsers.
Pundits are claiming this functionality will negatively affect other consumer cloud storage systems, such as Dropbox. “Google Kneecapped Dropbox,” proclaimed Business Insider.
The one other point that’s worth noting is that changes made to the online file while the user is offline take precedence over whatever changes the offline user makes.
“If an online collaborator deletes the text you edit while offline, their changes will override yours. If a collaborator deletes the document you’re editing offline, your changes will be lost when you come back online because the document will no longer exist. Try to use offline editing for documents that you own and that won’t be deleted without your knowledge.”
Well, yeah, but that sort of defeats the purpose of using Google Drive for collaboration in the first place, doesn’t it?
Another point — only enable Google Drive on your own computers.
“Enabling offline access on public or shared computers can put your data at risk, since others may be able to view your synced Google documents and spreadsheets.”
Which also sort of seems to defeat the purpose — wouldn’t a great use case for this feature be “I’m traveling without my computer or it broke, and so I’m using somebody else’s”? Might be nice to have a “Mr. Phelps” feature that automatically deleted itself after syncing a file, or on command.
Needless to say, features that require the Internet won’t be available online — sharing, publishing, reporting a problem, etc. Surprisingly, however, so is inserting an image or a picture.
Google made the announcement at its sold-out I/O conference in San Francisco.
As you may recall, a few months back I did a piece on Massachusetts Governor and candidate (now Republican presidential nominee) Mitt Romney, talking about how he not only deleted all his email messages when he left office, but had his staff members buy 17 hard disk drives. used in his office (reportedly spending $100,000 of government funds to do so).
Turns out, he wasn’t quite thorough enough. The Wall Street Journal, upon discovering that email of one cabinet member, then-Administration and Finance Secretary Thomas Trimarco, had been accidentally retained, made a public records request for copies of emails between Trimarco and top Romney officials, and reportedly got 73 pages’ worth.
Some of them are available here but it doesn’t really matter. They’re about the implementation of his health care plan (aka “Romneycare”), and honestly, I don’t care what they’re about. What’s interesting is the process of finding them, and how even someone who went to as much care as Romney to scrub his past was still tripped up by missing one guy’s email cache.
The Journal article, by Mark Maremont, also mentioned in passing that Romney occasionally used a private email account for discussing official business, which is non-optimal on a business basis, let alone politically. Moreover, such a tactic doesn’t protect a person from an electronic discovery request; typically they cover all email accounts that a person might use, not just the business one — it just makes complying with the request more of a challenge for the IT department.
This wasn’t the only recent news on the Romney disk drive front. A few days later, while leaving a note to journalists following his campaign teasing them about the cushy bus they got to travel in, he added, “PS — erased your hard drives.”
While he obviously didn’t do that, the note gave all the reporters the opportunity to rehash the erased disk drive story from his gubernatorial days, as well as gave President Obama’s campaign the opportunity to criticize him.
“Mitt Romney may joke about how his staff erased government hard drives to keep his records secret, but what we do know about his record as Governor is anything but funny — he left the state 47th in job creation and number one in per capita debt in the nation,” Obama spokesman Danny Kanner said in a statement.
With data centers increasingly being built in less-urban areas, and with the increasing number of wildfires in recent years, this sort of disaster needs to be added to the panoply of hurricane/tornado/earthquake for disaster recovery.
Last summer, the data center at the Los Alamos National Lab in New Mexico was surrounded on two sides by a 60,000-acre fire, while in 2007, the data center at Pepperdine University was threatened by a 1,200-acre fire in Malibu, Calif., that came within 100 feet of it.
More recently, we’ve had the fires in Colorado, which led one data center manager to post to Slashdot asking for advice. While there were the usual number of jokes, tangents, and speculation about his motives, there was also useful advice for data center managers as fire season approaches. (And most of this advice is useful for disasters in general.)
- Have a disaster recovery plan and make sure it’s updated — for example, are all the contacts and their phone numbers correct? “DR plans are a living document that should be updated for every significant change to your infrastructure,” noted Slashdot user Macgrrl. “They should have an annual ‘trial run’ to see if they work. The worst time to find out your DR plan doesn’t work is in an actual disaster event.”
- Priorities are people, data, equipment.
- This is one advantage of using the cloud — data is by definition offsite.
- Perform regular offsite backups.
- Make sure the network is documented and up-to-date, with the documentation available electronically and offsite. Save configuration settings to a text file and store it both electronically and on paper.
- Label everything — including AC adapters to keep from zapping things afterwards.
- Take pictures of the cabling for documentation purposes.
- If you have to save equipment, focus on disk drives and servers first. And keep in mind that insurance that reimburses for equipment lost in a fire might not reimburse for equipment damaged in a bugout.
- To save time, use wire cutters to disconnect cables (*not* power cables!).
- Cover things you’ve left behind with plastic or trash bags to help protect them from water and smoke.
- Consider setting up your data center to be portable in the first place — set it up in a shipping container, put racks on wheels (and make sure doors are wide enough to move them through and you have a forklift if necessary), use quick-disconnect hard drive enclosures, buy a truck or van to store onsite, etc.
“Any disaster plan should be able to cope with ‘and then a giant foot appeared above the building and squished it flat,’” noted Slashdot user GirlInTraining. “Yours should be no different. It might not be a wild fire that threatens your servers… it could be a UPS that shorts out, or a tornado, flood, a failed fire suppression unit, or simple human incompetence.”
When you have a whole lot of stuff, you have two choices. You can get one really, really big box. Or, you can get a whole lot of little boxes, and find ways to use them efficiently — like having them all be the same size so they’re interchangeable, and finding a good way at indexing the stuff in them so you can find it. And if you can solve the latter problem, little boxes tend to be a lot cheaper than big ones, and a lot more versatile.
This works for anything, whether you’re talking about logistics shipping to the Gulf War, moving cross-country, or organizing the pantry. It’s also the same theory behind virtualization – if you get a whole bunch of little processors working together well enough, they’re at least as good as one big processor, because you can keep adding little processors to them.
Traditionally, storage companies have worked by making bigger and bigger boxes; it’s part of what has kept companies like EMC and IBM in business, because really big boxes cost a lot of money.
However, we’re increasingly seeing cases where users are, instead, getting a whole bunch of little boxes to work together. It’s only worth the effort if you are, yourself, a great big company, so that a) you have the expertise around to hire people to get the little boxes to work together better and b) buying all the big boxes you would need just costs too darn much and it actually does save you money to find a way to let little boxes do it.
This is where companies like Facebook, Backblaze, and now Netflix come in. (And, likely, companies such as Google, but they don’t talk about it — though if you Google YouTube and “content delivery network” on the Google site, you sure end up with a lot of interesting patents.)
Backblaze has been patting itself on the, well, back for being the inspiration behind Netflix’ move, but really, the credit goes to the moving companies that figured out that, instead of sending gigantic trucks to all sorts of places to pick up stuff to move it, instead they should send a bunch of storage containers to the people who are moving, let the people fill them up, and then drive around and pick up all the storage containers. This was called a pod, and Backblaze called its similar system — a standardized bunch of storage and hardware and software to manage it — a Storage Pod.
(When you think about it, we’re even moving that way with coffee, with those little K -Cup things. And, really, it’s how the Internet itself works — instead of trying to send one large message, it breaks all the messages up into packets of the same size and then reassembles them at the other end, because the simplicity of only having to transport a single size of packets is worth the effort to break the message up and reassemble it.)
So, what Netflix decided was, rather than building a centralized gigundo data center with a ton of storage in it to hold all the movies, instead it would build a whole bunch of standardized pods — which it is calling the Open Connect Architecture — and placing the pods all over the country so the data doesn’t have to go as far.
True, if you’re renting something esoteric they’ll probably have it in some main office somewhere, but it’s a pretty safe bet that you’re renting one of the ten most popular movies of the past six months. It’s basically the same method behind Redbox — take care of the 80% of movie watchers and then figure out how to deal with the other 20%. So far Netflix is only taking care of 5% of its data this way, but it expects to ship most of its data this way in the future.
This sort of system only works if you’re really big — in the case of Netflix, streaming a billion hours of programming a month. The announcement hammered the stock of the vendors Netflix has been using to deliver its content, and there’s some dire warnings about what it means to big storage vendors like EMC, but, practically speaking, most companies just don’t operate on the economies of scale to make it worthwhile to do on their own.
A year after releasing its first Magic Quadrant in e-discovery, Gartner has released a new one with big changes — and it has only itself to blame.
In that MQ, Gartner predicted that a quarter of all e-discovery companies would be consolidated by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. It also helpfully produced a list of vendors that could be acquired.
Consequently, this year’s report noted a number of acquisitions, including CaseCentral and Clearwell. The Clearwell acquisition, by Symantec, also pushed Symantec into the head position in the Leaders quadrant, from its position in the Challengers quadrant the year before.
Another big acquisition in the past year was the admittedly criticized purchase by HP of Autonomy. The company is considered independent enough from HP that it is still referred to as Autonomy in the report, and it appears to have improved its position since last year, with Gartner noting it is now being sold through the channel as well as direct.
And the acquisitions aren’t over, Gartner says.
Big vendors — such as HP, Symantec, IBM and EMC — have made acquisitions in this space and we expect that other big players will do the same, or build offerings of their own within the next 12 to 24 months. The next big round of acquisitions will be of legal review tools, with the capacity to perform the review, analysis and production functions carried out by lawyers and paralegals, in service firms, law firms, corporations and government agencies.
The functionality of the existing products is also expected to change, Gartner says.
This year, we expect to see a consolidation of functionality to deal with electronic information across a spectrum that includes identification, preservation, collection, ECA or early data assessment, processing, review, analysis and production of data. The market will contain software pure-plays (e-discovery only), as well as product groups or divisions in large well-known IT providers.
It’s slated to be a fast-growing market, though, with Gartner estimating that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010, with the five-year CAGR to 2015 to be approximately 16%.
The industry is also likely to see growth in e-discovery outside the U.S., where it has primarily been based. While the U.S. accounted for 85% of market revenue in 2010, vendor revenues outside the U.S. almost doubled between 2009 and 2010, Gartner noted, adding that many vendors will realize up to a third of their revenue outside North America during the next three years. Gartner also expects vendors in other areas, including enterprise information archiving, enterprise content management, enterprise search and content analytics, to start adding e-discovery functionality.
Gartner also emphasized that the E-Discovery Reference Model was playing more of a role in e-discovery, with users increasingly wanting vendors to support it.
Finally, e-discovery and the costs around it may end up encouraging users to delete outdated data — with the benefit of saving money on storage, Gartner said.
While the White House released its digital government plan last week, it appears to have left out one major factor: just where the heck all that data is going to be stored, especially when storage already appears to be an issue for federal agencies, according to a recent survey.
The Digital Government plan doesn’t even mention the word “storage,” even though open data accessible to everyone is one of the linchpins of the plan.
But a recent survey by MeriTalk of 151 federal government IT professionals about big data found that storage was already an issue.
Factors found in the survey indicate the following:
- 87% of IT professionals say their stored data has grown in the last two years (by an average of 61%)
- 96% expect their data to grow in the next two years (by an average of 64%)
- 31% of data is unstructured, and that amount is increasing
- Agencies estimate they have just 49% of the data storage/access they need to leverage big data and drive mission results
- 40% of respondents pointed to storage capacity as one of the most significant challenges their agency faced when it came to managing large amounts of data
- Agencies currently store an average of 1.61 petabytes of data, but expect to get to 2.63 petabytes in just the next two years
- 57% of agencies say they have at least one dataset that’s grown too big to work with using their current data management tools and/or infrastructure
- While 64% of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand, they estimate10 months as the average time they could double their short-to medium-term capacity
- The #1 step that agencies say they are taking to improve their ability to manage and make decisions with big data is to invest in IT infrastructure to optimize data storage (39%)
A few weeks back, I picked on disaster recovery vendors that wait until there’s been a natural disaster and then use that to promote their services. “If one wants to offer such a service to one’s clients, how about issuing a generic press release at the beginning of the disaster seasons so that it looks less like a vendor exploiting a particular tragedy?” I suggested.
Well, someone did that, so I need to encourage the behavior I want to see.
The Games of the XXX Olympiad (that’s 30th, for those of you who don’t do Roman numerals) are scheduled for July 27 through August 12 this year in London. But that’s not the only event London is hosting this summer, or, arguably, even the biggest; it’s also the year of mayoral elections, as well as Queen Elizabeth II’s Diamond Jubilee next month, marking the 60th anniversary of her reign.
To tell you something about how special this is, the Queen is the only British monarch to celebrate a Diamond Jubilee, other than Queen Victoria in 1897.
Bad time for a data center outage.
The city department in charge of the data center is the Greater London Authority (GLA), a strategic and delivery authority with the role of designing a better future for the capital. The GLA’s Technology Team, based in City Hall, provides IT support for the Mayor’s Office, the London Assembly, and the GLA’s staff.
Four years ago, just before a previous mayoral election, a burst water main in a nearby street cut power supplies to City Hall and caused major disruption. Since then, the GLA manages six times as much data, making a potential outage that much more devastating.
To prepare for this, the GLA — working with Cristie Data, a Stroud, Gloucestershire provider — implemented a disaster recovery infrastructure that will enable it to get its IT systems up and running in four hours, compared with the more than three days that it had previously taken when it was based on backup magnetic tapes.
The new infrastructure incorporates FalconStor storage-management and replication technology with Nexsan E60 disk-storage systems. The FalconStor product virtualizes the GLA’s storage environment and replicates it across a shared metropolitan area network to London‘s data center in Woking, Surrey, about 20 miles away. That’s where the Nexsan E60 storage hosts the GLA’s replicated environment. Virtualization means that the storage network can be managed from one interface. If there’s a disaster, FalconStor’s RecoverTrac technology automates recovery of the IT environment.
On top of providing improved disaster recovery ability, it is estimated that the new infrastructure will save the GLA £90,000 a year, providing payback within four years. Of course, should the GLA have to invoke its disaster-recovery systems for real, the cost would be recovered a lot quicker — like, immediately.
Postscript to the “Prince of Persia” backup recovery story from a month ago — in the process of researching it, I ran into a similar story about the time the people makingToy Story 2 almost lost the entire film due to lack of a backup. Interesting, I thought, but since it was tangential to the story I was writing, I didn’t include it.
Guess I should have. Slate, in writing an article about a copy of The Avengers almost being deleted, mentioned the Toy Story 2 episode in passing in the process — it was actually included as a special feature on the DVD — and suddenly it’s all over the place, though the story goes back to at least 2010.
It’s a teaser for a longer story on the Toy Story 2 DVD, which I watched with my son this weekend. It starts:
“When making a film like Toy Story 2, we use a bunch of UNIX and Linux machines. On those kinds of machines there’s a command, RM*, that removes everything on the filesystem as fast as it can.”
“Somebody had run RM* on the drive where all the Toy Story 2 files were kept, and things just started to disappear.””
In the process of trying to recover the two years’ worth of work on the film, the company discovered that two months of backups were corrupt, and it had no viable backups — which might have delayed the film by as much as a year.
Fortunately, Galyn Susman, visual arts director at Pixar, had just had a baby, and in setting up a system she could work on from home, had a copy of the film.
(Slate also has a copy of the 2 1/2-minute film from the DVD, which has since been deleted from the 2010 story.)
Happy ending, but he whole story is quite a comedy of errors.
[I]f you do enter a mistaken rm *, DON’T UNPLUG THE COMPUTER, YOU IDIOT!! That will just damage the file system and won’t be quick enough to save any files. Hit Control-C. It’s much faster and safer, though even that will probably be too late.
But it took 20 seconds to delete all the files. That says there were a lotof files. It also says they were all in a flat structure with no subdirectories, since rm * doesn’t remove subdirectories. OK, maybe the command was really rm -r *, but the makers of the video were trying to keep things simple and dramatic. If you type rm -r *, think four times [before hitting Enter]. If it’s rm -rf *, make it at least six.
Then, instead of bringing a drive to Galyn’s house and copying the files onto it, they wrapped her computer — the one with the only copy in the world of a year’s worth of work — in blankets and drove it in a car to the studio…But at least they had an offsite backup, even if it was by chance.”
NoFilmSchool.com, a website devoted to digital filmmaking, goes into great detail about how to prevent this sort of problem. While some of it is geared specifically toward digital filmmaking, much of it applies to the average enterprise as well.
You know, there’s nothing like going to the movies to be reminded of how fast things advance in storage technology.
As part of seeing The Avengers at midnight on Friday, I attended a Marvel superhero marathon of Iron Man I and II, Thor, and Captain America. (Hulk came from another studio and was not included.) Over twelve hours in all of geekish wankery, and oddly enough I was the only middle-aged woman in the theatre.
Now, one of the truisms about science fiction — which is, essentially, what this is — is no matter how hard they try to make everything look Futuristic, there’s always tell-tale signs of the era whether it’s actually made, like the miniskirts in Star Trek (not to mention the attitudes about women, but let’s not go there). And Iron Man, made four years ago — an eternity in storage years — is no exception. There were times where it was an anachronistic as Captain America.
Let’s start with Tony Stark’s home lab where he’s designing the Iron Man suit Mark 2. He may have a snazzy 3D CAD setup with a Siri-like voice interface, but when it comes to storing the files, he has to stop to decide whether to put them on the office server or his home server — apparently this state-of-the-art facility wasn’t prescient enough to have thought of the cloud.
On the other hand, if Stark had been able to use the cloud, there would have been no need for the dramatic scene where Pepper slips into Tony’s office in corporate headquarters, attaches a thumb drive (with some sort of cryptographic thing that breaks into his system — but if it’s his system, why is it needed?) to his computer (because all weapons developers have unguarded USB slots on their CEO’s computer), and downloads the entire unencrypted contents of his hard drive onto the thumb drive, including the conveniently marked, easily discovered “ghost drive.”
This scene, of course, does point out the value of setting up proper security systems in your organization, as well as the inherent security flaws in thumb drives. And yes, I am an incredibly annoying person to watch movies with; why do you ask?
(There was, by the way, an Iron Man 4 GB thumb drive, unfortunately now sold out. Perfect for corporate espionage. And SanDisk sold a 4GB microSDHC in 2011 that included a copy of Iron Man 2 that you could watch from your Samsung Android smartphone.)
Though Stark obviously has a swell home office setup, including access to corporate databases, for some reason he doesn’t have access to the files on his own computer, either through the cloud or through a backup or replicated storage elsewhere. Nope, he’s got to resort to Sneakernet — or, in the case of Pepper Potts, High-Heelnet.
And let’s not even get into the scene where the computerized Jarvis is telling him, “But sir! We still have terabytes of data to download!” Heavens! Reminds me of watching the first episode of the original Battlestar Galactica where we were told that the threatening Cylon ships must be two microns long.
At least by Thor, released just last year, we have a scientist lamenting that government agents not only took her data, and her backups, but the backups of her backups.
Remind me in a couple of years; I’ll come back and take a look at The Avengers‘ data technology.