With data centers increasingly being built in less-urban areas, and with the increasing number of wildfires in recent years, this sort of disaster needs to be added to the panoply of hurricane/tornado/earthquake for disaster recovery.
Last summer, the data center at the Los Alamos National Lab in New Mexico was surrounded on two sides by a 60,000-acre fire, while in 2007, the data center at Pepperdine University was threatened by a 1,200-acre fire in Malibu, Calif., that came within 100 feet of it.
More recently, we’ve had the fires in Colorado, which led one data center manager to post to Slashdot asking for advice. While there were the usual number of jokes, tangents, and speculation about his motives, there was also useful advice for data center managers as fire season approaches. (And most of this advice is useful for disasters in general.)
- Have a disaster recovery plan and make sure it’s updated — for example, are all the contacts and their phone numbers correct? “DR plans are a living document that should be updated for every significant change to your infrastructure,” noted Slashdot user Macgrrl. “They should have an annual ‘trial run’ to see if they work. The worst time to find out your DR plan doesn’t work is in an actual disaster event.”
- Priorities are people, data, equipment.
- This is one advantage of using the cloud — data is by definition offsite.
- Perform regular offsite backups.
- Make sure the network is documented and up-to-date, with the documentation available electronically and offsite. Save configuration settings to a text file and store it both electronically and on paper.
- Label everything — including AC adapters to keep from zapping things afterwards.
- Take pictures of the cabling for documentation purposes.
- If you have to save equipment, focus on disk drives and servers first. And keep in mind that insurance that reimburses for equipment lost in a fire might not reimburse for equipment damaged in a bugout.
- To save time, use wire cutters to disconnect cables (*not* power cables!).
- Cover things you’ve left behind with plastic or trash bags to help protect them from water and smoke.
- Consider setting up your data center to be portable in the first place — set it up in a shipping container, put racks on wheels (and make sure doors are wide enough to move them through and you have a forklift if necessary), use quick-disconnect hard drive enclosures, buy a truck or van to store onsite, etc.
“Any disaster plan should be able to cope with ‘and then a giant foot appeared above the building and squished it flat,'” noted Slashdot user GirlInTraining. “Yours should be no different. It might not be a wild fire that threatens your servers… it could be a UPS that shorts out, or a tornado, flood, a failed fire suppression unit, or simple human incompetence.”
When you have a whole lot of stuff, you have two choices. You can get one really, really big box. Or, you can get a whole lot of little boxes, and find ways to use them efficiently — like having them all be the same size so they’re interchangeable, and finding a good way at indexing the stuff in them so you can find it. And if you can solve the latter problem, little boxes tend to be a lot cheaper than big ones, and a lot more versatile.
This works for anything, whether you’re talking about logistics shipping to the Gulf War, moving cross-country, or organizing the pantry. It’s also the same theory behind virtualization — if you get a whole bunch of little processors working together well enough, they’re at least as good as one big processor, because you can keep adding little processors to them.
Traditionally, storage companies have worked by making bigger and bigger boxes; it’s part of what has kept companies like EMC and IBM in business, because really big boxes cost a lot of money.
However, we’re increasingly seeing cases where users are, instead, getting a whole bunch of little boxes to work together. It’s only worth the effort if you are, yourself, a great big company, so that a) you have the expertise around to hire people to get the little boxes to work together better and b) buying all the big boxes you would need just costs too darn much and it actually does save you money to find a way to let little boxes do it.
This is where companies like Facebook, Backblaze, and now Netflix come in. (And, likely, companies such as Google, but they don’t talk about it — though if you Google YouTube and “content delivery network” on the Google site, you sure end up with a lot of interesting patents.)
Backblaze has been patting itself on the, well, back for being the inspiration behind Netflix’ move, but really, the credit goes to the moving companies that figured out that, instead of sending gigantic trucks to all sorts of places to pick up stuff to move it, instead they should send a bunch of storage containers to the people who are moving, let the people fill them up, and then drive around and pick up all the storage containers. This was called a pod, and Backblaze called its similar system — a standardized bunch of storage and hardware and software to manage it — a Storage Pod.
(When you think about it, we’re even moving that way with coffee, with those little K -Cup things. And, really, it’s how the Internet itself works — instead of trying to send one large message, it breaks all the messages up into packets of the same size and then reassembles them at the other end, because the simplicity of only having to transport a single size of packets is worth the effort to break the message up and reassemble it.)
So, what Netflix decided was, rather than building a centralized gigundo data center with a ton of storage in it to hold all the movies, instead it would build a whole bunch of standardized pods — which it is calling the Open Connect Architecture — and placing the pods all over the country so the data doesn’t have to go as far.
True, if you’re renting something esoteric they’ll probably have it in some main office somewhere, but it’s a pretty safe bet that you’re renting one of the ten most popular movies of the past six months. It’s basically the same method behind Redbox — take care of the 80% of movie watchers and then figure out how to deal with the other 20%. So far Netflix is only taking care of 5% of its data this way, but it expects to ship most of its data this way in the future.
This sort of system only works if you’re really big — in the case of Netflix, streaming a billion hours of programming a month. The announcement hammered the stock of the vendors Netflix has been using to deliver its content, and there’s some dire warnings about what it means to big storage vendors like EMC, but, practically speaking, most companies just don’t operate on the economies of scale to make it worthwhile to do on their own.
A year after releasing its first Magic Quadrant in e-discovery, Gartner has released a new one with big changes — and it has only itself to blame.
In that MQ, Gartner predicted that a quarter of all e-discovery companies would be consolidated by 2014, with the acquirers likely to be mainstream companies such as Hewlett-Packard, Oracle, Microsoft, and storage vendors. It also helpfully produced a list of vendors that could be acquired.
Consequently, this year’s report noted a number of acquisitions, including CaseCentral and Clearwell. The Clearwell acquisition, by Symantec, also pushed Symantec into the head position in the Leaders quadrant, from its position in the Challengers quadrant the year before.
Another big acquisition in the past year was the admittedly criticized purchase by HP of Autonomy. The company is considered independent enough from HP that it is still referred to as Autonomy in the report, and it appears to have improved its position since last year, with Gartner noting it is now being sold through the channel as well as direct.
And the acquisitions aren’t over, Gartner says.
Big vendors — such as HP, Symantec, IBM and EMC — have made acquisitions in this space and we expect that other big players will do the same, or build offerings of their own within the next 12 to 24 months. The next big round of acquisitions will be of legal review tools, with the capacity to perform the review, analysis and production functions carried out by lawyers and paralegals, in service firms, law firms, corporations and government agencies.
The functionality of the existing products is also expected to change, Gartner says.
This year, we expect to see a consolidation of functionality to deal with electronic information across a spectrum that includes identification, preservation, collection, ECA or early data assessment, processing, review, analysis and production of data. The market will contain software pure-plays (e-discovery only), as well as product groups or divisions in large well-known IT providers.
It’s slated to be a fast-growing market, though, with Gartner estimating that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010, with the five-year CAGR to 2015 to be approximately 16%.
The industry is also likely to see growth in e-discovery outside the U.S., where it has primarily been based. While the U.S. accounted for 85% of market revenue in 2010, vendor revenues outside the U.S. almost doubled between 2009 and 2010, Gartner noted, adding that many vendors will realize up to a third of their revenue outside North America during the next three years. Gartner also expects vendors in other areas, including enterprise information archiving, enterprise content management, enterprise search and content analytics, to start adding e-discovery functionality.
Gartner also emphasized that the E-Discovery Reference Model was playing more of a role in e-discovery, with users increasingly wanting vendors to support it.
Finally, e-discovery and the costs around it may end up encouraging users to delete outdated data — with the benefit of saving money on storage, Gartner said.
While the White House released its digital government plan last week, it appears to have left out one major factor: just where the heck all that data is going to be stored, especially when storage already appears to be an issue for federal agencies, according to a recent survey.
The Digital Government plan doesn’t even mention the word “storage,” even though open data accessible to everyone is one of the linchpins of the plan.
But a recent survey by MeriTalk of 151 federal government IT professionals about big data found that storage was already an issue.
Factors found in the survey indicate the following:
- 87% of IT professionals say their stored data has grown in the last two years (by an average of 61%)
- 96% expect their data to grow in the next two years (by an average of 64%)
- 31% of data is unstructured, and that amount is increasing
- Agencies estimate they have just 49% of the data storage/access they need to leverage big data and drive mission results
- 40% of respondents pointed to storage capacity as one of the most significant challenges their agency faced when it came to managing large amounts of data
- Agencies currently store an average of 1.61 petabytes of data, but expect to get to 2.63 petabytes in just the next two years
- 57% of agencies say they have at least one dataset that’s grown too big to work with using their current data management tools and/or infrastructure
- While 64% of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand, they estimate10 months as the average time they could double their short-to medium-term capacity
- The #1 step that agencies say they are taking to improve their ability to manage and make decisions with big data is to invest in IT infrastructure to optimize data storage (39%)
A few weeks back, I picked on disaster recovery vendors that wait until there’s been a natural disaster and then use that to promote their services. “If one wants to offer such a service to one’s clients, how about issuing a generic press release at the beginning of the disaster seasons so that it looks less like a vendor exploiting a particular tragedy?” I suggested.
Well, someone did that, so I need to encourage the behavior I want to see.
The Games of the XXX Olympiad (that’s 30th, for those of you who don’t do Roman numerals) are scheduled for July 27 through August 12 this year in London. But that’s not the only event London is hosting this summer, or, arguably, even the biggest; it’s also the year of mayoral elections, as well as Queen Elizabeth II’s Diamond Jubilee next month, marking the 60th anniversary of her reign.
To tell you something about how special this is, the Queen is the only British monarch to celebrate a Diamond Jubilee, other than Queen Victoria in 1897.
Bad time for a data center outage.
The city department in charge of the data center is the Greater London Authority (GLA), a strategic and delivery authority with the role of designing a better future for the capital. The GLA’s Technology Team, based in City Hall, provides IT support for the Mayor’s Office, the London Assembly, and the GLA’s staff.
Four years ago, just before a previous mayoral election, a burst water main in a nearby street cut power supplies to City Hall and caused major disruption. Since then, the GLA manages six times as much data, making a potential outage that much more devastating.
To prepare for this, the GLA — working with Cristie Data, a Stroud, Gloucestershire provider — implemented a disaster recovery infrastructure that will enable it to get its IT systems up and running in four hours, compared with the more than three days that it had previously taken when it was based on backup magnetic tapes.
The new infrastructure incorporates FalconStor storage-management and replication technology with Nexsan E60 disk-storage systems. The FalconStor product virtualizes the GLA’s storage environment and replicates it across a shared metropolitan area network to London‘s data center in Woking, Surrey, about 20 miles away. That’s where the Nexsan E60 storage hosts the GLA’s replicated environment. Virtualization means that the storage network can be managed from one interface. If there’s a disaster, FalconStor’s RecoverTrac technology automates recovery of the IT environment.
On top of providing improved disaster recovery ability, it is estimated that the new infrastructure will save the GLA £90,000 a year, providing payback within four years. Of course, should the GLA have to invoke its disaster-recovery systems for real, the cost would be recovered a lot quicker — like, immediately.
Postscript to the “Prince of Persia” backup recovery story from a month ago — in the process of researching it, I ran into a similar story about the time the people makingToy Story 2 almost lost the entire film due to lack of a backup. Interesting, I thought, but since it was tangential to the story I was writing, I didn’t include it.
Guess I should have. Slate, in writing an article about a copy of The Avengers almost being deleted, mentioned the Toy Story 2 episode in passing in the process — it was actually included as a special feature on the DVD — and suddenly it’s all over the place, though the story goes back to at least 2010.
It’s a teaser for a longer story on the Toy Story 2 DVD, which I watched with my son this weekend. It starts:
“When making a film like Toy Story 2, we use a bunch of UNIX and Linux machines. On those kinds of machines there’s a command, RM*, that removes everything on the filesystem as fast as it can.”
“Somebody had run RM* on the drive where all the Toy Story 2 files were kept, and things just started to disappear.””
In the process of trying to recover the two years’ worth of work on the film, the company discovered that two months of backups were corrupt, and it had no viable backups — which might have delayed the film by as much as a year.
Fortunately, Galyn Susman, visual arts director at Pixar, had just had a baby, and in setting up a system she could work on from home, had a copy of the film.
(Slate also has a copy of the 2 1/2-minute film from the DVD, which has since been deleted from the 2010 story.)
Happy ending, but he whole story is quite a comedy of errors.
[I]f you do enter a mistaken rm *, DON’T UNPLUG THE COMPUTER, YOU IDIOT!! That will just damage the file system and won’t be quick enough to save any files. Hit Control-C. It’s much faster and safer, though even that will probably be too late.
But it took 20 seconds to delete all the files. That says there were a lotof files. It also says they were all in a flat structure with no subdirectories, since rm * doesn’t remove subdirectories. OK, maybe the command was really rm -r *, but the makers of the video were trying to keep things simple and dramatic. If you type rm -r *, think four times [before hitting Enter]. If it’s rm -rf *, make it at least six.
Then, instead of bringing a drive to Galyn’s house and copying the files onto it, they wrapped her computer — the one with the only copy in the world of a year’s worth of work — in blankets and drove it in a car to the studio…But at least they had an offsite backup, even if it was by chance.”
NoFilmSchool.com, a website devoted to digital filmmaking, goes into great detail about how to prevent this sort of problem. While some of it is geared specifically toward digital filmmaking, much of it applies to the average enterprise as well.
You know, there’s nothing like going to the movies to be reminded of how fast things advance in storage technology.
As part of seeing The Avengers at midnight on Friday, I attended a Marvel superhero marathon of Iron Man I and II, Thor, and Captain America. (Hulk came from another studio and was not included.) Over twelve hours in all of geekish wankery, and oddly enough I was the only middle-aged woman in the theatre.
Now, one of the truisms about science fiction — which is, essentially, what this is — is no matter how hard they try to make everything look Futuristic, there’s always tell-tale signs of the era whether it’s actually made, like the miniskirts in Star Trek (not to mention the attitudes about women, but let’s not go there). And Iron Man, made four years ago — an eternity in storage years — is no exception. There were times where it was an anachronistic as Captain America.
Let’s start with Tony Stark’s home lab where he’s designing the Iron Man suit Mark 2. He may have a snazzy 3D CAD setup with a Siri-like voice interface, but when it comes to storing the files, he has to stop to decide whether to put them on the office server or his home server — apparently this state-of-the-art facility wasn’t prescient enough to have thought of the cloud.
On the other hand, if Stark had been able to use the cloud, there would have been no need for the dramatic scene where Pepper slips into Tony’s office in corporate headquarters, attaches a thumb drive (with some sort of cryptographic thing that breaks into his system — but if it’s his system, why is it needed?) to his computer (because all weapons developers have unguarded USB slots on their CEO’s computer), and downloads the entire unencrypted contents of his hard drive onto the thumb drive, including the conveniently marked, easily discovered “ghost drive.”
This scene, of course, does point out the value of setting up proper security systems in your organization, as well as the inherent security flaws in thumb drives. And yes, I am an incredibly annoying person to watch movies with; why do you ask?
(There was, by the way, an Iron Man 4 GB thumb drive, unfortunately now sold out. Perfect for corporate espionage. And SanDisk sold a 4GB microSDHC in 2011 that included a copy of Iron Man 2 that you could watch from your Samsung Android smartphone.)
Though Stark obviously has a swell home office setup, including access to corporate databases, for some reason he doesn’t have access to the files on his own computer, either through the cloud or through a backup or replicated storage elsewhere. Nope, he’s got to resort to Sneakernet — or, in the case of Pepper Potts, High-Heelnet.
And let’s not even get into the scene where the computerized Jarvis is telling him, “But sir! We still have terabytes of data to download!” Heavens! Reminds me of watching the first episode of the original Battlestar Galactica where we were told that the threatening Cylon ships must be two microns long.
At least by Thor, released just last year, we have a scientist lamenting that government agents not only took her data, and her backups, but the backups of her backups.
Remind me in a couple of years; I’ll come back and take a look at The Avengers‘ data technology.
Back in the day, any person who was a reasonable risk used to get a fistful of credit card offers in the mail every month — 0% interest for six months, no fee for the first year, $25 credit if you charged one thing, etc. And while I know people into leverage and arbitrage who found ways to make money on these deals by having one credit card pay off another, I never did — mainly because I couldn’t trust myself to keep track of all the various offers in a way that wouldn’t come back to bite me.
So here we are today, and we have a batch of cloud storage and cloud synchronization services — Box, Dropbox, Drive, SkyDrive, and so on, not to mention my venerable Qwest Digital Vault, which magically changed its name to the CenturyLink Digital Vault when CenturyLink bought Qwest.
I have quite an assortment of space kicking around — 25 GB with Google Drive, 2 GB with Dropbox, 5 GB with Box, and I think 7 GB with SkyDrive; I already missed out on a 25-GB offer there, and if there’s an easy way to find out what my capacity is there, I’m not finding it. (I can consider myself lucky I don’t have iCloud. I don’t think.)
My Digital Vault is a whole other kettle of fish. I thought I had 25 GB there — but when I try to log in, it doesn’t recognize my password. Then again, unless my mother has come back from the dead and changed her maiden name, it doesn’t recognize that, either, so perhaps I’m actually using the wrong ID — but the site doesn’t offer me a way to be reminded what that is, and so far I’ve gone through two separate kinds of chat sessions and neither of them can tell me, either. In any event, its website says it only offers 2 GB free, so that’s probably what I have.
(Plus I have BackBlaze for my actual backups, but I’m not counting that.)
That all totals up to 41 GB, which sounds pretty impressive.
The problem is, it’s not really enough to do anything with. It’d be great to store all my pictures in the cloud, so I could always retrieve them and have plenty of copies to keep them safe, but the picture folder on my NAS (aka “The Big Brick,” 2 terabytes) is 57 GB all by itself. Yes, some of those are duplicates — remember the part about copies to keep them safe? — and some of those are videos, now that I have a camera that can take both still photos and videos. Plus every few months or so I collect all my pictures from all the various sources and save those to the big brick, so they’re all together.
Yes, I should delete all the picture copies sometime — but I’m petrified about making a mistake, and who’s got time?
But for the sake of argument, let’s pretend that I’ve found something I can store in the cloud, something that fits. So then I have to try to keep track of which of my fistful of services I’ve stored it in. I also have to keep track of how to get into each one — something I’ve already demonstrated I have trouble with.
I can sign up for Spanning Stats for Google Drive, but that’s yet another site I have to remember to go check. Plus it turns out that it’s actually a sales tool to encourage you to sign up for its Google Drive backup service. Great. Google doesn’t back up its own cloud service?
That brings up my next level of fear — how do I make sure that what I put into the cloud is still there the next time I look for it? I certainly wouldn’t put my *only* copy of my pictures up there — what if there was a problem? What if the service went out of business? If I have to keep copies and worry about backups and synchronization with the cloud, too, then what’s the point?
(I’m not even worrying yet about the different levels of privacy that the different products offer — and I probably should.)
Spanning Stats also helps me only for Google Drive — I would need an application for Box, Dropbox, SkyDrive, and so on. So that’d be at least five applications and websites (each with their own user ID and password) that I’d have to remember.
What I really want is one thing that would check all my cloud storage systems and tell me what’s in each of them. And maybe while it’s at it, it could also keep track of all the various special offers I get for more free cloud storage space — and when they’re going to expire, and how to move the files around so I don’t get charged for anything. Because you know that’s coming, if it’s not already here. We’ve seen it with the credit cards.
Speaking of which, maybe someone could write an app like that for credit card offers, too.
Earlier this month at the National Association of Broadcasters show in Las Vegas a company called hVault announced that, Real Soon Now, it was going to ship a holographic storage system with some pretty amazing capabilities:
Holographic media has an archival lifetime in excess of 50 years, which eliminates the 2-5 year cycle of replacing magnetic media. Holographic storage systems consume about 1/100th the power of equivalent disk storage and can operate without any special power conditioning or cooling. Holographic media is totally impervious to magnetic fields, static electricity, extremes of temperature and humidity, atmospheric dust or water damage.”
If all this sounds familiar, it should. It was at another show — another NAB, in fact, in 2005 — when we heard about another company, InPhase, that was demonstrating and about to ship a holographic storage product where the discs would also last for….wait for it….50 years. We continued to hear about it on and off over the next few years, always on the verge of shipping, in March 2006, April 2006, November 2006, February 2007, April 2008…
And, in an amazing coincidence, what the Boulder, Colo.-based hVault is selling actually is the InPhase technology — which it bought after InPhase went bankrupt, after spending $100 million.
Meanwhile, no product details, no price, no ship date beyond “spring.” (In 2007, the drives were supposed to be $18,000 and each 300-gb disc was supposed to cost $300.) “Soon after will be followed by a family of drives ranging from 800GB to 1.6TB in capacity,” gushed an April 2006 article.
While some publications posted the press release and went along with it hook, line, and sinker, some readers were less sanguine. “Vendor: “So archive your digital data, forget it, and read it back in 50 years.”#ihaveabridgeforyou,” Tweeted one cynic.
As someone who wrote about it in April 2006 and thought it sounded sketchy then, let’s just say I’m….dubious.
Jordan Mechner, the designer of the game Prince of Persia (which went on to be a movie), recently wrote a blog post describing the day-long ordeal he and at least three other guys had trying to get copies of the original source code for his game from some Apple ][ disks.
Mechner and his team had to deal with multiple possible problems:
- Finding a drive to read the disk
- Finding software to read the disk
- Dealing with whatever forms of copy protection the disk might have had
- Finding software to run the software on the disk
- Dealing with whatever damage the disk itself might have suffered during its 22 years in his dad’s garage
- Dealing with whatever “bit rot” the data might have suffered
Try popping your old 1980s VHS and Hi-8 home movies into a player (if you can find one). Odds are at least some of them will be visibly degraded or downwright unplayable. Digital photos I burned onto DVD or backed up onto Zip disks or external hard drives just ten years ago are hit and miss — assuming I still have the hardware to read them.
Whereas my parents’ Super 8 home movies from the 1960s, and my grandparents’ photos from the 1930s, are still completely usable and will probably remain so fifty years from now.
Pretty much anything on paper or film, if you pop it in a cardboard box and forget about for a few decades, the people of the future will still be able to figure out what it is, or was. Not so with digital media. Operating systems and data formats change every few years, along with the size and shape of the thingy and the thing you need to plug it into. Skip a few updates in a row, and you’re quickly in the territory where special equipment and expertise are needed to recover your data. Add to that the fact that magnetic media degrade with time, a single hard knock or scratch can render a hard drive or floppy disk unreadable, and suddenly the analog media of the past start to look remarkably durable.
As an example, writes Science Daily, “Magnetic tape, which stores most of the world’s computer backups, can degrade within a decade. According to the National Archives Web site by the mid-1970s, only two machines could read the data from the 1960 U.S. Census: One was in Japan, the other in the Smithsonian Institution. Some of the data collected from NASA’s 1976 Viking landing on Mars is unreadable and lost forever.”
And that’s just accidental damage. There’s also the issue of potentially embarrasing data deliberately being destroyed.
Similarly, though companies such as Microsoft are working with organizations such as Britain’s National Archives to help preserve their data, it’s the proprietary nature of software from exactly such companies — Word and Outlook, for example — that is contributing to the problem, critics say.
Think of how many early movies and television programs are no longer available because the film deteriorated (in some cases actually spontaneously combusting) or were thrown out.
Organizations such as the Internet Archive, the Library of Congress, and the Long Now are working to help preserve data access, but that doesn’t necessarily help us as individuals. For that, digital archivist Jason Scott, who helped Mechner with his project, recommends the following: “If you have data you want to keep for posterity, follow the Russian doll approach. Back up your old 20GB hard drives into a folder on your new 200GB hard drive. Next year, back up your 200GB hard drive into a folder on your new 1TB hard drive. And so on into the future.”
That won’t necessarily solve the problem of having software that can read the data, but at least the data itself will be intact. (This is something I did a few months back when I reorganized my office — collected all the random CDs, DVDs, Zip drives, thumb drives, and 3 1/2-inch floppies cluttering up my office, and put them on my new 2-TB NAS drive.)
Mechner ends with a warning. “From a preservationist point of view, the POP source code slipped through a window that is rapidly closing. Anyone who turns up a 1980s disk archive 20 or 30 years from now may be out of luck. Even if it’s something valuable that the world really cares about and is willing to invest time and money into extracting, it will probably be too late.”