While the White House released its digital government plan last week, it appears to have left out one major factor: just where the heck all that data is going to be stored, especially when storage already appears to be an issue for federal agencies, according to a recent survey.
The Digital Government plan doesn’t even mention the word “storage,” even though open data accessible to everyone is one of the linchpins of the plan.
But a recent survey by MeriTalk of 151 federal government IT professionals about big data found that storage was already an issue.
Factors found in the survey indicate the following:
- 87% of IT professionals say their stored data has grown in the last two years (by an average of 61%)
- 96% expect their data to grow in the next two years (by an average of 64%)
- 31% of data is unstructured, and that amount is increasing
- Agencies estimate they have just 49% of the data storage/access they need to leverage big data and drive mission results
- 40% of respondents pointed to storage capacity as one of the most significant challenges their agency faced when it came to managing large amounts of data
- Agencies currently store an average of 1.61 petabytes of data, but expect to get to 2.63 petabytes in just the next two years
- 57% of agencies say they have at least one dataset that’s grown too big to work with using their current data management tools and/or infrastructure
- While 64% of IT professionals say their agency’s data management system can be easily expanded/upgraded on demand, they estimate10 months as the average time they could double their short-to medium-term capacity
- The #1 step that agencies say they are taking to improve their ability to manage and make decisions with big data is to invest in IT infrastructure to optimize data storage (39%)
A few weeks back, I picked on disaster recovery vendors that wait until there’s been a natural disaster and then use that to promote their services. “If one wants to offer such a service to one’s clients, how about issuing a generic press release at the beginning of the disaster seasons so that it looks less like a vendor exploiting a particular tragedy?” I suggested.
Well, someone did that, so I need to encourage the behavior I want to see.
The Games of the XXX Olympiad (that’s 30th, for those of you who don’t do Roman numerals) are scheduled for July 27 through August 12 this year in London. But that’s not the only event London is hosting this summer, or, arguably, even the biggest; it’s also the year of mayoral elections, as well as Queen Elizabeth II’s Diamond Jubilee next month, marking the 60th anniversary of her reign.
To tell you something about how special this is, the Queen is the only British monarch to celebrate a Diamond Jubilee, other than Queen Victoria in 1897.
Bad time for a data center outage.
The city department in charge of the data center is the Greater London Authority (GLA), a strategic and delivery authority with the role of designing a better future for the capital. The GLA’s Technology Team, based in City Hall, provides IT support for the Mayor’s Office, the London Assembly, and the GLA’s staff.
Four years ago, just before a previous mayoral election, a burst water main in a nearby street cut power supplies to City Hall and caused major disruption. Since then, the GLA manages six times as much data, making a potential outage that much more devastating.
To prepare for this, the GLA — working with Cristie Data, a Stroud, Gloucestershire provider — implemented a disaster recovery infrastructure that will enable it to get its IT systems up and running in four hours, compared with the more than three days that it had previously taken when it was based on backup magnetic tapes.
The new infrastructure incorporates FalconStor storage-management and replication technology with Nexsan E60 disk-storage systems. The FalconStor product virtualizes the GLA’s storage environment and replicates it across a shared metropolitan area network to London‘s data center in Woking, Surrey, about 20 miles away. That’s where the Nexsan E60 storage hosts the GLA’s replicated environment. Virtualization means that the storage network can be managed from one interface. If there’s a disaster, FalconStor’s RecoverTrac technology automates recovery of the IT environment.
On top of providing improved disaster recovery ability, it is estimated that the new infrastructure will save the GLA £90,000 a year, providing payback within four years. Of course, should the GLA have to invoke its disaster-recovery systems for real, the cost would be recovered a lot quicker — like, immediately.
Postscript to the “Prince of Persia” backup recovery story from a month ago — in the process of researching it, I ran into a similar story about the time the people makingToy Story 2 almost lost the entire film due to lack of a backup. Interesting, I thought, but since it was tangential to the story I was writing, I didn’t include it.
Guess I should have. Slate, in writing an article about a copy of The Avengers almost being deleted, mentioned the Toy Story 2 episode in passing in the process — it was actually included as a special feature on the DVD — and suddenly it’s all over the place, though the story goes back to at least 2010.
It’s a teaser for a longer story on the Toy Story 2 DVD, which I watched with my son this weekend. It starts:
“When making a film like Toy Story 2, we use a bunch of UNIX and Linux machines. On those kinds of machines there’s a command, RM*, that removes everything on the filesystem as fast as it can.”
“Somebody had run RM* on the drive where all the Toy Story 2 files were kept, and things just started to disappear.””
In the process of trying to recover the two years’ worth of work on the film, the company discovered that two months of backups were corrupt, and it had no viable backups — which might have delayed the film by as much as a year.
Fortunately, Galyn Susman, visual arts director at Pixar, had just had a baby, and in setting up a system she could work on from home, had a copy of the film.
(Slate also has a copy of the 2 1/2-minute film from the DVD, which has since been deleted from the 2010 story.)
Happy ending, but he whole story is quite a comedy of errors.
[I]f you do enter a mistaken rm *, DON’T UNPLUG THE COMPUTER, YOU IDIOT!! That will just damage the file system and won’t be quick enough to save any files. Hit Control-C. It’s much faster and safer, though even that will probably be too late.
But it took 20 seconds to delete all the files. That says there were a lotof files. It also says they were all in a flat structure with no subdirectories, since rm * doesn’t remove subdirectories. OK, maybe the command was really rm -r *, but the makers of the video were trying to keep things simple and dramatic. If you type rm -r *, think four times [before hitting Enter]. If it’s rm -rf *, make it at least six.
Then, instead of bringing a drive to Galyn’s house and copying the files onto it, they wrapped her computer — the one with the only copy in the world of a year’s worth of work — in blankets and drove it in a car to the studio…But at least they had an offsite backup, even if it was by chance.”
NoFilmSchool.com, a website devoted to digital filmmaking, goes into great detail about how to prevent this sort of problem. While some of it is geared specifically toward digital filmmaking, much of it applies to the average enterprise as well.
You know, there’s nothing like going to the movies to be reminded of how fast things advance in storage technology.
As part of seeing The Avengers at midnight on Friday, I attended a Marvel superhero marathon of Iron Man I and II, Thor, and Captain America. (Hulk came from another studio and was not included.) Over twelve hours in all of geekish wankery, and oddly enough I was the only middle-aged woman in the theatre.
Now, one of the truisms about science fiction — which is, essentially, what this is — is no matter how hard they try to make everything look Futuristic, there’s always tell-tale signs of the era whether it’s actually made, like the miniskirts in Star Trek (not to mention the attitudes about women, but let’s not go there). And Iron Man, made four years ago — an eternity in storage years — is no exception. There were times where it was an anachronistic as Captain America.
Let’s start with Tony Stark’s home lab where he’s designing the Iron Man suit Mark 2. He may have a snazzy 3D CAD setup with a Siri-like voice interface, but when it comes to storing the files, he has to stop to decide whether to put them on the office server or his home server — apparently this state-of-the-art facility wasn’t prescient enough to have thought of the cloud.
On the other hand, if Stark had been able to use the cloud, there would have been no need for the dramatic scene where Pepper slips into Tony’s office in corporate headquarters, attaches a thumb drive (with some sort of cryptographic thing that breaks into his system — but if it’s his system, why is it needed?) to his computer (because all weapons developers have unguarded USB slots on their CEO’s computer), and downloads the entire unencrypted contents of his hard drive onto the thumb drive, including the conveniently marked, easily discovered “ghost drive.”
This scene, of course, does point out the value of setting up proper security systems in your organization, as well as the inherent security flaws in thumb drives. And yes, I am an incredibly annoying person to watch movies with; why do you ask?
(There was, by the way, an Iron Man 4 GB thumb drive, unfortunately now sold out. Perfect for corporate espionage. And SanDisk sold a 4GB microSDHC in 2011 that included a copy of Iron Man 2 that you could watch from your Samsung Android smartphone.)
Though Stark obviously has a swell home office setup, including access to corporate databases, for some reason he doesn’t have access to the files on his own computer, either through the cloud or through a backup or replicated storage elsewhere. Nope, he’s got to resort to Sneakernet — or, in the case of Pepper Potts, High-Heelnet.
And let’s not even get into the scene where the computerized Jarvis is telling him, “But sir! We still have terabytes of data to download!” Heavens! Reminds me of watching the first episode of the original Battlestar Galactica where we were told that the threatening Cylon ships must be two microns long.
At least by Thor, released just last year, we have a scientist lamenting that government agents not only took her data, and her backups, but the backups of her backups.
Remind me in a couple of years; I’ll come back and take a look at The Avengers‘ data technology.
Back in the day, any person who was a reasonable risk used to get a fistful of credit card offers in the mail every month — 0% interest for six months, no fee for the first year, $25 credit if you charged one thing, etc. And while I know people into leverage and arbitrage who found ways to make money on these deals by having one credit card pay off another, I never did — mainly because I couldn’t trust myself to keep track of all the various offers in a way that wouldn’t come back to bite me.
So here we are today, and we have a batch of cloud storage and cloud synchronization services — Box, Dropbox, Drive, SkyDrive, and so on, not to mention my venerable Qwest Digital Vault, which magically changed its name to the CenturyLink Digital Vault when CenturyLink bought Qwest.
I have quite an assortment of space kicking around — 25 GB with Google Drive, 2 GB with Dropbox, 5 GB with Box, and I think 7 GB with SkyDrive; I already missed out on a 25-GB offer there, and if there’s an easy way to find out what my capacity is there, I’m not finding it. (I can consider myself lucky I don’t have iCloud. I don’t think.)
My Digital Vault is a whole other kettle of fish. I thought I had 25 GB there — but when I try to log in, it doesn’t recognize my password. Then again, unless my mother has come back from the dead and changed her maiden name, it doesn’t recognize that, either, so perhaps I’m actually using the wrong ID — but the site doesn’t offer me a way to be reminded what that is, and so far I’ve gone through two separate kinds of chat sessions and neither of them can tell me, either. In any event, its website says it only offers 2 GB free, so that’s probably what I have.
(Plus I have BackBlaze for my actual backups, but I’m not counting that.)
That all totals up to 41 GB, which sounds pretty impressive.
The problem is, it’s not really enough to do anything with. It’d be great to store all my pictures in the cloud, so I could always retrieve them and have plenty of copies to keep them safe, but the picture folder on my NAS (aka “The Big Brick,” 2 terabytes) is 57 GB all by itself. Yes, some of those are duplicates — remember the part about copies to keep them safe? — and some of those are videos, now that I have a camera that can take both still photos and videos. Plus every few months or so I collect all my pictures from all the various sources and save those to the big brick, so they’re all together.
Yes, I should delete all the picture copies sometime — but I’m petrified about making a mistake, and who’s got time?
But for the sake of argument, let’s pretend that I’ve found something I can store in the cloud, something that fits. So then I have to try to keep track of which of my fistful of services I’ve stored it in. I also have to keep track of how to get into each one — something I’ve already demonstrated I have trouble with.
I can sign up for Spanning Stats for Google Drive, but that’s yet another site I have to remember to go check. Plus it turns out that it’s actually a sales tool to encourage you to sign up for its Google Drive backup service. Great. Google doesn’t back up its own cloud service?
That brings up my next level of fear — how do I make sure that what I put into the cloud is still there the next time I look for it? I certainly wouldn’t put my *only* copy of my pictures up there — what if there was a problem? What if the service went out of business? If I have to keep copies and worry about backups and synchronization with the cloud, too, then what’s the point?
(I’m not even worrying yet about the different levels of privacy that the different products offer — and I probably should.)
Spanning Stats also helps me only for Google Drive — I would need an application for Box, Dropbox, SkyDrive, and so on. So that’d be at least five applications and websites (each with their own user ID and password) that I’d have to remember.
What I really want is one thing that would check all my cloud storage systems and tell me what’s in each of them. And maybe while it’s at it, it could also keep track of all the various special offers I get for more free cloud storage space — and when they’re going to expire, and how to move the files around so I don’t get charged for anything. Because you know that’s coming, if it’s not already here. We’ve seen it with the credit cards.
Speaking of which, maybe someone could write an app like that for credit card offers, too.
Earlier this month at the National Association of Broadcasters show in Las Vegas a company called hVault announced that, Real Soon Now, it was going to ship a holographic storage system with some pretty amazing capabilities:
Holographic media has an archival lifetime in excess of 50 years, which eliminates the 2-5 year cycle of replacing magnetic media. Holographic storage systems consume about 1/100th the power of equivalent disk storage and can operate without any special power conditioning or cooling. Holographic media is totally impervious to magnetic fields, static electricity, extremes of temperature and humidity, atmospheric dust or water damage.”
If all this sounds familiar, it should. It was at another show — another NAB, in fact, in 2005 — when we heard about another company, InPhase, that was demonstrating and about to ship a holographic storage product where the discs would also last for….wait for it….50 years. We continued to hear about it on and off over the next few years, always on the verge of shipping, in March 2006, April 2006, November 2006, February 2007, April 2008…
And, in an amazing coincidence, what the Boulder, Colo.-based hVault is selling actually is the InPhase technology — which it bought after InPhase went bankrupt, after spending $100 million.
Meanwhile, no product details, no price, no ship date beyond “spring.” (In 2007, the drives were supposed to be $18,000 and each 300-gb disc was supposed to cost $300.) “Soon after will be followed by a family of drives ranging from 800GB to 1.6TB in capacity,” gushed an April 2006 article.
While some publications posted the press release and went along with it hook, line, and sinker, some readers were less sanguine. “Vendor: “So archive your digital data, forget it, and read it back in 50 years.”#ihaveabridgeforyou,” Tweeted one cynic.
As someone who wrote about it in April 2006 and thought it sounded sketchy then, let’s just say I’m….dubious.
Jordan Mechner, the designer of the game Prince of Persia (which went on to be a movie), recently wrote a blog post describing the day-long ordeal he and at least three other guys had trying to get copies of the original source code for his game from some Apple ][ disks.
Mechner and his team had to deal with multiple possible problems:
- Finding a drive to read the disk
- Finding software to read the disk
- Dealing with whatever forms of copy protection the disk might have had
- Finding software to run the software on the disk
- Dealing with whatever damage the disk itself might have suffered during its 22 years in his dad’s garage
- Dealing with whatever “bit rot” the data might have suffered
Try popping your old 1980s VHS and Hi-8 home movies into a player (if you can find one). Odds are at least some of them will be visibly degraded or downwright unplayable. Digital photos I burned onto DVD or backed up onto Zip disks or external hard drives just ten years ago are hit and miss — assuming I still have the hardware to read them.
Whereas my parents’ Super 8 home movies from the 1960s, and my grandparents’ photos from the 1930s, are still completely usable and will probably remain so fifty years from now.
Pretty much anything on paper or film, if you pop it in a cardboard box and forget about for a few decades, the people of the future will still be able to figure out what it is, or was. Not so with digital media. Operating systems and data formats change every few years, along with the size and shape of the thingy and the thing you need to plug it into. Skip a few updates in a row, and you’re quickly in the territory where special equipment and expertise are needed to recover your data. Add to that the fact that magnetic media degrade with time, a single hard knock or scratch can render a hard drive or floppy disk unreadable, and suddenly the analog media of the past start to look remarkably durable.
As an example, writes Science Daily, “Magnetic tape, which stores most of the world’s computer backups, can degrade within a decade. According to the National Archives Web site by the mid-1970s, only two machines could read the data from the 1960 U.S. Census: One was in Japan, the other in the Smithsonian Institution. Some of the data collected from NASA’s 1976 Viking landing on Mars is unreadable and lost forever.”
And that’s just accidental damage. There’s also the issue of potentially embarrasing data deliberately being destroyed.
Similarly, though companies such as Microsoft are working with organizations such as Britain’s National Archives to help preserve their data, it’s the proprietary nature of software from exactly such companies — Word and Outlook, for example — that is contributing to the problem, critics say.
Think of how many early movies and television programs are no longer available because the film deteriorated (in some cases actually spontaneously combusting) or were thrown out.
Organizations such as the Internet Archive, the Library of Congress, and the Long Now are working to help preserve data access, but that doesn’t necessarily help us as individuals. For that, digital archivist Jason Scott, who helped Mechner with his project, recommends the following: “If you have data you want to keep for posterity, follow the Russian doll approach. Back up your old 20GB hard drives into a folder on your new 200GB hard drive. Next year, back up your 200GB hard drive into a folder on your new 1TB hard drive. And so on into the future.”
That won’t necessarily solve the problem of having software that can read the data, but at least the data itself will be intact. (This is something I did a few months back when I reorganized my office — collected all the random CDs, DVDs, Zip drives, thumb drives, and 3 1/2-inch floppies cluttering up my office, and put them on my new 2-TB NAS drive.)
Mechner ends with a warning. “From a preservationist point of view, the POP source code slipped through a window that is rapidly closing. Anyone who turns up a 1980s disk archive 20 or 30 years from now may be out of luck. Even if it’s something valuable that the world really cares about and is willing to invest time and money into extracting, it will probably be too late.”
One of the toughest jobs out there must be marketing director for a disaster recovery product or service. There’s no better time to promote one’s product or service than when there’s just been a disaster, yet doing so makes you look like you’re exploiting people’s tragedy and can backfire.
Take for example Microsoft, which came under a barrage of criticism during the Japan earthquake last year for offering to donate a dollar for every retweet of its message promoting its Bing search engine; after being attacked, the company swiftly backpedaled and just made a straight donation, no retweeting required.
That’s why I hesitated at posting about this upcoming vendor — my initial reaction was negative, and it’s only several days later, after checking out the coverage elsewhere, that I can look at its announcement more objectively.
With Tornado Alley this spring looking more like Tornado Interstate, and numerous regions and businesses affected, it’s not surprising that some vendors would want to use it as a news hook — though perhaps waiting until the twisters had actually stopped forming might have been more tasteful timing.
In response to the severe damage caused by tornadoes touching down in the Dallas area, Nirvanix, the leading provider of enterprise-class cloud storage services, today announced that it is expanding its Disaster Avoidance Program to customers currently storing data in its Node 3 data center in Dallas enabling them to exercise the option of moving their data to other locations in the Nirvanix Cloud Storage Network—either on a temporary or full-time basis—free of charge.”
(Ironically, the Storage Networking World show was being held in Dallas at the same time and was itself disrupted by the severe weather, although Nirvanix did not appear to attend that event.)
On its face, this is a reasonable offer. Users in a disaster area can store their data outside the area. Great. So what’s the problem?
Perhaps it’s the use of the phrase “the leading provider of enterprise-class cloud storage services” in the first sentence. Really, did Dallas people need to have that pointed out to them just then?
Perhaps it’s the Johnny-on-the-spot nature of the announcement, which was issued the same day the tornadoes actually occurred. By Googling, one can ascertain that such an announcement is not unusual for Nirvanix, with the company making similar offers during disasters such as the Japan earthquake and Hurricane Irene. Pull out the boilerplate press release, drop in the name of the disaster and its location, and you’re good to go.
One does wonder at what point the trigger occurs to send out such a release. After a certain amount of property damage occurs or a certain number of people are killed? Does it depend on how many customers Nirvanix has in the affected area? Will Nirvanix issue a similar press release and offer this week regarding the Midwest tornadoes, or did they not come up to snuff?
While some disasters, such as the Japan earthquake, are unpredictable, it’s no secret to anybody that we have tornadoes in the spring and hurricanes in the fall. If one wants to offer such a service to one’s clients, how about issuing a generic press release at the beginning of the disaster seasons so that it looks less like a vendor exploiting a particular tragedy? The anniversary of the Great San Francisco earthquake is coming up, too; that might be a marketing opportunity as well that is far enough removed from actual events and tragedies that it won’t appear so opportunistic.
I know I’ll never forget the heartwarming family traditions or the look on my daughter’s little face on the morning of World Backup Day.
Just kidding. Actually, it was last Saturday, and I didn’t even hear about it til a day or so afterwards. It was, in fact, only the second time the holiday had been celebrated.
As it happens, World Backup Day came into being from a reddit discussion a year ago.
I just think it would be for the good of everyone to have a reminder to save all your cherished pictures, videos and other important data to somewhere secure.
Companies should also get involved, making sure that their customers and their own data is secure and safe. Maybe even the back-up providers could offer discounts and rates based on the date to encourage sales and participation.
Why March 31? The theory was to have your computer all backed up in case there were tricks or viruses associated with April Fool’s Day. There’s now a web page and a Facebook page, as well as a Twitter feed that seems to look for people mentioning hard drive failures and then asks brightly whether they’d remembered to do a backup first — safe out of punching range.
Not surprisingly, backup vendors have jumped on the notion of World Backup Day, with — just as the original poster suggested — discounts and suchlike to encourage people to back up their data, as well as several helpful infographics and even Pinterest sites talking about the scourge of data loss. The holiday is also starting to make it to the mainstream media, and user organizations such as Lawrence Berkeley National Laboratory picked it up as well.
All kidding aside, it’s not a bad mnemonic idea, on the order of changing the batteries in your smoke detector during the switches to and from Daylight Savings Time. (By the way, when do people in Indiana and Arizona change their smoke alarm batteries, if those states don’t observe Daylight Savings Time?) Anything that encourages consumers to do backups is probably a good thing, though an annual backup probably isn’t that much help.
Unlike some holidays such as National Telework Week, which asks people to pledge to work at home and then calculates the hours they worked and the savings they made, World Backup Day doesn’t do any followup, so we don’t actually know how many people observed World Backup Day and from how many data losses we were saved. Perhaps that’s an idea for World Backup Day #3.
It hasn’t gotten a lot of play in the news media, but a recent U.S. District Court decision may at least weaken a policy that theoretically gives the Department of Homeland Security the right to search laptop storage of more than two-thirds of Americans.
In case you’ve forgotten, in August 2009, the U.S. government implemented a new policy for the Department of Homeland Security giving the department the right to search laptops in border areas. The problem was, according to Udi Ofer, Advocacy Director for the New York Civil Liberties Union, in a letter he wrote to the New York Times in August, 2010, Border Patrol agents have the right to conduct such seizures within 100 miles of the U.S. border, which covers much more of the United States than it sounds. In fact, two-thirds of the population of the U.S. lives in one of those areas, he wrote — and people in those areas could be subject to losing their laptops. (Indeed, the Ninth Circuit Court ruled that such laptops could be transported more than 100 miles away to do a more thorough search.)
In a particular case filed last May, the U.S. government was charged with targeting David House, a Massachusetts programmer, due to his association with Bradley Manning, the soldier accused to leaking material to WikiLeaks, for one of these searches. The American Civil Liberties Union and ACLU of Massachusetts had filed suit against the government for this, which the government moved to dismiss.
The 27-page court decision this week denied the government’s motion, meaning that the lawsuit against the government can continue to take place. Moreover, although the judge supported the government’s right to search laptops at the border, he did put some sideboards on that right, such as:
- Not allowing laptops and other equipment to be seized for an indefinite period of time (House’s were seized for seven weeks)
- Not allowing people to be targeted for First Amendment-protected political speech (it has been suggested that House was targeted due to his association with Manning)
This doesn’t eliminate the searches — which also have criminal defense attorneys concerned, due to loss of attorney-client privilege, not to mention students with majors in Islamic Studies — but this and some other lawsuits challenging the policy give hope that it may be modified in the future.