Posted by: JoMaitland
Universities, cultural heritage organizations and libraries around the world – there’s a cloud service for you now too. It’s an open source offering developed by not-for-profit organization, DuraSpace, called DuraCloud and is focused on preserving important documents.
The service runs on top of cloud storage providers’ Amazon S3 and Rackspace Cloud Files and eventually, Microsoft Azure. Users can store documents, images, video, just about any content you like and as many copies as you like, across these providers and it’s all accessible from a single portal. Try moving content across different cloud providers today without this kind of service. It’s a royal pain. DuraCloud automatically synchronizes your copies across providers and offers a health check service to verify the integrity of your files.
There are no requirements to how your content must be structured for ingest into DuraCloud. In terms of content, DuraCloud is essentially a blob store. You can upload any bitstream, in any format. DuraCloud is also capable of storing any type of package (i.e., AIP, ZIP, TAR, etc.). And since there are no requirements, you can easily transfer data to DuraCloud yourself. There are three options for uploading content to DuraCloud: via the web interface, the client-side synchronization utility, or the REST API.
DuraSpace started the project in 2009 and initially built it on EMC’s Atmos Online and Sun’s Cloud storage services, both of which went poof in 2010. It was a good test of the software, according to Michele Kimpton, CEO of DuraSpace, who said they were easily able to move DuraCloud to Amazon and Rackspace.
“It proves the model, you can’t rely on just one provider …Users need flexibility of providers and their data in multiple geographies,” Kimpton said.
The service is geared to the 1200 or so academic institutions and cultural heritage organizations already using DuraSpace’s Fedora framework for building an archive and Dspace, a repository application. These hook directly into DuraCloud, although you don’t need them to use DuraCloud. The service doesn’t offer any kind of security capability today such as encryption, which is a definite downside for anyone thinking of using it for sensitive information.
And it’s not especially cheap. DuraSpace charges a subscription fee for running the service of $375 per month which includes 500 GB of storage and access to all services in the platform. Additional storage is charged at the rate of the underlying cloud provider.
There are other preservation services out there, but so far none have taken advantage of the cloud. Chronopolis is a digital preservation service developed by the San Diego Supercomputer Center (SDSC) at UC San Diego. It takes a copy of your content and stores it offline, so you can’t see it or easily access it but they will keep it “forever” for you. Stanford University has a service called LOCKSS (Lots of Copies Keep Stuff Safe), but you have to be a member and run a server called a LOCKSS box in your IT environment. Your box joins others in a peer to peer network and if any one box goes down, you can pull your content from another LOCKSS box. Kimpton claims it doesn’t scale well and you need specialist skills to use it.
Eventually DuraCloud will offer data mining and data analytics services for the content in its stores and Kimpton expects someone will probably want to license it as some point for commercial purposes. “We’ll decide if we want to do that down the line,” she said. “We’re not trying to make a profit, that’s why there is trust within our community.”