Yottabytes: Storage and Disaster Recovery

Sep 29 2016   12:39PM GMT

Archivists Dealing With Born-Digital Data

Sharon Fisher Sharon Fisher Profile: Sharon Fisher

Tags:
government
history
museum
Storage

When former Texas Governor Rick Perry left office in 2014, the records he left to the State Library and Archives Commission included more than 7 terabytes of data, writes Isabelle Taft in the Texas Tribune. But there weren’t any digital archives to put it.

“While the governor’s 4,000 cubic feet of paper could be sorted, itemized, boxed and shelved alongside other state records dating back centuries, [state archivist Jelain ] Chubb and her staff had no system in place to store thousands of gigabytes of photos, emails and audio and video recordings, much less make them available to the public,” Taft writes. “The Perry collection presented a Texas-sized challenge for a commission that had no capacity to manage the ‘born-digital’ records — those with no paper or analog footprint.”

Consequently, the commission needed to set up the Texas Digital Archive for the material, a project that so far has cost $700,000. The data is stored in the Amazon cloud.

In many ways electronics records are safer and more durable than paper ones—they can easily be backed up, replicated, and sent elsewhere to prevent a Library of Alexandria-type conflagration. Moreover, electronic records are typically easier to gain access to than paper records because people don’t have to travel to where the records are. That also helps make government more transparent. But that’s all predicated on the data still being readable electronically.

“Ancient hieroglyphics and scrolls have survived centuries, but digital storage is fragile, the files easily swept away or locked up in encryption,” writes Arielle Pardes in Vice.com. “The technology we use to store things today might not be around tomorrow, and many of the platforms we use to store information are owned by private companies, which makes it harder for archival institutions to save them. And how much of what we upload online is worth saving at all?”

There are three main problems with maintaining archives of born-digital material, Pardes writes:

  1. It requires the hardware to read it.
  2. It requires power for the hardware.
  3. It requires software—often proprietary, and sometimes copyrighted—to read it.

This is particularly true as data storage goes to private companies in the cloud—such as Facebook—rather than on software that we own, Pardes warns. “Many of the sites we use that are free, or that you rent space on, like a wedding site, they’re private companies,” she quotes historian Abby Smith Rumsey as saying. “You don’t have ownership of it.”

That was the problem with some of Perry’s data, which dated back to 2000 and in some cases used formats that are no longer around, such as WordPerfect, Adobe Pagemaker, VHS tapes, CDs and raw camera files, Taft writes. “Many files had to be reformatted so the public could view them with contemporary software.”

To help in this effort, the Library of Congress has created a list of recommended formats for archiving digital data, and has an ongoing discussion about the sustainability of digital formats.

After the Perry project, the Texas data archive staff is now working on a similar project for Texas state agencies. “Arguably the most important function of the digital archive, however, is still under development: the ability to ingest and display the born-digital archives of state agencies,” Taft writes. “Archivists are currently working with three pilot agencies — the Texas Historical Commission, the Office of the Attorney General and the Railroad Commission — to get their electronic records from the late 1990s and early 2000s on the digital archive.”

Unfortunately, the group is running into the same problem as with Perry’s data. “Texas requires state agencies to preserve records if the state archives can’t yet take them,” Taft writes. “But floppy disks loaded with files can decay until they’re unreadable. Emails are often deleted to free up expensive storage space. And some formats are already obsolete.”

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

Share this item with your network: