My partner is working on a project in which he's writing a data processing library in Python that's going to read data from different sources into memory, manipulate it, and export it into different formats. I'm helping him load the data but some of the datasets are really large (over 4 gigs).
So we're looking for an open source library for backup storage that can deal with these large datasets. It also needs to be able to deal with the data structure (add, rename, etc) and should support fast iteration. It also needs to be able to add missing values. Does anyone have a good suggestion?