We need you suggestion in the following design.
We have a file with 100 million records.This file is updated on daily basis by more than 50 jobs.We would like to obtain the set of records from those 10 millions records, which are either updated or created on that day.Since we don't have any audit fields in this files, locating those records is pretty much a issue.
Proposed solution #1
Get the exact replica of the file into another file. At the end of the day, fetch the actual record and its corr. record in the replica file.Calculate the check sum(A - Check sum for the actual record in the file,B - Check for the equivalent record in the replica file), We can infer that the particular record is update, if the check sum doesn't match.
1. We need to calculate the check for all the 100 million records
2. The replica has to updated on the daily basis.
Proposed solution #2
Joural the particula file, At the end of the obtain,automate the process to decipher the journal receiver to obtain the updated/created records.
1. Journal receiver - Storeage,
2. Automated process to decipher the journal reciever has to be defined.
Please let us know the best way to obtain the subset of records.
Free Guide: Managing storage for virtual environments
Complete a brief survey to get a complimentary 70-page whitepaper featuring the best methods and solutions for your virtual environment, as well as hypervisor-specific management advice from TechTarget experts. Don’t miss out on this exclusive content!