need a process to obtain set of records updated/added out of 100million records

Hi, We need you suggestion in the following design. We have a file with 100 million records.This file is updated on daily basis by more than 50 jobs.We would like to obtain the set of records from those 10 millions records, which are either updated or created on that day.Since we don't have any audit fields in this files, locating those records is pretty much a issue. Proposed solution #1 Get the exact replica of the file into another file. At the end of the day, fetch the actual record and its corr. record in the replica file.Calculate the check sum(A - Check sum for the actual record in the file,B - Check for the equivalent record in the replica file), We can infer that the particular record is update, if the check sum doesn't match. Dis-advantages : 1. We need to calculate the check for all the 100 million records 2. The replica has to updated on the daily basis. Proposed solution #2 Joural the particula file, At the end of the obtain,automate the process to decipher the journal receiver to obtain the updated/created records. Dis- advantages: 1. Journal receiver - Storeage, 2. Automated process to decipher the journal reciever has to be defined. Please let us know the best way to obtain the subset of records. Thanks, Maven

Answer Wiki

Thanks. We'll let you know when a new response is added.

If you can add a record to the table (nothing uses the table without specifying fields), you can add a BEFORE UPDATE trigger to set a new last-modified field to the current date. The end-of-day program can use WHERE lastModified=CURRENT DATE.

If that won’t work, you can add an AFTER UPDATE trigger to write record IDs into a new file. The end-of-day program can read that new file, process the records logged there, then clear the log file.

Sheldon Linker (
Linker Systems, Inc. (
800-315-1174 (+1-949-552-1904)

Discuss This Question: 6  Replies

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • astradyne
    I think I would have to agree with the other poster that you should look at using a trigger to capture the records that are added or updated, the difference being that I would capture the whole record into a duplicate file for analysis and then clear it down on a daily basis. That way, you're not worrying about another job having to access a file with 100,000,000 records in it. Having said that, I'm intrigued as to what sort of data the file is storing. I've seen JDE transaction files with 500,000+ records in, and they've been purged fairly soon afterwards. From a design perspective I would recommend that you look into either adding timestamp fields to the file, or use a trigger (as suggested by the previous poster) to create a secondary file with the timestamp details in linked by a key to the original file. You could then use the timestamp fields to purge the file down before it gets too large... All the best Jonathan
    370 pointsBadges:
  • SheldonLinker
    In my message above, in the first suggestion, I meant add a COLUMN, not a record. Sorry about that.
    30 pointsBadges:
    The idea of using a physical file trigger is a good one. As an alternative to writing the all record to another file, you can write only the key, if the original file has a unique key, or the relative record number. It's just a way to save some storage if you can later use the saved data to access the full record. A stored procedure, instead of a trigger, can be another way to get it done. Anyway, you'll need a way to identify if the record in the file replica is a new one or an existing one which has been updated. How about deleted records? It will ever be any?
    0 pointsBadges:
  • Thelark1
    Sheldon, Instead of using a trigger file that would work perfectly well, why not setup journaling on the required file. You can journal for create, updates or deletions and record the changes made to the journal receiver. At the end of the day you can run a program/query over the journal receiver for all of the updated records. This is also a good way of recovering your large file if for some reason data gets deleted by mistake or the file corrupts in the middle of the day. Regards Darren
    0 pointsBadges:
  • DanTheDane
    Hi, We had a similar challenge for which we made the following solution: First of all, many programs updating the same file may create an overall poor performance. We decided therefor to create only one program that inserts/updates the big file. This program gets its input from a dataqueue to which the many processes send their transactions. The result: a superb performance, as there are no 'waits' to retrieve/write a record to a physical file with index. The record-layout for the dataqueue record may, on top of your current record-layout in the big file, contain timestamps, userid, program-name a.o. The new program that shall proces records from the dataqueue, shall write/update your big file AND write the record to a new file 'todaystransactions'. If needed you may also include a proces-control instruction to the new program in the form of a field containing the action to do with the record (fx. 'DEL', 'UPD', 'ADD' etc). In our solution we have added update on a file containing summarized data, and an inquiry program is available for our users. Thus they may follow the business through the day. Pro's for the solution: - superb performance - 'todaystransactions' file accessible all day through - 'todaystransactions' may contain add'l data for analysis (timestamp, program name a.o.) Cons: - output from your 50 pgms shall be written to the dataqueue, so (depending on your coding-techniques) you may have many programs to modify. rgds Dan
    2,555 pointsBadges:
  • TomLiotta
    There should be no difference between one and fifty programs updating a file. If one program performed better, it indicates a problem in how the fifty programs were designed. As for the question, a trigger program is a good idea. The thought to output keys is also good, though it's unclear how useful this will be. An updated record key tells nothing about what changed. A record key by itself doesn't tell if the transaction was an update or an add. If multiple updates or an add plus an update to a single key doesn't provide much useful information. A journal receiver is listed with 'storage (size?)' as a disadvantage, but receivers are given as an alternative to a copy of the file which would be ten times larger. It's hard to see how a 90% reduction would give a size disadvantage. Tom
    125,585 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.


Share this item with your network: