Hi, I need to compare 2 large database files on the AS400, they have 20 million + records each. They are both keyed uniquely, exact same file layout. I need to identify inserted, updated and deleted records. I tried using cobol program reading first file with the unique logical, and matching with the second file and checking for differences etc. The problem is that it's taking way too long. Would anyone have any suggestions please? It's a batch process, the Cobol is called from a CL. I can embed SQL if it helps....
Software/Hardware used:
ASKED:
September 5, 2011 5:30 PM
UPDATED:
March 31, 2012 5:08 PM
thanks for reply Yorkshireman. yes it a nightly job which is to extract daily changes to a file for autiting etc purposes. the file is updated in numerous places. An end of day copy is taken the previous day. do you mean whenever the file is updated to keep track of the key of updated records? If so, I dont think this would be sufficient, as for example, it wouldn’t catch manual updates to the file.
I beleive the best way would be to use SQL.
Use the the NOT EXISTS option
Here is a sample of one of mine.
INSERT INTO WRKENCRPT/DELPF (SYS, PRN, AGT, ACCTNO, CLIENT, STRTDAT, ENDDAT, CHGDAT, USRPRF) SELECT SYS, PRN, AGT, ACCTNO, CLIENT, STRTDAT, CURDATE(), CHGDAT, USRPRF FROM ICULIB/ACCTMSTR A WHERE CLIENT <> '1379' AND NOT EXISTS (SELECT * FROM WRKENCRPT/ACCTMSTR B WHERE A.ACCTNO = B.ACCTNO AND A.SYS = B.SYS AND A.PRN = B.PRN AND A.AGT = B.AGT)If you put the SQL statment in a source member, you can then just use the RUNSQLSTM command and you do not have to do any coding.
If you need to do multiple SQL statements, you can put them all in the same source member. Just add a semi-colon at the end of each statement.
Good question ARTZONE
YorkShireMan has an excellent approach using Triggers.
The trigger program is called by the system when there is an update/delete/add to the file by any means. It is called in “real” time as the events occur.
This trigger program could write directly to the audit file. It receives the before and after image of the record for an update, before image for a delete, after image for an add,. Program/user/job date & time of change.
Phil
Thanks everyone for all the sugestions… I will try out an SQL tomorrow and see how it goes… and update this discussion
If it can be stated in an SQL statement, it’s almost guaranteed that SQL will be faster. Any HLL is going to be calling into DB2 routines for every individual I/O statement and then coming back up into program code until the next I/O. A single SQL statement will basically only make a single call into DB2 routines and stay in there.
But if you want to track adds/changes/deletes, journaling will be far faster than alternatives.
Tom
By using triggers you are efectively spreading the processing around the system to the point of origin. In overall terms it will consume more resource, but by adding a couple of milliseconds at each change point, you – and users – won’t notice it.
Your audit file will be up to date, all the time, and a simple display via change time woudl enable a ‘real time plus milliseconds’ view.
If you haven’t written a trigger program then a quick net search will bring up lots of code examles, into which all you need do is substiture your origin file and add your audit file, and it will be working.
As an alternative, you could write a journal extractor – as each journal is detached, a function reads the entries and sifts out the ones relevant to the file under study. You have the choice of writing something to an audit file, or merely making the extracted journal entries your audit – that’s what a journal does, after all. If you journalled this file to its own journal you would retain the functionality of a system journal – accurate, comprehensive, timely, and have the audit trail you need – just save the journal receiver to tape and place it, day by day or whatever, into your audit log filing cabinet.
- No programming needed ! All your batch time released – Guaranteed accurate !
Your audit file will be up to date…
But you shouldn’t call it an “audit” file. It’s an application file like all the others. It’s subject to the same DFU or SQL or other modifications as any database file, so it’s inappropriate for audits.
Of course, you could journal the ‘audit’ file, but that just adds an additional layer.
Tom
But you shouldn’t call it an “audit” file. It’s an application file like all the others
Tom’s correct of course, but recording system activity within an application is a common enough solution that ‘audit file’ generally implies this kind of approach. A well designed application will have it secured in an appropriate manner against interference – so, as you say, probably many will not.
My preferred solution would be to use the system journal as the source of information about changes. It is guaranteed to be complete, you have it running anyway (you *do* have these files journalled?) and it’s quick. If your auditors need something on a tape somewhere then you could rewrite the journal entries to an audit file (application data, Tom) and secure it to a suitable user group profile.
Any number of ways of cracking this particular nut. What’s important – speed (your start point) – efficiency (= future costs) simplicity (= reduced maintenance costs) – security (= regulatory costs)
bonne chance !