Oracle 10g process each record exactly on time multiple processes

Oracle 10g
Oracle 10g administration
SELECT statement
UPDATE statement
I have over 400 records added to a table in a minute. A processed flag is set in each record as they are processed. I want to process each record exactly one time. I want to process the records in small groups with multiple processes; that may run on disparate machines. It is desirable to process them in the order they were inserted into the table. I was thinking I could use rownum to limit the number of records selected for processing. Also thinking I could use select for update skip locked so that each process would select a different set of records to process. But either I'm wrong or clueless. Can anyone offer a suggestion that will work? I'm also lazy and don't want to write or use some server to read the records and dole out the work to connecting processes. Is there some oracle feature or collection of features that will provide a solution? I'm running oracle 10g.

Answer Wiki

Thanks. We'll let you know when a new response is added.

We probably need a little more info to fully answer this question, but here are some notes/suggestions:

1. If you are thinking of querying recent records based on the “processed flag”, the flag should be an integer or date/time value that is changed for every processed record – preferably a non-null column. If you use some form of true/false, y/n, 0/1 valued column, Oracle’s query optimizer will quickly decide the column isn’t very useful as an index and your query to get recent records will become a full table scan. Simple solutions:
a) Make it a date/time field, set to null for new records and then set to the date/time processed. This works, but I prefer my indexed columns to not contain null values, so I would probably use:
b) Make it an integer field, initially set to zero, and then set to a non-zero sequence number when it is processed.
(n.b. Both of these have advantages and disadvantages. You could also use the date/time field and set the “new” records to a fixed old date – like “1/1/1900” to avoid the null problem – I am not thrilled with that, because then the query to get the records has to be hard coded for an exact match with this arbitrary date.)

2. You said “desirable to process them in the order” they were inserted. Is this merely desirable, or a requirement?
If you <b>have </b>to process the records in the order they were inserted, then you will probably have to implement the (non-lazy) approach of a table reader that dispatches the records in order to the other processes. You would need to do this if one of the records in a batch could be an update to an earlier record in the same time window – running chunks of records in parallel could result in the “update” being run prior to the “original” data.

If it is merely that you want to have them run more or less in the time order that they were received, that is much simpler. Include a field containing a 0/1 flag to indicate the record has been “taken”, and a date/time field containing the date added to the table (note that this would need to be separate from the “date processed” field in 1 above).

cursor get_rec is select <primary key fields> from <table>
where Processed_Flag = 0 and Taken_Flag = 0
order by Date_Inserted;
while true
open get_rec;
fetch get_rec into <fields>;
if get_rec%found then
update <table> set Taken_Flag = 1
where <primary_key_fields = cursor fields> and Processed_Flag = 0 and Taken_Flag = 0;
— if the record update succeeded, then this process now “owns” this record, and any other thread
— will fail and drop to the exception below
<process record here>
update <table> set Processed_Flag = MySequence.NextVal;
where <primary_key_fields = cursor fields> and Processed_Flag = 0;
when NO_DATA_FOUND then
end if;
close get_rec;
end loop;

Obviously, there is a lot missing from the above – error handling and a way to start/stop the loop, for example. But this could be a framework for each of the individual processes on the different systems.

Note that this will “tend” to fetch records in order, but there is no guarantee.

I use a separate Processed_Flag (non-zero after processing is done) and Taken_Flag (selected for processing) to allow for recovery should a process fail and leave a record in an intermediate state. If you use the same flag for both purposes, you need to beware that a process failure after a record is “taken” could be mishandled as though the processing had completed. In this case, simply resetting the Taken_Flag to zero would enable re-processing of the “lost” record(s). You could also set the Taken_Flag to the Oracle process SID, which could help diagnosis if/when problems occur.

Discuss This Question: 3  Replies

There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • carlosdl
    "It is desirable to process them in the order they were inserted into the table" Just a couple of comments about the above text: -If you want to process the records in small groups, in parallel, then the above might not be possible. For example, if you had 10 records to be processed in two groups, one from record 1 to 5 and the other from record 6 to 10, then records 1 and 6 would be the first to be processed. -The order 'in which they were inserted into the table' is not really an order from the database point of view. The only way to ensure that the records will be queried in some order is specifying that order in the query (i.e. ORDER BY <some_columns>). On the other hand, I would not use the 'SKIP LOCKED' feature, because it is undocumented, and it could change or disappear in future releases, and moreover it doesn't always work as one could expect. And about the solution.... ... ... I'm still trying to figure it out. :-)
    85,865 pointsBadges:
  • Wileecanislatrans
    First I want to say thanks for your input. 1) I said it is desirable to process them in order because I realize if I select in groups there will be the issue where record 1 is processed first and 6 goes next. It should take less than a second to process each record. One reason for many processes and small groups. If I run 40 processes and each one selects 10 records I should be done in 10 seconds. If record 391 runs in the same second as record 1 and records 10 and 400 both run 10 seconds later that isn't a problem. However, if record 1 is processed 15 minutes after it is inserted into the database that is a problem. I could have picked a better choice of words. 2) I realize 'skip locked' is undocumented or at least not blessed and could go away. Mostly put it here as something I tried and to help explain what I'm trying to do. 3) Each record has an insertion time stamp which I had plan to use to order the records. 4) Record processing is done in java. 5) I too am trying to digest the answer 6) Is this not a situation encountered by others? I was hoping someone would say yeah "select .........." quick and easily done. Guess I'm gonna have to work for a living.
    15 pointsBadges:
  • carlosdl
    I think the approach suggested by Kccrosser could work. Haven't you taken a look at it ?
    85,865 pointsBadges:

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.


Share this item with your network: