The mentioned sub part:
(EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617
Can be set up in a more optimised way via:
EVHYY = 2004 and EVHMM = 06 and EVHDD = 17
In case the first condition is not true, rest of condition(s) is/are not tested anymore…..
Another option to optimise is to introduce a logical on the fields EVHYY, EVHMM and EVHDD for table INVEHH
The problem is the subquery. It must be run for every row in the table since the rows in INVEHH can change at any time. Any change to INVEHH by any other user or job in the system could change the result set. Are the tables journaled? Do jobs on the system run under commitment control?
If you want the subquery to return a static result, you should be able to use a Common Table Expression:
WITH HH as (Select EVENT
where (EVHYY*10000 + EVHMM*100 + EVHDD)=
select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
where EVENT in (select EVENT from HH)
The CTE should produce a result set that the full-select can rely upon.
Of course, the date calculation should also go. There seems to be no point in executing the multiplication and summation of date values when they could be directly compared.
The subquery is indeed the problem. Generally, when you do a computation in the Where clause, you prevent the query optimizer from using available indexes. Direct comparisons with the individual fields is MUCH more efficient.
However, you also need to make sure you have an effective index on the table.
In this case, the optimized index would be:
create unique index ui_myindex on INVEHH ( EVHDD, EVHMM, EVHYY, EVENT );
Then, when you run your subquery, the lookup in the index will be very efficient. This kind of “covering” index allows the query engine to completely avoid doing a fetch against the real table – when it finds an index record with the corresponding YY/MM/DD values, it has the EVENT data in the index itself.
Why do I suggest the index column order be DD, then MM, then YY, then EVENT?
Depending on the database engine and version, the order of the columns in the index can be quite important. In this case, the DD values will range from 01 to 31, the MM values from 01 to 12, and the YY values probably from 2000 to 2009. Thus, the DD values are the most “discriminant” (the widest range of values) and the YY values the least “discriminant”. Ordering the columns from most to least discriminant can significantly boost performance, as the query engine is dividing up the available values “quicker” by using the more discriminant values first.
I would bet that recoding the subquery and adding the above index will solve your performance problems.