Tip:
The mentioned sub part:
(EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617
Can be set up in a more optimised way via:
EVHYY = 2004 and EVHMM = 06 and EVHDD = 17
In case the first condition is not true, rest of condition(s) is/are not tested anymore.....
Another option to optimise is to introduce a logical on the fields EVHYY, EVHMM and EVHDD for table INVEHH
=======================================================================
The problem is the subquery. It must be run for every row in the table since the rows in INVEHH can change at any time. Any change to INVEHH by any other user or job in the system could change the result set. Are the tables journaled? Do jobs on the system run under commitment control?
If you want the subquery to return a static result, you should be able to use a Common Table Expression:
<pre>
WITH HH as (Select EVENT
from INVEHH
where (EVHYY*10000 + EVHMM*100 + EVHDD)=
20040617 )
select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
from INVEHD
where EVENT in (select EVENT from HH)
</pre>
The CTE should produce a result set that the full-select can rely upon.
Of course, the date calculation should also go. There seems to be no point in executing the multiplication and summation of date values when they could be directly compared.
Tom
----------------------------- kccrosser
The subquery is indeed the problem. Generally, when you do a computation in the Where clause, you prevent the query optimizer from using available indexes. Direct comparisons with the individual fields is MUCH more efficient.
However, you also need to make sure you have an effective index on the table.
In this case, the optimized index would be:
create unique index ui_myindex on INVEHH ( EVHDD, EVHMM, EVHYY, EVENT );
Then, when you run your subquery, the lookup in the index will be very efficient. This kind of "covering" index allows the query engine to completely avoid doing a fetch against the real table - when it finds an index record with the corresponding YY/MM/DD values, it has the EVENT data in the index itself.
Why do I suggest the index column order be DD, then MM, then YY, then EVENT?
Depending on the database engine and version, the order of the columns in the index can be quite important. In this case, the DD values will range from 01 to 31, the MM values from 01 to 12, and the YY values probably from 2000 to 2009. Thus, the DD values are the most "discriminant" (the widest range of values) and the YY values the least "discriminant". Ordering the columns from most to least discriminant can significantly boost performance, as the query engine is dividing up the available values "quicker" by using the more discriminant values first.
I would bet that recoding the subquery and adding the above index will solve your performance problems.
The problem is not the subquery, that’s running over a table that has only about 1000 records. My question really is about how DB2/400 optimizes the query.
This is how I think the query optimizer should do:
Step 1) Run the subquery over the table with 500 records – should run really fast.
Step 2) Now run the main query using the result set – The table is indeed indexed by the field (element) in the where clouse.
What seems to be happenning is:
DB2/400 runs the subquery for each one of the records in the big table with 22 million records.
I know in SQL Server and Oracle, we can look at the query execution path and make changes to it, if RDBMS did not pick the best route.
So my question really is, is there any such facility in DB2/400? or Does DB2/400 expect the queries to be structured in some other fashion that is specific to the platform?
SQL on the iSeries seems to have a couple of quirks from other platforms. It seems that in a statement using “where EVENT in (Select EVENT from INVEHH…” syntax, the “in” select is run for EVERY record in the outer statement.
Try using an inner join syntax. I’ve found it to help most of the time!
select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
from INVEHD as a inner join (Select distinct EVENT from INVEHH where (EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617) as b on a.EVENT = b.event
even better is to use EVHYY = 2004 and EVHMM = 06 and EVHDD = 17 instead of (EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617 because SQL can make use of a logical over the date fields if they exist
if you can say that there can NEVER be duplicate EVENTs selected from the INVEHH file, you can eliminate the distinct from the statement. Not knowing the data relationships for sure I added the distinct just in case.