Question

  Asked: Oct 20 2004   8:40 AM GMT
  Asked by: apothen


DB2/400 SQL Sub Query Performance?


Database Management Systems, DB2, Data warehousing/Business intelligence, UDB for iSeries/i5, DB2 universal database

Here is the problem. Table INVEHH is a (Events header table) with about 500 records and INVEHD (Event Details) with about 22 million records.
These tables are on a DB2/400 database.

1) Original Query that takes forever:

select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
from INVEHD
where EVENT in (Select EVENT
from INVEHH
where (EVHYY*10000 + EVHMM*100 + EVHDD)=
20040617)


2) If I run the subquery first then I get the following.

Select EVENT
from INVEHH
where (EVHYY*10000 + EVHMM*100 + EVHDD)=
20040617


Event Number
19
967
970
972
986
990
991
993
994
998
******** End of data ********

3) And then if I re-run the first query with the results from the subquery like this, I get instentaneous results.

select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
from INVEHD
where EVENT in ( 19,
967,
970,
972,
986,
990,
991,
993,
994,
998)


As far as I know query one should perform just as well as query 3. But one takes forever and 3 has instentaneous response.

Let me know, if you can tell me what is it that I am doing wrong. I had discussed this with my colleagues and we came up with some alternatives that executes pretty fast. But theoretically query one should run just as fast.

Ajish

Subscribe to Alerts! Get questions and answers delivered to your Inbox.


E-mail me updates on this question



   SUBSCRIBE

hidden modal window

Answer Wiki (Improve, edit or add to this answer)


 RATE THIS ANSWER
0
Click to Vote:
  •   0
  •  0



Tip:

The mentioned sub part:
(EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617

Can be set up in a more optimised way via:
EVHYY = 2004 and EVHMM = 06 and EVHDD = 17

In case the first condition is not true, rest of condition(s) is/are not tested anymore.....

Another option to optimise is to introduce a logical on the fields EVHYY, EVHMM and EVHDD for table INVEHH
  • AddThis Social Bookmark Button

Browse more Questions and Answers on Database, DataManagement and AS/400.

Looking for relevant Database Whitepapers? Visit the SearchOracle.com Research Library.


Discuss This Answer


You must be logged-in to discuss a question. Log-in/Register

apothen  |   Oct 21 2004  10:21AM GMT

The problem is not the subquery, that’s running over a table that has only about 1000 records. My question really is about how DB2/400 optimizes the query.

This is how I think the query optimizer should do:

Step 1) Run the subquery over the table with 500 records - should run really fast.
Step 2) Now run the main query using the result set - The table is indeed indexed by the field (element) in the where clouse.

What seems to be happenning is:

DB2/400 runs the subquery for each one of the records in the big table with 22 million records.

I know in SQL Server and Oracle, we can look at the query execution path and make changes to it, if RDBMS did not pick the best route.

So my question really is, is there any such facility in DB2/400? or Does DB2/400 expect the queries to be structured in some other fashion that is specific to the platform?

 

BigKat  |   Oct 21 2004  11:59AM GMT

SQL on the iSeries seems to have a couple of quirks from other platforms. It seems that in a statement using “where EVENT in (Select EVENT from INVEHH…” syntax, the “in” select is run for EVERY record in the outer statement.

Try using an inner join syntax. I’ve found it to help most of the time!

select EVENT, INUMBR, ISTORE, EVHYY, EVHMM, EVHDD
from INVEHD as a inner join (Select distinct EVENT from INVEHH where (EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617) as b on a.EVENT = b.event

even better is to use EVHYY = 2004 and EVHMM = 06 and EVHDD = 17 instead of (EVHYY*10000 + EVHMM*100 + EVHDD)= 20040617 because SQL can make use of a logical over the date fields if they exist

if you can say that there can NEVER be duplicate EVENTs selected from the INVEHH file, you can eliminate the distinct from the statement. Not knowing the data relationships for sure I added the distinct just in case.