2,555 pts.
 SQL EXCEPTION JOIN
I have three files (orderinfo, deliveryinfo and itemlines). The three files entries shall exist as 1 to 1 to many.

The files contains orders from 10+ years. Thus data has been migrated several times by various people.

I have discovered that the one-to-one-to-many rule has been broken for many orders. I wrote a pgm to check the file's ordernumbers against the other files to find the ordernumbers with missing records. It turned (as expected) out to be a very time-consuming task.

I challenged myself to write an sql to do the job, assuming that it will perform better, but... Alas, I must admit that I have not yet been able to write the sql. Can anybody help ?



Software/Hardware used:
IBM Power i, V6R1
ASKED: May 2, 2010  1:01 AM
UPDATED: May 4, 2010  6:21 PM

Answer Wiki:
A key bit of information is missing (no pun intended) - how are the tables related? Does each table have a single column "ordernumber"? Assuming that all the files are linked by an "ordernumber" in each table, then there are the following types of data relationship problems which could exist: 1. OrderInfo with no DeliveryInfo associated select orderid from orderinfo oi where not exists (select 1 from deliveryinfo di where di.ordernumber = oi.ordernumber); 2. OrderInfo with no ItemLines associated select orderid from orderinfo oi where not exists (select 1 from itemline il where il.ordernumber = oi.ordernumber); 3. DeliveryInfo with no ItemLines associated select deliveryid from deliveryinfo di where not exists (select 1 from itemlines il where il.ordernumber = di.ordernumber); 4. DeliveryInfo with an OrderNumber which doesn't match any OrderInfo record select deliveryid from deliveryinfo di where not exists (select 1 from orderinfo oi where oi.ordernumber = di.ordernumber); 5. ItemLines with an OrderNumber which doesn't match any OrderInfo record select itemid from itemlines il where not exists (select 1 from orderinfo oi where oi.ordernumber = il.ordernumber) There can also be itemlines records with ordernumbers that don't match deliveryinfo records, but that set would be a subset of the records already found above. Doing one giant query that finds all the broken links would be a challenge, and probably cause lots of table scans. If you have some nice unique indexes on the tables, like: create unique index oin on OrderInfo(ordernumber, orderid); create unique index din on DeliveryInfo(ordernumber, deliveryid); create unique index iln on ItemLines(ordernumber, itemid); then the queries will be relatively efficient. (This assumes you have some column, or small set of columns that is a primary key that can be used for the "orderid", "deliveryid", and "itemid" columns, respectively.)
Last Wiki Answer Submitted:  May 4, 2010  6:21 pm  by  Kccrosser   3,830 pts.
All Answer Wiki Contributors:  Kccrosser   3,830 pts.
To see all answers submitted to the Answer Wiki: View Answer History.


Discuss This Question:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _


 

How about something like this ?

SELECT *
FROM orderinfo oi
WHERE NOT EXISTS 
	(SELECT 1 
	FROM deliveryinfo
	WHERE order_id = oi.order_id)
OR NOT EXISTS
	(SELECT 1 
	FROM itemlines
	WHERE order_id = oi.order_id)

This kind of query works fast on indexed columns, but I don’t know if it will be usable on your platform.

 63,580 pts.