Here's the structure of the db I'm using: * Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred). * Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered).
Here's what I want to do: I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product).
Here's the challenge for the query at hand: There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY. I need to get the latest product data (in case there is more than one season) and the total number of seasons.
Here's the query I've designed: SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as Permanent FROM gin_Mouvements M LEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_ID LEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_ID LEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_ID LEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_ID LEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID) OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison, Rank() over (order by T.RefREFART ASC) as Pos FROM frs_ArticlesBDFournisseurTaille T LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS) ORDER BY RefREFART DESC) B
Query explained: All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here. I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by.
Performances: This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon.
Advice ?: If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know.
Stats: Execution plan sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered)[sp2k_staging].[dbo].[frs_ArticlesBDFournisseurTaille].