Challenge improving query performance on large tables

25 pts.
Tags:
Query optimization
Query performance
SQL query optimization
SQL Server 2008
SQL Server 2008 R2
Hi,

Here's the structure of the db I'm using: * Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred). * Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered).

Here's what I want to do: I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product).

Here's the challenge for the query at hand: There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY. I need to get the latest product data (in case there is more than one season) and the total number of seasons.

Here's the query I've designed: SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as Permanent FROM gin_Mouvements M LEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_ID LEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_ID LEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_ID LEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_ID LEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID) OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison, Rank() over (order by T.RefREFART ASC) as Pos FROM frs_ArticlesBDFournisseurTaille T LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS) ORDER BY RefREFART DESC) B

Query explained: All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here. I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by.

Performances: This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon.

Advice ?: If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know.

Stats: Execution plan sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered)[sp2k_staging].[dbo].[frs_ArticlesBDFournisseurTaille].
Thank you.



Software/Hardware used:
SQL server 2008 R2

Answer Wiki

Thanks. We'll let you know when a new response is added.

Start by getting rid in the index scan by adding a non-clustered index. If you can post the sqlplan file somewhere I can take a look at it.

Discuss This Question: 3  Replies

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
  • carlosdl
    You might want to re-post your query using the editor's code tool in order to make it more readable.
    68,495 pointsBadges:
    report
  • Sport2000
    Hi, Here's the structure of the db I'm using: * Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred). * Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered). Here's what I want to do: I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product). Here's the challenge for the query at hand: There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY. I need to get the latest product data (in case there is more than one season) and the total number of seasons. Here's the query I've designed: SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as Permanent FROM gin_Mouvements M LEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_ID LEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_ID LEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_ID LEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_ID LEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID) OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison, Rank() over (order by T.RefREFART ASC) as Pos FROM frs_ArticlesBDFournisseurTaille T LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS) ORDER BY RefREFART DESC) B Query explained: All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here. I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by. Performances: This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon. Advice ?: If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know. Stats: Execution plan sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered)[sp2k_staging].[dbo].[frs_ArticlesBDFournisseurTaille]. Thank you.
    25 pointsBadges:
    report
  • Sport2000
    Hi, I've split into 2 indexes codeean INCLUDE (refrefart,taille) and codeeanfrs INCLUDE (refrefart,taille). It's now working wonders. Thanks
    25 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Thanks! We'll email you when relevant content is added and updated.

Following