Best practice of implementing sql "JOIN" returning policy in snowflake

Question

Say we have two tables performing a left join:

Table 1

Joint Key  || Attribute 1 || Attribute 2 || Attribute 3
   A             1               11           21        
   B             2               12           22
   C             3               13           23

Table 2

Joint Key  || Attribute 4 || Attribute 5 
   A             31               41      
   A             32               42      
   C             33               43

by performing a table 1 left join table 2 on "Joint Key" it will return two records having Joint Key = 'A'

Joint Key  || Attribute 1 || Attribute 2 || Attribute 3 || Attribute 4 || Attribute 5 
   A             1               11           21              31               41    
   A             1               11           21              32               42

What's the best practice of defining the return police, specifically in snowflake, that can return me the same row count as table 1.

Taking the above example, I want the the record has the MAX(Attribute 4). Two initial ideas come to my mind

Option 1: use "GROUP BY" clause -- need list columns explicitly, cumbersome when dealing with table has many columns.

Option 2: something like

select * from (
  select 
    Tabel1.*
    max(Table2.Attribute_4) as mx_Attribute_4,
    Table2.Attribute_5
 from Table1
 left join Table2
 on Joint_Key
) as temp
where temp.Attribute_4 = temp.mx_Attribute_4

it's quite complicated and time-consuming too.

Any other suggestions?

David Garrison · Accepted Answer · 2021-09-28 14:37:42Z

2

you could use QUALIFY

Something like:

select
    t1.Joint_key, t1.Attribute_1, t1.Attribute_2, t1.Attribute_3, t2.Attribute_4, t2.Attribute_5
from Table1 t1
left join Table2 t2
    on t1.Joint_key = t2.Joint_key
qualify row_number() over(partition by Joint_Key order by Attribute_4 desc) = 1

This is certainly more clean, and should be more efficient than a group by. It does still require the query to sort records by Attribute_4, but I don't see a way of avoiding that unless you are ok with using any of the sets of values instead of the one with MAX(Attribute_4). In that case you could be more efficient by using order by 1 in the row_number() window function.

edited Sep 28, 2021 at 14:37

answered Sep 28, 2021 at 14:25

David Garrison

2,8101 gold badge18 silver badges26 bronze badges

Thank you! I tried something like this with correlated subquery but snowflake doesn't support it. This is the answer I am after!
– QPeiran
Commented Sep 28, 2021 at 20:43

Add a comment |

NickW · Accepted Answer · 2021-09-28 07:22:14Z

0

You seem to have some confused ideas about how joins work. If you have Table1 left join Table2 then it will return all the records from Table1 with any data from matching records in Table2 - so in your case you would normally get the 3 records from Table1.

However, in your case you have 2 records in table2 that matches 1 record in table 1 so this will duplicate your results and you will get 4 records: 2 with key A and then 1 with B and 1 with C.

Anyway, given the example data you’ve provided, please update your question with the result you want to achieve so that someone can help you

answered Sep 28, 2021 at 7:22

NickW

9,5822 gold badges8 silver badges22 bronze badges

I interpret the question as having a pretty good understanding of left join. The results in the question are only including the problem rows, which is the "duplicated" row with key 'A'. And the requested output is to have those two output rows consolidated into one row.
– David Garrison
Commented Sep 28, 2021 at 14:30

Add a comment |

Collectives™ on Stack Overflow

Best practice of implementing sql "JOIN" returning policy in snowflake

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
sql
t-sql
snowflake-cloud-data-platform
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged sqlt-sqlsnowflake-cloud-data-platform or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
sql
t-sql
snowflake-cloud-data-platform
or ask your own question.