> If distinct is used in any of above, then question “why?” naturally arises.
Not if distinct is the default.
> Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Either with or without distinct can be a bug depending on what you are doing it for.
There are actually 4 variations on what you might want, and you can get all of them with distinct:
select distinct student.student_id, student.student_name, parent.parent_id, parent.parent_name from ...
select distinct student.student_name, parent.parent_id parent.parent_name from ...
select distinct student.student_id, student.student_name, parent.parent_name from ...
select distinct student.student_name, parent.parent_name from ...
Our main application at work is essentially a CRUD application, and I've worked on it for over 10 years now. I'm fairly confident I can count on one hand the number of cases where a join returned unexpected duplicates which DISTINCT would "fix".
Sometimes I wonder if we're just weird, somehow avoiding this issue.
These examples reminded me one more issue: change in column selection, might change number of rows,
which means column addition/removal is so much riskier afair.
> Not if distinct is the default.
If that works for you, great, but let’s agree to disagree here.
Your mental model, if you will forgive the straw man, is that SELECT over multiple tables is conceptually equivalent to nested for-loops over each table, and the WHERE condition is an if-statement.
My mental model is that I'm working with sets. If yesterday I asked for the set of CITY,COUNTRY, and today I've changed that to the set of COUNTRY, then obviously the result set today is going to be much smaller. This is not a risk to me -- asking for a different set gives me a different set, I can't imagine being surprised by that.
Not if distinct is the default.
> Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Either with or without distinct can be a bug depending on what you are doing it for.
There are actually 4 variations on what you might want, and you can get all of them with distinct: