Performance implication exist, but it is secondary.
Primary reason:
distinct on every select shows either lack of knowledge of schema, in particular which columns make rows unique, or unfortunate schema design.
(Apart from niche cases, schema should be somewhat normal. I.e. column parent_name belongs in the table parent, not in the table student)
Select a from x where myuniquekey=1; —- guaranted to return 1 or zero rows, if myuiniquekey is actually unique.
Select a from x join y on x.parent_id = y.y_id —- guaranteed to return same amount of rows as exist in y, never more, never duplicates y rows. (N-to-1 relation)
If distinct is used in any of above, then question “why?” naturally arises.
In more severe case, leads to bugs:
Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Technically sql allows comparing unrelated columns (colour=last_name), but for vast majority of cases, when joining, one of the side should be joined using it’s unique key, and other side should be using it’s foreign key, which ensures that duplicates don’t appear randomly, and thus distinct is not needed.
> If distinct is used in any of above, then question “why?” naturally arises.
Not if distinct is the default.
> Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Either with or without distinct can be a bug depending on what you are doing it for.
There are actually 4 variations on what you might want, and you can get all of them with distinct:
select distinct student.student_id, student.student_name, parent.parent_id, parent.parent_name from ...
select distinct student.student_name, parent.parent_id parent.parent_name from ...
select distinct student.student_id, student.student_name, parent.parent_name from ...
select distinct student.student_name, parent.parent_name from ...
Our main application at work is essentially a CRUD application, and I've worked on it for over 10 years now. I'm fairly confident I can count on one hand the number of cases where a join returned unexpected duplicates which DISTINCT would "fix".
Sometimes I wonder if we're just weird, somehow avoiding this issue.
These examples reminded me one more issue: change in column selection, might change number of rows,
which means column addition/removal is so much riskier afair.
> Not if distinct is the default.
If that works for you, great, but let’s agree to disagree here.
Your mental model, if you will forgive the straw man, is that SELECT over multiple tables is conceptually equivalent to nested for-loops over each table, and the WHERE condition is an if-statement.
My mental model is that I'm working with sets. If yesterday I asked for the set of CITY,COUNTRY, and today I've changed that to the set of COUNTRY, then obviously the result set today is going to be much smaller. This is not a risk to me -- asking for a different set gives me a different set, I can't imagine being surprised by that.
A great idea in principal, but having a local server doesn't necessarily mean said server will continue functioning once severed from the rest of the internet. Things like DNS and timing signals are an issue. Until it is tested, and retested after every update, I wouldn't trust any financial server to keep ticking along once so disconnected.
> you could get time signals from GPS (or GLONASS)
You forgot to mention Galileo ... GLONASS may not be politically attractive :)
But beyond the GNSS ecosystem, there are of course other interesting options:
Safran STL[1] which uses LEO sats. Apparently it works well in places where you can't get a good GNSS signal (i.e. indoors without an external antenna). (This was previously Orolia STL, but they were acquired by Safran).
Most national time labs also offer a leased-line service, e.g. NPL (UK)[2]
There is also a very niche (read: VERY expensive) commercial timing-as-a-service product from a company called Hoptroff[3]. You license both the service/connection plus their software clients, so the $$$$$$ add up. Definitely one of those "if you have to ask you can't afford it" vendors.
sure, just a matter of what servers are configured to do ahead of time and what happens when assumptions fail - are the servers tested against a situation where they can't get the time as expected etc
What happens if/when your GNSS constellation operator intentionally degrades quality (ex: GPS & selective availability)? Do you fall back to a local atomic clock assuming one is available?
Problem is that despots want power, whatever system you have you will get despots wanting to be at the top, and doing what is necessary to get there. Democracy at least moves them around a bit.
One interesting fact in the article is that for the top spots at least, you did it once for a year and then you were barred from that office for like 10.
I’ve read something like: there are bunch of observations of extremely dense masses in various suroundings. Talented people can find explanations other than black holes in each case.
But that feels adhoc, and black hole fits quite nicely for whole class of observations. Current consensus is that b holes exist.
My read from looking at the literature is that the feeling is more like "black holes are a mathematical theory that have not been falsified by observational evidence, and for which we have some compelling, but inconclusive data, reconstructed from intensely noisy and poor resolution sources. Seems likely, but more data needed."
If you're sufficiently cynical, you can describe almost all astronomy results that way. Lots of alternate explanations for the observations have been proposed, and fail to match the full evidence.
Just how much evidence are you going to demand of phenomena hundreds of lightyears away before you say, "yeah, that's probably what it is"? Anyway, do be sure to read the whole list of observations linked above by csours, as there are quite a few.
Primary reason: distinct on every select shows either lack of knowledge of schema, in particular which columns make rows unique, or unfortunate schema design. (Apart from niche cases, schema should be somewhat normal. I.e. column parent_name belongs in the table parent, not in the table student)
Select a from x where myuniquekey=1; —- guaranted to return 1 or zero rows, if myuiniquekey is actually unique.
Select a from x join y on x.parent_id = y.y_id —- guaranteed to return same amount of rows as exist in y, never more, never duplicates y rows. (N-to-1 relation)
If distinct is used in any of above, then question “why?” naturally arises.
In more severe case, leads to bugs:
Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Technically sql allows comparing unrelated columns (colour=last_name), but for vast majority of cases, when joining, one of the side should be joined using it’s unique key, and other side should be using it’s foreign key, which ensures that duplicates don’t appear randomly, and thus distinct is not needed.