The SQL commonplace is aware of an fascinating function the place you possibly can undertaking any practical dependencies of a main (or distinctive) key that’s listed within the GROUP BY
clause with out having so as to add that practical dependency to the GROUP BY
clause explicitly.
What does this imply? Contemplate this easy schema:
CREATE TABLE writer (
id INT NOT NULL PRIMARY KEY,
title TEXT NOT NULL
);
CREATE TABLE e book (
id INT NOT NULL PRIMARY KEY,
author_id INT NOT NULL REFERENCES writer,
title TEXT NOT NULL
);
As a way to rely the variety of books by writer, we have a tendency to jot down:
SELECT a.title, rely(b.id)
FROM writer a
LEFT JOIN e book b ON a.id = b.author_id
GROUP BY
a.id, -- Required, as a result of names aren't distinctive
a.title -- Required in some dialects, however not in others
We now have to group by one thing distinctive on this case, as a result of if two authors are known as John Doe, we nonetheless need them to supply separate teams. So GROUP BY a.id
is a given.
We’re used to additionally GROUP BY a.title
, particularly in these dialects that require this, since we record a.title
within the SELECT
clause:
- Db2
- Derby
- Exasol
- Firebird
- HANA
- Informix
- Oracle
- SQL Server
However is it actually required? It isn’t as per the SQL commonplace, as a result of there’s a practical dependency between writer.id
and writer.title
. In different phrases, for every worth of writer.id
, there may be precisely one doable worth of writer.title
, or writer.title
is a operate of writer.id
Which means it doesn’t matter if we GROUP BY
each columns, or solely the first key. The end result have to be the identical in each instances, therefore that is doable:
SELECT a.title, rely(b.id)
FROM writer a
LEFT JOIN e book b ON a.id = b.author_id
GROUP BY a.id
Which SQL dialects help this?
At the very least the next SQL dialects help this language function:
- CockroachDB
- H2
- HSQLDB
- MariaDB
- MySQL
- PostgreSQL
- SQLite
- Yugabyte
It’s noteworthy that MySQL used to easily ignore whether or not a column might be projected unambiguously or not, within the presence of GROUP BY
. Whereas the next question was rejected in most dialects, it was not, in MySQL, previous to the introduction of the ONLY_FULL_GROUP_BY mode:
SELECT author_id, title, rely(*)
FROM writer
GROUP BY author_id
What ought to we show for writer.title
, if an writer has written a couple of e book? It doesn’t make sense, but MySQL nonetheless used to permit it, and would simply undertaking any arbitrary worth from the group.
Immediately, MySQL solely permits for projecting columns with a practical dependency on the GROUP BY
clause, as is permitted by the SQL commonplace.
Professionals & Cons
Whereas the shorter syntax that avoids the additional columns is likely to be simpler to take care of (simple to undertaking further columns, if required), there may be some danger of queries breaking in manufacturing, particularly when underlying constraints are disabled, e.g. for a migration. Whereas it’s unlikely {that a} main secret’s disabled in a dwell system, it might nonetheless be the case, and with out the important thing, a beforehand legitimate question will not be legitimate for a similar motive why MySQL’s previous interpretation was invalid: There’s not a assure of practical dependency.
Different syntax
Ranging from jOOQ 3.16, and #11834, it is going to be doable to reference tables straight within the GROUP BY
clause, as a substitute of particular person columns. For instance:
SELECT a.title, rely(b.id)
FROM writer a
LEFT JOIN e book b ON a.id = b.author_id
GROUP BY a
The semantics will probably be:
- If the desk has a main key (composite or not), use that within the
GROUP BY
clause, as a substitute - If the desk doesn’t have a main key, record all of the columns from the desk as a substitute.
Since not one of the RDBMS supported by jOOQ at the moment helps this syntax, it’s a purely artificial jOOQ feature.