So, @rotnroll666 nerd sniped me once more. Apparently, the Neo4j Cypher question language helps arbitrary reductions, identical to any useful assortment API, oh say, the JDK Stream API:
Stream.of(2, 4, 3, 1, 6, 5)
.cut back((i, j) -> i * j)
.ifPresent(System.out::println); // Prints 720
SQL doesn’t have this, but it could be very helpful to have the ability to often do this. An arbitrary discount might be carried out “simply” in SQL. Let’s have a look at the above multiplication discount. In PostgreSQL, you’d write it like this:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
(
with recursive
u(i, o) as (
choose i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
choose u.i, u.o from u the place o = 1
union all
choose r.i * u.i, u.o from u be a part of r on u.o = r.o + 1
-- ^^^^^^^^^ discount
)
choose i from r
order by o desc
restrict 1
)
from t;
Woah. That’s a little bit of a syntactic beast. Let’s decompose it.
The mixture perform
First off, if we had been summing the values, we’d use the built-in SUM
perform, like this:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose sum(i)
from t;
That might produce 21. In case you’re keen to lose precision, you possibly can emulate PRODUCT()
utilizing logarithms. However we wrote REDUCE()
, a hypothetical one, like this:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose cut back(
t1.i * t2.i referencing gathered as t1, accumulating as t2
)
from t;
That is SQL, so the lambda expression would clearly use a ton of key phrases, utterly novel and distinctive to this explicit perform, and also you’d want jOOQ to make it composable 😁. Primarily, we’d have some kind of discount expression based mostly on two pseudo tables:
- The gathered desk containing the end result
- The accumulating desk (or fairly row)
A discount is a generic mixture perform that operates on teams. So, we should re-use some SQL mixture perform mechanism to realize the specified behaviour.
Utilizing ARRAY_AGG() to get the aggregation impact
First off, let’s do some aggregation. PostgreSQL’s ARRAY_AGG()
is ideal for this job, as a result of it
- Aggregates
- But kinda leaves the information untouched, not like e.g.
SUM()
In a means, it’s a set like Stream.acquire()
, not a discount.
If we use ARRAY_AGG()
in a correlated subquery, we’ll nonetheless get the aggregation impact, however we will unnest the array once more to a desk, with a view to function on it. You’ll be able to see this within the following instance:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
(
choose string_agg(i::textual content, ', ')
from unnest(array_agg(t.i)) as u(i)
)
from t;
This yields:
2, 4, 3, 1, 6, 5
Not a really helpful factor to do, mixture, unnest, and mixture once more, however it exhibits the facility of nesting an mixture perform in a correlated subquery’s FROM
clause. In case your RDBMS doesn’t have arrays, perhaps you are able to do the identical factor utilizing JSON_ARRAYAGG
and JSON_TABLE
, or XMLAGG
and XMLTABLE
.
Disclaimer: PostgreSQL usually Does The Proper Factor™. I believe you’d be extra onerous pressed to juggle with SQL syntax as elegantly in most different RDBMS, so this strategy isn’t transportable. However as Lætitia Avrot so elegantly put it:
Subsequent step, generate row numbers
There are primarily 2 methods how we will generate row numbers in our instance:
Adapting our earlier instance for some visualisation:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
(
choose string_agg(row(i, o)::textual content, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t;
(Superior, that row constructor!)
This produces:
(2,1), (4,2), (3,3), (1,4), (6,5), (5,6)
Doesn’t look fancy, however think about we group by even numbers:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
i % 2,
(
choose string_agg(row(i, o)::textual content, ', ')
from unnest(array_agg(t.i)) with ordinality as u(i, o)
)
from t
group by i % 2;
The result’s now:
i % 2 | string_agg |
---|---|
0 | (2,1), (4,2), (6,3) |
1 | (3,1), (1,2), (5,3) |
It’s a bit bizarre, proper? We GROUP BY
within the outer question, and your complete correlated subquery is the combination perform based mostly on the truth that its FROM
clause incorporates ARRAY_AGG()
. This isn’t a lot totally different from this question:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose 1 + sum(i) + 2
from t;
We’re used to constructing scalar expressions from mixture features on a regular basis. That is nothing fancy. We are able to simply additionally simply wrap the perform in one other subquery:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose (choose 1 + sum(i) + 2)
from t;
From right here, it’s not far fetched to increase the aggregate-function-in-scalar-subquery strategy to the FROM
clause, after which unnesting the aggregation once more. This may occasionally not “click on” instantly. The GROUP BY
clause in SQL is a bit bizarre, syntactically.
Comment: Regrettably, PostgreSQL doesn’t enable utilizing mixture features within the
FROM
clause on the identical question stage like in a correlated subquery. I used to be going to indicate a elaborateLATERAL
model, however this doesn’t work (but).
Now, recurse
The ultimate bit is the recursion with the r
desk:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
(
with recursive
u(i, o) as (
choose i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
choose u.i, u.o from u the place o = 1
union all
choose r.i * u.i, u.o from u be a part of r on u.o = r.o + 1
-- ^^^^^^^^^ discount
)
choose i from r
order by o desc
restrict 1
)
from t;
We merely recurse on the ordinality. The primary subquery of UNION ALL
produces the primary row of our information, particularly (1, 1)
. The subsequent iterations simply at all times multiply the results of r.i
by the worth of u.i
from the following row by ordinality. That is in all probability finest proven visually:
r.i | r.o | u.i |
2 = u.i (first iteration) | 1 | 2 |
8 = prev r.i * u.i | 2 | 4 |
24 = prev r.i * u.i | 3 | 3 |
24 = prev r.i * u.i | 4 | 1 |
144 = prev r.i * u.i | 5 | 6 |
720 = prev r.i * u.i | 6 | 5 |
Lastly, we don’t care about SQL’s set-based means of working. I.e. we don’t care about the entire set of multiplications which can be proven within the desk above. We solely care in regards to the final row, ordered by the ordinality, which incorporates our end in r.i
Achieved!
Utilizing group by
Simply as proven earlier than, we will simply add a GROUP BY
clause to the outer question. E.g. let’s multiply odd and even numbers individually:
with t(i) as (values (2), (4), (3), (1), (6), (5))
choose
i % 2,
(
with recursive
u(i, o) as (
choose i, o
from unnest(array_agg(t.i)) with ordinality as u(i, o)
),
r(i, o) as (
choose u.i, u.o from u the place o = 1
union all
choose r.i * u.i, u.o from u be a part of r on u.o = r.o + 1
)
choose i from r
order by o desc
restrict 1
),
string_agg(i::textual content, ' * ')
from t
group by i % 2
I’ve added one other mixture perform STRING_AGG()
for good measure to get:
i % 2 | i | string_agg |
---|---|---|
0 | 48 | 2 * 4 * 6 |
1 | 15 | 3 * 1 * 5 |
Great, isn’t it? Now, I wasn’t in a position to simply add an OVER()
clause proper there. That produced
SQL Error [42P20]: ERROR: window features aren’t allowed in features in FROM
Possibly that may work as properly, within the close to future? Or, I’d give you one other hack to make it work, in case of which I’ll replace this put up.
jOOQ help
Clearly, this shall be supported in jOOQ quickly: https://github.com/jOOQ/jOOQ/points/11385. The syntax shall be once more rather more bearable:
ctx.choose(T.I.mod(inline(2)), cut back(T.I, (i1, i2) -> i1.occasions(i2)))
.from(T.I)
.groupBy(T.I.mod(inline(2)))
.fetch();
Different emulations utilizing precise CREATE AGGREGATE FUNCTION
shall be investigated as properly, within the close to future.