I’ve not too long ago found a pleasing new addition to PostgreSQL 14, the brand new enable_memoize flag that improves the efficiency of some nested loop joins the place statistics trace at this being acceptable. I imply, who can resist this temptation:
Enhancing question velocity by 1000x hints at one thing very suboptimal having been happening earlier than, and a device like memoization could be of nice assist. However will it additionally assist with an “extraordinary” be part of? I wished to attempt myself.
What’s memoization?
In an ideal world freed from unwanted side effects (and SQL is such an ideal world, in principle), memoization implies that we will substitute y
for f(x)
in any computation, provided that y = f(x)
. For instance, irrespective of what number of instances you calculate UPPER('x')
, you’ll at all times get 'X'
. If the calculation of such a operate is dear, and there are solely few doable enter values, then why not simply preserve a hash map that maps all earlier enter values and use that to search for recognized (or at the very least frequent) values as a substitute of computing them once more?
As I’ve proven beforehand on this weblog, Oracle 11g has launched a function known as scalar subquery caching, a function which you’ll be able to activate in jOOQ to keep away from pricey PL/SQL context switches.
Within the case of PostgreSQL’s enable_memoize
, this may be significantly helpful for nested loop joins in SQL, and to reference the above tweet, a lateral be part of is usually executed by way of a nested loop be part of.
Turning the function on and off
I’ve created a schema like this:
CREATE TABLE t AS
SELECT i, i % 5 AS j
FROM generate_series(1, 100000) AS t(i);
CREATE TABLE u AS
SELECT i, i % 20000 as j
FROM generate_series(1, 100000) AS t(i);
CREATE INDEX uj ON u(j);
In abstract:
- Each tables
t
andu
have 100000 rows. t.j
has solely 5 distinct values, every worth seems 20000 instances.u.j
has 20000 distinct values, every worth seems 5 instances.
When operating this on PostgreSQL 14:
SELECT current_setting('enable_memoize');
I get:
|current_setting| |---------------| |on |
So, the function is lively, which I may also see in an EXPLAIN
of the next question:
EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;
The plan is:
|QUERY PLAN | |----------------------------------------------------------------------| |Nested Loop (price=0.30..8945.41 rows=496032 width=16) | | -> Seq Scan on t (price=0.00..1443.00 rows=100000 width=8) | | -> Memoize (price=0.30..0.41 rows=5 width=8) | | Cache Key: t.j | | -> Index Scan utilizing uj on u (price=0.29..0.40 rows=5 width=8)| | Index Cond: (j = t.j) |
With out memoization, when becoming a member of the 2 tables like that, then, for the 100000 rows in t
, I’ve to search for 100000x the 5 matching rows in u
. But when memoization kicks in, then I should carry out the lookup solely 5 instances, as a result of there are solely 5 distinct values of t.j
We will mess around with execution plans by turning the function on or off:
SET enable_memoize = ON;
SET enable_memoize = OFF;
When turned off, PostgreSQL appears to decide on a hash be part of or merge be part of as a substitute, on my machine (between a number of executions, the plan may swap)
|QUERY PLAN | |-------------------------------------------------------------------| |Hash Be a part of (price=3084.00..11568.51 rows=499351 width=16) | | Hash Cond: (t.j = u.j) | | -> Seq Scan on t (price=0.00..1443.00 rows=100000 width=8) | | -> Hash (price=1443.00..1443.00 rows=100000 width=8) | | -> Seq Scan on u (price=0.00..1443.00 rows=100000 width=8)| |QUERY PLAN | |------------------------------------------------------------------------| |Merge Be a part of (price=9748.11..763846.11 rows=50000000 width=16) | | Merge Cond: (u.j = t.j) | | -> Index Scan utilizing uj on u (price=0.29..3848.29 rows=100000 width=8)| | -> Type (price=9747.82..9997.82 rows=100000 width=8) | | Type Key: t.j | | -> Seq Scan on t (price=0.00..1443.00 rows=100000 width=8) |
Let’s Benchmark
We’re utilizing the standard benchmark approach described right here:
- We repeat an operation 25x in mode A and mode B and evaluate (or greater than 25, if it’s a quick operation)
- We repeat the above 5x to mitigate any warmup and different caching results
You may run the next benchmark on the above schema your self, to confirm:
DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the entire benchmark a number of instances to keep away from warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Assertion 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT t.*
FROM t JOIN u ON t.j = u.j
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Assertion 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
On my machine, the outcomes are constant, respectable, not too spectacular, however nonetheless important:
Run 1, Assertion 1: 00:00:03.763426 Run 1, Assertion 2: 00:00:03.401346 Run 2, Assertion 1: 00:00:03.769419 Run 2, Assertion 2: 00:00:03.375677 Run 3, Assertion 1: 00:00:03.771465 Run 3, Assertion 2: 00:00:03.374413 Run 4, Assertion 1: 00:00:03.769136 Run 4, Assertion 2: 00:00:03.398734 Run 5, Assertion 1: 00:00:03.772544 Run 5, Assertion 2: 00:00:03.375272
I.e. a ten% speedup. Throughout the entire system, that alone would already be price it.
Optimising LATERAL
Let’s attempt optimising LATERAL
as a substitute. We may run a question like this:
SELECT *
FROM
t,
LATERAL (
SELECT depend(*)
FROM u
WHERE t.j = u.j
) AS u(j)
The EXPLAIN
of the above is
|QUERY PLAN | |---------------------------------------------------------------------------------| |Nested Loop (price=4.40..3969.47 rows=100000 width=16) | | -> Seq Scan on t (price=0.00..1443.00 rows=100000 width=8) | | -> Memoize (price=4.40..4.42 rows=1 width=8) | | Cache Key: t.j | | -> Combination (price=4.39..4.40 rows=1 width=8) | | -> Index Solely Scan utilizing uj on u (price=0.29..4.38 rows=5 width=0)| | Index Cond: (j = t.j) |
So, we will once more cache the computation of the COUNT(*)
worth for every of the 5 distinct t.j
enter values, relatively than re-calculating this each time. Certainly, this should be even higher than earlier than?
Benchmark time!
DO $$
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT INT := 25;
rec RECORD;
BEGIN
-- Repeat the entire benchmark a number of instances to keep away from warmup penalty
FOR r IN 1..5 LOOP
v_ts := clock_timestamp();
SET enable_memoize = OFF;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT depend(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Assertion 1: %', r, (clock_timestamp() - v_ts);
v_ts := clock_timestamp();
SET enable_memoize = ON;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT *
FROM
t,
LATERAL (
SELECT depend(*)
FROM u
WHERE t.j = u.j
) AS u(j)
) LOOP
NULL;
END LOOP;
END LOOP;
RAISE INFO 'Run %, Assertion 2: %', r, (clock_timestamp() - v_ts);
RAISE INFO '';
END LOOP;
END$$;
And this time, we will see a major speedup!
Run 1, Assertion 1: 00:00:03.419728 Run 1, Assertion 2: 00:00:01.083941 Run 2, Assertion 1: 00:00:03.404954 Run 2, Assertion 2: 00:00:01.098404 Run 3, Assertion 1: 00:00:03.425725 Run 3, Assertion 2: 00:00:01.093883 Run 4, Assertion 1: 00:00:03.441691 Run 4, Assertion 2: 00:00:01.127837 Run 5, Assertion 1: 00:00:03.420172 Run 5, Assertion 2: 00:00:01.097943
That’s nice information! Wait, does this work additionally for extraordinary correlated subqueries? As a result of the above LATERAL
correlated subquery could possibly be rewritten as:
SELECT
t.*,
(
SELECT depend(*)
FROM u
WHERE t.j = u.j
) j
FROM t;
Regrettably, the plan doesn’t present memoization:
|QUERY PLAN | |-----------------------------------------------------------------------------| |Seq Scan on t (price=0.00..441693.00 rows=100000 width=16) | | SubPlan 1 | | -> Combination (price=4.39..4.40 rows=1 width=8) | | -> Index Solely Scan utilizing uj on u (price=0.29..4.38 rows=5 width=0)| | Index Cond: (j = t.j) |
And the benchmark (you may paste the question into the benchmark logic your self to breed) confirms there’s no memoization impact
Run 1, Assertion 1: 00:00:03.617562 Run 1, Assertion 2: 00:00:03.605765 Run 2, Assertion 1: 00:00:03.610084 Run 2, Assertion 2: 00:00:03.682064 Run 3, Assertion 1: 00:00:03.725952 Run 3, Assertion 2: 00:00:03.705622 Run 4, Assertion 1: 00:00:03.672669 Run 4, Assertion 2: 00:00:03.644612 Run 5, Assertion 1: 00:00:03.645741 Run 5, Assertion 2: 00:00:03.642717
Plainly with this new function, correlated subqueries could possibly be rewritten to nested loop outer joins sooner or later? Different optimisers already do that, and we’d have successfully the identical function right here as Oracle’s scalar subquery caching.
Conclusion
The function is turned on in PostgreSQL 14. Other than some further reminiscence consumption, which may be a small drawback if the optimiser is improper and statistics are off, I don’t see any downside of this new function. SQL is a side-effect free (in principle) 4GL, which means the optimiser can exchange any computation by a cached worth that relies upon solely on the computation’s enter values.
A correlated subquery is a operate, whose enter parameters are the predicates and different references to the outer question’s columns. As such, the results of a correlated subquery could be cached, or memoized. As proven above, this has drastic results in your current day SQL queries, from whcih you may revenue simply by upgrading to PostgreSQL 14.