Friday, June 3, 2022
HomeProgrammingPostgreSQL 14’s enable_memoize For Improved Efficiency of Nested Loop Joins – Java,...

PostgreSQL 14’s enable_memoize For Improved Efficiency of Nested Loop Joins – Java, SQL and jOOQ.


I’ve not too long ago found a pleasing new addition to PostgreSQL 14, the brand new enable_memoize flag that improves the efficiency of some nested loop joins the place statistics trace at this being acceptable. I imply, who can resist this temptation:

Enhancing question velocity by 1000x hints at one thing very suboptimal having been happening earlier than, and a device like memoization could be of nice assist. However will it additionally assist with an “extraordinary” be part of? I wished to attempt myself.

What’s memoization?

In an ideal world freed from unwanted side effects (and SQL is such an ideal world, in principle), memoization implies that we will substitute y for f(x) in any computation, provided that y = f(x). For instance, irrespective of what number of instances you calculate UPPER('x'), you’ll at all times get 'X'. If the calculation of such a operate is dear, and there are solely few doable enter values, then why not simply preserve a hash map that maps all earlier enter values and use that to search for recognized (or at the very least frequent) values as a substitute of computing them once more?

As I’ve proven beforehand on this weblog, Oracle 11g has launched a function known as scalar subquery caching, a function which you’ll be able to activate in jOOQ to keep away from pricey PL/SQL context switches.

Within the case of PostgreSQL’s enable_memoize, this may be significantly helpful for nested loop joins in SQL, and to reference the above tweet, a lateral be part of is usually executed by way of a nested loop be part of.

Turning the function on and off

I’ve created a schema like this:

CREATE TABLE t AS
SELECT i, i % 5 AS j 
FROM generate_series(1, 100000) AS t(i);

CREATE TABLE u AS
SELECT i, i % 20000 as j 
FROM generate_series(1, 100000) AS t(i);

CREATE INDEX uj ON u(j);

In abstract:

  • Each tables t and u have 100000 rows.
  • t.j has solely 5 distinct values, every worth seems 20000 instances.
  • u.j has 20000 distinct values, every worth seems 5 instances.

When operating this on PostgreSQL 14:

SELECT current_setting('enable_memoize');

I get:

|current_setting|
|---------------|
|on             |

So, the function is lively, which I may also see in an EXPLAIN of the next question:

EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;

The plan is:

|QUERY PLAN                                                            |
|----------------------------------------------------------------------|
|Nested Loop  (price=0.30..8945.41 rows=496032 width=16)                |
|  ->  Seq Scan on t  (price=0.00..1443.00 rows=100000 width=8)         |
|  ->  Memoize  (price=0.30..0.41 rows=5 width=8)                       | 
|        Cache Key: t.j                                                |
|        ->  Index Scan utilizing uj on u  (price=0.29..0.40 rows=5 width=8)|
|              Index Cond: (j = t.j)                                   |

With out memoization, when becoming a member of the 2 tables like that, then, for the 100000 rows in t, I’ve to search for 100000x the 5 matching rows in u. But when memoization kicks in, then I should carry out the lookup solely 5 instances, as a result of there are solely 5 distinct values of t.j

We will mess around with execution plans by turning the function on or off:

SET enable_memoize = ON;
SET enable_memoize = OFF;

When turned off, PostgreSQL appears to decide on a hash be part of or merge be part of as a substitute, on my machine (between a number of executions, the plan may swap)

|QUERY PLAN                                                         |
|-------------------------------------------------------------------|
|Hash Be a part of  (price=3084.00..11568.51 rows=499351 width=16)           |
|  Hash Cond: (t.j = u.j)                                           |
|  ->  Seq Scan on t  (price=0.00..1443.00 rows=100000 width=8)      |
|  ->  Hash  (price=1443.00..1443.00 rows=100000 width=8)            |
|        ->  Seq Scan on u  (price=0.00..1443.00 rows=100000 width=8)|


|QUERY PLAN                                                              |
|------------------------------------------------------------------------|
|Merge Be a part of  (price=9748.11..763846.11 rows=50000000 width=16)            |
|  Merge Cond: (u.j = t.j)                                               |
|  ->  Index Scan utilizing uj on u  (price=0.29..3848.29 rows=100000 width=8)|
|  ->  Type  (price=9747.82..9997.82 rows=100000 width=8)                 |
|        Type Key: t.j                                                   |
|        ->  Seq Scan on t  (price=0.00..1443.00 rows=100000 width=8)     |

Let’s Benchmark

We’re utilizing the standard benchmark approach described right here:

  • We repeat an operation 25x in mode A and mode B and evaluate (or greater than 25, if it’s a quick operation)
  • We repeat the above 5x to mitigate any warmup and different caching results

You may run the next benchmark on the above schema your self, to confirm:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the entire benchmark a number of instances to keep away from warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Assertion 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Assertion 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

On my machine, the outcomes are constant, respectable, not too spectacular, however nonetheless important:

Run 1, Assertion 1: 00:00:03.763426
Run 1, Assertion 2: 00:00:03.401346

Run 2, Assertion 1: 00:00:03.769419
Run 2, Assertion 2: 00:00:03.375677

Run 3, Assertion 1: 00:00:03.771465
Run 3, Assertion 2: 00:00:03.374413

Run 4, Assertion 1: 00:00:03.769136
Run 4, Assertion 2: 00:00:03.398734

Run 5, Assertion 1: 00:00:03.772544
Run 5, Assertion 2: 00:00:03.375272

I.e. a ten% speedup. Throughout the entire system, that alone would already be price it.

Optimising LATERAL

Let’s attempt optimising LATERAL as a substitute. We may run a question like this:

SELECT *
FROM 
  t, 
  LATERAL (
    SELECT depend(*) 
    FROM u 
    WHERE t.j = u.j
  ) AS u(j)

The EXPLAIN of the above is

|QUERY PLAN                                                                       |
|---------------------------------------------------------------------------------|
|Nested Loop  (price=4.40..3969.47 rows=100000 width=16)                           |
|  ->  Seq Scan on t  (price=0.00..1443.00 rows=100000 width=8)                    |
|  ->  Memoize  (price=4.40..4.42 rows=1 width=8)                                  |
|        Cache Key: t.j                                                           |
|        ->  Combination  (price=4.39..4.40 rows=1 width=8)                          |
|              ->  Index Solely Scan utilizing uj on u  (price=0.29..4.38 rows=5 width=0)|
|                    Index Cond: (j = t.j)                                        |

So, we will once more cache the computation of the COUNT(*) worth for every of the 5 distinct t.j enter values, relatively than re-calculating this each time. Certainly, this should be even higher than earlier than?

Benchmark time!

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the entire benchmark a number of instances to keep away from warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT depend(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Assertion 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT depend(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Assertion 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

And this time, we will see a major speedup!

Run 1, Assertion 1: 00:00:03.419728
Run 1, Assertion 2: 00:00:01.083941

Run 2, Assertion 1: 00:00:03.404954
Run 2, Assertion 2: 00:00:01.098404

Run 3, Assertion 1: 00:00:03.425725
Run 3, Assertion 2: 00:00:01.093883

Run 4, Assertion 1: 00:00:03.441691
Run 4, Assertion 2: 00:00:01.127837

Run 5, Assertion 1: 00:00:03.420172
Run 5, Assertion 2: 00:00:01.097943

That’s nice information! Wait, does this work additionally for extraordinary correlated subqueries? As a result of the above LATERAL correlated subquery could possibly be rewritten as:

SELECT 
  t.*, 
  (
    SELECT depend(*)
    FROM u
    WHERE t.j = u.j
  ) j
FROM t;

Regrettably, the plan doesn’t present memoization:

|QUERY PLAN                                                                   |
|-----------------------------------------------------------------------------|
|Seq Scan on t  (price=0.00..441693.00 rows=100000 width=16)                   |
|  SubPlan 1                                                                  |
|    ->  Combination  (price=4.39..4.40 rows=1 width=8)                          |
|          ->  Index Solely Scan utilizing uj on u  (price=0.29..4.38 rows=5 width=0)|
|                Index Cond: (j = t.j)                                        |

And the benchmark (you may paste the question into the benchmark logic your self to breed) confirms there’s no memoization impact

Run 1, Assertion 1: 00:00:03.617562
Run 1, Assertion 2: 00:00:03.605765

Run 2, Assertion 1: 00:00:03.610084
Run 2, Assertion 2: 00:00:03.682064

Run 3, Assertion 1: 00:00:03.725952
Run 3, Assertion 2: 00:00:03.705622

Run 4, Assertion 1: 00:00:03.672669
Run 4, Assertion 2: 00:00:03.644612

Run 5, Assertion 1: 00:00:03.645741
Run 5, Assertion 2: 00:00:03.642717

Plainly with this new function, correlated subqueries could possibly be rewritten to nested loop outer joins sooner or later? Different optimisers already do that, and we’d have successfully the identical function right here as Oracle’s scalar subquery caching.

Conclusion

The function is turned on in PostgreSQL 14. Other than some further reminiscence consumption, which may be a small drawback if the optimiser is improper and statistics are off, I don’t see any downside of this new function. SQL is a side-effect free (in principle) 4GL, which means the optimiser can exchange any computation by a cached worth that relies upon solely on the computation’s enter values.

A correlated subquery is a operate, whose enter parameters are the predicates and different references to the outer question’s columns. As such, the results of a correlated subquery could be cached, or memoized. As proven above, this has drastic results in your current day SQL queries, from whcih you may revenue simply by upgrading to PostgreSQL 14.



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments