It’s been virtually 1 12 months now since jOOQ 3.14 was launched in October 19, 2020 with SQL/JSON (and SQL/XML) assist. Half a 12 months later, we’ve launched jOOQ 3.15 with MULTISET
assist, which builds on prime of those options to supply type-safe nested collections, the best way each ORDBMS ought to implement them.
Constructing (dogfooding) on prime of our personal SQL/JSON API has revealed a whole lot of caveats of the assorted SQL/JSON implementations throughout distributors, and to be frank, it’s been a little bit of a sobering expertise. Regardless of there now being the ISO/IEC TR 19075:6 normal (largely pushed by Oracle this time), many distributors have already carried out some sort of JSON assist, and it seems in a different way in all dialects – to the extent the place writing vendor agnostic SQL/JSON is nearly not possible with hand written native SQL. You’ll want an API like jOOQ or every other abstraction to standardise the totally different dialects.
On this article, I’d wish to share just a few of the most important caveats encountered over the previous 12 months. There are lots of extra, simply attempt to translate some normal SQL/JSON to numerous dialects with jOOQ.
JSON Sort or String?
JSON paperwork might be seen as easy strings with formatted content material. Certain, I imply you’ll be able to parse a string right into a JSON doc, so why complicate the SQL sort system with new sorts?
Sadly, this proves to be a poor choice by most distributors. PostgreSQL obtained this proper by providing the JSON
and JSONB
sorts, the latter being fairly performant even when storing and indexing JSON paperwork.
First off, and this involves no shock to anybody working with virtually every other language than SQL: Varieties are semantic. Stringly typed environments are unhealthy. Or because the quote goes:
The Bitterness of Poor High quality Stays Lengthy After the Sweetness of Low Worth is Forgotten — Benjamin Franklin
With “Low Worth” being the low worth of the fast and soiled stringly-typed function addition with no formal sort. We already had this again and again with BOOLEAN
sorts, and dialects like MySQL that fake that supporting this can be a good thought:
SELECT *
FROM t
WHERE 1 OR 0 -- TRUE OR FALSE
Let’s take a look at examples of what I imply.
MariaDB and MySQL
Let’s take a look at MariaDB’s and MySQL’s syntax first. Right here’s create a JSON array:
Producing
[1, 2]
That’s nice! And it even conforms to plain SQL syntax. A JSON array may also be nested simply:
choose json_array(json_array(1, 2), 3)
To provide:
[[1, 2], 3]
Now, what if that nested array originates from a derived desk?
choose json_array(nested, 3)
from (choose json_array(1, 2) nested) as t
It ends in:
-- MySQL, MariaDB 10.6 [[1, 2], 3] -- MariaDB 10.5 ["[1, 2]", 3]
Bummer. In MariaDB 10.5, the nested array misplaced its “JSON sort annotation” and went again to the stringly typed model of itself. As a string, it must be quoted. This appears to be a bug https://jira.mariadb.org/browse/MDEV-26134 which apparently has been fastened already beneath a distinct subject in MariaDB 10.6. It’s not the one one, although. The bug tracker is stuffed with related points: https://jira.mariadb.org/browse/MDEV-13701.
MySQL appears to fare a bit higher by now, although there are caveats when aggregating (see additional down).
The one workaround I’ve discovered to this point for the above bugs is extraordinarily laborious:
choose json_array(json_merge_preserve('[]', nested), 3)
from (choose json_array(1, 2) nested) as t
Think about doing that each time you nest JSON. It really works, but it surely actually must be automated (e.g. by means of jOOQ).
Oracle
Oracle outlined a lot of the SQL normal, and I actually like how SQL-idiomatic its syntax feels, identical to SQL/XML. Sadly, they waited with the introduction of the brand new JSON sort till Oracle 21c (can’t wait to play with it). As such, we’ve to decide on whether or not to characterize JSON as VARCHAR2
, by default (restricted to 4000 bytes on some techniques or 32kb at most!) or CLOB
. For those who’re doing severe JSON, you’ll be able to most likely all the time add the RETURNING CLOB
clause to each single JSON operate name of yours. Successfully:
choose json_array(1, 2 returning clob)
from twin
The nested model seems like this:
choose json_array(nested, 3 returning clob)
from (choose json_array(1, 2 returning clob) nested from twin) t;
This works a lot better in Oracle, however there are nonetheless many edge circumstances that it doesn’t get proper. Do that on Oracle 18c:
choose json_arrayagg(coalesce(json_object(), json_object()))
from twin
And it produces one other occasion of erroneously stringly-typed JSON:
["{}"]
The workaround is so as to add FORMAT JSON
nearly all over the place, simply to make certain, e.g.
choose json_arrayagg(
coalesce(json_object(), json_object()) format json
)
from twin
Now, the result’s as anticipated:
[{}]
To play it secure, you’ll most likely have to put in writing FORMAT JSON
nearly all over the place once more, like RETURNING CLOB
Truncation
Truncation might be probably the most sobering half when utilizing JSON. Why will we nonetheless have size-limited information sorts in SQL? It hardly each is sensible from an utility perspective. But, right here we’re.
Oracle
Run this question in Oracle:
choose json_arrayagg(proprietor) from all_objects;
And also you’ll get:
SQL Error [40478] [99999]: ORA-40478: output worth too giant (most: 4000)
Growing the utmost VARCHAR2
measurement to 32kb will solely postpone the issue. There isn’t any “cheap” measurement restrict to such paperwork, so once more, you’ll should RETURNING CLOB
on a regular basis.
choose json_arrayagg(proprietor returning clob) from all_objects;
The worth is similar as all the time. CLOB
are only a bit extra annoying to work with than VARCHAR2
, each inside Oracle in addition to within the consumer utility (e.g. JDBC primarily based), as a result of you must work with one more useful resource while you simply needed a string. jOOQ will simply add the clause all over the place for you, there’s hardly a motive not to try this with jOOQ, because the fetching of CLOB
values is totally clear to jOOQ customers.
MySQL
Up till not too long ago, you couldn’t name JSON_ARRAYAGG()
in MySQL, and the MariaDB model crashed the server (https://jira.mariadb.org/browse/MDEV-21912). When this weblog was written, neither implementation supported the ORDER BY
clause in JSON_ARRAYAGG()
, which I discover fairly important, so the workaround is to make use of GROUP_CONCAT
:
choose concat('[', group_concat(id), ']')
from t_book
In fact, that could be very incorrect if the concatenated values should not numbers, so we have to additionally use JSON_QUOTE
, e.g.
choose concat('[', group_concat(json_quote(title)), ']')
from t_book
And, in the event you’re embedding that stuff in different JSON buildings, you must flip what actually is a string now, again into JSON utilizing JSON_MERGE
(till not too long ago, however now deprecated), or JSON_MERGE_PRESERVE
, e.g.
choose json_object(
'titles',
json_merge_preserve(
'[]',
concat('[', group_concat(json_quote(title)), ']')
)
)
from t_book
To provide a doc like this:
{"titles": ["1984", "Animal Farm", "O Alquimista", "Brida"]}
With out that JSON_MERGE_PRESERVE
, you’d be getting:
{"titles": "["1984","Animal Farm","O Alquimista","Brida"]"}
Positively not one thing you’ll be able to ever bear in mind.
Anyway. This part was about truncation! What occurs with giant, aggregated JSON paperwork in MySQL? Do that:
choose
concat('[', group_concat(json_quote(table_name)), ']')
from information_schema.tables
It produces (on my machine):
["innodb_table_stats","innodb_index_stats","CHARACTER_SETS","CHECK_CONSTRAINTS","COLLATIONS","COLLATION_CHARACTER_SET_APPLICABILITY","COLUMNS","COLUMNS_EXTENSIONS","COLUMN_STATISTICS","EVENTS","FILES","INNODB_DATAFILES","INNODB_FOREIGN","INNODB_FOREIGN_COLS","INNODB_FIELDS","INNODB_TABLESPACES_BRIEF","KEY_COLUMN_USAGE","KEYWORDS","PARAMETERS","PARTITIONS","REFERENTIAL_CONSTRAINTS","RESOURCE_GROUPS","ROUTINES","SCHEMATA","SCHEMATA_EXTENSIONS","ST_SPATIAL_REFERENCE_SYSTEMS","ST_UNITS_OF_MEASURE","ST_GEOMETRY_COLUMNS","STATISTICS","TABLE_CONSTRAINTS","TABLE_CONSTRAINTS_EXTENSIONS","TABLES","TABLES_EXTENSIONS","TABLESPACES_EXTENSIONS","TRIGGERS","VIEW_ROUTINE_USAGE","VIEW_TABLE_USAGE","VIEWS","COLUMN_PRIVILEGES","ENGINES","OPTIMIZER_TRACE","PLUGINS","PROCESSLIST","PROFILING","SCHEMA_PRIVILEGES","TABLESPACES","TABLE_PRIVILEGES","USER_PRIVILEGES","cond_instances","error_log","events_waits_current","events_waits_history","events_waits_history_long","events_waits_summary_by_host_by_event_name","events_waits_summary_by]
Wait, what’s that on the finish
,"events_waits_summary_by]
Invalid JSON. As a result of the GROUP_CONCAT
string obtained truncated. We might set the next
set @@group_concat_max_len = 4294967295;
And now the output is right, and for much longer:
["innodb_table_stats",...,"t_identity_pk","t_triggers"]
An API like jOOQ will have the ability to set this session variable robotically for you, however you most likely don’t need to take into consideration this on a regular basis for native SQL?
Information Sort Assist
JSON is aware of just a few information sorts. Specifically:
- string
- quantity
- boolean
- null
- object
- array
That’s lower than SQL, however typically adequate (something might be encoded as a string, in spite of everything). However while you don’t have BOOLEAN
sorts in SQL (e.g. MySQL, Oracle), then you must manually encode your already handbook encoding to JSON BOOLEAN
.
MySQL
MySQL makes you imagine that this works:
choose json_array(true, false)
As a result of the above produces
[true, false]
But it surely doesn’t actually work. That appears to be arduous coded within the parser. As quickly as your true
and false
values are expressions as an alternative of literals, e.g. originating from a derived desk:
choose json_array(t, f)
from (choose true as t, false as f) as t
You’ll get:
[1, 0]
There are other ways to emulate this. One being:
choose json_array(
json_extract(case when t = 1 then 'true' when t = 0 then 'false' finish, '$'),
json_extract(case when f = 1 then 'true' when f = 0 then 'false' finish, '$')
)
from (choose true as t, false as f) as t
Now, we’re getting once more:
[true, false]
Oracle
In contrast to MySQL / MariaDB, Oracle SQL doesn’t fake it has a BOOLEAN
sort. As an alternative, folks encode it as a NUMBER(1)
or CHAR(1)
or another factor. No matter the encoding, that is the answer:
choose json_array(
case when t = 1 then 'true' when t = 0 then 'false' finish format json,
case when f = 1 then 'true' when f = 0 then 'false' finish format json
)
from (
choose 1 as t, 0 as f, null as n
from twin
) t
Producing
[true,false]
NULL dealing with
When utilizing NULL
with SQL/JSON, there are numerous caveats. First off, SQL NULL
is just not the identical factor as JSON NULL
. Utilizing PostgreSQL:
choose
a,
b,
a = b as equal,
a is null as a_is_null,
b is null as b_is_null
from (
choose null::jsonb as a, 'null'::jsonb as b
) as t
Produces:
|a |b |equal|a_is_null|b_is_null| |---|----|-----|---------|---------| | |null| |true |false |
The place the empty cells are SQL NULL
values, and the null
worth is a JSON null
worth that isn’t “SQL NULL
“. It’s the one cheap factor to do, actually. Do dialects agree?
MySQL
choose
a,
b,
a = b as equal,
a is null as a_is_null,
b is null as b_is_null
from (
choose null as a, json_extract('null', '$') as b
) as t
Producing additionally:
|a |b |equal|a_is_null|b_is_null| |---|----|-----|---------|---------| | |null| |1 |0 |
So, sure!
Oracle
Let’s see, this may very well be how I can create a JSON NULL
worth:
choose
a,
b,
case when a = b then 1 else 0 finish as equal,
case when a is null then 1 else 0 finish as a_is_null,
case when b is null then 1 else 0 finish as b_is_null
from (
choose null as a, json_value('[null]', '$[0]') as b
from twin
) t
But, Oracle is thought to have a cool relationship with NULL
strings. That is the consequence:
|A |B |EQUAL|A_IS_NULL|B_IS_NULL| |---|---|-----|---------|---------| | | |0 |1 |1 |
There doesn’t appear to be an precise JSON NULL
illustration! I couldn’t discover a workaround for this but. In all probability I’ll. However that is very unlucky, resulting in many conversion edge circumstances.
Then again, Oracle is the one dialect among the many ones offered on this article that launched a really helpful NULL
dealing with clause for mixture features. Test this out:
choose
json_arrayagg(a),
json_arrayagg(a absent on null),
json_arrayagg(a null on null)
from (
choose 1 as a from twin union all
choose null from twin
) t
Producing
|A |B |C | |---|---|--------| |[1]|[1]|[2,null]|
Word the above question doesn’t produce the proper consequence on Oracle 18c but, due to a parser / optimiser bug. Use this to work across the bug for this instance:
choose
json_arrayagg(a) a,
json_arrayagg(a + 0 absent on null) b,
json_arrayagg(a - 0 null on null) c
from (
choose 1 as a from twin union all
choose null from twin
) t
Different dialects have totally different opinions on mixture NULL
values to JSON paperwork. In SQL, mixture features are likely to ignore NULL
values, reminiscent of Oracle above, however with JSON, it’s typically important to incorporate the worth, particularly when making a JSON_OBJECT
, the place an absent key is just not strictly the identical factor as an absent worth.
Db2
Db2 has a really restricted implementation of SQL/JSON. It is vitally requirements compliant by way of syntax, however suffers from extreme bugs like these:
Which make it fairly unusable for now. Absolutely, this may enhance within the close to future.
SQL Server
I’ve additionally neglected SQL Server on this article. SQL Server had JSON and XML assist for some time, however carried out it utterly in a different way. You possibly can’t simply type arbitrary JSON objects or arrays simply, however you’ll be able to rework consequence units into JSON in an intuitive manner.
This provides a fast win when streaming outcomes, however doesn’t compose very effectively. E.g. you’ll be able to’t create a JSON_ARRAY
with scalar, non-object content material, although a JSON_OBJECT
might be created like this:
choose 1 as a, 2 as b
for json path, without_array_wrapper
Producing
{"a":1,"b":2}
A restricted variety of JSON options might be emulated additionally in SQL Server, and as with all of the dialects that don’t assist an precise JSON sort, fixed escaping / unescaping could also be required.
Conclusion
SQL/JSON was standardised comparatively late, largely by Oracle. The usual could be very sound. However regrettably, a whole lot of dialects disagree on syntax, behaviour, and a few are nonetheless fairly buggy.
The soundest of all implementations is that of PostgreSQL, which launched correct information sorts and provides a wealthy set of vendor-specific features to control JSON instantly in SQL. Within the close to future, PostgreSQL will embrace normal SQL/JSON, identical to Oracle, and improve its personal implementation. I don’t assume there might be something basically new, simply extra transportable normal syntax.
See this speak for a preview of “what’s subsequent”:
There may be nothing insurmountable for a library like jOOQ, which abstracts over all of the dialects and provides the SQL/JSON (and SQL/XML) performance utilizing a uniform, standards-inspired API. As dialects enhance, jOOQ will undertake new, higher syntax by particular person distributors, so you’ll be able to implement your SQL/JSON queries forwards compatibly.
So, as all the time, use jOOQ to work across the delicate and boring variations of the assorted vendor syntaxes and begin utilizing SQL/JSON already at this time. Regardless of the caveats offered right here, SQL/JSON is among the most enjoyable issues about trendy SQL!
Learn extra: