Wednesday, July 3, 2024
HomeProgrammingUtilizing JDK Collectors to De-duplicate father or mother/baby nested collections – Java,...

Utilizing JDK Collectors to De-duplicate father or mother/baby nested collections – Java, SQL and jOOQ.


In basic SQL (i.e. earlier than jOOQ’s superior MULTISET operator), nested collections have been fetched utilizing strange (outer) joins. An instance of such a question can be a question operating towards the sakila database to fetch actors and their movies. Utilizing jOOQ:

Consequence<?> outcome =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .fetch();

The outcome from the jOOQ debug log would look one thing like this:

+--------+----------+---------+-------+---------------------+
|actor_id|first_name|last_name|film_id|title                |
+--------+----------+---------+-------+---------------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR     |
|       1|PENELOPE  |GUINESS  |     23|ANACONDA CONFESSIONS |
|       1|PENELOPE  |GUINESS  |     25|ANGELS LIFE          |
|       1|PENELOPE  |GUINESS  |    106|BULWORTH COMMANDMENTS|
|       1|PENELOPE  |GUINESS  |    140|CHEAPER CLYDE        |
+--------+----------+---------+-------+---------------------+

As anticipated for a SQL be part of operation, this denormalises the information, resulting in duplicate entries for every actor, or for those who type otherwise, e.g. by FILM_ID, you then’d additionally see the duplicate entries per movie:

+--------+----------+---------+-------+----------------+
|actor_id|first_name|last_name|film_id|title           |
+--------+----------+---------+-------+----------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR|
|      10|CHRISTIAN |GABLE    |      1|ACADEMY DINOSAUR|
|      20|LUCILLE   |TRACY    |      1|ACADEMY DINOSAUR|
|      30|SANDRA    |PECK     |      1|ACADEMY DINOSAUR|
|      40|JOHNNY    |CAGE     |      1|ACADEMY DINOSAUR|
+--------+----------+---------+-------+----------------+

That is simply how a be part of operation works. It creates a cartesian product after which filters by major key / international key matches, which JPA customers know underneath the title MultipleBagFetchException.

De-duplicating and nesting the collections with fetchGroups()

What we often need is a few type of nested knowledge construction, e.g. fetch the movies per actor. A easy utility in jOOQ is to simply use fetchGroups():

Map<ActorRecord, Consequence<FilmRecord>> outcome =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .fetchGroups(ACTOR, FILM);

That is tremendous handy, it’s order preserving, however it has a flaw. It’s not “sensible” sufficient to recollect the LEFT JOIN semantics and do The Proper Factor, which is produce an empty checklist of FilmRecord in case an actor doesn’t have any movies. If that’s the case, then there’s a NULL movie within the outcome set in SQL:

+--------+----------+---------+-------+---------------------+
|actor_id|first_name|last_name|film_id|title                |
+--------+----------+---------+-------+---------------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR     |
|       1|PENELOPE  |GUINESS  |     23|ANACONDA CONFESSIONS |
|       1|PENELOPE  |GUINESS  |     25|ANGELS LIFE          |
|       1|PENELOPE  |GUINESS  |    106|BULWORTH COMMANDMENTS|
|       1|PENELOPE  |GUINESS  |    140|CHEAPER CLYDE        |
|     ...|...       |...      |    ...|...                  |
|     201|UNKNOWN   |ACTOR    | {null}|{null}               |
+--------+----------+---------+-------+---------------------+

So, we received’t get an empty checklist, however an inventory containing an empty FilmRecord, identical to once you GROUP BY actor and COUNT(*) the variety of movies:

var r =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        rely(),
        rely(FILM.FILM_ID))
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .groupBy(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME)
   .orderBy(
       ACTOR.ACTOR_ID)
   .fetch();

The results of this question would possibly appear to be this:

+--------+----------+------------+-----+-----+
|actor_id|first_name|last_name   |rely|rely|
+--------+----------+------------+-----+-----+
|       1|PENELOPE  |GUINESS     |   19|   19|
|       2|NICK      |WAHLBERG    |   25|   25|
|       3|ED        |CHASE       |   22|   22|
|       4|JENNIFER  |DAVIS       |   22|   22|
|       5|JOHNNY    |LOLLOBRIGIDA|   29|   29|
|     ...|...       |...         |  ...|  ...|
|     201|UNKNOWN   |ACTOR       |    1|    0|
+--------+----------+------------+-----+-----+

Observe how the specified rely worth of 0 can solely be achieved once we go the nullable FILM.FILM_ID column as an argument. So, what can be the equal fetchGroups() name that exposes this behaviour?

Deduplicating and nesting the collections with JDK Collectors

A really a lot underestimated JDK function are Collectors. Whereas they have been launched particularly for utilization with the Stream API, they may very well be used with any sort of Iterable, in precept, and I’m nonetheless hoping {that a} future JDK will supply Iterable.acquire(), amongst different issues.

With jOOQ, you may acquire the outcomes of any question by calling ResultQuery.acquire(). To translate the above fetchGroups() instance, we will write this, producing virtually the identical outcome:

Map<ActorRecord, Listing<FilmRecord>> outcome =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .acquire(groupingBy(
       r -> r.into(ACTOR), filtering(
           r -> r.get(FILM.FILM_ID) != null, mapping(
               r -> r.into(FILM), toList()
           )
       )
   ));

The above acquire() name nests these operations:

  • It teams by actor (identical to fetchGroups())
  • It filters group contents by these movies whose ID isn’t NULL (this will’t be achieved with fetchGroups()).
  • It maps the group contents to comprise solely FILM content material, not all the projection.

So, clearly extra verbose, but additionally way more poweful and simple to introduce customized shopper facet aggregation behaviour, in case you may’t transfer the aggregation logic to your SQL assertion.

Extra on highly effective collectors on this article right here:

Nesting collections instantly in SQL

No article on this weblog can be full with out plugging the superior MULTISET various to nest collections. In spite of everything, the above deduplication algorithm solely actually works for those who’re becoming a member of down a single parent-child relationship path, and it’s fairly inefficient when there are a whole lot of duplicate knowledge units.

Assuming these auxiliary knowledge varieties:

file Movie(String title) {}
file Actor(String firstName, String lastName) {}
file Class(String title) {}

You possibly can write a question like this:

// We're importing the brand new Information::mapping methodology for comfort
import static org.jooq.Information.mapping;

Consequence<Record3<Movie, Listing<Actor>, Listing<Class>>> outcome = ctx
    .choose(
        FILM.TITLE.convertFrom(Movie::new),
        multiset(
            choose(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(Actor::new))),
        multiset(
            choose(FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(Class::new)))
    )
    .from(FILM)
    .orderBy(FILM.TITLE)
    .fetch();

Discover that we’re now getting a outcome set containing

  • All of the movies
  • The actors per movie as a nested assortment
  • The classes per movie as one other nested assortment

The instance is utilizing implicit joins to keep away from a few of the extra verbose be part of syntaxes within the MULTISET expressions, however that’s not strictly related for this instance.

The outcome being one thing like:

+----------------------------+--------------------------------------------------+----------------------------+
|title                       |multiset                                          |multiset                    |
+----------------------------+--------------------------------------------------+----------------------------+
|Movie[title=ACADEMY DINOSAUR]|[Actor[firstName=PENELOPE, lastName=GUINESS], A...|[Category[name=Documentary]]|
|Movie[title=ACE GOLDFINGER]  |[Actor[firstName=BOB, lastName=FAWCETT], Actor[...|[Category[name=Horror]]     |
|Movie[title=ADAPTATION HOLES]|[Actor[firstName=NICK, lastName=WAHLBERG], Acto...|[Category[name=Documentary]]|
|Movie[title=AFFAIR PREJUDICE]|[Actor[firstName=JODIE, lastName=DEGENERES], Ac...|[Category[name=Horror]]     |
|Movie[title=AFRICAN EGG]     |[Actor[firstName=GARY, lastName=PHOENIX], Actor...|[Category[name=Family]]     |
+----------------------------+--------------------------------------------------+----------------------------+

Conclusion

There are a lot of ways in which result in Rome. Basic SQL based mostly approaches to nest collections used some type of deduplication method within the shopper. In jOOQ, you possibly can all the time do that with fetchGroups(), and since not too long ago additionally with acquire() instantly (acquire() was all the time accessible by way of an middleman stream() name).

For really highly effective nesting of collections, nonetheless, our advice is all the time to maneuver your logic into SQL instantly, utilizing native array performance, or utilizing MULTISET

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments