Metrics and common semantic layers allow semantic-free BI
A semantic layer is a business-friendly illustration of knowledge, permitting for rationalization of complicated enterprise logic in easier phrases. In Enterprise Intelligence (BI), it has been referred to as the metadata layer, semantic mannequin, enterprise view, or BI mannequin.
When the semantic layer was first launched to BI instruments ~30 years in the past, it outlined desk joins, metric aggregation, user-friendly names and extra, permitting BI end-users to easily drag-and-drop fields like Product Identify and Gross sales onto a report. Wham, there’s your knowledge! Sure, “no-code” BI has been round for at the very least 30 years. This allowed early knowledge groups to start out pondering extra strategically about the place to place enterprise logic, but in addition opened up plenty of complicated points. As a BI advisor for 20+ years and founding father of FlexIt Analytics, these points have been on my thoughts for a very long time.
For a very simplified rationalization, enterprise logic resides in one in every of three locations within the “BI Layers”:
In a super world, you’re placing as a lot enterprise logic as attainable on the lowest degree, within the knowledge warehouse by means of transformations (the “T” in ETL/ELT). This reduces duplication (DRY — don’t repeat your self), permits for a “single supply of reality”, reduces vendor lock-in, and simplifies consumption at greater ranges. Nevertheless, this isn’t at all times possible.
Information warehouse growth will be sluggish. If a enterprise definition must be modified, the enterprise consumer might not have entry to the info engineering group, or they might be backlogged. So that they attain out to the BI group and have them put the logic into the BI semantic layer. What if the BI group can be backlogged, or not prepared to make adjustments? Then, the enterprise consumer places the logic within the report. What occurs in the event that they don’t have “creator” entry to the BI device? Then they run the report, export to Excel, and put their enterprise logic there (that is one other article, or a thousand articles).
We all know that stuffing enterprise logic in stories will get messy in a short time. However we additionally know that there are limits to the quantity of enterprise logic you could apply to the info warehouse. For instance, you can not outline complicated joins very properly (dimensional modeling vs de-normalized “one-big-table” is one other article). Additionally, you usually can not assign column attributes like label
, format
, aggregation
, description
. Due to this fact, the pure place to place a majority of what you are promoting logic grew to become the BI device semantic layer. How a lot enterprise logic is put within the semantic layer then determines whether it is skinny (little or no) or thick (lots).
BI device semantic layers began out medium thick, permitting you to outline plenty of complicated enterprise logic, however they have been additionally considerably restricted of their capabilities. Early instruments (Enterprise Objects, Cognos) definitely fell into this class, and even successors like Tableau obtained into the sport with the flexibility so as to add joins up to now few years. Looker, nonetheless, got here in large with LookML to create a really thick semantic layer with superior code and scripting capabilities. You may say an analogous factor of Energy BI’s MDX.
The thick semantic layer was an enormous enchancment, permitting for each complicated and re-usable enterprise logic. Nevertheless, it was additionally remoted to that device, not out there to different instruments or customers who don’t have entry. In case you’re a small group with one device and everybody loves that device, then this is probably not an issue. Nevertheless, what occurs when 1) you develop and add extra instruments to your stack, or 2) wish to transfer to a unique device (i.e. vendor lock-in).
To that time, the Enterprise Intelligence Tendencies 2020 examine revealed that 67% of workers have entry to multiple BI device, with a mean of 3.8 BI instruments per firm. There are a lot of different causes to keep away from the thick semantic layer, a few of them detailed within the put up beneath:
Thus, usher within the subsequent section in trendy BI, the skinny semantic layer.
The concept of a skinny semantic layer within the BI device is to leverage different instruments that construct a semantic layer between the BI device and the database. A few of these are metric layers (aka Headless BI, metric retailer), like Remodel, Dice, and Metricql. Others, like dbt (knowledge construct device), are knowledge transformation instruments that supply help for metrics, in addition to different semantic layer performance.
BI instruments which have a skinny semantic layer sync or pull a lot of the metadata from the headless semantic layer after which outline some further metadata on high of that. There are a rising variety of BI instruments that undertake the skinny semantic layer strategy. Right here is an article about Superset, detailing the concepts behind a skinny semantic layer:
The skinny semantic layer is clearly an enormous step ahead for BI. However now, being a go-getter, you’re most likely pondering “why not take it additional”? Along with metrics and another metadata, why not push extra metadata like names/labels, descriptions, formatting, and synonyms all the way down to the headless semantic layer?
The idea of semantic-free BI is just not new. It dates again to early BI instruments, and was first regarded as utilizing a common (unified) semantic layer, not too long ago termed “headless”. The concept is that each one shoppers of knowledge (BI, ML, and different instruments) can “converse the identical language” by accessing a “single supply of reality” the place frequent metadata semantics are utilized. In contrast to the skinny semantic BI instruments that synchronize some metadata from the headless semantic layer, the semantic-free BI device merely holds a reference to metadata within the headless layer. There is no such thing as a metadata element held within the BI device. Technically, you might change an attribute on the report layer and name this a semantic layer, however it’s not “the” semantic layer that we’re speaking about.
Like all nice concepts (hoverboards, flying automobiles, inside-out Oreo’s), implementing a common semantic layer that truly works and is well worth the funding stays elusive. The metrics layer options talked about earlier (Remodel, Dice, dbt, Metricql) are gaining main traction, however are considerably singularly centered on metrics. For good motive, it’s crucial part of the common semantic layer. On the opposite finish of the spectrum, there are full common semantic layer choices like AtScale and Kyligence. However they don’t seem to be centered metrics layers, and it additionally stays to be seen if they may acquire traction. Will BI and different instruments put within the effort to combine with them? In contrast to the open-source metrics options, AtScale or Kyligence are neither open supply nor clear. They each haven’t any pricing web page and listing solely the most important corporations as their clients, so I believe it’s truthful to say that they don’t seem to be “common” unified semantic layer choices.
With knowledge groups of all styles and sizes, present choices seemingly work very properly for a small proportion of organizations. Maybe smaller, nimbler corporations discover the headless BI choices match completely. Moreover, “semantic-free” BI might be not that important for these organizations. On the opposite finish of the spectrum, massive mega-corporations could also be having success with choices like AtScale or Kyligence. That’s nice! Nevertheless, this text is de facto for the 90% which can be in between.
To be able to get there, I see this as a 3 step strategy:
- Refine and merge ideas of the metrics and common semantic layer
- Outline requirements for BI instruments to speak to the headless layers
- Attain a important mass of BI instruments that may help semantic-free BI
The metrics and common semantic layers cowl practically every part, however must each come collectively and mature. As soon as that occurs, it must be straightforward for BI instruments to combine with this headless semantic layer. With out a set of requirements for speaking to the headless layer, every BI device should create customized connectors to every, which can seemingly result in failure. If the framework is each open and straightforward, then BI instruments will naturally undertake a semantic-free mannequin. Then, we will begin to attain important mass for semantic-free BI.
Presently, I see dbt within the lead towards enabling semantic-free BI for just a few causes:
- It’s not one other device in your stack. Because the quickest rising answer for knowledge transformation, it already holds a majority of your complicated enterprise logic. Firms are already transferring their LookML down a layer, into dbt.
- By default, plenty of metadata is of course built-in to the dbt fashions.
- dbt’s documentation and help for
meta
already allow full knowledge cataloging capabilities. Now, it’s a matter of enhancing the performance and making it extra lively/alive. dbt-core
is open supply and free, with a vibrant neighborhood- Extras like knowledge lineage and knowledge freshness are enormous for BI
Though dbt is just not a real headless server, they’re presently engaged on their headless metrics providing. Moreover, dbt acknowledges the necessity for enchancment and has laser give attention to each the metrics and semantic layer:
Proper now, BI instruments can and are integrating with dbt to offer skinny and semantic-free BI experiences. FlexIt Analytics and Lightdash have already got semantic-free capabilities by means of integration with dbt. Others like Superset and Metabase have sync instruments that enable for guide syncing of dbt fashions to help a skinny semantic layer. Given dbt’s reputation, many others, like Thoughtspot (Aug 2022) and Holistics (beta entry out there now) are coming quickly, so we’ll see how they combine a while in 2022. Lastly, some BI instruments like Mode provide you with a little bit bit (dbt supply freshness), however not a lot in the best way of the semantic mannequin.
Issues are progressing quickly, making it crucial to get some requirements in place. Right here is an article that each particulars the way to set requirements and supplies a working Github dbt mission for the way to combine with dbt to allow semantic-free BI:
We’ve come a great distance towards enabling knowledge groups and knowledge shoppers of all stripes. In some regards, we’re simply originally, however the present sense of urgency and fast tempo of each headless semantic choices and skinny or semantic-free BI instruments provides me hope that we are going to get there quickly.
I’d love to listen to your ideas, or attain out to Andrew Taft