Sunday, September 11, 2022
HomeData ScienceJinja/DBT SQL Templating in JupyterLab/VSCode | by Jean-Claude Cote | Jul, 2022

Jinja/DBT SQL Templating in JupyterLab/VSCode | by Jean-Claude Cote | Jul, 2022


Rapidly prototype your SQL templates

Picture by Joanna Kosinska on Unsplash

SQL itself doesn’t lend itself effectively to reusability. To realize reusability SQL is commonly templated utilizing libraries like Jinja. For instance Apache Superset leverages Jinja templating in its Dataset definitions and naturally DBT is a device constructed totally round Jinja templates. Jupyterlab-sql-editor natively helps Jinja templating making it potential to prototype jinja-sql templates for Superset and DBT proper inside JupyterLab and VSCode.

Create reusable SQL utilizing Jinja templates

The --jinja possibility permits Jinja templating assist making any variable or operate declared in your pocket book out there for substitution inside Jinja tags {{}}

For instance you need to use a variable to symbolize the precise desk identify utilized in your SQL statements. This may be helpful when you will have database schema for growth and manufacturing tables.

In our earlier article we noticed that the %%sparksqlmagic helps many output codecs; textual content, interactive grid, interactive JSON tree, html, skip. The %%sparksql magic has another trick up its sleeve! Forgive my pun.

The --output sqlpossibility renders the ensuing SQL assertion with syntax highlighting. It’s a handy technique to validate your Jinja templated code. On this instance we use a python record variable to automate the era of a SELECT assertion.

As we are able to see the Jinja {% for merchandise in record %}has produced the specified columns.

Tapping into the ability of DBT

Utilizing Jinja templating to automate the era of SQL assertion could be very highly effective and in reality a whole venture is constructed round this very thought. DBT is a framework to develop modular SQL fashions with SELECT statements and features a ref()Jinja macro which handles dependency administration (which fashions to construct first).

DBT features a plethora of Jinja macro aimed toward simplifying the elaboration of SQL statements. DBT has a package deal administration system permitting third celebration developer to contribute libraries of macros. There are many them on the DBT Hub

%%sparksql magic lets you faucet right into a wealth of macros by merely changing the--jinjawith the --dbt possibility and by configuring your pocket book with the situation of your DBT venture. We’ll illustrate using DBT utilizing the instance venture jaffle_shop.

Within the jaffle_shop there’s a mannequin referred to as orders.sql. We are able to place the content material of this file inside a Jupyter cell and use the --dbt to course of the template utilizing DBT. We are able to use the —-output sqlto render the assertion and the--output htmlto execute the question and show the outcomes.

Discover using the DBT ref() macro. This macro refers to current DBT fashions inside your DBT venture. %%sparksql really makes use of the DBT framework to render the SQL template. All of the DBT fashions and macros are made out there to %%sparksql magic.

We are able to render the SQL produced by DBT. The output is fairly lengthy, we solely present part of it. Discover the payment_method loop producing the anticipated columns. Additionally discover choose * from _dbt__cte__stg_orders which is that they means DBT handles the materialization of ephemeral fashions. Extra particulars right here.

As seen earlier than we are able to execute and show the outcomes. Discover the DBT log outputs earlier than the precise outcomes are displayed.

Utilizing %%sparksql is a handy technique to prototype your DBT code. We have now illustrated using DBT utilizing a pre-existing mannequin. To create a DBT mannequin from scratch you may need to merely show the information of current tables (DBT sources) or pre-existing DBT fashions. Suppose you need to create a brand new mannequin primarily based on the stg_payments . You can begin by exhibiting the information in stg_payments

%%sparksql --dbt --output htmlchoose * from {{ ref('stg_payments') }}

Then begin reworking this dataset utilizing DBT’s Jinja macros. All of the whereas seeing the ensuing output or rendering the SQL.

%%sparksql --dbt{% set payment_methods =
['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
choose
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}'
then quantity else 0 finish) as {{ payment_method }}_amount,
{% endfor -%}
sum(quantity) as total_amount
from {{ ref('stg_payments') }}
group by order_id

After you have a terrific templated question you’ll be able to transfer it to manufacturing by merely copying the template right into a DBT .sqlmannequin file and be assured that it’s going to really do what you count on it to do.

Visible Studio Code DBT Energy Person

DBT Energy Person is a well-liked VSCode extension for DBT tasks. It helps many DBT autocompletion options in your .sql mannequin information.

Visible Studio Code helps working with Jupyter Notebooks natively and thus the %%sparksql magic works inside VSCode. As proven earlier than all it is advisable to do to leverage a DBT venture in a pocket book is to provide the situation of your DBT to %%sparksql

As soon as a pocket book cell’s language is ready to SQL the autocompletion of DBT Energy Person will kick in and also you’ll get the identical advantages as when modifying an .sql file. Discover that for those who set up the sql-language-server in your VSCode it’ll mechanically change the language to SQL when it detects a cell with a %%sparksql magic.

Keep in mind that VSCode autocompletion is triggered by <CTRL-SPACE> slightly than <TAB> in JupyterLab.

For instance if we choose the is_incremental suggestion proven above we get the next code inserted into our pocket book cell. The identical behaviour as DBT Energy Person in a .sql mannequin file.

The underside line is that %%sparksqlmagic works the identical means in VSCode and JupyterLab notebooks. You possibly can render DBT fashions into SQL, execute queries and look at the outcomes. Equally VSCode notebooks are an effective way to prototype DBT fashions. As soon as you might be happy along with your mannequin you’ll be able to copy it into an .sqlmannequin file.

On this article we confirmed how one can leverage %%sparksql to simply prototype templated Spark SQL in JupyterLab and VSCode. We targeted on Spark, nonetheless jupyterlab-sql-editor additionally features a %%trino magic! Extra SQL engines is likely to be added sooner or later. Contributions are welcomed! Right here’s our git repo CybercentreCanada/jupyterlab-sql-editor.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments