Rapidly prototype your SQL templates
SQL itself doesn’t lend itself effectively to reusability. To realize reusability SQL is commonly templated utilizing libraries like Jinja. For instance Apache Superset leverages Jinja templating in its Dataset definitions and naturally DBT is a device constructed totally round Jinja templates. Jupyterlab-sql-editor natively helps Jinja templating making it potential to prototype jinja-sql templates for Superset and DBT proper inside JupyterLab and VSCode.
Create reusable SQL utilizing Jinja templates
The --jinja
possibility permits Jinja templating assist making any variable or operate declared in your pocket book out there for substitution inside Jinja tags {{}}
For instance you need to use a variable to symbolize the precise desk identify utilized in your SQL statements. This may be helpful when you will have database schema for growth and manufacturing tables.
In our earlier article we noticed that the %%sparksql
magic helps many output codecs; textual content, interactive grid, interactive JSON tree, html, skip. The %%sparksql
magic has another trick up its sleeve! Forgive my pun.
The --output sql
possibility renders the ensuing SQL assertion with syntax highlighting. It’s a handy technique to validate your Jinja templated code. On this instance we use a python record variable to automate the era of a SELECT assertion.
As we are able to see the Jinja {% for merchandise in record %}
has produced the specified columns.
Tapping into the ability of DBT
Utilizing Jinja templating to automate the era of SQL assertion could be very highly effective and in reality a whole venture is constructed round this very thought. DBT is a framework to develop modular SQL fashions with SELECT statements and features a ref()
Jinja macro which handles dependency administration (which fashions to construct first).
DBT features a plethora of Jinja macro aimed toward simplifying the elaboration of SQL statements. DBT has a package deal administration system permitting third celebration developer to contribute libraries of macros. There are many them on the DBT Hub
%%sparksql
magic lets you faucet right into a wealth of macros by merely changing the--jinja
with the --dbt
possibility and by configuring your pocket book with the situation of your DBT venture. We’ll illustrate using DBT utilizing the instance venture jaffle_shop.
Within the jaffle_shop there’s a mannequin referred to as orders.sql. We are able to place the content material of this file inside a Jupyter cell and use the --dbt
to course of the template utilizing DBT. We are able to use the —-output sql
to render the assertion and the--output html
to execute the question and show the outcomes.
Discover using the DBT ref()
macro. This macro refers to current DBT fashions inside your DBT venture. %%sparksql
really makes use of the DBT framework to render the SQL template. All of the DBT fashions and macros are made out there to %%sparksql
magic.
We are able to render the SQL produced by DBT. The output is fairly lengthy, we solely present part of it. Discover the payment_method
loop producing the anticipated columns. Additionally discover choose * from _dbt__cte__stg_orders
which is that they means DBT handles the materialization of ephemeral fashions. Extra particulars right here.
As seen earlier than we are able to execute and show the outcomes. Discover the DBT log outputs earlier than the precise outcomes are displayed.
Utilizing %%sparksql
is a handy technique to prototype your DBT code. We have now illustrated using DBT utilizing a pre-existing mannequin. To create a DBT mannequin from scratch you may need to merely show the information of current tables (DBT sources) or pre-existing DBT fashions. Suppose you need to create a brand new mannequin primarily based on the stg_payments
. You can begin by exhibiting the information in stg_payments
%%sparksql --dbt --output htmlchoose * from {{ ref('stg_payments') }}
Then begin reworking this dataset utilizing DBT’s Jinja macros. All of the whereas seeing the ensuing output or rendering the SQL.
%%sparksql --dbt{% set payment_methods =
['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}choose
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}'
then quantity else 0 finish) as {{ payment_method }}_amount,
{% endfor -%}
sum(quantity) as total_amount
from {{ ref('stg_payments') }}
group by order_id
After you have a terrific templated question you’ll be able to transfer it to manufacturing by merely copying the template right into a DBT .sql
mannequin file and be assured that it’s going to really do what you count on it to do.
Visible Studio Code DBT Energy Person
DBT Energy Person is a well-liked VSCode extension for DBT tasks. It helps many DBT autocompletion options in your .sql
mannequin information.
Visible Studio Code helps working with Jupyter Notebooks natively and thus the %%sparksql
magic works inside VSCode. As proven earlier than all it is advisable to do to leverage a DBT venture in a pocket book is to provide the situation of your DBT to %%sparksql
As soon as a pocket book cell’s language is ready to SQL the autocompletion of DBT Energy Person will kick in and also you’ll get the identical advantages as when modifying an .sql
file. Discover that for those who set up the sql-language-server in your VSCode it’ll mechanically change the language to SQL when it detects a cell with a %%sparksql
magic.
Keep in mind that VSCode autocompletion is triggered by <CTRL-SPACE> slightly than <TAB> in JupyterLab.
For instance if we choose the is_incremental
suggestion proven above we get the next code inserted into our pocket book cell. The identical behaviour as DBT Energy Person in a .sql
mannequin file.
The underside line is that %%sparksql
magic works the identical means in VSCode and JupyterLab notebooks. You possibly can render DBT fashions into SQL, execute queries and look at the outcomes. Equally VSCode notebooks are an effective way to prototype DBT fashions. As soon as you might be happy along with your mannequin you’ll be able to copy it into an .sql
mannequin file.
On this article we confirmed how one can leverage %%sparksql
to simply prototype templated Spark SQL in JupyterLab and VSCode. We targeted on Spark, nonetheless jupyterlab-sql-editor additionally features a %%trino
magic! Extra SQL engines is likely to be added sooner or later. Contributions are welcomed! Right here’s our git repo CybercentreCanada/jupyterlab-sql-editor.