Friday, January 13, 2023
HomeData ScienceCreate a Native dbt Challenge

Create a Native dbt Challenge


Photograph by Daniel Ok Cheung on Unsplash

dbt (information construct device) is among the hottest applied sciences within the information engineering and analytics house. Not too long ago, I’ve been engaged on a job that performs some post-processing over dbt artefacts and wished to put in writing up some exams. So as to take action, I’ve needed to create an instance undertaking that might run regionally (or in a docker container), in order that I wouldn’t should work together with the precise Knowledge Warehouse.

On this article we are going to undergo a step-by-step course of one can comply with as a way to create a dbt undertaking and join it with a containerized Postgres occasion. You need to use such initiatives both for testing functions, and even for experimenting with the dbt itself as a way to check out options and even practise your expertise.

Step 1: Create a dbt undertaking

We shall be populating some information in a Postgres database subsequently, we first want to put in the dbt Postgres adapter from PyPI:

pip set up dbt-postgres==1.3.1

Notice that the command may even set up the dbt-core package deal in addition to different dependencies which can be required for working dbt.

Now let’s go forward and create a dbt undertaking — to take action, we will initialise a brand new dbt undertaking by working the dbt init command within the terminal:

dbt init test_dbt_project

You’ll then be prompted to pick out which database you want to make use of (relying on the adapters you will have put in regionally, you might even see totally different choices):

16:43:08  Working with dbt=1.3.1
Which database would you want to make use of?
[1] postgres

(Do not see the one you need? https://docs.getdbt.com/docs/available-adapters)

Enter a quantity: 1

Make sure that to enter the quantity that corresponds to the Postgres adapter, as proven within the output checklist. Now the init command ought to have created the next fundamental construction within the listing the place you’ve executed it:

dbt undertaking construction created by `dbt init` command — Supply: Writer

Step 2: Create a Docker Compose file

Now let’s create a docker-compose.yml file (place the file on the similar stage because the test_dbt_projectlisting) wherein we shall be specifying two providers — one would correspond to a ready-made Postgres picture and the second to a dbt picture that we are going to outline in a Dockerfile within the subsequent step:

model: "3.9"

providers:
postgres:
container_name: postgres
picture: frantiseks/postgres-sakila
ports:
- '5432:5432'
healthcheck:
take a look at: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
construct: .
picture: dbt-dummy
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
situation: service_healthy

As you may inform, for the Postgres container, we shall be utilizing a picture known as frantiseks/postgres-sakila which is publicly out there and accessible on Docker Hub. This picture, will populate the Sakila Database on the Postgres occasion. The database fashions a DVD rental retailer and is consisted of a number of tables that are normalised and correspond to entities corresponding to movies, actors, prospects and funds. Within the subsequent few following sections we’ll make use of this information as a way to construct some instance dbt information fashions.

The second service, known as dbt, would be the one which creates an atmosphere the place we are going to construct our information fashions. Notice that we mount the present listing into the docker container. It will let the container have entry to any adjustments we could also be doing to the information fashions with out having to re-build the picture. Moreover, any metadata generated by dbt instructions (corresponding to manifet.json) will seem immediately on the host machine.

Step 3: Create a Dockerfile

Now let’s specify a Dockerfile that shall be used to construct a picture on prime of which the working container will then construct the fashions laid out in our instance dbt undertaking.

FROM python:3.10-slim-buster

RUN apt-get replace
&& apt-get set up -y --no-install-recommends

WORKDIR /usr/src/dbt/dbt_project

# Set up the dbt Postgres adapter. This step may even set up dbt-core
RUN pip set up --upgrade pip
RUN pip set up dbt-postgres==1.3.1

# Set up dbt dependencies (as laid out in packages.yml file)
# Construct seeds, fashions and snapshots (and run exams wherever relevant)
CMD dbt deps && dbt construct --profiles-dir profiles && sleep infinity

Notice that within the final CMD command, we deliberately added an additional && sleep infinity command such that the container gained’t exit after working the steps specified within the Dockerfile in order that we will then entry the container and run extra dbt instructions (if wanted).

Step 4: Create a dbt profile for the Postgres database

Now that we’ve got created the required infrastructure for our host machines as a way to create a Postgres database, populate some dummy information in addition to creating a picture for our dbt atmosphere, let’s concentrate on the dbt facet.

We are going to first should create a dbt profile that shall be used when interacting with the goal Postgres database. Inside the test_dbt_project listing, create one other listing known as profiles after which a file known as profiles.yml with the next content material:

test_profile:
goal: dev
outputs:
dev:
kind: postgres
host: postgres
consumer: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1

Step 5: Outline some information fashions

The following step is to create some information fashions primarily based on the Sakila information populated by the Postgres container. If you’re planning to make use of this undertaking for testing functions, I might advise to create no less than one seed, one mannequin and a snapshot (with exams if attainable) so that you’ve got a full protection of all dbt entities (macros excluding).

I’ve created some information fashions, seeds and snapshots already, that you could entry them on this repository

Step 6: Run the Docker containers

We now have the whole lot we want as a way to spin up the 2 docker containers we specified within the docker-compose.yml file earlier, and construct the information fashions outlined in our instance dbt undertaking.

First, let’s construct the photographs

docker-compose construct

And now let’s spin up the working containers:

docker-compose up

This command ought to have initialised a Postgres database utilizing the Sakila Database, and created the dbt fashions specified. For now, let’s ensure you have two working containers:

docker ps

ought to give an output that features one container with title dbt and one other one with title postgres.

Step 7: Question the fashions on Postgres database

In an effort to entry the Postgres container, you’ll first have to infer the container id

docker ps

After which run

docker exec -it <container-id> /bin/bash

We are going to then want to make use of psql, a command-line interface that provides us entry the postgres occasion:

psql -U postgres

When you have used the information fashions I’ve shared within the earlier sections, now you can question every of the fashions created on Postgres utilizing the queries under.

-- Question seed tables
SELECT * FROM customer_base;

-- Question staging views
SELECT * FROM stg_payment;

-- Question intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;

-- Question mart tables
SELECT * FROM cumulative_revenue;

-- Question snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;

Step 8: Creating extra or modifying current fashions

As talked about already, the Dockerfile and docker-compose.yml information have been written in such a method such that the dbt container would nonetheless be up and working. Due to this fact, everytime you modify or create information fashions, you may nonetheless use that container to re-build seeds, fashions, snapshots and/or exams.

To take action, first infer the container id of the dbt container:

docker ps

Then enter the working container by working

docker exec -it <container-id> /bin/bash

And eventually run any dbt command you would like, relying on the modifications you’ve made to the instance dbt undertaking. Right here’s a fast reference of essentially the most generally used instructions for these functions:

# Set up dbt deps
dbt deps

# Construct seeds
dbt seeds --profiles-dir profiles

# Construct information fashions
dbt run --profiles-dir profiles

# Construct snapshots
dbt snapshot --profiles-dir profiles

# Run exams
dbt take a look at --profiles-dir profiles

get the total code of this tutorial

I’ve created a GitHub repository known as dbt-dummy that comprises all of the items you want as a way to rapidly create a containerized dbt undertaking that makes use of Postgres. You may entry it within the hyperlink under.

This undertaking can also be out there within the instance initiatives part of the official dbt documentation!

Last Ideas

In right this moment’s tutorial we went by means of a step-by-step course of for making a dbt undertaking on a neighborhood machine utilizing Docker. We’ve constructed two pictures, one for the Postgres database that additionally populates the Sakila database, and one other one for our dbt atmosphere.

It’s vital to have the ability to rapidly construct some instance initiatives with information construct device that may then be used as a testing atmosphere or perhaps a playground for experimenting and studying.

Turn into a member and browse each story on Medium. Your membership price instantly helps me and different writers you learn. You’ll additionally get full entry to each story on Medium.

Associated articles you may additionally like

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments