Blog Refactoring legacy SQL to dbt

Refactoring legacy SQL to dbt

Christine Berger

Oct 13, 2021

As an analytics engineer on the professional services team at dbt Labs, I've helped a dozen or so data teams refactor from a legacy SQL transformation workflow to modular data modeling in dbt.

I've noticed a clear pattern: rare is the team that starts with a completely clean slate. More often than not, teams have accumulated years of SQL code powering business-critical, often complex data transformations.

These SQL transformations may run in a stored procedure, in an Airflow SQL operator, or some other scripting environment---but regardless of where it runs, that legacy SQL code must be migrated!

How can that migration be done with a) minimal headache and b) maximum quality?

Announcing a new (free!) on-demand course

Over the last 18 months, the dbt Labs professional services team has evolved a process to help teams migrate legacy SQL queries to dbt SQL.

And as of today, you can now learn + implement that same refactoring process, with a new on-demand course.

If you're completely new to dbt, you may want to start with the dbt Fundamentals course before diving into this one.

All the links

Everyone learns a bit differently, so you can pick up the course materials in a few different formats:

The video course, Refactoring SQL for Modularity, which includes a practice query + dataset to test your refactoring skills.
A pull request in a GitHub repo, where you can follow commit-by-commit how a monolithic query (as from a stored procedure) breaks down modular dbt code (note: these commits are covered in-depth in the video course).
A written tutorial on legacy SQL -> dbt SQL refactoring, in case you prefer reading or bookmarking the short version.

Our SQL refactoring process, in short

For a bit of background, the migration workflow that we follow internally is:

Migrate your legacy SQL code, unchanged---make sure it runs and produces an identical output.
Implement dbt sources (as opposed to raw table references).
Choose a refactoring strategy.
Implement clean CTEs (we ❤️ CTEs).
Separate transformation logic into standardized data model layers (often the longest step).
Audit the output (often using the audit_helper dbt package).

There is, of course, no one true way to refactor legacy SQL into dbt models!

I'd love to hear from you (on Twitter or in #learn-on-demand in dbt Community Slack) how your approach varies from the one we lay out in the course.

Last modified on: Feb 27, 2024

Accelerate speed to insight

Democratize data responsibly

Build trust in data across business

Book a demo

Achieve a 194% ROI with dbt Cloud. Access the Total Economic Impact™️ study to learn how. Download now ›

Refactoring legacy SQL to dbt

Announcing a new (free!) on-demand course

All the links

Our SQL refactoring process, in short

Recent Posts