As an analytics engineer on the professional services team at dbt Labs, I've helped a dozen or so data teams refactor from a legacy SQL transformation workflow to modular data modeling in dbt.

I've noticed a clear pattern: rare is the team that starts with a completely clean slate. More often than not, teams have accumulated years of SQL code powering business-critical, often complex data transformations.

These SQL transformations may run in a stored procedure, in an Airflow SQL operator, or some other scripting environment—but regardless of where it runs, that legacy SQL code must be migrated!

How can that migration be done with a) minimal headache and b) maximum quality?

Announcing a new (free!) on-demand course

Over the last 18 months, myself and the rest of the dbt Labs professional services team have evolved a process to help teams migrate legacy SQL queries to dbt SQL.

And as of today, you can now learn + implement that same refactoring process, with a new on-demand course.

If you're completely new to dbt, you may want to start with the dbt Fundamentals course before diving into this one.

Everyone learns a bit differently, so you can pick up the course materials in a few different formats:

Our SQL refactoring process, in short

For a bit of background, the migration workflow that we follow internally is:

  1. Migrate your legacy SQL code, unchanged—make sure it runs and produces an identical output.
  2. Implement dbt sources (as opposed to raw table references).
  3. Choose a refactoring strategy.
  4. Implement clean CTEs (we ❤️ CTEs).
  5. Separate transformation logic into standardized data model layers (often the longest step).
  6. Audit the output (often using the audit_helper dbt package).

There is, of course, no one true way to refactor legacy SQL into dbt models!

I'd love to hear from you (on Twitter or in #learn-on-demand in dbt Community Slack) how your approach varies from the one we lay out in the course.