At Fishtown Analytics we believe that analysts should function more like software engineers — building and maintaining code that can be used across dbt projects in packages is one way we enable this.
Source data from third parties typically comes through an ETL tool like Fivetran or Stitch in the same structure for all companies. This means that once one analyst has done the work to model the data for that source, that code can (and should!) be shared with other analysts.
TL;DR: want to go from syncing new data to having models built in minutes? Packages are going to be your friend.
A Walkthrough: Installing the Zendesk Package
Once you have connected Zendesk in your ETL tool of choice, allow some time for the data to replicate (maybe a day in case there is a lot of data). When you see the data in your data warehouse, you can implement the package.
Configuring your dbt_project.yml file
- Open your dbt project in your text editor and go to the
- In the Github repository for the package you are looking to install, click on
dbt_project.yml(here’s the one for Zendesk).
- Copy the “Zendesk” model information from the bottom section. If you already a have
modelsspecified in your project, you do not need to include it again, simply add the Zendesk section and below rows (highlighted in red below). Note: the spacing has to match with existing fields in your
dbt_project.ymlfile. The YAML file format is very sensitive to this so if you get any errors after implementing the below step, check spacing!
- What the “vars” or variables are asking for is where to look in your data warehouse to grab the relevant data from. You should remove the `# schema.table` placeholder and fill in with your tables (e.g.
Pulling from the package repository
- Now that you have told the package where to get the data, you need to tell your dbt project where to pull the models from to build the package.
- For dbt versions 0.10.0 and after: Add a
packages.ymlfile to your project. Use the format listed below to past the https URL from the package repository and release or branch listed as the revision.
- For dbt versions prior to 0.10.0: Add a “repositories” section at the bottom of your
dbt_project.yml. Beneath it, copy and paste the https URL from the package repository. For our example, that’s https://github.com/fishtown-analytics/zendesk.git.
- You may notice that the above images have either
revision: 0.1.0or “@0.1.0” appended to the url we copied from the package—this references a specific release of the package. As package authors make changes and improvements to a package, it’s generally better for you to manually upgrade to these new versions instead of upgrading automatically. New versions could introduce incompatibilities with the rest of your project, so testing these upgrades by hand is recommended.
- To figure out what the latest version is when you install, click on “release”.
- This will bring you to a list of all releases. In Zendesk’s case, there is currently only one release but if there were multiple, the latest release will be listed at the top of the page
Ok — so now your project knows where to look for the models and the models know what tables to look to build from. Now you need to actually download the code so that dbt can run it.
- From the command line, run
dbt deps. This will go fetch the code from the URL you specified.
- Then do a
dbt runto build the models.
Can I have multiple packages in my project?
Yes! Just follow the above steps and add addition URLs.
Wait — I’m not seeing the models in my project where they normally would be…
All downloaded packages show up in the a gitignored folder called
Can I make changes to the models for customizations?
You can’t directly customize the code within your
dbt_modules folder, because the next time you run
dbt deps all of your changes will be overwritten. But it’s very common to want to modify or build on top of code in a package! Here are some options for how to do that:
- Build on top of the models in the package. Everything from the package you referenced is available in your graph, so just call
ref()and start building like you would on top of any other model!
- Override a model in the package with one that you build yourself. Create a model in your local project with the same name as one from the package, then set
enabled: falsefor the model in the package.
- Best of all: submit a PR to the repository to get your change added in to the package!
How do I know when updates have come out to a package I use and how do I upgrade?
You should “watch” any package repository that you add so you are updated from Github when new releases come out (there is a red box around the “watch” button in the above image). When a new update comes out, update the “@0.1.0” pin to the newest version and read the release notes to understand what has been changed (maybe it isn’t something you want!).