Many folks may not know it, but George (Fivetran’s Co-Founder and CEO) and I go way back. We started out as mortal enemies when I was helping to launch Stitch while Fivetran was in its still-nascent days back in 2015 😄 Since I founded Fishtown Analytics in 2016, though, we’ve been close partners and have chatted several times a year. I still remember the days when the Fivetran office was so small that I couldn’t avoid also getting Taylor in frame when we were Zooming! And of course, as of April 2020 we’re now both A16Z portfolio companies and share a board member.
We figured with Fivetran’s big launch of its dbt integration we’d do a bit of an unusual blog post: a Slack conversation, rather than a monologue. The topic? Why we both believe that Fivetran’s launch of a dbt scheduler is a great thing for the long-term health of the dbt ecosystem and why we, Fishtown Analytics, are excited about it.
Let’s dive in
George: Thanks for inviting me onto the dbt blog to talk about our new product that is directly competitive with Fishtown’s commercial software offering! What a wacky world open source is 😄
Tristan: Hah! My pleasure. I love you taking it right there because it’s something that’s on a lot of folks’ minds. I know that you and I don’t feel that way at all, but there was a great thread in dbt Slack recently that voiced exactly this sentiment. I’m sure there are others out there thinking the same thing.
George: Yeah for sure. And it’s a legitimate thing to wonder about. So let me start by saying that Fivetran is not fundamentally a company that is focused on data transformation. Our mission is to make access to data as simple and reliable as electricity*. We provide the bottom layer of the modern data stack. The vast majority of our product innovation focuses on growing our number of connectors, forever increasing their reliability, and forever decreasing their latency. The question “how do I get my data from point A to point B, reliably, quickly, and with as little effort as possible?” is a shockingly large problem and we plan to continue to be laser focused on it for a very long time.
Tristan: I love that. And just to be clear, it’s the existence of tools like Fivetran that enabled us to build dbt in the first place. If Fivetran didn’t exist, the utility of dbt would be far lower.
We’re incredibly aligned on the fundamental ETL>>ELT transformation that’s going on in the market: customers should load their data into a high-performance, cloud-based datastore in its most granular form and then transform it using the highly scalable resources of that datastore. It’s this disaggregation of the extraction and loading from the transformation that has always enabled our products to be such strong compliments.
George: Absolutely. But this really starts getting at the heart of why it was important to us to have our own native dbt integration: many companies haven’t made this shift yet. We talk to companies every day who currently use a single vendor for the E, the T, and the L. If Fivetran didn’t offer our own data transformation functionality, we’d be at a disadvantage when selling to companies who look at the world in this way.
The way we see it, officially adopting dbt as a standard for data transformation enables us to leverage the strength of (and invest in!) the product and the open source community instead of attempting to compete with it. We truly are making a bet on the long-term future of both dbt as a product and as a community. And of course, that also implies a strong forward-looking bet on Fishtown Analytics as the maintainer of both.
Tristan: Aw shucks 😳
…no, seriously, I appreciate that! Fivetran has made a meaningful commitment and I see a tremendous amount of potential in pushing the modern data stack towards an open source standard for expressing data transformation workloads. Analysts spend literally years of their lives authoring dbt code, and making sure that they can take these skills with them to future jobs is hugely valuable for the community.
We’re especially excited about the work that y’all have done with the dbt packages that you’ve built for Fivetran connectors. 24 distinct packages today! That’s awesome.
George: Yeah! You may not realize this, but we’ve spent way more human hours in building out our open source dbt packages than we’ve spent on building our native dbt job scheduler. We now have two full-timers dedicated to this effort, so expect to see more and more coverage in coming months.
Tristan: Nice. Is the goal for there to be a package for every connector?
George: Yes, for every connector that delivers a predictable schema. We consider these dbt packages to be the "second layer" of Fivetran, that will ultimately be just as valuable as the connectors.
Tristan: Love that.
Taking a turn: it’s great that we’re both so positive on dbt and its community, but how do you see the commercial relationship evolving between Fivetran and dbt Cloud?
George: It's important for Fivetran to offer a great transformation solution to our users "out of the box" when they set up Fivetran, and dbt orchestration is going to be an important part of our product that we will continue to develop over time. However, we expect that dbt Cloud will always be the premiere dbt experience, and we're perfectly happy to see customers start with Fivetran dbt Transformation and later upgrade to dbt Cloud. And of course, we’re also very happy to see our customers that need the more advanced features of dbt Cloud go straight there if that’s what makes the most sense for them.
Tristan: That 100% makes sense, and I appreciate your willingness to go on the record saying that—it certainly makes this collaboration so much more straightforward.
Can we talk for a sec about where you think the ecosystem is going? I think you and I both believe that over the next several years there will likely be a bunch of companies who have some versions of dbt job scheduling incorporated into their products. Is that right?
George: Absolutely. dbt is complementary to each layer of the modern data stack and it’s hard to imagine that some of the cloud providers and other ecosystem players won’t offer some level of dbt functionality inside their own products. We won’t be the only ones to see the value in doing this.
I think this kind of story has played out in very negative, value-destroying ways in the past (think: Cloudera / Hortonworks, MongoDB/DocumentDB). IMO we have an opportunity to get it right this time in this ecosystem—to make collaboration between vendors actually win-win.
Tristan: We actually thought about this a lot in the early days. Our goal had always been to build dbt into an open source standard for how data transformation workloads were expressed, and so we fully anticipated the question of “How do you compete with other vendors hosting your open source product?”
Other open source products have responded to this challenge by migrating to non-OSS licenses. We considered doing the same, but it never sat right with us. Instead, we opted for a different approach. We decided that dbt Cloud does (and charges for) two things:
- Harden the platform
dbt Cloud gives companies of all sizes access to an operational environment for dbt Core that would be hard to replicate themselves. This means: distributed, fault-tolerant, well-monitored, and highly reliable. These characteristics of a system are costly and hard-to achieve, and when you’re running critical workflows they’re extremely valuable. We back up these characteristics of the platform with guaranteed SLAs.
- Innovate on top of the platform
dbt Cloud provides brand new user interfaces that make dbt both more accessible and more powerful. This includes development experiences like the dbt Cloud IDE, slim CI, and a soon-to-be-launched metadata API we’re calling Codex. These features are all built on top of and leverage dbt Core, but they greatly extend its reach and usefulness.
#1 is a fairly standard approach for OSS maintainers, but #2 is much less common. We feel that this “innovate on top of the platform” strategy has the potential to create a lot more value in the ecosystem and to align interests of many vendors more closely. It’s why we fundamentally see Fivetran’s launch here as a good thing: ultimately, we aren’t trying to sell a hosted dbt scheduler...we’re trying to sell brand new user experiences that are additive to dbt.
George: In fact, we’re already brainstorming on ways that Fivetran and dbt Cloud could be directly integrated, further strengthening this “value added” story!
Tristan: Hah, yes! Too early to share more now, but this is an area that I know that we both want to spend more time on :D
Just to complete the story: we’re very early on in the product lifecycle of dbt Cloud overall—we had a grand total of four engineers at the start of 2020! But we’re growing the team quickly: we’re at 13 engineers today and are on pace to have more than 30 by EOY 2021. You’ll start to see more and more rapid launches from us in the coming months.
George: You obviously know your product way better than I do, so I wonder if you could do a quick brain dump. Let’s say I’m evaluating the two products against each other today—what are the differences in capabilities?
Tristan: Yeah! Give me a minute on this and I’ll ping you on Slack when I’m done…
Ok...that’s what I have. I messaged some folks on your team to make sure I had the right info in there so I think we should be good to go. I hope you don’t feel like I’m piling on…!
George: Hah, no, that’s fair, that was exactly the point I was trying to make. I think this chart speaks louder than my high-level assertions from before: Fivetran is primarily focused on data movement, not data transformation. We’re incredibly excited about dbt as a solution to data transformation for our customers specifically because it allows us to focus on what we’re already good at and leverage all of the work done by dbt and the dbt community as an accelerator.
Will we build some of the things in your spreadsheet over time? Absolutely. Is it a priority for us to close this gap? Not at all.
Last thing I’ll say before we can put this already-long post out there into the world. Something I’m always telling folks is that the size of the ecosystem we all operate in today is absolutely tiny relative to the size that it will be in, say, ten years. There is just so much data at rest that needs to be moved. There is so much computation that needs to happen on top of it to make it useful, actionable. Commercially, the dollars today just pale in comparison to the dollars in the future as all of us in the ecosystem grow the pie together. You can see this promise for future growth in the explosive multiple Snowflake commanded at its IPO—everyone realizes there is a tremendous amount of demand for what we’re jointly building.
Getting lost in “what features product A has vs. product B” is just not really the point where we sit today. This release for us was about a long-term alignment with the dbt community.
Tristan: Love it. Thanks for taking the time to hang out 🙂