Every year on Fishtown Analytics’ birthday (July 1), I take a moment to reflect back on the past 12 months. After one year in business, I was relieved that Fishtown Analytics was still alive and showing early signs of promise. After two years, I was excited that our way of thinking about analytics was starting to see some traction. After three years, I was energized by the size, the cohesiveness, and the sheer goodwill of the dbt community.
This year has been something else. Year four saw dbt grow from a tool used by forward-thinking early adopters into a burgeoning standard in the modern analytics stack. As of last week, there were more than 2,100 companies that are actively using dbt. Wow. dbt is no longer a niche product and dbt Slack is no longer a community of misfits...we’ve gone mainstream, folks.
This chart invokes a lot of feelings for me. Terror, for instance. As in: "holy shit there are a lot of people using the stuff we build, I hope we don’t let them down!” There is also pride: I genuinely believe that dbt and its community are helping data professionals achieve more personally and making positive impacts on their organizations. It feels good to be creating a force for good in the world! There is also a lot of excitement about the future: the growth of the community has allowed us to bring in resources that we can use to do amazing new things!
Amidst this swirl of emotions and the day-to-day triage of tasks, writing this post once a year forces me to stop and take stock. Where are we as a community? Where is dbt as a product? What does the future hold?
Here are some reflections.
Where is dbt as a product?
dbt Core is increasingly mature. The primary constructs it uses—SQL + YAML + Jinja, materializations,
ref —aren’t going anywhere. This year, we’ve devoted engineering bandwidth to solving gnarly issues that have been around for a very long time but only manifested themselves in edge cases and latent user confusion. Things like having a clear context wherever you might want to write code and the mess that was
dbt_project.yml have been revisited and (at long last) rationalized. We’re also pushing into more sophisticated operational use cases like blue-green deployments that give power users more options as they manage increasingly complex dbt footprints.
It feels good to be able to both clean up some of our historical inconsistencies as well as push further into advanced use cases—both of these are the kinds of things you can only do when the product is fundamentally working. We’re excited to continue to harden dbt in these ways.
Other aspects of both dbt Core and dbt Cloud are much more nascent, though.
- We’re in the very early days in improving the developer experience of writing dbt code. The dbt Cloud IDE is still in its infancy, and is only one of the many ways in which we ultimately believe that users will write dbt code that we want to facilitate. This push towards improved developer experience continues to be critical as the community continues to grow and as new users are often less tolerant of poor DX than were early-adopters.
- dbt Docs is still a fledgling product that shows a tremendous amount of promise. It’s usage is growing steadily, and even despite the lack of investment we’ve made in it over the past two years since its launch it continues to be one of the most widely-loved parts of the dbt experience. We need to double down on dbt Docs. The analytics engineering workflow doesn’t stop with publishing a dataset. Users throughout the business need to know the context of that dataset in order to effectively discover, use, and trust it.
- Data quality! Gosh, there is so much exciting work happening in this space, and we believe it is only getting more critical as organizations’ data footprints expand. There are meaningful ways in which we believe dbt’s current testing functionality doesn’t give analytics engineers all of the tools that they need, and we care about closing that gap.
- We need to continue to improve the contributor experience. We’ve made it possible (finally!) to submit a PR to dbt Core and have it run through all CI checks without requiring any manual intervention from someone at Fishtown Analytics. Great! And we’ve grown our product and engineering organizations and so will have more bandwidth to work with contributors on PRs. Growing the contributor community around dbt Core is very important to us in the coming year.
Where are we as a community?
The dbt Slack community continues to be the largest, most engaged, and most authentic community of professionals working in the modern data stack. The group has 3x’ed since this time last year, from roughly 2,000 to roughly 6,000 humans around the world. That growth is fantastic! It means that the dbt viewpoint is reaching more people than ever, and it also means that we each benefit from the collective wisdom of an increasingly large, diverse, group.
But growth comes with challenges. At a community size of 6,000, it’s impossible for any of us to know even a small percentage of the community personally, which means that the interpersonal bonds that characterized the community of 2016-2018 are more attenuated. It’s harder to encourage norms of behavior like reciprocity, trust, and goodwill without these personal connections. How do we scale the kind of goodwill and esprit de corps that any strong community requires to be self-sustaining?
The past year has seen us:
- ...create an onboarding process that encourages identifying yourself as a real human (one of our values) and forces participants to accept the community guidelines prior to joining.
- ...embrace a city-first community strategy that encourages local communities inside the bigger, global community. We believe that community is formed first and foremost via a group of close-knit personal relationships that are best formed at the local level.
- ...invest in documentation. We switched to Docusaurus — this enabled us to finally build out a reference section, improve how our docs are structured, and made it easier for community members to contribute changes (over 30 community contributions to dbt documentation to-date!).
- ...invest in learning. We ramped up dbt Learn, teaching the analytics engineering workflow to over 200 people. This is the next generation of dbt power users and we love seeing these folks show up in dbt Slack and start confidently answering questions.
We’re taking the “scaling the community while keeping it great” challenge very seriously, and still have a lot of work to do. In the coming year, we are planning to moderate the growth of the dbt Slack group in order to maintain it as a productive space for idea exchange. At the same time, we’ll be thinking about other ways beyond Slack for the community to take shape that may scale more effectively for specific use cases. Sometimes the best way to create an effective conversation is not to invite everyone to a dinner party and suggest they all talk at once :)
What does the future hold?
Both dbt and the analytics engineering workflow are a product of the modern data stack. And that entire stack is going mainstream, fast. Four years ago:
- ...Snowflake was not yet a mature platform nor was it widely deployed. Today, Snowflake has filed to go public at a valuation of $20B and has the product and customer base to match.
- ....Fivetran and Stitch had a combined headcount of < 50 people and combined connector count of maybe 25. Today, Fivetran is worth > $1B and has 340 employees and over 100 connectors. (Data from Stitch is hard to get since the Talend acquisition.)
Wow! Four years ago we knew that this world was coming, but as of today it’s clear that companies from large to small now have (or could have!) access to all of their data in a high-performance, modern data warehouse. dbt’s entire view of the world assumes that this is true, and the traction dbt is seeing is highly correlated with more and more companies entering this modern ecosystem.
As data warehouse technology and data ingestion technology have matured, the primary front for innovation has changed during this same four-year period. Four years ago:
- ...almost no one was thinking about data discovery and cataloging; today most hyperscale tech companies have built internal solutions and the rest of the market is waking up to the problem as well.
- ...data quality was an issue that few teams had time to focus on. Today, it’s a major focus for forward-thinking data engineering teams and there is more attention than ever on tooling and methodology.
- ...a crop of innovators in BI tooling seemed ascendant. Today, most of these companies (led by Looker) have been acquired and innovation from that generation of tools has slowed. Instead, we are starting to see innovation in both open source BI and in novel UI paradigms.
Sitting back and processing all of this, I’m left with only one thought: it’s a great time to be an analyst. Data analysts are leveling up, and this trend is being driven by improvement in tools and corresponding improvement in workflows.
Ten years ago I spent my time solving dumb technical problems with my shitty tools or repeating the same work I had done a month prior. Today, when I still get to put on my analyst hat I get to think about how to design great metrics, or implement processes to improve data quality, or push self-service to hundreds of data consumers. I’m more valuable to the organizations that I touch than I was a decade ago, and working with data is just more fun. Hopefully you agree.
Thanks for a great year four. I remain incredibly excited about working together to invent the future.
Speaking of which—what changes do you think the next four years will bring? If you have any thoughts, DM me in Slack.
See you in there :)