Today I’m incredibly excited to announce the launch of an Integrated Development Environment (IDE) for dbt. With the IDE, you can build, run, test, and version control dbt projects from your browser. There’s no wrestling with pip, homebrew, hidden files in your home directory, or coordinating upgrades across large teams. Instead, you can orchestrate the entire analytics engineering workflow in your browser.
In this post I want to talk about the thinking behind this feature. Why build an IDE at all? The answer to this question speaks to our foundational beliefs about analytics engineering and the long-term future for our industry.
The central dysfunction of data
In the very early days of Fishtown Analytics, I was contacted by the CFO of a Fortune 100 business. Given that we had exactly four people on the team at that point and had worked with exactly zero companies with revenue > $1B, it was a little outside of our area of expertise and I told him as much. But we ended up setting up a meeting where he, the CTO, and their head of data engineering hung out for an entire afternoon and talked data.
I’ll never forget that experience—it was so perplexing to me. The company had built a phenomenal set of data pipelines, all loading data into a single data lake, with Elastic Search (??) on top. This was a huge investment in data engineering, but strangely, no one in the entire business had actually used the data to do anything. They were stuck and hoping to get our help to know what to do next.
I remember asking “Well, have you written any queries on the data yet?” It seemed like such a stupid question, almost insulting…hey you idiots, try actually writing a fucking query! I remember feeling a flush of embarrassment as I asked it, because I thought either they were going to look dumb or I was going to look dumb.
The answer I got back was “We don’t have anyone who can do that.” In this global enterprise, you don’t have anyone who can write SQL?? I just couldn’t grok this. As our meeting wrapped up, the head of data engineering walked me to his office to show me a whiteboard full of diagrams of their data architecture. On the way, we passed rows and rows of cubicles. In each one of them, a single monitor with a single Excel sheet stared back at me. There was an entire matrix of humans whose job was information processing, but these humans couldn’t process the data that this engineer was serving up
Why was this the case? And how was this situation allowed to persist!?
Data projects fail at the chasm between IT and the business
This data project was a failure because it couldn’t cross the chasm from IT to the business. The technologists didn’t know how to make their data infrastructure consumable to the hoards of Excel consumers and it languished.
This is just one way data projects can fail. There are a bunch more, and each of them draw their origin story from the chasm between IT and the business.
- Crossing the chasm from business to IT. Projects initiated by the business with no support from technology typically see traction on the business side for a little while but run into data quality / reliability / scalability issues as their underlying technical issues emerge. Technical teams don’t want to touch these systems, which slowly die out as users grow disillusioned.
- Data bread lines. Lloyd at Looker uses the term data bread lines to describe when data consumers have to queue up to get support from the technology team before they can perform certain analyses. This delay absolutely crushes analytical velocity and buy-in from data consumers.
- Reporting without insight. Ticket-based reporting requests made by business users and fulfilled by technologists provide only a patina of data, not real insight. The people with knowledge about the business are not involved in the exploratory process, which is where all of the discoveries are to be had. You’ll get KPIs but no “ah-ha” moments.
Every single one of these failure modes is pointing at the same underlying challenge: building a bridge between people who understand the business and people who understand data technology. This inability to bridge the divide is the core dysfunction in modern data, and it’s also the biggest opportunity for progress.
Why is this gap so wide?
It didn’t used to be.
In a prior generation of data, a single person could acquire all of the relevant skills to become an expert: both a domain area expert as well as an expert in the relevant data technology. Data wasn't that big or diverse, and typically people acquired the necessary data skills (Excel, simple SQL, SAS) as their problems required. The data was small in size and scope, but the process worked fine. Just grab a CSV and get to work.
This broke down with the advent of the modern data stack. Today, the possibilities for analysis have grown dramatically, but the breadth and depth of skills required has also grown. This means that it’s no longer enough for a single domain expert to sit down at a computer and geek out for a while to get an answer. Getting answers to questions in the modern data ecosystem requires teams working together in lockstep, using a well-thought-out workflow, with tooling built to enable this collaboration.
This is why we exist
Since the very outset of the Fishtown Analytics, we've been building the process, the tooling, and the community needed to bridge this gap in modern organizations.
We championed a new function on data engineering teams—Analytics Engineering—specifically focused on collaborative knowledge creation. We've built the core tool in the analytics engineering toolkit, dbt. And we've built a vibrant community of analytics engineers who push the boundaries of the practice.
I want to take just a second to say a bit more about each of these things and how they help solve data's core dysfunction.
We decided years ago that the thing that we were doing was not data engineering, it was something unique. We had already spent thousands of hours consulting with clients and knew just how different the work we were doing was from the work being done by data engineers.
Analytics engineering is fundamentally about building a bridge between technology and the business. Analytics engineers are knowledge specialists: like librarians, they curate an organization’s knowledge. They acquire and codify new knowledge, make sure that all knowledge is reliable and current, and train users on how to access that knowledge.
These are the humans that we’ve been missing on our data projects, and this is the practice that we need to perfect if we’re to solve the core dysfunction of data. With a mature analytics engineering practice, knowledge is built up incrementally by many people in many small pieces.
Analytics engineering is practiced by people on the boundary between technology and the business, and so its primary tool must be both simple to pick up yet go incredibly deep. This has been a design goal from the outset.
While dbt’s heritage as a command-line tool has been empowering for a large group of users, it’s also acted as a barrier to entry to many others. Learning the command line shouldn’t be a prerequisite to building a mature analytics engineering function, nor should you have to learn to manage Python virtual environments on Windows.
And while source control is a critical part of the analytics engineering workflow, new analytics engineers don’t need to be dropped right into the deep end. Often, a simple branching flow is more than adequate and can be accomplished in a straightforward UI.
We want as many people to practice this nascent data function as possible, and this release today is a step in that direction. With the launch of the dbt IDE (and its inclusion in the forever-free Developer Plan), we’re lowering the barrier to entry to creating and disseminating organizational knowledge.
The analytics engineering community
Analytics engineering is something that we’re all figuring out together in real-time. dbt Slack is the world’s single largest collective of analytics engineers, and the conversation in those channels is how we learn from each other, how we find and train new analytics engineers, and how we invent the state of the art together.
More to come
Our focus in 2020 is to expand the group of people practicing analytics engineering. The IDE is the first major step in doing that and you’re going to continue to see improvements to the developer experience over the coming months.
We’re also going to expand this group by significantly ramping up our training program–dbt Learn, holding our very first global analytics engineering conference (more on this soon!), and continuing to support the growing community around dbt.
I hope this sounds like something you’d like to be a part of. If you have any thoughts you’d like to share, find me on dbt Slack, I’m @tristan.
For more details on the IDE features, read: Announcing the dbt IDE