books

The bug funnel isn't a practice or tool, it's a perspective that applies across a range of development tasks. The concept is a drastic oversimplification of something Steve McConnell calls the "Cone of Uncertainty" in Software Estimation: Demystifying the Black Art (we'd both be well served if you paused here and read it before you return to my storytelling - it's one of the few books that I've had multiple startups go out of their way to buy for Tech Leads.)

The main point is that cost of fixing a bug increases dramatically with the "distance" from the initial development of the bug. This directly implies that it's worth spending money early on on the process, so you're not forced to spend even more further down the line. Even calling it "down the line" points out that this is an assembly line manufacturing metaphor, and you should be wary of any analogy between physical systems and software systems, so let's look at some actual software situations to make the case more clearly. These are organized by increasing time to resolve and number of parties involved.

Seconds to Minutes, One person
A developer mis-types a keyword and their IDE highlights it, the developer fixes it directly, ideally without even breaking their train of thought.
Minutes to Hours, One person
A developer uses an obsolete API and a quick-check pass (linter1, unit tests) flags it - immediately after they've written that chunk of code, and haven't yet moved on to the next piece.2
Hours to Days, Several people
A low-level feature gets implemented, with tests, merged, and then when another team goes to use the feature it turns out that something about the design was unsuitable.3
Days to Weeks, Many people
A developer makes a simple change to something that interacts with certain specialized hardware that they don't have available for testing, and isn't found until much later by whole-system testing by the dedicated QA team, leading to an open issue long after the developer has moved on from the change itself. At this point it has also derailed the QA team, and possibly the release engineering team if your QA team is under-resourced and only running pre-shipment approval tests - and yet, you still consider yourself lucky that you caught it before it shipped to anyone.
Weeks to Months, Many people and Multiple companies
A third party dependency update introduces an incompatibility that isn't noticed in QA but causes a bad interaction or crash on a customer network. Even identifying the problem here is very expensive, both due to distance between cause and effect, and due to the exposure of the problem on a customer system - causing commercial embarassment and being harder to diagnose in the field instead of in the lab. This is the level where, if you (or your customer) are interesting enough there will be public post-mortem articles, possibly hostile ones...

There are ways to mitigate these individually - better field introspection tools to make the customer-visible case less expensive to resolve, for example, and there is always a market for developer tools that claim to reduce errors. These tools tend to reduce the costs within one level, and that is valuable, but they don't change the overall shape of the funnel: catching problems earlier is still cheaper, and moving problems closer to the developer still has a huge benefit.

Why this matters

Recognizing the structure of the "funnel" is a good way to see the value of release engineering and infrastructure work like

All of these help find problems faster, and push them "closer" to the developers (who, after all, implemented the problems in the first place.) The "funnel" model should make it clearer why you want to push in that direction.

Opposing forces

"Move fast and break things" is a prototyping technique, when you're trying to see what shape something is - so you're building it as cheaply as possible and have exceptionally high chances of throwing it away. It's generally not customer-facing - most of your users should be on your team or otherwise dedicated to evaluating the proof of concept.

The main problem with this approach is that not everyone involved understands that it's what you're doing - you're expecting "OK, this gave us an understanding of the problem, now let's design something real" and what you get instead was "everyone was happy with that so let's just ship it". The bug funnel can serve as a formal structure to explain why "fixing it later" is extraordinarily expensive.


  1. Traditionally, lint was a tool that picked at little syntactic concerns in C code, but the name has mostly stuck for the category of fast ad-hoc code-checkers. Do not underestimate the power of a simple grep for scanning code for "APIs we shouldn't call any more" that gets run as part of your automated tests... 

  2. What about code review? Anything that can be automatically tested before you submit something for review should be, so you don't waste reviewer time with things that you could have already discovered to not work. 

  3. This is often an example of how unit tests aren't enough; while they are important, they don't replace practical use-case tests that show off the business logic.