workflow

Release Engineering

Why is release engineering a big deal? How does it help your developers instead of getting in their way?

Can't we just do Continuous Delivery?

No.

Continuous Delivery is about minimizing the lag between any development work being "done" and all customers being able to use the result. While there is significant value in thinking in those terms and figuring out what you're doing in that gap - actually eliminating the gap can often result in feeding your customers into a buzzsaw of poorly consider reactions to changing requirements, rather than handing them the polished results of well-considered design.

Teams that haven't worked in this mode before may find that there are manual review processes, hard-to-automate system tests, or even external legal approvals that currently occupy those gaps. These are not as unusual as it might seem - even a trivial website, if it's delivered to the government of Canada, must comply with the Official Languages Act and provide multiple translations, which you might not be able to update continuously along with other code changes.

Also, while ideally you should be able to run every possible test on every change, this is often not be possible in practice (and it may not be merely expensive.)

Examples (from past personal experience):

Some validation that a public search engine worked "at scale" involved spinning up $20k of cloud compute and feeding it synthetic load for at least half a day. (There was corporate enthusiasm for doing this before release rollouts - and for firefighting - but not for every change on every branch.)
Robot systems where low fidelity simulations were possible, but literally noone had convincing simulation of gripper "fingertips" against actual consumer packaging over wildly varying temperatures without actually doing it in physical space (and a specialized team that helped work out what items were worth testing in the face of particular types of change was an amazing asset to have - but again, they scaled to "releases and firefighting", not to every change.)
Sometimes you don't "own" the customers - you're delivering a service to someone else to integrate and they have their own constraints. A performance accelerator component delivered to a major global website got validated on their schedule, by installing it on staging hardware and feeding it a fraction of their real login load, then "turning the knob" until it handled a convincing amount (or didn't, and all traffic got reverted to the previous System - because our experiments didn't justify their downtime.)
Sometimes, while your customers want new features, they don't actually want the disruption, IT time, or data reprocessing effort that comes with an upgrade, and you've negotiated that they get quarterly releases but aren't actually required to take them more than annually. (Unlike modern consumers, these customers are actually paying you money so they do in fact have a say in this kind of thing.) Here you want as much pre-release testing as possible because you really don't want the reputation hit of having the product fail once the customer finally gets around to rolling it out.

What can we do, then?

The first pushback you'll get from adding any process at all will be "why are we even doing this?" It helps to start with principles, and examples of problems that are solved by sticking to those principles. (While it's also educational to look at disasters caused by failing to adhere to principles¹ that doesn't give much software-specific actionable guidance.)

Principle: Traceability (and Provenance)

Traceability is simply the idea that there should be no question about what you shipped - a product name and version number should be enough to get you back to all of your inputs. Even if you're building bespoke one-off things for individual customers - as often happens in a startup while it's still trying to converge on a product - you should still be able identify everything that goes in to each of those things. It can save you a lot of time to see that a customer complaint is on release X and see that release X+1 has a fix for it; you need clearly labeled releases to be able to do that. Even if you're delivering bespoke fixes, knowing what "future" release to extract them from makes this kind of traceability worth the trouble.

Keep in mind that you can get a lot of traceability "for free" from process automation - not that the automation itself is free, but it's a good way to forestall "excess creativity" in the build process, and then you can let that process to the labelling automatically.

Provenance is the rest of the story - for every input you use to build a product (third-party libraries or your own code, including data) you should know where those inputs come from - if you track a customer problem down to third party code, that doesn't absolve you of the problem, that just means you've narrowed down where to keep digging. Sometimes that means "knowing which support contract to invoke", sometimes it means "fixing the problem locally and posting the fix on their github", or other things in between; if you're using a component that's actively developed, you need to know what version you shipped so you can see if they've already fixed it upstream. (Ideally you want to be able to say the precise version as they publish it, but even knowing "latest, but downloaded on this date" can be workable.)

Provenance also lets you "connect the dots" between the public security announcements your customer is pestering you about - CVEs, USNs, RHSAs, etc. are generally quite specific about which version numbers a given advisory applies to. (If your software is public-facing you might find yourself having to provide these identifications - or if you have certain large customers, they may insist even if you're not otherwise obligated.)

Finally, if your company is being acquired or taking on significant investment, one part of modern Due Diligence is a License Audit - a review of your third party components to see if there are any license compliance issues. If that's in your future, it's a lot easier to produce that inventory if you just keep track of licenses as you add dependencies - at very least it gives you a hook to find problems before they become load-bearing. (At one company with a product with a large number of open-source dependencies, the board of directors wanted a regular report of what new licenses we had to worry about - this turned out to be easiest to integrate by just comparing our package inventories at release time and reporting on the difference.)

(See the Traceability article for more depth.)

Principle: Repeatability → Debuggability

Repeatability is just the idea that if you chose to build something once, you can revisit that and build it again. The most practical reason for this is that if you see problems in the field, you should be able to easily build the exact same fielded version with

extra debugging (though if you do this a lot, you should think about what quality level your product is actually at and maybe start shipping builds with a lot of debugging turned on by default)
small "trial" workarounds - while your fix should be based on an understanding of the problem, it can sometimes be worth seeing if a speculative fix works by trying it.

It should also be clear to everyone working on the project how "what you intended to build" turns into "what you actually shipped" - not necessarily how all of the mechanics work, but how to operate them.

(You should also do the "easy version" and actually keep copies of what got shipped - but that's usually not enough to support even basic debugging, unless you're actually shipping fully debug-capable builds to customers, which is rare.)

Principle: Delivery reflects Intent

There should be no question about what you shipped. While this sounds obvious, be aware that there are red flags like "didn't we fix that last release? Why is this customer still seeing it?" - if the problem was subtle in some way that your fix didn't capture, that's one thing, but if the code got written, reviewed, and landed, but still didn't end up in the next release, that should be a matter of great concern.

Relatedly, if you did fix something, and then two releases later it's broken again - that looks really bad from the customer's perspective and you should be putting more effort into preventing it. Usually that means "customer-visible fixes get more test coverage than would seem reasonable", but if the code "just got lost", that's another red flag that keeps your customer from taking your ability to meet your commitments seriously.

Principle: Automate software, not people

There is often pushback about doing serious automation around release engineering, like "it's not the product focus", "we can just have some checklists", "do we have to review and test the automation too". In practice, you don't need that much code (and the simpler you can keep it, the better.) Checklists² are critically important, but they're also hard for humans - anything you can reduce to mechanical automation will be more reliable and eliminate human tedium and fatigue (which lead to errors.)

That said, starting with a recipe or detailed checklist can help work out the initial process and figure out what parts are worth automating; they also make it easier to hand the chore around, especially on a small team that's reluctant to dedicate resources to process work. Ultimately you want the release process to be "boring" - it's not where you should be innovating, and you probably don't want to spend creativity on it either.

Principle: Release schedules support planning.

You need to get working software out the door to your customers. Software estimation is hard, and some things really are "researchy" enough that they're done when they're done - but as you accumulate experience in a particular problem domain and product environment you'll get an idea of what parts you can make predictions about. Predictions can help you allocate resources all along the delivery pipeline - testing, marketing, customer delivery and support, and even with the customer themselves.

It also helps with planning across subsystems. In the extreme, Mark Shuttleworth gave a PyCon 2010 Keynote about the "Release Cadence" that Ubuntu was using - and gently imposing on the open source community - you didn't have to pay attention to it, but related projects naturally synchronized around shipping features in time to hit the Ubuntu 6-month release cycle, or the 2-year "Long Term Support" cycle. This was especially helpful for foundational projects like Gnome and KDE to be able to say "we'll have these features by this target, so applications can start using them in the same target" instead of waiting for a later cycle.

This is the opposite of the model used by classic systems like XP "Extreme Programming" which still had a cadence - a release every 3 or 6 weeks - but the release was "only things that are completely ready" with no implied pressure to "fit" a deadline. The lack of pressure was considered important for quality (basically, "we will ship no code before its time") and while it did encourage having very clear terms for what it meant for code to be ready, it did nothing to help related projects plan for future deliveries.

Principle: Release versioning supports planning.

In addition to having a schedule, you can communicate (and commit to) more detail by properly "naming" (with version numbers) the software you release.

The best practice here is Semantic Versioning where you always have a three part version number - changing the major version for incompatible changes, the minor version for "new features that don't break compatibility", and a third "patch" version for compatible bug-fixes only. While this is completely clear (and mostly mechanical) for simple things like library dependencies, for an entire product you'll end up using it to communicate to people (customers) where it gets more complicated. You may choose to define support contracts based on these versions; is a major version change just a continuation of an existing contract, or a new purchase? How long can you run a given version before you are required to upgrade? Sometimes there's a "lighter" version where support is allowed to push back on customer complaints if there's a newer version available, and it's a matter of service quality whether you even try to confirm that the problem is fixed on the newer version.

Sometimes you'll have sales and marketing interest in how these are presented; for one interesting version of this, the first number was controlled by marketing and tied to high-effort customer presentations on all of the new things "System 4" would get them - even if they were already partly rolled out and customer-tested before that. (Engineering controlled the rest of the version number, primarily mechanically.)

Principle: Version what you Test

With all of the emphasis on versioning and support code - for my last three companies, the version of the data³ accompanying the code was just as important. Since we generally tested the latest data with the latest code, rather than a grid of code and data, we didn't actually need a different version number for the data, just enough information to trace what data was current for that version of the code. (Attaching direct references in version control, using git lfs or something like it, is an obvious way to handle this, but you'll need to watch out for the data getting updated and the cross-reference falling behind; it's often worth including some basic acceptance tests to defend against that.)

Conclusion

While a lot of release engineering looks from the outside like bookkeeping for bookkeeping's sake, it all comes from principles, which support avoiding problems (past and future.) There are arguments for and against - but "it takes time" usually isn't a good one; knowing the tradeoffs, and knowing what you're asking for, are a good way to show your team that using release engineering proceses is actually in their interests.

For example, any NTSB or USCSB investigation but specifically the Containership Dali taking out a bridge because a poorly installed label interfered with a single wire connection is practically begging for a certain kind of attention to detail - might be a bit much for a typical software project, though. ↩
Worth reading "The Checklist Manifesto", Atul Gawande, if you have any repeated process that needs to be done right; the book is far more significant than it sounds. ↩
Data in this case meant bulk machine learning models and map data, product-specific but not generally customer-specific. ↩

2026-02-05

Essays: Release Engineering

Tags: Pretty Good Practices workflow

Provenance

"Provenance" is a term more common in the art world, where it means the origin or source of a thing - and can more specifically mean the history of ownership, as distinct from the thing itself. In art, provenance is a mixture of giving credit to the artist and imputing value to the artifact (the difference in value between a true original and a perfect forgery is bounded entirely by the provenance of the piece.) In software, the primary use of provenance is in software licensing with some secondary value in identifying security issues (CVEs).

Open Source

Open Source software specifically has licenses (usually from a well-known set) that allow a developer to use them within some generous constraints. On one end of the generosity scale (within Open Source) you have BSD and MIT which are basically "do whatever, just don't file off the serial numbers and claim it as yours" to messier licenses like LGPL where you have to enable the end-user to be able to update components of the deliverable; some of these licenses require you clearly identify any changes you've made. Whatever actions they require, the important thing is that you be able to identify (for things that you ship¹) what bits have what licenses, and that you're conforming to them.

Proprietary

Proprietary licenses are usually much more restricted in specific ways², since they're often direct contracts and don't rely on copyright. Since they are bespoke, you don't have the shortcut of finding interpretations from the community for what you can do - either you need to do a close reading with your desired use in mind, or to come up with formal questions and ask your own IP lawyer about them.³

Again, the important thing for you to understand is where the pieces of your code come from and what licenses apply to them, and what actions those licenses take.

The Code Itself

While licenses and ownership are important, you also need to be able trace where the actual code comes from. Sometimes that's just a matter of being able to map out your software supply chain risks, but it's also useful when debugging - not only can you confirm you have the correct upstream code, often the upstream bug tracking system will be adjacent to the code repository and you can find out if someone has already reported something related to your issue, possibly even a proposed solution branch. Ubuntu Launchpad acts as a "switchboard" for this, with links to further-upstream sources and ticketing systems (including CVEs) and some information about tracking the progress of fixes within Ubuntu's own releases - especially useful if you're trying to decide about building a local version of a package or waiting for it to show up as an official "backport", Launchpad will have public policy discussion and decisions for each package, as well as being a place to vote on impact or contribute your own test results.

Stackoverflow

There are a number of online sources of problem-solving code snippets to read and learn from. If they don't have an explicit license, that's all you can do - the author has copyright by default. Very popular systems like stackoverflow do have specific licenses - in this case, part of the Creative Commons, with requirements for including credit and not restricting further copying.

In practice, any given stack overflow answer isn't an exact fit anyway, so a pretty good practice is to learn from it, confirm it with more specific documentation (one of the best results of reading stackoverflow answers is that they get you to say "oh that's what that man page meant!"), apply that new understanding locally, and leave a note (in a comment or version control commit log) about where you got the idea - not for compliance, but so the next time someone comes along to debug that bit of code, they can look and see if the original answer has been updated or refined, so you don't have to rediscover it.

Mechanisms

Debian/Ubuntu copyright and license information

Debian packaging is a combination of tooling and policy/discipline. One important piece of this is that every installed package contains a /usr/share/doc/$package/copyright file with information about upstream source, licenses, contributors, and copyright holders. Though it was originally freeform, about 80% of packages in a typical install use a strictly parseable format that is easy to build auditing tools around.

If you're starting from scratch, you'll probably want to do a bulk "look at every package" review; given your specific requirements, you can come up with a few simple questions and have your whole team chip in on reviewing. After that, you should simply have a license review every time you add a new package - for the package itself and the new dependencies it brings in. (You want to keep this simple and do it early to reduce sunk costs in building on top of something with an inappropriate license - or even with a "probably but not quite sure" license that needs lawyer time.) One traditional way of handling this is to have a short list of licenses that are "automatically OK for dependencies" - MIT, BSD, ISC, and Apache 2 usually head the list - with others requiring escalation to someone who can talk to (and spend money on) the company's IP counsel, or take responsibility on behalf of the company for using that license. (You'll quickly find engineers taking the initiative to choose alternative solutions that fall in the approved license list just so they can keep going and not get blocked on license review; depending on what problem space you're working with, that might actually be a reasonable accidental result.)

This author has sucessfully used the debian copyright files to satisfy acquirer and investor due-diligence, mostly of the form "is there any use of the GPL that would require releasing our proprietary source code, and thus reducing its value", for two startups. One of the acquirers was impressed at getting an obviously-exhaustive spreadsheet of dependencies and license categories, instead of the "umm, we have some code?" that they'd gotten from previous startups, saving months of review. Open source dependency management is sufficiently mainstream these days that there are companies⁴ that will do this auditing as a paid service, usually with value-added features like notifying you when security advisories are issued for anything in your "software supply chain".

Debian/Ubuntu file-by-file packaging information

The entire point of a packaging system is that you know exactly what files in your install come from what packages; a big part of this is just that you need to know where they are so you can remove them cleanly - a step that "just run make install" leaves out - but it also means that if you find a problem in a particular file at the operating system level, you can trace it all the way back to the author, even if it isn't your code.

Other distributions

Redhat's rpm (also used by SUSE) has a parallel history with Debian's dpkg; while it has made different choices it has support for the mechanical aspects of provenance tracing.

Language-specific packaging

NPM (for Javascript), Cargo (rust), CPAN (perl), PyPI (python), Gems (ruby) are all community packaging systems oriented around a specific programming language, rather than an operating system. This means they focus more on multiplatform support and community openness, but often less on package quality or malware prevention⁵. The upside is that peer pressure (and ease of copying) usually means that one license will usually dominate - for example, most NPM packages are MIT licensed, with ISC and BSD filling in most of the rest.

(Go is a little odd in that while it has a packaging mechanism the format is "a URL pointing at a git repository", making Github into the accidental package repository for the industry; this means it completely lacks any kind of policy enforcement beyond "does the package actually build". You probably want to at least add a layer of your own git mirrors to have a little bit of control, here, even if you'd use upstream package repositories for other languages.)

You'll sometimes hear that language-specific package system for C and C++ and is dpkg (or rpm). That's got a kernel of truth (in that nothing else is) but those also have tools for automatically extracting metadata and build options from language-based packages directly into operating-system-based packages, so they really are a "higher level" system. They also have more policy information - the Debian Policy Manual has details on handling packaging subtleties and upgrade issues that have come up across a thirty year lifespan, that most language-specific packaging can't handle (arguably they don't need to, if you're regularly doing clean builds into freshly created environments, but that's a relatively recent innovation.)

Conclusion

If your product is an entire platform shipped on an operating system, you'll probably need all of this - but even if you're working with simpler architectures (microservices on AWS, web-frontend-only tooling) you're still likely to need to figure out what you're building on top of - even if you're not aiming for acquisition, you'll still need to be able to identify what upstream security vulnerablities have put you at risk or at least have cause you to need upgrades. Provenance - knowing where all of your pieces come from - is just a matter of having (and keeping) the data you need to keep track of it all; this doesn't require a lot of new work, just a little discipline.

Key to copyright is copy - you can generally screw around as much as you want locally/personally, copyright-based licenses don't even apply until you give a thing to someone else. ↩
Sometimes they are only restricted in those specific ways and otherwise are very broad; one commercial linux IDE licenses individual people but the software can be installed anywhere at all - so we could ship a pseudo-embedded system with that IDE as one of the local editors, and as long as the individual developer had a license for it, they could run it on any fielded system. In another case, we could use a set of embedded libraries from a CPU-vendor SDK and ship those object files anywhere - but we weren't allowed to run them on similar but competing chips (again, not complicated, just very specific.) ↩
In some circumstances you want to talk to an IP lawyer about your open source use too - but it's 2025, if they have any experience in software licensing at all this should be a relatively short conversation because (for example) the GPL is older than they are and they have no excuse for not already being familiar with it. ↩
Examples include Black Duck which has been around for ages; Snyk has more recently joined the field; their services are shaped differently but they can both provide the kind of "open source audit" that investors are looking for. ↩
Aside from simply "selling off" packages, NPM and PyPI (by virtue of being popular and thus high-value targets, not through any particular flaws) have long been victims of typo-squatting, and more recently "vibe-squatting" attacks on similarly-named packages. ↩

2026-02-05

Essays: Provenance

Tags: Pretty Good Practices workflow

Code Review

This essay is part of a series on "Pretty Good Practices", aimed at development teams that would like to improve their craft, and thus improve their results.

Code Review is the process of having peer developers look at changes early, before they become part of the product, in order to try and find problems before customers do. (This is narrower than "Change Control" in that parts of it are very specific to software, rather than hardware or content.) The broadest modern version of this is Given enough eyeballs, all bugs are shallow but on a more conventionally sized development team you can expect most of the benefit from the first couple of reviewers.¹

Code Review is not a new concept - it has roots as far back as IBM in the 1970's - though it is modern enough that it's not part of the "surgical team" model from The Mythical Man Month. It didn't really get popular (outside of environments where it was legally mandated) until free tools like Rietveld (2008) and Gerrit (2009) started becoming available to particularly motivated teams. The popularity of the code review process within large (FAANG) organizations was a source of inspiration for the industry at large, even if the full-blown tools themselves weren't available. (The Github "Reviews" feature wasn't generally available until the end of 2016.)

Before the tooling made the practice a widespread default, a lot of review only happened after problems were found - even at places where senior engineers read all of the changes that went by, looking for red flags, changes were often "integrated by default" and complex changes could get insufficient attention because there wasn't an enforced time budget. The popular exception, and one of the big "examples of what could be done but you're not realistically doing it" was Linux itself - changes went out over email and were only integrated when Linus (or later, his "lieutenants") accepted them, in a refreshingly public process. Git, Bitkeeper, and other tooling adapted to this, rather than supporting or encouraging it, and the process has remained consistent over a surprisingly long timeline.

It's 2025, not 1995 - doesn't everyone already do this?

Unfortunately, no. In the extreme case, you have your brilliant technical founder who has pumped out a bunch of code that impresses the seed investors or an initial customer or two - and works far better than it has any right to - but now you're expanding the team and adding the first people for whom software is a Real Thing and not just some details that distract from the thing they're actually brilliant about. After all, startups are rarely interesting in a purely software way - instead, software provides some novel leverage in an existing space, and serves the problem rather than being the problem.

This isn't unique to startups - sometimes a new project "falls out" of some other field - maybe a mechanical engineer coded up some automation, maybe an admin intern got lucky with getting a Fiverr contractor or vibe coder to build something - and then existing developers get stuck with a "demo that a bunch of people already rely on" dumped in their laps.² Rarely do you jump into a project in a new space that followed established software engineering principles from the start - instead, you usually stumble into the new space and have to retrofit software engineering principles onto it.

Whatever path the software takes as it matures, it usually progresses from "this did something useful on one computer, at least once" to "multiple people need to understand what's going on here well enough to improve it further". This is the ultimate point of a bunch of software engineering techniques and tools, even if it often isn't expressed that way.

So what are they doing instead?

I've been involved in projects where

the code was just some files, version control was a New Thing³
the code was checked in to a free private account, but everyone was working on the same problem on the main branch and branching wasn't really a thing⁴
"this function works fine if I type it in but not if I put it in a file"⁵

... and many other points on a spectrum of how much the craft of software development was taken seriously. (Not out of disdain or disrespect, just inexperience or lack of exposure to Real Software Practices.) While these examples are primarily about version control, it turns out that reasonably sophisticated version control is what code review is built on top of - both mechanically (review tooling itself) and in terms of process (the review acts as a point of gatekeeping, so you need strict definitions of what the change is being applied to.)

Why Review

We've now established that a world without code review is a bit of a jungle, where software development is possible but more through heroism than technique. Why is code review part of this complete breakfast?

The primary motivation is just getting more eyes on the code you're producing, early in the development process. The obvious benefit is giving more than one developer a chance to say "that's odd" or "what's up with that" - but it's also a chance for the team members who, in the future, are going to be fixing or updating this code, to check on how readable (and thus, maintainable) it is, as long as "this was difficult to figure out" is feedback that you'll actually use to improve the code.⁶ To put it another way: review time is a chance to move "reviewability" issues further up the funnel, so you're not discovering mid-crisis that noone can figure out what the code is doing.

It also gets more eyes on the requirements and gives you another chance to catch a misinterpretation of what was intended, where the code appears perfectly good but is doing a different thing. Clearly you want to have already found such things sooner than this but it's better than finding them later.

It's a chance to emphasize team effort. "Peers pointing out details to each other from our coding standards" gets a lot less reflexive resistance than "coding standards imposed from On High" does, even if you have senior developers driving the standards effort.

It's also a chance to spread historical context from developers who have had longer exposure to the code base - though that can reasonably turn into "that's good to know, can we get it into developer documentation"?⁷⁸ This is a useful question to prime junior reviewers to ask, when someone responds with a detailed explanation - "great, what documentation should I have found that answer in, and can we put it there" recognizes the value of the explanation while also taking the opportunity to move the comprehensibility of the code base forward.

Ultimately, code review is an early demonstration that the code successfully communicates with other developers. You really don't want whoever makes the next change six months down the line to go "what were you thinking" - especially if that ends up being you because six months plenty of time to turn it into "what was I thinking"; it's just much easier to test for that with someone else's eyes while there's still a chance to clarify things.

Review of Public Code

This discussion is mainly about in-team review of proprietary code. If you have a tightly-knit development team for an open source project, it would also apply there, but open source has some additional complexity: one of your sources of code to review is "complete strangers from the Internet." Nurturing a community around a piece of open source code tends to require generosity or at least starting from a non-adversarial stance - but at the same time, some contributors are malicious and you need diligence (and process!) to defend yourself from them.

It's a difficult line to walk, and all I'm trying to do here is acknowledge the difficulty in the hope that you also acknowledge it. It's worth familiarizing yourself with the XZ Utils backdoor (also known as CVE-2024-3094) which is a combination of supply chain threats, the difficulty of dependency proliferation (even with reasonable dependency management) and the kind of things your review process might catch.

This isn't advice on running a public project at scale - dealing with thousands of developers from a starting point of what is honestly outright paranoia isn't healthy or productive. Instead, I'd suggest looking at things like curl's contributor guide for what a good "public face" looks like, and how you can still strive for quality with many strangers instead of a closely knit team - commercial development almost never has the structures (or need) to wrangle that many contributors, but it's still a useful perspective, and you'll find many good ideas on how to treat other developers humanely (that you wouldn't necessarily come up with on your own for coworkers.)

How Code Review Works

Mechanically, code review is part of the process of making changes to code⁹. You start with what you believe is a working codebase, take a proposed change, and try to analyze the change to determine if it

leaves you with a working codebase afterwards
actually achieves the goal it was intended to accomplish.

This means you need reviewers to understand what "working codebase" means - "does it conform to our coding standards" contributes to that, and having coding standards include things like "we assume that code that isn't tested doesn't actually work" is a particular strong one - but ultimately a reviewer needs to know what is changing and understand what the change is actually doing.

They also need to understand the intent of the change - not just as expressed in the code, but expressed externally by what the planned change was - which means you need some shared system that expresses those plans. How you do that is a project management question and out of scope of this article, but it should be clear that you have to have some record, even if it's a simple prose explanation at the top of the review request.

This is somewhat biased towards small changes, but that's the way to make them comprehensible - large changes can generally be broken down (and large mechanical changes, like coding-style consistency improvements, can be made more reviewable by keeping them isolated - it's easier for a reviewer to convincingly affirm that a change is, as intended, only mechanical, and thus has much lower risk to the codebase because it really isn't changing anything else.)

If you're reviewing an entire codebase, perhaps because your team is seeing it for the first time and needs to bring it up to standard so that they can start doing new work on it, you likely want to instead do an audit which needs a different approach.

Gatekeeping

Review is implicitly a gatekeeping action - "yes, this can go in" or "not yet, these improvements are needed." (In a functioning team you rarely need an outright rejection - that should be handled at the tasking level - any developer should be able to start from a position of confidence that the change is actually desired, code review does not substitute for team management.) That means it needs to happen at a point in the process where it can actually be used for that, though the details depend on your branching strategy.

Feature Delivery Cadence

Unless you're doing early "exploratory" development, you probably have some sort of time pressure¹⁰ for delivering features. Since review is in between creating features and landing them, deadlines put certain pressures on the review process.

For reviewers, it's important not to delay review until too close to the deadline - we've already established that you want to do it promptly while things are still top-of-mind for the developer, but you also want to allow the developer to time to turn around thoughtful responses to feedback. It's also important to point out right away if you have any planned reasons that you will not be available to perform your review duties, like vacations, to allow the developer (or team lead) to find an alternate reviewer.

For developers, once your team has a normal review cadence established, it will become important to explicitly communicate about unusual review deadlines that legitimately need faster response (this should be rare, and is not just "hurry up so I can move on to the next project".)

Complex projects that are expected to interfere with other development (particularly by making things hard to merge) such as "updating to a new coding standard all-at-once" or "interface renaming" or "upgrading a major non-compatible dependency" need to be coordinated across the team, but you want to avoid having everyone stop work during the process. One approach is to do the work on a branch, aiming for a point where most other work can synchronize so there are few outstanding branches, then do a "final burst" of pulling those changes back in to the complex branch and reviewing and landing it, before freeing everyone up to continue.

Even simple features can collide, so while you don't usually want everyone on the team keeping track of everything that's going on, it's still valuable for anyone who's about to land something to know what else recently landed, to help understand where to look first for unusual behaviours. (The better your testing, the less of a problem this is!)

Feature Branches

One common strategy is the use of "feature branches" - you manage changes to the code base in units of features, and each new feature starts by branching the main codebase at a specific point (basically a managed copy, so you don't have to deal with competing changes from other developers in an uncontrolled manner. After all, they'll also be working on feature branches.)

In the straightforward case, you've implemented your new code on this branch, you've added new tests and they pass, existing tests also pass, and it's ready to review. The mechanics of setting up the review will usually involve selecting users and providing a description of the change (which may already be available as part of a higher level ticketing system that drives all of this at the project management level - it's still worth adding specific implementation commentary here, to aim your reviewers in the right direction.)

The primary thing keeping things from being straightforward is when some other feature branch has landed and has overlapping work with yours. (In the most obvious cases this will show up as conflicting code directly in the version control system¹¹ but it's more likely to be something like an API or calling convention change where if you're lucky, tests will catch it.) If there's any other work going on in parallel, you'll want to do an update from master immediately before review, so the reviewers are seeing what's actually landing (rather than what you started from) and that you can show that "if you hit Merge right now, the tests would pass" because your branch matches what the post-merge main line would be. The primary defense here is communication, rather than tooling - your team members should have at least a rough idea of what related work is going on, and if the project is too big for that, it needs to be broken up into smaller components with boundaries/contracts that make it possible to have enough context most of the time.

Depending on how long your feature work takes, other features may land that change areas you rely on - in some cases, even features that you need to finish your work. This isn't just about avoiding merge conflicts at the end¹², it's about work that you specifically need to absorb into your feature branch to continue making progress. Ideally, for things like this you'd also be a reviewer on the other changes as part of making sure they support what you're trying to do. This also applies pressure towards keeping features isolated and small - or at least breaking larger features into smaller components.

(As your team gets bigger and there's more churn, you may have to do this again after approvals; you may need to go as far as having a merge schedule, especially as you're approaching a formal release point where you want the option of delaying a feature rather than delaying a release, based on what external commitments are driving your deadlines.)

Multi-layer feature branches

When you are programming in a more exploratory mode (particularly common when bug fixing) you might not know at the start how the changes will break down; one feature might need another feature, or cleaning up some technical debt, or some unanticipated restructuring - the change may just be shaped such that breaking it into several pieces simplifies the review - "adding a system-wide feature" and "using the new system-wide feature" have different kinds of complexity and possibly different reviewer audiences. Perhaps you're the first one touching some old code and need to do a "cosmetic changes only" whitespace cleanup or style pass separate from your own less neutral changes.

There are a few approaches to this. One is to just split the additional work off as new blocking features, so that they start from trunk as their own independent feature branches; then you can just update your branch once they land (or depending on the version control system, you can repoint your branch at them until they land, then point it back at the main branch.) You can describe them as blocking in the review description, but it's an entirely social construct; you're peeling off some self-contained work that you'll use when it lands on the main line.

Slightly more advanced is to develop the "sub-feature" as an independent feature but pull the development branch into your primary feature branch as it evolves. This lets you use it as it develops, but leaves your branch with all of the complexity from both parts, until the sub-feature gets merged and it "disappears" from your feature branch relative to the main line.

Most expressive (but also usually requiring the most sophisticated use of the version control system) is the version where you start with the same feature/sub-feature branches, but "repoint" your feature branch to be based on the sub-feature. This means that your feature branch doesn't reference the main line directly, just indirectly "through" the sub-feature; likewise the mechanics of merging the sub-feature and then "losing" it (until you re-point your branch to the main line post-sub-feature-merge) have more moving parts. It does let the version control system represent more accurately the way you're actually thinking about the code; it can also reduce confusion about which changes are part of the sub-feature vs. the primary feature, though if that confusion is likely this might not be the right way to split them up.

Choice of Reviewers

Depending on the size of your team and the distribution of skills, you may have a bunch of options on reviewers. At the simple end of the spectrum, you may only have three developers and one worked on the branch, so your policy will basically be "one other reviewer" (or depending on your growth plans, start with a two-reviewer policy even if that means "everybody".) "Minimum two-reviewer signoff" is a popular starting policy - at worst, it's inspired by the two-key launch authorization policy; at best, it is an opportunity for reviewer mentoring, or for getting review coverage from different perspectives or areas of experience.

If you have a variety of experience levels (or areas) on your team, you might instead tag reviewers with particular areas of specialization - "John needs be a reviewer for any embedded code", "Mark needs to sign off on anything with security implications". Ultimately you don't want to block on any single reviewer (people take vacations, after all!) so even if you start out with specialist reviewers you should consider pairing them with other developers specifically tasked with "levelling up" in those areas.

Another way to handle specialist reviewers is to have generally distributed "required" reviewers and add the specialists as "optional" reviewers, who are expected to chime in on anything specific to their expertise but aren't necessarily expected to review the entire change. (Review tooling rarely has good support for "partial" review like this, but policy can be enough.) You do have to watch out for getting specialist reviewers "stuck in a rut" where an area of the project can only be worked on and reviewed by two or three people - there's a tendency to develop tunnel vision or even shared but unrecorded knowledge of how the code works, which reduces the effectiveness of the review, especially the quality of the "can future readers comprehend this" measurement.

Depending on how explicit you are about scheduling your team's time, another option is to have a handful of reviewers on a change but count the first two approvals as enough. (If you do this, you generally want formal policy on reviewer response time, otherwise you will run into unfair effort-sharing issues that will harken back to college team projects and chore sharing - it can still work, it's just likely to cause people-management issues comparable to just being explicit about the assignments in the first place.) If your team is already comfortable with a top-down management hierarchy, then you can just assign reviewers. If your team is more self-organizing and team leadership is more about reducing friction and interfacing with other systems in the company, then review assignments can also be self organizing (though you'll want to keep an eye on spreading review experience and helping keep reviewers from getting too specialized.)

What Should Reviewers Do

The main thing a reviewer is expected to do is understand the change that's being made. Given that understanding, it is an oversimplification to say that they then need to "approve" or "reject" it - your team needs to have an understanding of what that means in the context of your project.

The main thing to do with the understanding is to compare it to what the change is supposed to be doing. This usually comes from your project management tooling, and should have been established before even starting the feature branch. It should also be reflected in any internal project documentation - which would ideally also be managed in the feature branch, but might be in a parallel external system instead (if it is, that should be part of the scope of the review as well.)

After the basic "it does what needed doing" review, the next question is "does it do it well". Does it meets your team's standards? Does it use your preferred coding idioms? Does it include sufficient testing (and does the review request include evidence of those tests working?) How about related documentation? If it has a UI component, have the translations been updated? Does it meet accessibility standards? If the change incorporates code from elsewhere, have the licenses and CVE history (or other security reputation) been reviewed? If user data is involved, is the GDPR applicable? If the code involves new network connections, has there been an "attack surface" review, and has deployment documentation been updated?

If enough of these apply, you probably need a formal checklist - with as much automation as you can arrange. (One team built a literal can-i-merge tool that did a bunch of automated tests and quality checks on a trial merge, which was great except it took over half an hour to run, and still covered less than half of the suggestions above.) In some organizations this is (somewhat mistakenly) called "CI" tooling; it is integration-adjacent, but unless you explicitly manage a sequence of merges (which would let you test each feature as it is about to merge) you end up needing effectively simultaneous pipelines for each of your developers.

What Should Reviewers NOT Do

One important reason to formalize coding standards (even if it's a simple as "Run black on all python code") is that experienced developers will often have preferences that are not written down, "I know it when I see it", or just biases, and it's unfair to other developers to get review push back for things that there's no reason for them to have known. Put another way, "I wouldn't have written it in that particular way" is not a valid reviewer comment; if the reviewer actually has concrete beliefs behind the opinion, they get to write them down and build consensus around them first. Critique isn't an effective way to get elegance - you need to actually teach it.

There will also be issues around "how much additional testing can a reviewer insist on". One common dividing line is "if we have a test like that, insisting on using that technique in this area is fine; if we have to invent a new way to test this thing, there's a reasonable discussion to have about building that additional test infrastructure as a separate task" (which may still block the higher level feature.)

Reviewers should also not need to be running tests - the developer should be running them all along during their development, locally (as long as they include some convincing record of the test run) or as part of per-branch CI (which can just be a link to a result page.) Developers shouldn't be submitting changes that don't pass, the reviewers should be looking at "do these new tests represent the feature well, do they cover the interesting parts of the change" and not "do they actually pass if run". (If you're very early in adopting the testing that supports this kind of review, it may make sense for reviewers to run the test as a cross-check that you're getting reproducibility right but it should be as additional checks of your process, not a replacement for developer-reported test results.)

Coding Standards as Crystallized Arguments

Coding standards are simultaneously trivial and vital. The common feature of most coding standards is that they apply to things where there isn't an obviously superior choice - or at least there's enough ambiguity that your team is wasting time arguing about it. At the same time, developers typically have the kind of independence and self-perceived authority that they tend to reflexively reject arbitrary choices handed down from on high. For new team members, having a standard to point at reduces friction and simplifies integration with the team - and if it seems arbitrary to them, you can point out that there was an argument phase and arguments got distilled down into the final resulting policy. It may help to have developers sign off on the standards explicitly, so you can show new team members "look, this standard has five of your peers behind it already, do you have a strong enough case that you think would convince all of them to change their minds?" Being able to point to past efforts to reach consensus helps it looks less authoritarian, and makes it easier to convince newer team members that they'll be included in similar decisions in the future.

For current members when you're trying to establish a standard, it gives them a time and place to bring up their opinions and preferences, with the understanding that once you have a decision, it's time to move on to more important things.¹³

(It isn't necessary to completely suppress further discussion of the topic - just to get buy-in on the idea that the status quo is based on battles already fought and there needs to be significant new information to revisit the question.)

Mechanically, it's also good to produce worked examples of what the standard actually means - your project isn't public enough or big enough to have it's own StackOverflow section, but an internal library of examples is a great introduction to how to apply a standard. Ideally these examples will look more like "use this interface" or "configure your editor this way" than "copy this boilerplate" - you will find a lot less resistance to standards if you put effort into making it straightforward to conform to them.

Writing for Review

One reason to have everyone on the team involved in review is that it gives them perspective from "the other side" on what kind of things are harder or easier to review.

The common pressure here is towards minimalism - smaller changes are easier to review, and well-scoped changes - where everything that changed is clearly and directly related to the problem at hand - are more reviewable even when they are larger. This pressure is strong enough that you'll need some explicit back pressure on things like "you actually need more tests than that" or "this really does need to be better commented or simpler" or "yes, that's very Clever, but no one will appreciate that when they look at it again six months from now."

In particular, this gets in the way of "casually" cleaning up technical debt, even if said cleanup might make the feature change simpler. The sensible thing to do is to acknowledge those explicitly and turn those cleanups into tasks that block the features - in theory you can show that these paths get the features done sooner than without the cleanup, but that is a significant communication challenge (a worthwhile one - but systems like Agile, for example, make this particularly benefit harder to express in conventional backlog management, especially if you discover the cleanup opportunity while doing the feature work.)

Depending on the history (and "maturity") of the code base, you can get some value out of "deliberate scope creep" - team-wide recognition that when a new feature exposes the need for broader changes to enable continuing work and reduce technical debt, it's worthwhile to take the opportunity to make time for the broader change as part of "paying down" the technical debt, or "recognizing that the technical debt has come due". This is basically another explicit anti-minimalism force - and you can develop a consensus in the team about how much pain technical debt is causing vs. tightening up feature work.

Keep Function-Neutral Changes Separate

As an example of how trivial details can cause disproportionate amounts of friction: ideally you'd start with a common coding style and never have to transition. And by "ideally" I mean "that's never actually happened" - so you're going to have to make the existing code base conform. Ideally you'd do that all-at-once, do a testing pass, and be done with it, but, well, "ideally" - in practice you're going to not make a global change because you don't trust your testing enough, even in the face of a self-evidently neutral change. Instead, you're going to encourage people to make these style changes when they touch old code for the first time - and then their branches will be cluttered with style work (which often touches most of the file) and end up harder to reason about and review, possibly causing more friction than giving up and not changing the style in the first place would.

You can compensate for that by requiring that neutral changes get their own reviewed branches, where you only change style and nothing else; you need to do these before your real changes, so you probably also want to fast-track them. You can do that reasonably safely - the reviewer isn't trying to confirm correctness or requirements, just "this really is only changing style and does nothing to what the code does, and is a neutral change" which only really requires language familiarity/competence and not deep familiarity with that part of the codebase. This also means that you can safely have junior devs do the review, and you can have senior devs do them rapidly without derailing them much. (You may be able to encourage senior devs to take on these reviews because they have a better appreciation for the benefits of the change, and may have even driven the adoption of the policy in the first place.)

Neutral changes do still have intent - primarily, to make the code easier to work on (by becoming consistent with more modern parts of the code base, and aligning with the standards the team has come to expect) but also to make those changes complete so you don't need revisit them again in the same area. It's just that all changes of this sort also share a common intent - "don't change the behaviour of the code" - that is unlike any feature change. This does make these kind of reviews simpler, since there's nothing to interpret.

With some languages and environments you can use tooling to prove that the changes are neutral - for example, in python you can use the ast module to turn the before and after versions of the file into an abstract syntax tree and show that those haven't changed. Not a perfect test, but for this kind of thing, a fast cheap test that leaves you with a couple of bits that need manual explanation gives a better return on investment than full scale tooling - if you're going to build that much tooling you can probably build the tool that makes the style change directly, which you can then review/audit and then trust directly. You probably don't want to do go too far down that path unless you're product already has you building software-modification tools, though.

Social aspects of Code Review

Code Review is ultimately a structured form of communication among developers, which means that there are ways to explicitly manage that structure.

Senior reviewer vs Junior

Specifically encourage juniors¹⁴ to ask about testing to get them engaged; it can serve as a "safe" question that they can ask when reviewing code from senior devs, since all of you are trying to reinforce a culture in which everyone is expected to respect the question wherever it comes from (and you don't need that much familiarity with the code base to be able to tell if tests are present.) Also, a test suite often serves double-duty by explaining the code from a different angle - the test calls are also "worked examples" of the interface.

Another way to keep juniors from falling into "this code is from a senior developer, I barely understand it, it must be good" trap is to encourage the idea that a review is a reasonable place for asking good questions about the code. The review process should still emphasize finding problems, but readability and clarity are core components of code being reviewable in the first place. A reviewer who is still becoming familiar with the code can often have clearer judgement of the quality of an explanation, since they felt the need for it in the first place.

In the other direction - remind seniors that the clever idea they came up with during the review is not usually superior enough to derail the review. It's one thing if the code under review is wrong - even if that's merely of the form "we have a tested and working API over here that already does this, can you use it instead of duplicating it here?" - but if it's just a matter of style, and the style isn't codified and agreed on by the whole team, then that's not necessarily enough to push for it. (Using this to launch a conversation about codifying that detail as a shared standard is fine - but it probably belongs in some other forum, not this particular feature review.) An explicit mentoring relationship can change where this bar is - but then the discussion can and should happen in private pre-review, not in "public" (in front of the whole team.)

Tone

Whether or not you're a stereotypical "cranky developer" it's worth considering tone extra carefully in review discussions - because being judgmental about the code is part of the task, it's easy to slip over the line into being judgmental about the developer which is harmful to team relationships.

It's also useful to consider that review comments are usually preserved, not just the code - and if they aren't, that's a tooling issue that's worth fixing - when doing "archaeology" (or "spelunking") on the code, especially when debugging a regression, it's pretty valuable to be able to resurface the discussions around when a problematic change was made. Did this weird case come up at the time, and was dismissed? Great, we were almost sufficiently foresightful, maybe recalibrate that reflex a little - especially if "it would be inconvenient to write a test for this" was in the arguments, turning it into a concrete "Test Better!" argument can be valuable. Or maybe noone thought of it - possibly that means you need to "review a little harder", or maybe it was Just That Strange; either point is interesting to include in a post-mortem.

Interactivity

Respond promptly, in both directions. The whole point is keeping things "early in the funnel" while the developer still has things in their head and can answer questions; at the same time, it's important for the developer to come back with either "good question, here's why that's not a problem" or "good question, it's going to take me a day or two to improve this and have you look at it again." Most code review systems don't have good affordances for this, so encourage explicit communication that makes it clear whose court the ball is in.

You want your team to understand that reviewing the work of others is as important as their own work - especially if you care about feature "velocity". While this does require balance, review is happening at the point were something is "almost done" and giving it prompt attention means that it can land that much sooner, and you can both move on to the next thing (or return to your current thing) more quickly.

One of the workflow challenges in being responsive is that it means review is an interruption - not as severe as being on pager duty, but still something difficult to plan your time around as a developer. One conventional approach is to just observe how much time you spend on review and adjust the amount of review you take on based on that - estimating review time has some of the same problems as estimating software in the first place, but you can generally develop the ability. This is one flaw with the common approach of assigning reviewers at review time - the reviewer can't plan as much as they might like. If you have a review cadence, being explicit about expecting developers to be spending a standard amount of time on review at the end of the cycle can help by providing some structure. If you have particularly specialized reviewers, making sure they know up front (when a new feature is started) that a change is coming up in their specialty can also help - even if they aren't explicitly engaged with the work (though if you can get early design feedback from them, that's probably a good tradeoff against doing their own work in that area) just having a "heads up" can help them keep context in mind and make jumping in to review a little bit less disruptive.

Another approach (sometimes inspired by having a "review problem") is to actually schedule reviews as meetings. This makes the time more explicit and high-bandwidth, since the reviewers and the developers are mostly prevented from engaging in other distractions; it can backfire if the reviewers quickly turn up a handful of things that need developer work before continuing, especially if there's pressure to Fix Them Right Now during the meeting. It's important to be aware that "coding with people looking over your shoulder" is a distinct skill that's rarely explicitly developed - it's more of a teaching or rehearsed presentation activity - and unless it's already a specific mentoring situation, you'll usually get better results from letting the developer go off and actually think about the revisions, particularly if the review has turned up additional (hopefully small) problems to solve. It's also important to understand that meetings have a lot of overhead regardless of what they're for - the "review meeting" approach won't be efficient but it might serve to improve shared alignment, or to focus extra attention on particularly critical cases where the expense is justified. (There's also likely to be a lot less resistance to additional meetings if everyone is clear that the goal is to stop having them once communication has improved.)

The Review Lifecycle

The review process doesn't end with the first round of feedback - while it's important to converge quickly, there's still a step of making changes based on feedback and then getting those reviewed. Part of it is making sure the developers involved can communicate efficiently - but part of that communication is the changes to the code itself.

As the team and the code matures (and practices like Writing for Review take hold) you may find yourself actually needing very few changes in response to the initial review feedback. This goes much more smoothly if you can see that those changes are incremental. Most version control systems support this in straight-line development, but among modern git users there is a popular development pattern of "cleaning up" your changeset using rebase and force push. Effectively, this makes the changes look like the final form is what you intended all along - on the one hand, this is rewriting or "retconning"¹⁵ history; on the other hand, it does mean that when you look back at the code there isn't any distraction from the incremental problem solving, and only the final reviewed change is preserved as history. When applying this to a review, this creates an immediate problem, though - the reviewers have to look at the whole changeset from scratch, instead of just seeing the incremental change as incremental. Not only is it inconvenient, it leads to lower quality review since "oh I've seen that bit before, it was ok last time" is a far less rigorous (and possibly untrue) judgment than "that code is literally unchanged because this incremental diff doesn't touch it".

There are ways to mitigate this¹⁶ with tooling improvements but the simpler approach is to just declare that clean up and force push in particular are only allowed on private branches, never shared branches, and once review has begun, your branch is not private. Having to use different workflows before and after initial branch can also be error prone or at least tedious, so it can be useful to be able to point to "review friction" as a team-wide motivation to put up with it.

In open source projects one occasionally finds a pattern of keeping all the commits for review but doing a squash merge. This keeps the development visible at review time but doesn't keep it long term - the theory is that this gives novice contributors more freedom to flail, since it won't reflect on them at all after the contribution is merged. (It's sometimes suggested that this makes git bisect easier to use, but using --first-parent is enough if you're keeping the usual discipline of the main line always passing all tests.)

Conclusions

Code Review is a process that helps you get more eyes on your team's code, but there are lots of details that go into a good workflow for this. It's an extreme case of the belief that code is as much about communicating with people as it is about communicating with machines, which means you need to be attentive to social friction while developing it. While it requires a reasonable version control workflow, it doesn't need all that much additional plumbing - it mainly needs more communication.

Jakob Nielsen, 2000 was original research specific to qualitative usability testing, the model intuitively fits code review as well. ↩
The flexibility for non-developers to express things in code is often a valuable business practice - but it's also a hint that overall, software development perhaps isn't as much of an engineering discipline as we insist it is. ↩
There wasn't even .old or date-based history; just one set of files that got changed by two or three people (occasionally yelling at each other.) RCS let people at least "lock" the file they were changing which meant people yelling at each other before losing conflicting changes. Full disclosure: this was a 1990 problem, not a 21st century one - but inexperience sometimes looks like time travel. ↩
For a short time around 2015, Atlassian Bitbucket had a big advantage over Github in the not-yet-off-the-ground startup market - a free account could have a private repo. What it didn't have was Github's forced bias for making changes via pull request, so everyone was just checking out master and resolving conflicts rather than avoiding them - which, honestly, isn't too much of a distraction when you have three people sharing a desk. ↩
A much-ignored point of consistency is tabs vs. spaces, and while developers tend to be the sort that want personal flexibility - this is one of those areas where you must make a choice (and that choice is spaces) and enforce it (VC hooks if you can, also shared default configs for the editors and IDEs your team prefers) because trying to use both will hurt your team, and noone really prefers one to the other that strongly - though they're likely to have been burned in the past by having not nailed it down. ↩
Google famously goes as far as to have a company-wide "language skills and standards" level among reviewers, called "readability", specifically to encourage raising the quality bar for all internal google code. I'm using the term more generally, but it's interesting to see where google puts emphasis. ↩
See also Code Retirement Pages. ↩
A common conversation on teams that use something basic like Confluence or Google Docs without someone tasked as "information architect" or "gardener" is for discussions about documenting something to go "well, where did you look for it? I'll put it there" "I just searched for these keywords" which doesn't help to discover structure. It turns out to still be a useful conversation - just make sure to add those keywords explicitly in the document. You could think about it as "old-school SEO" but it's just making sure you include literal keywords that are relevant to your topic even though they're not actually part of how you're discussing the solution. ↩
This sounds like "Change Review" but that's a broader concept applied to systems and configuration rather than simply code (often hiding the "code review" part as "change the system to use this specific release" that was itself reviewed before getting released from the software organization.) ↩
Time pressure can be real ("we need to be in compliance with new regulations by 1 January") or fictional (Agile "sprints") or self-inflicted ("our marketing campaign promises these features this quarter but they never talked to engineering about it.".) In rare cases (XP's "Release Train" concept from the late 1990s) they can be entirely decoupled, with features delivered as they qualify, but that usually requires very tight coupling with a single user rather than having a product. Even without direct customer pressure on timing, if you have a product that has expensive tests (like certification audits, or mechanical performance tests of robot systems) you might reasonably have a group performing those tests on a limited schedule that needs a stream of releases that are "worth testing", which provides the same kind of back-pressure on the review and merge process. ↩
Atlassian Bitbucket used to aggressively render "this won't merge back cleanly into the target branch" as part of the review mechanism, but they dropped it at some point and you're probably left to your own discipline - likely pulling your main branch into your feature branch and re-testing before landing it. ↩
You also never force push to the mainline, but that's because ideally you don't commit to the mainline at all: you get review approvals on your branch and merge it and that's all you ever do to the mainline. (Most professional grade systems will let you outright forbid this by policy, or even only allow a specific "repo owner" to actually ever perform merges - but the fact that it is mechanically disruptive to everyone else sharing your repo and they will come after you with torches and pitchforks will usually encourage self-policing.) ↩
This is related to the military concept of "under orders" - a well run team will support giving a lot of direct feedback during the planning stage, but when a plan is executed, it's executed pursuant to authority - not inflexibly, but without pushback about preferences or committed decisions. (Not that software teams can be run to military standards - "herding cats" is has been the descriptive terms since at least the 1990s - but it is helpful to see that arguing is friction and there's a time and place for it.) ↩
While "Junior Developer" usually means someone who has relatively less career experience, in this context it also includes someone who has only recently had their first exposure to this team or this product. ↩
Retroactive Continuity - that is, failing entirely at preserving continuity but producing something that looks (from the present) as if it were a coherent story all along - is a staple of the comic book and comic book movie industries. The term is usually pejorative. ↩
This author hasn't experienced any beyond "don't do that" but is eager to hear of them. ↩

2026-02-05

Essays: Code Review

Tags: bugs Pretty Good Practices workflow

Code Audit

Code Review is a fundamentally incremental process, where you start with working code, and analyze proposed changes with the goal of continuing to have working code. What if instead you've been handed an entirely unfamiliar chunk of code - from a subcontractor, a vendor, a founder, or "the Internet" - and it has grown and evolved without ever having been subjected your review processes?

In the extreme case, you only have the code, but perhaps the only testsuite the code has is that "it's working in production already", and it's now your problem to keep it that way.

Testing

Obviously¹ your first task is going to be to start coming up with tests that represent "it works in production" in a more fine-grained way - even if that's just "stand up a staging system, run these commands against it" (in a repeatable, and preferably automated, way.) This leans into the idea that "what the system does now is Correct because someone is paying for it", which isn't the most sophisticated expression of intent but it's a fine starting point.

Documentation

If you have documentation you're in better than average shape, as it's usually a higher level expression of what features people cared about enough to write down - it can also capture "what we thought it should do" which can be clearer guidance than "what it happens to have ended up doing". (This is also why documentation is more important than you think it is for highly technical software products - trying to explain what's going on can lead to "huh, that is stupid when you say it out loud" Ah-Hah!-moments that can feed directly back into valuable customer-facing improvements. This is also why many documentation people have an inner product manager that you'd be well served to take advantage of...)

The Code

Once you're ready to look at the code, you'll probably want to go back and forth between identifying the structure and figuring out where things go, and examining the details of how this is accomplished. Ideally you're recording this analysis for your coworkers, and producing internal documentation for future use - while you may get a "good enough/can't use this" result, it's not the most important part, especially if you do end up actually accepting the code. Also, recorded decisions are good draft material for a more formal auditing policy guide, if you end up doing a lot of this work.

Code Structure

Depending on the kind of code you're working with, the structure may be more discoverable "outside-in" rather than "top-down", but in either case you want to start looking for

interfaces and data flow
component structure (internal - how it's built, external - dependencies)
how "features" are expressed (since those are likely the first things you're going to be asked to change)

This structure should provide guidance for what kinds of tests you should be writing and which parts of the system you want to be able to demonstrate still work after future changes.

(In an system where all changes go through code review, you're generally showing that "all tests still pass" - for an initial audit like this, you are instead likely to write tests that seem entirely reasonable and should pass but don't - so you need to have some way of tracking "expected" failures, so you can put pressure on fixing them soon, even if you can't do it immediately.)

Code Details

If you have fully automated tooling around your coding standards (formatting, cyclomatic complexity, comment density, type annotations) it's worth giving those a shot to "see what happens". If they don't fail outright, they can give you an initial view of what parts of the system are likely to need attention soonest; if they do, then at least you have a clear early cleanup task to schedule.

Security is usually a system design issue, but there are things that can stand out directly in code - and that tools can find - and there are often domain-specific checklists like OWASP that can suggest areas to look at where even shallow examination can expose problems. These are often in the form of "simple industry-wide problems that we should have gotten beyond, some time in the last three decades", like "examine all sql queries for injection attacks" or "be suspicious of all explicit memory allocation".

Michael Feathers, "Working with Legacy Code" specifically coins the term "Legacy code" to mean "any code that doesn't have sufficient tests for you to safely make real changes to it." ↩

2026-02-05

Essays: Code Audit

Tags: Pretty Good Practices workflow