Release Engineering

Essays: Release Engineering

Why is release engineering a big deal? How does it help your developers instead of getting in their way?

Can't we just do Continuous Delivery?

No.

Continuous Delivery is about minimizing the lag between any development work being "done" and all customers being able to use the result. While there is significant value in thinking in those terms and figuring out what you're doing in that gap - actually eliminating the gap can often result in feeding your customers into a buzzsaw of poorly consider reactions to changing requirements, rather than handing them the polished results of well-considered design.

Teams that haven't worked in this mode before may find that there are manual review processes, hard-to-automate system tests, or even external legal approvals that currently occupy those gaps. These are not as unusual as it might seem - even a trivial website, if it's delivered to the government of Canada, must comply with the Official Languages Act and provide multiple translations, which you might not be able to update continuously along with other code changes.

Also, while ideally you should be able to run every possible test on every change, this is often not be possible in practice (and it may not be merely expensive.)

Examples (from past personal experience):

Some validation that a public search engine worked "at scale" involved spinning up $20k of cloud compute and feeding it synthetic load for at least half a day. (There was corporate enthusiasm for doing this before release rollouts - and for firefighting - but not for every change on every branch.)
Robot systems where low fidelity simulations were possible, but literally noone had convincing simulation of gripper "fingertips" against actual consumer packaging over wildly varying temperatures without actually doing it in physical space (and a specialized team that helped work out what items were worth testing in the face of particular types of change was an amazing asset to have - but again, they scaled to "releases and firefighting", not to every change.)
Sometimes you don't "own" the customers - you're delivering a service to someone else to integrate and they have their own constraints. A performance accelerator component delivered to a major global website got validated on their schedule, by installing it on staging hardware and feeding it a fraction of their real login load, then "turning the knob" until it handled a convincing amount (or didn't, and all traffic got reverted to the previous System - because our experiments didn't justify their downtime.)
Sometimes, while your customers want new features, they don't actually want the disruption, IT time, or data reprocessing effort that comes with an upgrade, and you've negotiated that they get quarterly releases but aren't actually required to take them more than annually. (Unlike modern consumers, these customers are actually paying you money so they do in fact have a say in this kind of thing.) Here you want as much pre-release testing as possible because you really don't want the reputation hit of having the product fail once the customer finally gets around to rolling it out.

What can we do, then?

The first pushback you'll get from adding any process at all will be "why are we even doing this?" It helps to start with principles, and examples of problems that are solved by sticking to those principles. (While it's also educational to look at disasters caused by failing to adhere to principles¹ that doesn't give much software-specific actionable guidance.)

Principle: Traceability (and Provenance)

Traceability is simply the idea that there should be no question about what you shipped - a product name and version number should be enough to get you back to all of your inputs. Even if you're building bespoke one-off things for individual customers - as often happens in a startup while it's still trying to converge on a product - you should still be able identify everything that goes in to each of those things. It can save you a lot of time to see that a customer complaint is on release X and see that release X+1 has a fix for it; you need clearly labeled releases to be able to do that. Even if you're delivering bespoke fixes, knowing what "future" release to extract them from makes this kind of traceability worth the trouble.

Keep in mind that you can get a lot of traceability "for free" from process automation - not that the automation itself is free, but it's a good way to forestall "excess creativity" in the build process, and then you can let that process to the labelling automatically.

Provenance is the rest of the story - for every input you use to build a product (third-party libraries or your own code, including data) you should know where those inputs come from - if you track a customer problem down to third party code, that doesn't absolve you of the problem, that just means you've narrowed down where to keep digging. Sometimes that means "knowing which support contract to invoke", sometimes it means "fixing the problem locally and posting the fix on their github", or other things in between; if you're using a component that's actively developed, you need to know what version you shipped so you can see if they've already fixed it upstream. (Ideally you want to be able to say the precise version as they publish it, but even knowing "latest, but downloaded on this date" can be workable.)

Provenance also lets you "connect the dots" between the public security announcements your customer is pestering you about - CVEs, USNs, RHSAs, etc. are generally quite specific about which version numbers a given advisory applies to. (If your software is public-facing you might find yourself having to provide these identifications - or if you have certain large customers, they may insist even if you're not otherwise obligated.)

Finally, if your company is being acquired or taking on significant investment, one part of modern Due Diligence is a License Audit - a review of your third party components to see if there are any license compliance issues. If that's in your future, it's a lot easier to produce that inventory if you just keep track of licenses as you add dependencies - at very least it gives you a hook to find problems before they become load-bearing. (At one company with a product with a large number of open-source dependencies, the board of directors wanted a regular report of what new licenses we had to worry about - this turned out to be easiest to integrate by just comparing our package inventories at release time and reporting on the difference.)

(See the Traceability article for more depth.)

Principle: Repeatability → Debuggability

Repeatability is just the idea that if you chose to build something once, you can revisit that and build it again. The most practical reason for this is that if you see problems in the field, you should be able to easily build the exact same fielded version with

extra debugging (though if you do this a lot, you should think about what quality level your product is actually at and maybe start shipping builds with a lot of debugging turned on by default)
small "trial" workarounds - while your fix should be based on an understanding of the problem, it can sometimes be worth seeing if a speculative fix works by trying it.

It should also be clear to everyone working on the project how "what you intended to build" turns into "what you actually shipped" - not necessarily how all of the mechanics work, but how to operate them.

(You should also do the "easy version" and actually keep copies of what got shipped - but that's usually not enough to support even basic debugging, unless you're actually shipping fully debug-capable builds to customers, which is rare.)

Principle: Delivery reflects Intent

There should be no question about what you shipped. While this sounds obvious, be aware that there are red flags like "didn't we fix that last release? Why is this customer still seeing it?" - if the problem was subtle in some way that your fix didn't capture, that's one thing, but if the code got written, reviewed, and landed, but still didn't end up in the next release, that should be a matter of great concern.

Relatedly, if you did fix something, and then two releases later it's broken again - that looks really bad from the customer's perspective and you should be putting more effort into preventing it. Usually that means "customer-visible fixes get more test coverage than would seem reasonable", but if the code "just got lost", that's another red flag that keeps your customer from taking your ability to meet your commitments seriously.

Principle: Automate software, not people

There is often pushback about doing serious automation around release engineering, like "it's not the product focus", "we can just have some checklists", "do we have to review and test the automation too". In practice, you don't need that much code (and the simpler you can keep it, the better.) Checklists² are critically important, but they're also hard for humans - anything you can reduce to mechanical automation will be more reliable and eliminate human tedium and fatigue (which lead to errors.)

That said, starting with a recipe or detailed checklist can help work out the initial process and figure out what parts are worth automating; they also make it easier to hand the chore around, especially on a small team that's reluctant to dedicate resources to process work. Ultimately you want the release process to be "boring" - it's not where you should be innovating, and you probably don't want to spend creativity on it either.

Principle: Release schedules support planning.

You need to get working software out the door to your customers. Software estimation is hard, and some things really are "researchy" enough that they're done when they're done - but as you accumulate experience in a particular problem domain and product environment you'll get an idea of what parts you can make predictions about. Predictions can help you allocate resources all along the delivery pipeline - testing, marketing, customer delivery and support, and even with the customer themselves.

It also helps with planning across subsystems. In the extreme, Mark Shuttleworth gave a PyCon 2010 Keynote about the "Release Cadence" that Ubuntu was using - and gently imposing on the open source community - you didn't have to pay attention to it, but related projects naturally synchronized around shipping features in time to hit the Ubuntu 6-month release cycle, or the 2-year "Long Term Support" cycle. This was especially helpful for foundational projects like Gnome and KDE to be able to say "we'll have these features by this target, so applications can start using them in the same target" instead of waiting for a later cycle.

This is the opposite of the model used by classic systems like XP "Extreme Programming" which still had a cadence - a release every 3 or 6 weeks - but the release was "only things that are completely ready" with no implied pressure to "fit" a deadline. The lack of pressure was considered important for quality (basically, "we will ship no code before its time") and while it did encourage having very clear terms for what it meant for code to be ready, it did nothing to help related projects plan for future deliveries.

Principle: Release versioning supports planning.

In addition to having a schedule, you can communicate (and commit to) more detail by properly "naming" (with version numbers) the software you release.

The best practice here is Semantic Versioning where you always have a three part version number - changing the major version for incompatible changes, the minor version for "new features that don't break compatibility", and a third "patch" version for compatible bug-fixes only. While this is completely clear (and mostly mechanical) for simple things like library dependencies, for an entire product you'll end up using it to communicate to people (customers) where it gets more complicated. You may choose to define support contracts based on these versions; is a major version change just a continuation of an existing contract, or a new purchase? How long can you run a given version before you are required to upgrade? Sometimes there's a "lighter" version where support is allowed to push back on customer complaints if there's a newer version available, and it's a matter of service quality whether you even try to confirm that the problem is fixed on the newer version.

Sometimes you'll have sales and marketing interest in how these are presented; for one interesting version of this, the first number was controlled by marketing and tied to high-effort customer presentations on all of the new things "System 4" would get them - even if they were already partly rolled out and customer-tested before that. (Engineering controlled the rest of the version number, primarily mechanically.)

Principle: Version what you Test

With all of the emphasis on versioning and support code - for my last three companies, the version of the data³ accompanying the code was just as important. Since we generally tested the latest data with the latest code, rather than a grid of code and data, we didn't actually need a different version number for the data, just enough information to trace what data was current for that version of the code. (Attaching direct references in version control, using git lfs or something like it, is an obvious way to handle this, but you'll need to watch out for the data getting updated and the cross-reference falling behind; it's often worth including some basic acceptance tests to defend against that.)

Conclusion

While a lot of release engineering looks from the outside like bookkeeping for bookkeeping's sake, it all comes from principles, which support avoiding problems (past and future.) There are arguments for and against - but "it takes time" usually isn't a good one; knowing the tradeoffs, and knowing what you're asking for, are a good way to show your team that using release engineering proceses is actually in their interests.

For example, any NTSB or USCSB investigation but specifically the Containership Dali taking out a bridge because a poorly installed label interfered with a single wire connection is practically begging for a certain kind of attention to detail - might be a bit much for a typical software project, though. ↩
Worth reading "The Checklist Manifesto", Atul Gawande, if you have any repeated process that needs to be done right; the book is far more significant than it sounds. ↩
Data in this case meant bulk machine learning models and map data, product-specific but not generally customer-specific. ↩