Archaeology isn't just an excuse to wear a cool hat and carry a whip - it's about recognizing that "today" is built upon layer after layer of "the past" and sometimes you need to peel back those layers to explain a problem.
In the software development context it primarily comes up when fixing bugs and when modifying long-stable code to meet changing requirements. This breaks down into answering near-term "how does this code work now" and longer term "how did we get here" questions.
Digging into the present (forensics)
Outside of some modern art contexts archaeology is not about the present; "digging into the present" more properly lands somewhere between journalism and forensics. In software we aim to approach it more from the forensic side - separating reports/hearsay from concrete evidence, keeping meticulous records of any discoveries (simplified by being able to perfectly copy memory states and program inputs and outputs, and store them at the terabyte scale - criminal forensics would be greatly simplified by being able to make hundreds of exact copies of a crime scene and poke at it with impunity, without disrupting the original!)
This is sort of the reverse of the bug funnel, and at the same time is a good part of the motivation for that model: ultimately a bug report is "something happened that shouldn't have happened" and the shouldn't part is based on some model of how the system works - your customers will have a model of the system that is arguably more important than yours, but is also vastly less precise. To usefully pass a complaint along the chain from user to developer, you need to make sure the complaint continues to fit the model, and that involves making it more concrete without losing sight of the end user's perspective.
A powerful way of making the complaint more concrete is to turn it into a test case.2 This test case can serve several purposes:
- Others, particularly at review time, can look at the test case and agree (or disagree) that it represents the actual problem.
- The test case can be applied to multiple past versions of the code (see "archaeology" below) to narrow down what versions express the undesired behaviour.
- The test case can be applied to future versions of the code - as a "regression test" to make sure the problem doesn't come back.1
- Finally the test case can be used on the present version of the code to confirm that the change does the desired thing - serving as a "free" first intent-based test.3
Once you have a (failing) test to work from, you can apply the usual tools for figuring out what's going on - tracing execution, single stepping, measuring performance - the test should give you an "expected path" through the code and you can see where things deviate. Sometimes that will be enough - especially if you're responding to an environmental change or a new business requirement, the difference between "code that doesn't try to do that at all" and "code that successfully does that" is usually straightforward and direct.
It's when the code already almost does that and the change isn't an external desire to change it that you have to start digging more deeply…
Digging into the past (archaeology)
Once you've identified where the relevant code is, you may have questions about how it got that way. Unless you're really aggressive about narratively documenting your project timeline, you're going to be digging in to version control history.4
Once you've found the area of concern, the first question to try and
answer is "what were the last changes to this specifically." At the
file level, a simple git log is enough, just showing every change
made to the file; you can then git show each change one by one (or
see everything with git log --patch which is noisy, but useful if
you want to see the whole flow of the development process, or if you
just want to search for keywords and don't know if they're in code,
comments, or commit messages.)
In practice, you've probably narrowed down your problem more precisely
than "entire file", maybe to a single function or group of functions,
or maybe even a few lines of code. This is where git blame
(formerly git annotate) comes in - since git is just assembling
files from a history of patches, blame runs through the same
process, but keeps track of which patch touched a given line, then
presents the entire file with annotations on each line. At this
point, you can examine that revision directly (using git show) and
look at discussion in the commit messages, what the previous code was,
or even just what branch it was merged from (which should be
enough to let you trace it back to the Pull Request that inspired it, to see the
review discussion.)
While it is very often true that "what you just changed is what
broke things", that's more about pushing back on the idea that
software "just breaks" without some human intervention - it doesn't
mean that the most recent change is guaranteed to be the problem.
Sometimes the flaw has been there for a long time (see also "how did
this ever work") and is revealed by new external
circumstances.5 In these cases you probably need to do
deeper archaeology - the same thing you did to find the recent change,
but digging another layer back. While continuing with git log and
git blame works fine, git
blameall gets you the entire
history of the file (including deletions) with all changes from the
first commit visible at one time. This can be a bit overwhelmingly
noisy, but can give you quick answers to "did anything in this file
ever call this function" if your concern happens to be shaped that
way.
A note about git blame
The "blame" feature is a bit of tech toxicity that can distract from
how useful it is. Perforce and Subversion had annotate commands,
Subversion got the blame alias (and then a praise alias in
reaction to that.) Git documents blame as the "primary" command (it
doesn't have praise at all) and git annotate is only for
similarity with older version control systems like the ones mentioned
above.
The toxicity is that you can only blame people for something; the code doesn't change itself. At the same time, the recorded code doesn't encapsulate what was going on when the code was written - it only contains the actual code itself. The thing to remember (that gets short-circuited by even calling it "blame") is that you're not (at least on a healthy team) looking for someone to blame - you're only trying to take code that doesn't work today and turn it into code that does work. You're doing this digging to see how the code has been shaped over time because other more direct approaches haven't worked and you need more context.
That doesn't mean that "who worked on this particular bit of code" can't be important - perhaps you'll learn that it was some code one of the founders wrote five years ago after three all-nighters and noone has dared touch it since - and you can probably ask them about it and get a good story about how the company tried to get a Red Bullâ„¢ sponsorship before the VC money came in. Or perhaps the change was from a senior engineer who was experimenting with something new... or they only touched it incidentally as part of a global cleanup chore. These are weird - culturally interesting, but still weird - corner cases that don't really help you solve the problem. Most of the time you're going to find code written by one of your peers, with similar experience and constraints, and similar habits - who makes similar mistakes to what you'd have made if the task was in your hands instead. This is why you're going to be able to build your own mental model of what's going on, and fix the thing. That is the essence of Rule 3 itself - patterns in code are human patterns of thought, and treating your peers as human isn't just empathetic, it's accurate.
Chesterton's Fence
The principle that one should not tear down a fence unless one knows why it was erected is fundamentally about using principles instead of research; while this applies as much in software as it does anywhere else, good practices in software (version control, code review) mean that you should have the information and it's just a matter of digging it up. Archaeology is about having the tools in place to do this digging.
Conclusion
Archaeology is a tool for understanding code. It's not your only tool, and you will probably have more immediate options for most bugs - "what did you just change, that's probably what broke" is a far more potent investigative tool - but as you work on systems of greater age and complexity, it's more likely to be what you need to understand how the system got where it is.
-
Expressing it as "the problem coming back" is somewhat magical thinking, a regression test is really a "guardrail" to make sure a developer doesn't reintroduce the problem without requiring that they have perfect awareness of all past problems. As such, it needs to be easy and convenient for all developers to run all such tests as part of development - the guardrail doesn't serve it's purpose if someone has to go digging for it. ↩
-
Open source projects often push the work of generating reproduction cases back to the reporting user, since the whole point is that end users have full access to the source code; commercial projects more often have one or more tiers of internal support organization to handle this. (In both cases it may still end up in the developer's lap.) ↩
-
TDD or "Test Driven Development" is the logical conclusion of this - where development starts with "writing a test that fails"; it fails for the more specific reason that "the code implementing the feature doesn't exist yet", but it still gives you a very clear stepping stone from "wanting the code to do this" and "showing that (with these changes) it does this." ↩
-
Concrete examples here all use
git- not because you have to, butgitis so widespread that any system you do use will likely have a detailed comparison list just to explain why you'd even consider using it. (You can even find these lists for much older systems like Subversion, Perforce, or CVS, they just describe the glorious future, instead of the glorious present.) ↩ -
This is also why it's important to turn a bug report into a concrete failing test - not just so that you can prevent the problem from recurring in the future, but so that you can resurrect past versions of the code and see if it previously worked or never did. ↩