In the course of my career, I’ve jumped in headfirst to dozens of projects. There was a period of time when people would hire me for 2 hour slots, and in that time I would have to learn enough about their project and their code to not just understand it, but offer useful insights. And as a consultant, I’ve often needed to quickly come up to speed and be able to make useful changes to a codebase within days.
What not to do
Here’s what I can tell you about what not to do:
Don’t start with code reading. You can get a lot of goofy unvalidated ideas about a codebase when you just start reading code. Usually you’ll miss the most important files, which are in a subdirectory of lib
, in a locked file cabinet with a sign reading “beware of the leopard”.
Don’t start by reading the tests. Tests are historical documents of pain, and of mortal combat with persistent bugs. Or, sometimes, they are obligatory reiterations of the code itself, generated or written by rote in order to satisfy the dictums of a coding standard.
Occasionally, they are beautiful. But even then tests rarely offer any insight into what is important in a project, vs what is merely hard to get right.
What to do
The best way I know to get acquainted with an unknown codebase is to fire it up locally (this alone may be very hard). Then approach softly, humbly, as a mere user. Start making theories about how it works, and what parts of the code are responsible for the behaviors you see.
Then deliberately start breaking it.
Confirm—or, more often, disprove—your conjectures. Add raise "hello"
exceptions or printf("*** YEP WE GOT TO THIS POINT\n");
. Make hypotheses, and then document and validate those hypotheses by adding assertions to the code—assertions that will visibly crash the code if they fail. Or if you can get the project working under a debugger, add breakpoints.
Do all this in an editor with Gitlens integration or something like it, so you can immediately see the story behind the code you’re looking at.
And notes. Take oh so many notes.
This article was adapted from a SIGAVDI letter originally sent out in January 2017. Subscribe for free to get pieces like this as they are published!
Usually understanding the business rules first works better for me. Then understanding the models, and how they correlate when in a data-intensive application, looking into the DB and _then_ seeing its queries on the repositories/services. So I go to the bottom of pyramid and get to top (usually some requests on Postman) and it helps me to unveil everything I need to head start.