Hey Avdi, should we use a coding standard?
Yes.
Which standard should we use?
I don’t think there’s one “right” coding standard. I think every project should select their own standard, based on their unique needs and their team-members preferences.
I think a diversity of styles across a programming language community is a good thing. I think it can be a good thing even across a large organization.
When I worked at a very large corporation, there were one or two recommended “site-wide” coding standards for any given language. But the ultimate choice of a standard for any given new project was left to the discretion of the team lead.
I think per-project is a good granularity for a choosing coding standard. This usually corresponds to a single source code repository, and a single roughly consistent team of people.
How should we choose a coding standard?
Most programming language communities eventually evolve a few widely-embraced standards guides. For instance, if you’re working on C code you might choose the GNU coding standard. Or if you’re using Ruby, you might select Bozhidar Batsov’s community-influenced standard.
These standards have had a lot of community thought and debate put into them. They have clearly-stated rationales for their choices. That makes them a good place to start.
But some of the choices in that standard are stupid and wrong!
True. That’s why I say a community standard is a good starting point. Where your team has clear objections to recommendations in the standard, you should modify your project style guide to fit your team better.
But half of our team thinks one way is best, and half insists the other way is best! How can I convince them they are wrong?
First off, let’s be honest. It’s probably not a 50/50 split. On most teams, for any given style choice, there are usually one or two people who feel strongly about it one way, and one or two who feel strongly the other way. The rest of the team doesn’t really care much either way.
Fine, how do I convince those one or two other people they are wrong??
Some well-reasoned, realistic examples of how your preference is more revealing of intent, or more efficient to type, or prevents misleading corner cases can sometimes help sway others.
It’s not that, my style is just more beautiful. And anyway, it’s the idiomatic way to to it.
After many years of feeling very, very strongly about coding style choices, I can tell you this: the biggest factor in code looking “right” or “more beautiful” or “more idiomatic” is familiarity. I can’t tell you how many times I’ve gone into projects that had style preferences I simply hated, and by the time I’d been working on it for six weeks I found I just didn’t care any more. Sometimes, I even adopted the new style for my own projects.
For instance, when I got started coding in C++, I learned to line up my curly braces so you could always look up from a closing curly brace and see the matching opening one on the same column:
int main(int argc, char** argv) { std::cout << "Hello, world!"; }
Later, I worked more with code in the GNU/GNOME style, where the opening curly follows the function name on the same line. Augh! New! Different! Wrong! Bad!
int main(int argc, char** argv) { std::cout << "Hello, world!"; }
My feelings these days can be summed up as: "Meh". I have a mild preference for GNU-style.
I've also seen numerous "A vs. B" examples of code presented, where B was supposed to be objectively more beautiful than A. And most of the time, I've been hard-pressed to see night-and-day difference that the author clearly believed to exist.
In Ruby code, I think the Rails-popularized convention of choosing curly braces or do … end for blocks based on the number of lines of code in the block is a bit silly. I've been told that one reason for this convention is that having do … end on a single line is objectively appalling:
h.fetch(:some_key) do raise ":some_key is required" end
Personally, I don't see anything bothersome about that line. As a rule, objective aesthetic judgments… aren't.
It's worth keeping in mind, too, that "standard" programming syntax is basically pants-on-head crazy to begin with. For instance, in every C-influenced programming language, we indicate "sending a message" or "requesting an attribute" something like this:
farm_boy.fetch_me_that_pitcher()
Since the words are in English, let's consider this from an English-language standpoint for a moment.
The period or full-stop character usually indicates either the end of a sentence, or the radix mark between whole numbers and fractional ones. Here, it seems to be standing in the place where we would usually put a comma:
"Farm boy, fetch me that pitcher"
Or perhaps a colon:
"Farm boy: Fetch me that pitcher!"
Meanwhile, the words are are separated from each other by a character that most computer beginners don't even know exists. And don't get me started on the use of parentheses!
OK, fine, style is subjective. But some choices really do make code easier to change, or make certain mistakes less likely!
Yes, I think that's true.
And my project lead is making the wrong choices!!
It's possible you may have good, solid, code-quality-oriented arguments for a certain style. And you may find yourself at odds with other team members, or your team leadership, over those points of style.
There is one very important question you need to keep in mind when advocating for a particular coding style choice. It's so important I'm going to put it on a separate line for emphasis.
What is the dollar value of being right?
Look, let's face it, you're totally right about this style decision or coding practice. And your opponents are wrong and stupid. But ask yourself this:
Will getting this right save you enough money to make up for:
- The cost of team-hours spent in style guide meetings that fail to end with any resolution?
- Damaged relationships and lessened communication as a result of butting heads in those meetings?
- The cost of hundreds-of-email-long threads debating the pros and cons of different styles?
- The rancor your teammates feel when they use the style they dislike but for which you fought tooth-and-nail?
There are some coding standard choices whose advantages which might be worth all that. But I'm hard pressed to think of them off the top of my head.
So if the coding standard might wind up enshrining bad choices, what's the point of even having one?
While coding standards can give programmers, especially less-experienced ones, a nudge away from poor choices, I don't see that as their principle advantage. To me, the biggest win from a coding standard is consistency.
Consider the following two blocks of Ruby code:
# Block #1 output = [] while word = input.shift unless STOP_WORDS.include?(word.downcase) output << word end end # Block #2 output = input.map(&:downcase) - STOP_WORDS
Was it obvious at first glance that block #1 and block #2 do substantially the same thing?
More importantly: was it obvious that despite having similar functionality, block #1 and block #2 behave differently in two subtle but significant ways?
The real value of a coding standard is in making same things look the same, and different things look different. When team members can rely on this kind of consistency, it enables them to more quickly get up to speed on what a given stanza of code is saying. And that in turn means they can operate at a higher level of abstraction for more of the time.
What does this mean for legacy code? Do we have to update every file in the project to the new standard?
In a word, no. I can't see any business case for devoting hours to going through every file in your project and updating them to be in line with a newly-minted coding standard. It's not worth the time and effort spent, and you're likely to introduce inadvertent bugs into the legacy code.
But what if I have to make a change in a pre-standard file? Should I update the whole file while I'm in there?
Here's where my answer may differ from that of others. It's an answer that has surprised some people.
The answer is: no. Updating the style of a legacy code file in the process of making a change takes up time, clutters up diffs, and increases the chance you'll unintentionally introduce new regressions.
So does that mean I should use the new style just in the code that I add or change?
Let's make the question concrete. Consider this code from a legacy file:
STOP_WORDS = [ 'i', 'we', 'if', 'and', 'the' ]
Suppose your task is to add the word "they" to this list of stop words. And suppose your coding standard now says that all strings should be double-quoted.
Should you update the whole list in the process of adding the new word?
STOP_WORDS = [ "i", "we", "if", "and", "the", "they" ]
]
Or should you use the new standard just for your additions?
STOP_WORDS = [ 'i', 'we', 'if', 'and', 'the', "they" ]
As I said earlier, to me the value of a coding standard is in establishing consistency. Consistency is important at a project level. But it's also important at a file level.
For that reason, in the case where you are updating a pre-coding-standard file, I don't believe you should obey the coding standard. Instead, if there is any consistent style whatsoever to the legacy code, you should try to match it.
That means, if a list of words uses single quotes, you should add new words in single quotes too.
STOP_WORDS = [ 'i', 'we', 'if', 'and', 'the', 'they' ]
If you and other team members find themselves updating a particular legacy source file frequently, then I think it's a good idea to queue up task to go in and update the file to match your current standards. But I don't think you should pre-emptively update files you hardly ever touch. And I can't support making locally inconsistent changes in order to stay in line with the style guide.
So, those are my present thoughts on coding standards. If you have ideas or questions, feel free to leave a comment.
One thing I’ve found helps with avoiding arguing every point in the coding standard is to get everyone to agree to just try it unchanged for a week or two before opening it up for debate. That gets people past the reflexive negative reaction to unfamiliar styles. Once that time is up it’s OK to start saying “Hey, I’ve tried it the rubocop way for a while and I still think we need to bump the line length warning out. 80 columns isn’t working for our deeply nested class hierarchies.” I’ve seen that work 3 different times on different teams, so I think it’s pretty repeatable.
Excellent advice.
Avdi, how do you feel about auto-formatting? I feel like the Go community has succeeded with its experiment of mandating use of gofmt. I’ve been programming for enough decades now, in dozens of languages and with different coding styles, that I don’t care about my arbitrary, petty personal preferences any more as long as I can understand code, rely on its consistent format, and have my editor/IDE enforce it. I would be completely OK with every language community adopting a (perhaps customizable) auto-formatter.
Of course, coding style isn’t just about formatting. But again, Go has led the way with enforcing semantic concerns as well. I think all this is part of a trend which is good: hooking up analysis tools to the development process in order to verify or mandate agreed-upon standards in an automated way.
I think Go is a bit of a special case. Auto-formatting works reasonably well for them because it has evolved hand-in-hand with an auto-formatter, and because it’s a language with deliberately few ways to do things.
For most languages I think a pretty-printer stage can be a helpful way to augment a standards policy, if it’s sufficiently flexible.
Finally, I also believe that there are some cases where it is a huge win to be able to format constant data in a tabular way, with some columns left-justified, and some of them righ-justified. Dave Thomas has a great talk example of how this can reveal bugs. In cases like this, if a pretty-printer breaks my alignment and justification I will drop-kick it out the nearest window 😉
Sometimes cleaning up a file is warranted when adding new code, but I would suggest: commit the clean up in its own git commit with a comment explain as much, and then make your edit, and commit it with a message related to the change. There’s nothing worse than hiding a one character change in a 200 line patch of mostly whitespace. In the end, police your own commits so that the contents of the patch are only related to what the patch is about. Style cleanups are ok, as long as they are the only thing in the commit.
Yes, 100% agreed.
To answer your question: “was it obvious that despite having similar functionality, block #1 and block #2 behave differently in two subtle but significant ways?”
Yes and no. It was immediately obvious to me at a glance that the big one would mutate the input (since it used shift) while the small one wouldn’t (since it used map, with non-destructive method). I didn’t bother looking for other differences until you asked… and then I had to go back, overcome my assumption that they would be as similar as possible, and look a bit closer to see that the larger one put the original (not downcased) version of the word into the output, while it was extremely obvious that the small one produced purely downcased output.
Sure, it was probably a rhetorical question… but you got me wondering whether others’ experience was like mine.
Sneaky, isn’t it? If you never tested it with capital letters, it could go undetected for a long time.
“Undetected” implies a bug to me. Either one may well be the desired behavior. Sounds like a reasonable case (so to speak) for a comment — re the why, not the what or how.
My “go or no go” test for a code style is if the linter for that language can be configured to handle that style. If so I’m fine with it. If not I’ll fight it.
I disagree with leaving legacy files alone. I think it is important for the health of a project to break the fear around touching legacy code. If someone is afraid to change a single quote to a double quote they’ll never have the courage to do any more significant refactoring. I update styles as I come across them and often find existing bugs in the process.
There are plenty of great arguments for using plain text files for source code, but I think many of the points you make with respect to style guides emphasize the one, huge down-side: when you edit code, you don’t care about the text, you care about the form. Yes, style guides are good because they bring every edit closer to being only about the form of the code, and not the text. For the same reason, it’s not useful to updated old files because while the text is changing significantly, the form is not (or you better hope so).
But really, wouldn’t it be better if we stored the form of the source code we write? Imagine, if you will, a language that stores on disk as AST, but can be opened in any text editor with whatever form that individual programmer prefers. Imagine a version control system that operates on forms instead of text, so you can see clearly the important bits of any change. For my money, that would be worth more than all the perfect style guides for all the projects in the world.
The most important thing, perhaps the ONLY important thing, about a coding standard is that the team use it consistently. It is this consistency that allows the code to fade into the background, all but unnoticed, while the reader concentrates on the meaning of the code and the intent of the original coder.
Great article!
In our team we are trying to refactor whole line (if touched) to new style if it doesn’t take a lot of time. We have somehow adopted
Rubocop
and disabled only few things which we didn’t liked/weren’t agreeing on.Project is running 6+ years now, and have a lot of legacy code. After some time doing these small refactoring we do found out, that even darkest parts of system is now pretty readable 🙂