Development

What NOT to fix in a Legacy Codebase

Nicolas Carlo

min read

Maintaining a legacy codebase can feel like a daunting task. There is so much code to refactor and so little time to do so. When do you choose to refactor or to let it go? You can’t spend most of your time cleaning up the mess: you need to deliver features and fix bugs. But if you don’t spend some time cleaning it up, things get messier. It slows you down and creates more bugs in the process…

Answers like “measure the cost of technical debt” are not so helpful if they don’t tell you how. In real life, you can’t compare the cost of refactoring vs. NOT refactoring a particular piece of code. Will it be changed often? For how long will this be maintained? Who will be working on it? Will it cause bugs? The answers often are gut-check estimations. But it’s hard to stand by the “gut-check” when your coworkers ask you whether a refactoring should be made.

I don’t have a simple answer to these questions. As often in software development, It Depends™.

However, I suggest we look at the problem from a different angle! What if we could list the things we know are NOT worth fixing? Not all messes have an equal impact on your work. When you deal with a legacy system, you should let some of it go so you can focus your energy on what’s left.

Don’t rewrite the whole thing

I have seen a bunch of projects which follow this pattern:

The codebase grows as the business requests features and bug fixes
Old developers progressively leave the project, new developers join
The codebase ages and creates friction for the team to work with
The team (mostly recent devs now) wants to rewrite the codebase using more modern tools

At this point, the plan is to replace the existing “legacy codebase” with a new, shiny one. If the team manages to get the Rewrite Project™ accepted, the rest goes like so:

The team starts rewriting the project somewhere else, using different tools (greenfield again, yay!)
Because the rewrite takes a long time, the team has to provide some support for the old system (urgent bug fixes, critical features). These should be implemented in the new codebase too.
After a while, the team tries to swap the old system with the new one. Quickly, a lot of regressions are reported. More time is needed until the replacement can be made.
Old developers progressively leave the project, new developers join
The new codebase ages and creates friction for the team to work with. It may finally get deployed to replace the old one… maybe not 🤷
The team (mostly recent devs now) wants to rewrite the codebase, using more modern tools 🙃

I’m not kidding! I’ve been there, maintaining a legacy codebase AND its rewrite attempt, both running in production. The idea to kickstart a new rewrite “that would fix it all” was frequently raised during meetings.

When it comes to legacy systems that are serving customers in production, a Big Bang Rewrite is seldom a good idea. Not that we should not modernize the code. But, it’s very naive to think we can replace a whole system we struggle to understand without the business noticing the difference.

What to do instead

In my experience, a better approach is to reduce the scope of the rewrite. Identify a chunk of the legacy system, and rewrite that. Just that.

Incremental rewrites can be delivered to production faster.

I like to use The Ship of Theseus technique to do this. It’s also called "The Strangler Fig Pattern” and looks like so:

Image source

If you are looking for an actual example from the trenches, I recommend this talk from Adrianna Chang who applied this at Shopify 🪖

Don’t fix what’s not in your way

There is a common saying that goes like this:

If it ain’t broke, don’t fix it.

To be honest, I don’t think it’s a great saying. It’s easy to misunderstand and usually used to hinder changes 😬

The point is not to prevent our fellows from changing the way things are done today. I believe progress should trump consistency, but the new standards should be explicit. Thus, I think it is fine to change something that is not “broken” if it’s a step towards the desired state.

However, it can quickly become a distraction if you are not mindful.

Maybe you have been in this situation before:

You see some old syntax/pattern while reading code
You decide you will modernize it as you are passing by
Later, when you ship, a regression is reported
You have to revert this change and fix the bug asap
Finally, you will spend much longer on this code than you thought, just because of that little detour you took

I have been there too. Many times, actually. In general, that’s because I’m not careful in step 2. For example, when I forget to test if my change introduced a regression—maybe I was not even aware of the actual behavior #oops 🤦

However, we should not conclude that we should avoid modernizing code after this little adventure.

Modernizing the code you have to maintain is desirable. If you need to change some code, it would be wise to spend some time to make the change easy first. But, as a professional, you should be aware of the changes you are making. Are you introducing a regression? What’s the current behavior? Automated tests usually are the fastest way to continuously get this feedback. They are often missing too! Without tests, you should either write them (that’s long the first time), test manually (that’s long every time), or let someone else test them (that’s how you get disrupted with urgent bug fixes in prod).

Therefore, my twist on this advice is:

If it ain’t in your way to deliver business value, don’t fix it.

We don’t refactor code for the sake of it. We refactor code so we can keep adding value to the software without introducing bugs.

Some old code may be ugly, but it doesn’t matter unless you have to read or update it. Don’t waste time refactoring code just because it’s not clean. It’s a distraction that can cost you a lot of time. Worst: it may give a bad taste to “refactoring”, leading your team to stop doing the necessary ones.

What to do instead

My point is that it takes time to make safe changes on a legacy codebase. Thus, you should reduce the number of changes you make. When time is limited, trade of the scope, not the quality.

Stay focused on the target. Do one thing at a time. That gives you more time to do that thing, and do it well.

Your time is better spent adding missing tests and refactoring code that’s actually in your way. Don’t be lured by the seemly low-hanging fruits that you see nearby. Stay focus.

If you struggle to do that, I recommend you try the Mikado Method. It provides a clear recipe to follow as you dive into unknown unknowns, so you stay on track.

Doing smaller, more frequent commits can help too.

Don’t waste time with easy refactorings

This one is more subtle. It’s about prioritizing which kind of refactoring to tackle first.

Some refactorings are easier to perform than others. Some are more impactful than others. The Difficulty and the Impact of refactorings don’t necessarily correlate. The trap is to only go for the easy ones and not consider the impact of these.

Again: your time is counted. There is so much you can do before the upcoming deadline. When there will be no time left, you will tend to take risky shortcuts. Typically, tests get dropped when time is running out. That’s why you should not waste too much time on insignificant refactorings that could be tackled later.

Rather, spend time on the necessary refactorings that will make the code testable. That will pay off as you will be able to move safer and faster in the code as requirements change and deadline approaches!

It’s better to ship one impactful refactoring rather than 50 small ones that have little-to-no impact on delivering business value.

What to do instead

To help you identify which parts of the legacy system you’d better spend your energy on, I recommend performing a Hotspots Analysis.

You need 2 things to run such an analysis:

A way to measure code complexity. If your language has no such tool, you can use the indentation level as a good-enough metric.
Version control metadata. If your project is using git, you are good to go.

With this, you can quickly draft a profile of the codebase you are dealing with. This will help you prioritize the code that you should refactor first, based on the impact this refactoring will have on your velocity:

Refactoring base on the impact it will have on your velocity

In short: focus on painful code that is changed often.

Refactorings that generally help are the following:

Extract logic into pure functions you can name, test, and refactor
Make side effects visible. In particular, separate code that returns a value from code that performs side-effects (the CQS principle).
Improve names. This one is usually an easy refactoring since it can be automated by your code editor (eg. F2 on VS Code). Yet, it’s an important one.
Remove duplication of concepts. Duplication is cheaper to maintain than getting the wrong abstraction. But, when you notice things always change together, then you have a good candidate for refactoring.

Finally, I’ll like to add a nuance to this advice. Sometimes, it’s OK to start with a few easy refactorings to warm up. Starting with something easy can get you started. That is a valid reason to do it. But, if you are looking to actually make that codebase easier to work with, don’t waste your time on these refactorings for too long.

If you want to discover more tips ans tricks to improve your productivity, take a look at this section of our blog.

Thanks to Nicolas Carlo for this collaboration. You can read more of his articles in his blog page Understand Legacy Code and follow him on Twitter @nicoespeon.

Photo credit: Markus Spiske.