Tech Debt and the Pragmatic Middle Ground

Blissful unawareness, denial, then acceptance, then resistance. And finally, a pragmatic middle ground. This is the typical journey engineers go through in their relationship with tech debt.

It's tempting to get straight to the point: how to remove tech debt, and how to keep it at bay. But that would be missing the journey. The journey without which you can't really appreciate the destination. So let's start with the time when you were not aware of this thing called "tech debt."

Blissful unawareness: tech debt?

The blissful unawareness stage doesn't last for long, once you start writing software professionally. For some, it only takes months into the job to stumble across tech debt. For others, it takes years to notice. By the time you've got enough experience under your belt to be called a senior, you will know - and accept - tech debt all too well.

It took me a year and a half to stumble on it - even though I didn't know what to call it at the time. It came in the form of spaghetti code that was impossible to unravel. I was junior and got a short-term contract to improve the data mining software that my university professor was building. This code was written by a mathematician who left the project behind after building the v1 of the software. They estimated the work to take a few weeks.

I dived in. However, there was no documentation, no tests, and the naming was hard to make sense of. I tried to add changes here or there, but the code just did not work as I expected and kept on breaking. In the end, I threw in the towel and walked away. I thought I must be missing basic engineering skills, not being able to understand the code. It wasn't me, though: the professor later told me I was the third person they hired who could not make sane modifications to the code. In the end, they rewrote the whole thing from scratch. It was tech debt - poor coding and lack of any practices - that suffocated this project.

Denial: tech debt?!

Product Manager: "Alright, let's add a button to this page. When people push it, they should be able to start chatting directly with customer service. How long will this take?"
Developer: "Uh... about two weeks."
PM: "Two weeks? But we have the chat functionality. It's just a button. What exactly is taking two weeks?"
Dev: "Uh... so making modifications isn't really that easy. We didn't really prepare for changing UI elements or directly opening the chat flow..."
PM: "You know what? I don't believe you. You have four days. Or else."
Dev: (Welp... thinking of ways to hack it...)

Strange and not-so-funny things happen when you're working as a developer and your boss is someone who's never done software before. Even worse if they have done development at a basic level, but never got to understanding tech debt.

Earlier in my career, there were many times when me or my teammates tried to explain what I would call as tech debt today. We just didn't know this was the term for it. All we knew was things are hard to do in the codebase. That the code clearly wasn't designed for those changes. That spending time to improve the architecture would be a smart thing to do. That we needed more time to get it right.

We never got that time. We were also accused by going slow, not being competent enough. We were told the role model was the "rockstar" developer in the team who got everything done much quicker than us. Yeah - I knew the person and their code. They introduced the insane hacks, then conveniently stepped away from it, saying they got the main use case working, and asked someone else to finish it up. And finishing up was never straightforward. It would frequently mean rewriting their previous code, as it needed many fixes and was thrown together in ways that made little sense.

Working at a place that is in denial about tech debt usually goes hand in hand with a grim engineering environment. It doesn't matter if management just doesn't understand or doesn't care about tech debt. Management rewards hacks, short-term solutions, and the people who introduce these hacks. People who want to do more thorough work - thinking about maintenance, longer-term - are not valued. They might even be called out as slowing the team down, not focusing on the important things, or living in a bubble.

It's stressful to work at places like this. You can still do good work - but you'll probably have to do a lot of the improvements in "secret," or outside normal hours, to not be accused of not moving fast enough. On the flip side, the best engineers leave places like this for companies and teams that understand and accept tech debt.

Acceptance: tech debt...

The better the engineering culture of a company, the more aware and conscious you become of tech debt. It took me years to get here. Microsoft / Skype was the first company where we would ponder over tech debt, collect the different types of debt we had, and discuss how to pay it off. This was also the point where I finally understood tech debt enough to be able to call it what it is and explain it to others.

Tech debt is the incremental cost of doing software development. Tech debt is what happens when more code builds up, and things become more complex.

For a new codebase and a greenfield project, this incremental cost is zero. But the more complex the code gets, the more effort is to change the code while keeping things working. Codebases that are hacked together and have little to no automated testing or documentation become very time consuming to change. They accumulate a lot of tech debt. On the other hand, codebases where developers regularly invest time in maintainability will have lower tech debt and are less expensive to change. Investing maintainability includes investing in readability, testability, automation, and tooling.

Tech debt is a fitting word in describing this additional cost to change code. Debt indicates that it accumulates over time. With real-life debt, if you owe money, you have options on how to repay this. You could pay interest only for a while, then pay off the principal at the end of the loan. You could pay off the interest and the principal in parallel. Or you could pay the whole loan off at once. If you delay paying off debt for too long, the interest goes up, and various fees might kick in. I extreme cases, the debt can grow so large that it could bankrupt you.

Tech debt has similar characteristics in all regards. Debt used smartly can accelerate progress. When used poorly, it can become expensive to maintain. And bankruptcy through tech debt is also a thing: this happens when it's cheaper to delete and rewrite the codebase than it is to maintain or fix it.

Resistance: tech debt! Not on my watch!

Once you accept that tech debt is a given with any codebase that grows in complexity, you start to think. How can we keep tech debt to the minimum?

While tech debt is a given, it accumulates much slower when following certain best practices around maintainability and ease of code modification. Things like readable code. Testing. Code reviews. CI / CD. Documentation. Sane architecture decisions.

Let's say you are lucky enough to work on a codebase that is light in tech debt and follows many of the best practices. To prevent tech debt from growing, make sure to keep it this way. Beware of broken windows.

Broken windows: where tech debt sneaks in

A lot of tech debt I see in otherwise good codebases starts relatively small. It is often left behind by someone not paying enough attention and changes slipping through code review. These are usually easy enough to fix. They start to become a problem when these small pieces of tech debt keep growing without anyone addressing them. The codebase could easily fall victim to broken windows. The exception becomes the norm. People think, "let me follow this pattern". And the existing tech debt sets a pattern that spreads throughout the codebase.

As a rule of thumb, if you can see simple ways to clean up bits of tech debt, just do it. The change should be fast enough, and you'll leave the code in a better state than it was before.

Removing tech debt: where to start?

Chances are, you don't have the luxury of the tech debt-light codebase. Let's take the more frequent of the codebase you're working on right, having a bunch of issues. How do you go ahead to address them?

For small tech debt, just fix it as you go. Follow the boy scout rule of leaving the code cleaner than you found it - similar how scouts leave the campground cleaner than they found it.

For larger pieces of tech debt, take inventory of tech debt and quantify the impact it has, and the effort it would take to remove it. When there's a lot of something - like tech debt - you won't be able to tackle it all. Without gathering data on larger pieces of tech debt, it's hard to make good decisions on how to deal with it. When we're talking about things that take weeks or months to fix, the team has to prioritize. How does this work compare to work that has business-facing impact?

Sure, there is duplication across the code. What would the impact be if you moved things to a shared library? And what is the cost? The impact will be far higher with a codebase that's frequently used. On the other hand, a soon-to-be deprecated codebase might mean a large effort, and a small reward.

Slow build times? If the build is run frequently by many people, the impact could be large. Heck, it can be large enough to justify a dedicated team spending months making the build faster - which actually happens at places like Uber, Facebook, or Google. Flaky tests? The impact likely high, the effort hopefully low. Verbose boilerplate code? Perhaps lower impact, some work. Naming you personally don't like? Probably small impact, and could be lots of effort. All of this will depend on your environment.

Propose projects with clear impact to tackle tech debt with dedicated efforts. There might be parts of tech debt that are begging to be done: the impact is so clear. Say the team ships lots of bugs in production, and you don't have automated testing or CI in-place. The impact of cutting down bugs significantly, and needing less manual testing makes this investment a no-brainer. Something like proposing to write a new system that will have 99.9% reliability over the existing 98% reliability might translate to millions of dollars saved per year. If it does, you've just made your business case. Reliability, cost savings, faster development cycles, and fewer bugs are the most common impact factors I've seen people pitch to get larger tech debt removal or migration projects funded from the top.

Pair tech debt removal with high-impact projects. Unfortunately, most of the time, it will be hard to make a case for an only-tech-debt-removal project. Why is this? It's because teams always aim to work on the most impactful project - the one delivering the most business value. Business value often being revenue, user metrics, and the like.

These projects are usually ambitious. They are high-visibility. And to ship them, the systems that have the most tech debt, need to be touched. Touching systems that have high tech debt means they're much slower to change already. So if the team will already need to make several changes to a tech debt heavy system, why not spend a little more time and reduce the debt?

Here's the dirty secret of teams who ship impact and remove tech debt at the same time: they rarely ask for permission to remove tech debt. Instead, they bundle the removal of tech debt to the high-impact project and just do it.

Dev: "We'll ship this feature that generates $5M/year additional revenue and we'll also introduce integration testing on the way."
PM: "Can we just ship without the integration testing?"
Dev: "Sure. But it will probably take longer then and we might lose revenue on the way. We're seeing lots of bugs go out in recent features and we'd need to do more manual testing, and more releases. By doing automated integration testing, we'll be done faster. We're estimating it would take 4 months this way and 6 months without the integration tests."

Why bother bundling with projects, though? It's to make sure they get done. While projects that only reduce tech debt but deliver no business value will often be de-prioritized, high impact projects almost never are. If you want to see your large tech debt reducing proposals through, couple them with high-impact projects.

The pragmatic middle ground: just enough tech debt

Is there such a thing as too little tech debt? If you pay off enough of tech debt, at some point, you realize that there is. The name of too little tech debt is premature optimization - and it can slow down teams and companies at critical times.

Take an example of a startup. When the company launches, speed and iterating quickly is key to surviving and winning. Do you worry about clean APIs and nice data models or just dump everything in an unstructured JSON that any developer can modify? The startups I've worked at that grew to be successful all went for the tech debt-heavy approach in the early days.

Uber was one of these startups. When I joined, there were still frequent reminders of the early tech debt, and the short-term decisions were haunting parts of the codebase. But that tech debt served its course. It allowed the company to move fast, when speed mattered the most, in getting product-market fit. After getting there, Uber invested in clearing all of it up.

Tech debt is something you want to have for early-stage projects: for throwaway prototypes, for MVPs, and when validating the business model of a startup. Tech debt is something that can be fixed by throwing time and developers at it, later on - the same way Uber did. If a late-stage startup that is growing fast is not busy paying off early tech debt on the side, something smells fishy. If a team owning a mature product is not keeping check of their tech debt - investing here and there to keep it at bay - something is also probably off.

Pragmatic engineers don't see tech debt as a bad thing: they see it as a tradeoff between speed and quality. They see it as the characteristic of a system. They put tech debt in the context of the goals of the project and don't try to pay off more than they need to. They also keep track of the debt and step in to reduce it before it gets too high - getting creative on the way when needed.

So now, how much tech debt should your project have? And what are you going to do to reduce - or perhaps increase - that tech debt?

Subscribe to my weekly newsletter to get articles like this in your inbox. It's a pretty good read - and the #1 software engineering newsletter on Substack.