Refactoring a Go service to 90% coverage
I spent a chunk of last quarter refactoring a Go service that backs our ads. It had grown the way these services do: one big package, functions that did three things each, and almost no tests. I wanted to make it easier to change without being scared every time. Along the way the test coverage went from somewhere low to about 90%, and the tests caught more real bugs than I expected.
Why I bothered with tests at all
The honest answer is that I did not trust myself to refactor it safely without them. The service handled real traffic and real money. If I broke something quietly, I would not find out until it showed up in a graph.
So before pulling functions apart, I wrote tests around the current behavior. Not perfect tests, just enough to pin down what the code did right now. Then I refactored, and if a test went red I knew the refactor changed behavior, not just shape. This is the loose version of TDD: I am not writing tests for code that does not exist yet, I am writing them to lock down code I am about to move around.
What the tests caught
Three things stand out.
First, nil and empty edge cases. A function took a slice of ad slots and picked the best one. When the slice was empty, it indexed into it and panicked. In production this almost never happened because there was usually at least one slot, but “almost never” is not never. The test that passed an empty slice failed loudly, and the fix was two lines.
Second, a config path nobody had ever tested. The service read a config value that switched between two pricing modes. One branch was exercised constantly. The other branch was only used for a specific kind of campaign, and it had a bug that had probably been there for a year. Nobody noticed because no test and very little traffic ever hit it. Writing a test for that branch surfaced it immediately.
Third, a crashloop right after a refactor. I had split a struct into two, and one of the new constructors left a field as its zero value when it should have been initialized. The service booted fine in most environments but crashlooped in one because that field was dereferenced during startup. A startup test that just constructed the service and ran its init caught it before it ever shipped. Without the test it would have been a deploy, a crashloop, a rollback, and an afternoon of confusion.
How tests changed the refactor
The thing I did not expect was how much the tests changed the way I worked, not just the result.
With a test suite I trusted, I stopped being careful in the cautious, slow way and started being careful in a faster way. I could rename things, move functions between files, change a signature, and run the tests. Green meant keep going. Red meant look closer. I did dozens of small refactors in a row that I would never have risked on the old untested code, because each one was cheap to verify.
That changed the pace. Refactoring untested code feels like walking on ice. Refactoring code with good coverage feels like normal walking.
func TestPickSlot_Empty(t *testing.T) {
got, err := PickSlot(nil)
if err == nil {
t.Fatal("expected error for empty slots, got nil")
}
if got != nil {
t.Fatalf("expected nil slot, got %v", got)
}
}
A test that boring is the kind that catches the panic at 2am.
The payoff during peak traffic
The payoff showed up during a high-traffic stretch. The refactored service ran through it without the kind of small errors we used to see. Not because the code was magically better, but because the obvious holes (the nil case, the untested branch, the bad constructor) were already closed before traffic ever reached them.
I am not going to claim 90% is some target everyone should hit. The number is not the point. The point is that writing tests around behavior before changing it let me change it a lot, and the bugs it caught were the kind that hide in the paths you never look at.