By Tim Varelmann
This case study is shared with permission from Lucky Data.
Lucky Data is a German IT company providing IT services. They also develop logistic dispatch software that modernizes how dispatchers plan logistics in the construction industry and at inland ports. Bluebird Optimization supports this development.
This article describes a refactor of heavily pointer-based legacy C++ to modern value semantics: a change that eliminated a whole category of bugs, cut the size of the most bug-prone code almost in half, and laid the groundwork for a faster, more responsive user experience. For a business whose software makes high-stakes operational decisions every day, that combination (fewer defects, faster to change, faster for the user) shortens the path from new customer requirements to shipped features.
--
For months, development had been humming along. New features went out in regular releases. Nothing dramatic, nothing on fire. The kind of stretch where you start thinking about what's next rather than what's broken.
Then came early March.
It showed up in code that was still under development: a stubborn bug with a reproducible symptom and no obvious cause. I spent a full day chasing it, and by evening had nothing but a patch: a piece of code that eliminated the symptom without explaining it. With a release on the horizon and other things still undone, I decided to postpone investigation.
That evening was uneasy. A full day had disappeared with nothing to show except a fix I couldn't fully justify. When you can't explain why a bug happened, you can't really be sure it's gone.
A week later, another one. Different surface, same feel: hard to pinpoint, symptoms that didn't line up cleanly with the code supposedly producing them. Almost another full day, another patch, another symptom silenced without understanding. The release was imminent. The patch went in, and the release shipped.
Once it was out, there was finally room to breathe: and to actually investigate.
With the pressure off, the picture came together quickly. The two bugs weren't independent. They shared a family resemblance, and that resemblance pointed at the architecture itself: a design built around pointers, with shared access to data spread across large parts of the code. Each bug had its own immediate trigger, but all of them were drawing from the same well.
The uncomfortable conclusion: this architecture had no real justification for what the software actually does. It was a carelessly chosen default in the past, never reconsidered. And it was producing bugs that would keep coming back under different names until the underlying structure changed.
To explain what changed, a quick detour through how programs store data:
Computer programs keep data in memory, and memory has two main regions: the stack and the heap.
Data on the stack belongs to one specific piece of code: the function that created it. Only that function can read or modify it.
Data on the heap lives on its own. Any piece of code with a pointer (essentially an address telling the program where the data lives) can read or modify it. The same piece of heap data can have many pointers aiming at it from many parts of the program.
Why does that matter? At first glance it sounds fine: if two parts of the program both change the same piece of data, surely each has a good reason. And individually, yes, each change usually does. The problem isn't individual intent. The problem is that developers can no longer reason locally.
Here's an example: Imagine code that decides whether truck 42 is available at 3pm. It reads the data: "truck 42, free", and starts assembling a dispatch order. Between the moment it reads and the moment it commits, another part of the program, also for a perfectly valid reason, marks truck 42 as under maintenance. The dispatch code has no way of knowing. The truck gets assigned anyway.
In isolation, both pieces of code are correct. The bug lives in the space between them. And debugging it means tracing every part of the program that might hold a pointer to truck 42, which in a mature codebase can be dozens or hundreds of places.
That doesn't mean pointers are bad. Anyone who has installed a browser extension, a Microsoft Office add-in, or a game mod has benefited from them: that whole category of "extend the running program with something it didn't originally know about" depends on pointer-like mechanisms. Pointers are the right choice in the right place.
For the central data store in question though: the piece that holds whatever data is currently on screen or in its temporal neighborhood: there was no such reason. The complexity of pointers was pure cost, no benefit.
Three things stood out.
The codebase reached for pointers as the default tool, in a part of the code where there was no case for them. The result was exactly the cost described above: shared access to data with no clear owner, lifetimes that had to be reconstructed piece by piece, and the space between any two accesses as a potential bug.
Mutexes are coordination mechanisms for code that runs in parallel: they prevent two threads of execution from stepping on each other's data. But currently, this part of the software runs on a single thread. Every mutex in it had been added, at some point in the past, as a patch to a bug that looked like a race condition. They were leftovers from old firefighting sessions, still slowing the code down.
The code used an abstract superclass as the only way to reach the data for the two kinds of things the software tracks: assignments (the scheduled events of a dispatch plan) and entities (the things that participate in those events: trucks, concrete mixers, skilled workers, helpers, products to be delivered). Inheritance is a mechanism that lets different kinds of things share an overarching category. It's useful when you genuinely need that unified treatment, but in C++ it practically forces developers toward pointers: treating different types through one shared category requires them. It also adds runtime cost and makes the code harder to follow.
Two decisions did most of the work.
When a part of the application needs an assignment or an entity, it asks by unique ID and receives a copy. The copy belongs to the caller (and is stored in the caller's stack). Nobody else can reach in and change it. When the caller is done, the copy simply disappears: no bookkeeping, no leaks, no surprises.
There's a well-worn piece of guidance in software design: prefer composition over inheritance. Instead of assignments and entities sharing a common ancestor, they now each contain a small common piece (the properties they genuinely share) and are otherwise independent types.
The overarching category that used to be the only way to access this data is still available for code that genuinely needs to treat both kinds uniformly: it's now implemented with a modern C++17 mechanism called the visitor pattern, which developers don't even have to know about in order to use it. But crucially, it's no longer the only door. Parts of the application that know they're dealing only with assignments, or only with entities, can now ask for exactly that. Less iterating over everything and filtering afterwards. More direct, more readable.
To put it briefly: bugs fixed and structurally prevented going forward, the software easier and cheaper to maintain and extend, and features that looked out of reach are suddenly almost ready. All achieved in less than two weeks of focused work.
Rolf Ruß/Patrick Wolff, CEO of Lucky Data assesses this development as follows:
"Quote from Patrick/Rolf"
Two lost days while a release was close were expensive. They also turned out to be the cheapest possible price for the lesson. Patching a bug without understanding it feels productive: the symptom goes away, the release ships, the open-items list gets shorter. What it actually does is borrow time against the next incident. The second patch, a week after the first, was the receipt.
The real work was taking the architecture seriously as a source of bugs. Identifying the unsuitable aspects of architecture and giving them the time they need to be replaced: that's where the leverage is.
If you recognize some of this in your own codebase: bugs that keep coming back under different names, parts of the code everyone is a little afraid to touch: I'd be glad to talk. Reach out directly, or subscribe to Bluebird Briefings to stay on top of optimization and engineering topics like this one.

