Legacy systems don’t just run services—they store decisions. Extracting decades of that knowledge, and whether it still reflects what the service needs to do, is the work that determines whether a modernization project succeeds or fails. In our previous article we made the case that AI can’t modernize a legacy system on its own. Here we look more closely at what that work actually involves, and why it’s the work that most projects compress or skip entirely.
The pull is always toward the most visible output: translating old code into new code. The less visible work—understanding what the system actually does and questioning whether it should keep doing it—gets deferred. Not because organizations make bad decisions, but because the pressure to deliver makes it easy to treat that work as something to sort out later.
Why institutional knowledge is hard to replace
Government systems carry a particular kind of institutional weight. The child benefit system, the court case management platform, the licensing database: these persist across governments, across decades, and across generations of staff who understood them deeply and then moved on.
What leaves with those people is rarely in any handover document. It’s the reasoning behind a business rule that nobody questions because it’s always been there. The policy exception encoded in a conditional statement that everyone’s afraid to touch. The workflow was designed around constraints that no longer exist but has since become how things are done. That unwritten, accumulated knowledge is what makes legacy modernization genuinely difficult, and what makes a purely technical approach to it genuinely risky.
The developers who built and maintained many government systems through the 1980s and 1990s are retiring. Specialized skills for older platforms are harder to find. The systems themselves are accumulating risk as dependencies go unsupported and integrations become more fragile.
The risk isn’t that the new system will fail testing. It’s that it will pass testing and still fail users, because it faithfully reproduced a workflow that was already serving people poorly, or preserved requirements that should have been retired years ago.
Knowledge extraction is two questions, not one
Effective knowledge gathering starts with separating two questions that are easy to conflate: what does this system do, and should it keep doing those things?
What does this system do?
In practice, AI analysis tends to be most useful for two things during knowledge extraction. The first is structural: identifying where business logic is concentrated, where dependencies are tightly coupled, and where the same rule has been implemented inconsistently across different parts of the system. Our Labs work found that graph-based analysis surfaced these structural issues, tangled dependencies and oversized modules carrying too much responsibility, in ways that traditional line-by-line review tends to miss, and far more quickly in an unfamiliar codebase. The second is speed: in a documented case from Thoughtworks, AI-assisted analysis of a 15-million-line legacy codebase brought reverse engineering time down from six weeks to two weeks per module by combining structural code analysis with the ability to query the system in plain language. That freed the team to spend more time on the contextual work that AI cannot do: understanding what the code was actually meant to accomplish and whether it still did. Even the best analysis tools describe what the code does, not why it was written that way. The why requires people.
What good knowledge extraction produces is something more than documentation. It’s a connected picture of the system: business rules linked to the policy intent behind them, workflows traced to the decisions that shaped them, edge cases recorded alongside the context that made them necessary. That picture is what makes the second question answerable. Without it, decisions about what to preserve, simplify, or retire are based on assumption rather than evidence.

Should it keep doing those things?
This is where service design comes in, and where most technically-framed modernization efforts fall short. Understanding a system thoroughly isn’t the same as knowing which parts of it deserve to survive. In practice this often means discovering that the scope of the service is larger than the system suggests. What looks like “schedule an appointment and complete an assessment” turns out to be “help someone find funding, understand their eligibility, and complete an assessment along the way.” The legacy system may handle the narrow workflow correctly while missing the broader need entirely.
The way this works in practice is that knowledge extraction and user research run in parallel rather than in sequence. As business rules and workflows are being documented, they become the agenda for conversations with the people who actually use the service. Those conversations don’t happen after the technical picture is complete; by then the scope has already been set. They happen while the extraction is underway, so that what gets documented is already being questioned.
In government, this step is more consequential than it might appear. Some workflows exist because of legislative requirements or privacy constraints, not operational preference. Others exist because of neither: they’re just how things have always been done. Knowing which is which before you start redesigning is the difference between a defensible decision and an expensive one.
Our work on Alberta’s Child Care Subsidy modernization shows what this looks like. The existing application was only accessible on decade-old browser versions or as a printable PDF. Families who needed to apply were tracking down old browsers through personal contacts, or being directed to libraries. Those who managed to apply online received a tentative approval email with no way to track what came next, which drove a high volume of calls to Alberta Supports staff for basic status updates. The forms used legal language that was technically accurate but practically unusable. Mandatory fields weren’t marked. Error messages didn’t explain how to fix mistakes.
None of that was visible from the code. It emerged from talking to assessors who processed applications and to families across rural and urban Alberta who submitted them. That research shaped what was preserved, what was simplified, and what was redesigned. The back-end system stayed in place. The citizen-facing service was rebuilt around how people actually needed to use it.
What good knowledge gathering actually involves
Getting the knowledge gathering phase right comes down to a few things we’ve seen make a consistent difference.
Treat everything AI extracts as a hypothesis, not a finding. AI tools are useful for reading unfamiliar code quickly—surfacing where key rules live, mapping how data moves, generating a first draft of what the system appears to do. But an AI tool doesn’t know what it doesn’t know. It will confidently describe a workflow that hasn’t reflected operational reality for years, because the code still says so. Even if nobody follows it anymore. Every candidate requirement an AI surfaces needs a human to either confirm it or retire it. Until that happens it’s a hypothesis, not a requirement.
The people who can answer your questions are leaving. Structured conversations with long-tenured staff are the primary source for the contextual knowledge that no tool can recover—why a rule was added, what incident produced an exception, what a field means in practice versus what the data dictionary says. These conversations have a shelf life, and once the people who can answer them are gone, the answers go with them. Capturing that knowledge systematically, with explicit decisions about what to preserve and why, is work that has value beyond the immediate project.
Knowledge gathering is also when you build your research agenda. The questions worth asking aren’t abstract: which business rules still reflect how the work actually happens? Which workflows create friction that staff or citizens have learned to tolerate? Which requirements were added to handle scenarios that no longer occur? These questions only emerge once you understand what the system actually does—which is why user research needs to happen before any redesign decisions get made. That connection between technical understanding and service design is where most technically-framed modernization efforts break down, and where the most costly assumptions get locked in.
Service design needs to be in the room while the technical picture is being built. By the time the technical documentation is complete, decisions have already been made about scope and structure that are hard to undo. Involving service designers and frontline staff while the extraction is happening ensures that every business rule and workflow being documented is simultaneously being asked a harder question: does this still serve anyone, and if not, what would? Without that, knowledge gathering produces a better picture of the current system. It doesn’t tell you what to do differently.
Extract knowledge early or pay later
The projects that get this right aren't necessarily the ones with the most resources or the best tools—they’re the ones that ask the harder questions early enough to act on the answer, before the pressure to deliver makes scope feel fixed and assumptions feel like facts.