I inherited a vibe-coded MVP to scale: the honest diagnosis

TL;DR

Over the past few months the profile of who reaches out to me has changed. It used to be “I have an idea, build it for me.” Now it’s “I already have the app, it sort of works, but it stalls the moment it has to grow, and I don’t know whether to fix it or throw it out.” The product was vibe coding: generating software by asking an AI and accepting the result without reading or understanding the code that came out. It works enough to have paying customers. And enough to be scary.

The first decision in front of one of these isn’t technical. It’s triage. And the most expensive temptation, the one almost everyone has on day one, is to order a full rewrite from scratch. It’s almost always a mistake. In most cases it’s not a transplant: it’s a test suite first, then a scalpel on what actually rotted. You save far more than the panic suggests.

The numbers explain the scare and why it misleads. 45% of AI-generated code carries an OWASP Top 10 flaw (Veracode), and only 10.5% of vibe-coded code passes a decent security review against 61% that simply “works” (Carnegie Mellon). It looks like a demolition order. It isn’t. It’s the measure of the distance between working and holding up, and distance you measure BEFORE you demolish, not after.

Working isn’t the same as done. But it isn’t the same as junk either.

Rewriting from scratch is the most expensive mistake there is

Everyone who calls me with a stuck app shows up with the same line on the tip of their tongue:

“This code is a mess. It’s faster to redo it from scratch than to understand this dumpster fire.”

Hold on. That line is the most expensive siren song in software engineering, and I’m not the one who discovered it. Joel Spolsky called rewriting from scratch “the single worst strategic mistake that any software company can make”, and that was in 2000, long before AI made the rewrite even more tempting and even more expensive.

Why a mistake? Because you’re about to throw out the one thing this MVP actually proved: that people want it. The code is ugly, but it carries months of learning baked in. Every weird hack is, half the time, a real edge case some customer hit, one that the “clean” rewrite will rediscover the hard way, in production, all over again.

Think about Twitter. It was born as a Rails monolith because that’s what two guys could move fast enough to find product-market fit. The scaling problems came later. Because it worked. If they had started “the right way,” beefy and distributed, there probably wouldn’t be a Twitter to have scaling problems. The speed of vibe coding is real and it’s valuable for validating. The mistake isn’t having validated that way. It’s not shifting gears once the validation is over.

A rewrite from scratch is the vanity of whoever just showed up on the project.

Before you touch a single line, the test suite

Here’s the step almost everyone skips, and it’s the one that separates a rescue from a second disaster: you don’t fix what you can’t test, nor safely refactor what you didn’t cover first. Tests aren’t the last step of the rescue. They’re the first.

And it’s exactly what vibe-coded code doesn’t have. The AI generates with a brutal happy-path bias: it covers the flow you described and ignores the rest of the universe. Network error, invalid input, two clicks on the same button, the user who does it in the wrong order. None of that exists in the code, so none of it breaks visibly. Until it breaks in front of the customer.

“Write tests before shipping a new feature? I’ll spend two weeks with nothing to show the board.”

I get the anxiety, and it’s backwards. The test suite isn’t time lost before delivering value. It’s what gives you permission to touch the code without praying. Without it, every refactor is a blind bet: you fix one bug and find out, three days later, that you broke two others nobody could see. With it, you refactor with your eyes open. It’s the difference between operating with the lights on and operating in the dark.

In practice, I don’t even ask for full coverage up front. I ask for tests on the flows that make money and the ones that lose money: the checkout, the login, anything that touches a balance. It’s the minimum safety net so everything that comes next is surgery, not roulette. That same discipline of validating before trusting we already broke down in local validation with real quality, only there applied to the flow of whoever is building, not rescuing.

The diagnosis: reading the debt statement

With a safety net in place, you can open the hood without fear. The diagnosis of a vibe-coded MVP almost always finds the same four holes. I call it reading the debt statement, because that’s literally what it is: finding out how much you owe and to whom.

Coupled architecture. The business logic is glued to the infrastructure, the API was designed without thinking about load, the schema can’t grow sideways. It works with fifty users because everything works with fifty users. The paper that formalized this calls it flow-debt trade-off: the fluidity of generating code masks the debt piling up in parallel.
Missing observability. Logging, tracing, and metrics came in as an afterthought, or didn’t come in at all. When it falls over at 3 a.m., you have nowhere to look. The OneUptime line stays on the wall: “observability isn’t a nice to have, it’s your only safety net”. In vibe-coded code it’s the stand-in for the human review that never happened.
Decorative security. API key hardcoded in the repo, auth with the logic inverted, a database exposed with no access rule. It’s not the exception: Apiiro measured AI-generated code adding up 10x more security findings in six months, with privilege-escalation paths up 322%.
Fragile deploy and CI. The classic case is preview, test, and production sharing the same database. That’s how Replit’s AI wiped the production database during an explicit code freeze, in ALL CAPS in the prompt, and then lied saying the rollback was impossible. Separating environments is the cheapest and most ignored lesson on the list.

The statement is scary on purpose. But a statement isn’t an eviction notice. It tells you where the expensive debt is and where the debt you can roll over is, and that distinction is the whole rescue.

How do I know whether the architecture can be saved or not?

There’s just one practical criterion: how coupled the business logic is to the infrastructure, plus whether the data model can grow. If the logic that matters is buried in the middle of the controller, dependent on a database detail that won’t scale sideways, that piece is a localized rewrite: there’s no saving the foundation without redoing it. If the business logic is at least isolated, even if ugly, it’s remediation: you improve it from the inside without demolishing. Most of an MVP falls in the second case. That’s why a full rewrite is almost never the right answer: you condemn the whole building over two rooms.

What to save and what to rewrite without mercy

Triage is deciding what goes into the operating room and what gets discharged. After doing this on plenty of apps, the pattern is pretty stable.

Save almost always: the domain model that reflects the actual business (the names of things and how they connect), the flows the user already validated in practice, and a good chunk of the interface. That’s the knowledge that cost months and that a rewrite throws in the trash for free. The version of this discipline at the code-structure level we opened up in your codebase is the new prompt: what decides whether it scales isn’t the stack, it’s the repository staying navigable.

Rewrite without mercy: the auth and permissions layer (it’s where getting it wrong hurts most, because it touches every request), the business logic the AI duplicated across eight, ten, twelve places, the schema that can’t take traction, and the integrations with no fallback at all. The duplication isn’t a detail: in 2024, for the first time in history, copy-pasted code overtook refactored code, with duplicate blocks growing 8x (GitClear). Each copy is one more place for a bug to hide and never get fixed everywhere.

What guides the cut is understanding why the app stalls where it stalls. Addy Osmani dubbed it the 70% problem: AI gets you fast to 70%, except it’s 70% of the volume of code, not 70% of the path to a finished product. The missing 30% is exactly what you can’t generate in a rush: edge cases, maintainability, performance, security. It’s the expensive part. It’s the part triage isolates.

Is vibe coding fit for production?

It’s fit to get you to the door, not through it. Vibe coding is a spectacular validation accelerator: it proves the hypothesis, wins the first customers, shows there’s a business. The mistake isn’t using it. It’s mistaking the prototype that validated for the product that scales, and going on stacking features on top of a foundation that was never built to carry weight. The very mechanics of this, of why “prompt, accept, deploy” stalls when it’s time to grow, we discussed in getting out of vibe coding. Vibe coding gets you to the MVP. Method gets the MVP to a product.

Stabilizing without stopping the business: surgery with the patient awake

The last piece, and the one that most sets a competent rescue apart, is the “without stopping the business” part. Because the app is live, it has customers using it, it has revenue coming in. You can’t shut everything down for two months to tidy the house. You have to operate with the patient awake.

The way to do it has a name: Strangler pattern: replacing the old system from the outside, module by module, while it keeps running, until the new one strangles the old. Instead of the big bang (“flip the switch to the new one on a Sunday and pray”), you pick a piece (auth, say), build the new version alongside, send a fraction of the traffic to it with a feature flag, confirm it holds, and only then retire the old one. Got it wrong? Rollback in one click, nobody notices. Repeat for the next module. The risk gets sliced into pieces that fit in your pocket, instead of a single bet that can take the company down.

And the first slice, almost always, is the cheapest and the most forgotten: separating the environments. The production database is sacred, it has a tested automatic backup, and nobody, human or AI, touches it without a net. It’s the fix that would have prevented the entire Replit disaster, and it costs a day.

There’s a trade-off, and I don’t sell miracles. On paper, the Strangler is slower than rewriting from scratch: you keep two systems alive at the same time for a while, you pay the cost of keeping the two talking. It’s annoying. But it’s the price of not stopping the revenue while you operate, and it’s incomparably cheaper than the rewrite that freezes the product for a quarter and still ships late. It’s the kind of independent diagnosis, with no strings attached, that we deliver in an Audit before a single line gets touched: the map of what to save, what to rewrite, and in what order to move. What to cut and what to keep when you decide the scope of that rebuild we opened up in what to cut and what to keep in an MVP.

Working isn’t done. But it isn’t junk either.

The instinct in front of a stuck vibe-coded MVP is binary: either it’s wonderful because it’s live, or it’s junk because the code is ugly. Both are wrong. It’s exactly what it looks like: a prototype that validated a business and now needs to become a product, with a debt you can read, line item by line item, and pay off in the right order.

The distance between the demo that wowed and the system that takes traction is measurable. It’s not faith, it’s diagnosis: tests to turn the lights on, the statement to know what you owe, triage to separate what you save from what you rewrite, and the Strangler to operate without switching the patient off. Almost always you can do it without demolishing the house.

Done is a state you prove, not one you feel.