The It-Worked-Once Deployment
The code ran locally once, so everyone pretends it’s production-ready.
What it is
The PR is approved on the strength of “I tested it
locally.” The CI is passing on tests AI wrote against
code AI wrote. The deployment is scp and a
pm2 restart. The service comes up, runs for thirty
seconds, is declared shipped — and falls over at 2am for
reasons that “couldn’t have happened locally.”
How it happens
AI made writing code dramatically faster without changing the cost of anything around it — building in a realistic environment, deploying with rollback, monitoring, alerting, on-call runbooks, capacity planning. The ratio of code-shipped-per-week to operational-discipline-applied collapses. The thing that used to be a third of the work (coding) is now a sliver; everything else stays the same in absolute terms and gets ignored in relative terms because it’s suddenly the bulk of the cost.
The “I tested it locally” claim used to be backed by hours of effort — setting up the data, configuring the env, exercising the flow. Now it’s backed by a five-minute AI interaction, and the developer’s mental model of what “tested” means hasn’t updated. The phrase points at the same words, the same confidence; the underlying work behind it has evaporated.
Why it’s dangerous
Production isn’t local. It has real load, real concurrency, real data shapes, real network failures, real cold starts, real noisy neighbors, real version mismatches between the service and its dependencies. Things that worked once locally fail predictably in production — not because they were buggy, but because they were never exercised against production conditions. The incident retrospective always reads “this came up because we never tested under realistic load,” and the cycle repeats next sprint.
The AI-era hinge: the gap between “code exists” and “code is in production safely” used to be obvious because writing the code was slow enough to draw attention to everything around it. Now writing is fast, the gap is wide, and the team keeps walking into it because the velocity at the front of the pipeline papers over the lack of investment at the back.
How to prevent it
“I ran it locally” isn’t a deployment claim — it’s a development checkpoint. The deployment claim is: tested in an environment that resembles production, deployed via a reproducible pipeline, monitored after release, rollback path known and exercised. AI can produce all of that — pipelines, infrastructure-as-code, monitoring config, dashboards, runbooks — faster than the team can review it. The leverage is real and underused; the work isn’t to skip it, it’s to use AI to do it cheaply enough to actually do it.
Scale to stakes: a typo fix on a static page isn’t the same as a new background job that talks to billing. The friction signal is concrete — if the answer to “how do we roll this back?” is “we’d ship a fix,” you don’t have a deployment, you have a hope.
The serious team fix
Three things, reinforcing each other:
- A team definition of done that means deployed and observed, not merged. A change isn’t shipped until it’s running in production under real load and monitoring confirms the expected behavior. The author owns this end-to-end. Closing the ticket on merge is the failure pattern the definition is designed to prevent — the team agrees what done actually means and holds the line.
- AI generates the operational scaffolding from the diff itself. Pipelines, infrastructure-as-code, dashboards, alerts, runbooks — AI drafts each one from the change being shipped, the human reviews and tightens. The cost of doing it right used to be the bottleneck; with AI generating the boilerplate, the bottleneck becomes “do we care enough to invest” — which is a question the team can answer, instead of a budget it can’t afford.
- A real staging environment and an automated deployment pipeline with rollback. Staging mirrors production in the ways that matter (data volume, dependencies, concurrency). Deployments are scripted, reproducible, and reversible. Rollback is a single command or button, exercised regularly so it works the night you actually need it. The infrastructure makes safe deployment the default, not an act of heroism.
The shift is: code that worked once on a laptop is a demo. The deployment is everything between the demo and the customer — and AI just made all of it dramatically cheaper to build, if the team decides to build it.