Screenshot-Driven Product Spec
“Make it look like this.” No behavior, data, permissions, or errors.
What it is
The product spec is a screenshot, a Figma frame, or a competitor’s page handed to AI with “build this.” The AI builds what it sees. What it can’t see — the data model behind the UI, the permission model, the empty state, the error states, the loading semantics, the validation rules, the analytics, the accessibility — gets invented from defaults or skipped entirely.
How it happens
Multimodal AI made “build from a picture” suddenly viable. The UI comes up in minutes. It looks right. But a screenshot is a snapshot of one state of one user’s session — it doesn’t show what happens when the list is empty, what happens when the form submission fails, who’s allowed to see this page (or this row in this page), what the data model is when the UI displays aggregates, what the keyboard nav and screen reader experience look like, or what analytics event fires when the user clicks the button. All of those get filled in with median defaults — the generic empty-state copy, the hardcoded permission check, the form that accepts anything, the spinner that never times out.
The AI-era mechanism is specific to multimodal. Pre-multimodal, “build this UI” required someone to translate the picture into a written spec — the translation step forced the questions. Post-multimodal, the picture goes straight to code, and the questions never come up. The cost of skipping them dropped to zero, so they get skipped, every time.
Why it’s dangerous
The product looks done and isn’t. The visible part is maybe 30% of the work; the rest is what makes it correct, safe, accessible, and operable. Shipping the visible part means shipping a product whose data model is whatever AI guessed, whose permissions are whatever AI assumed, whose empty state says “no items” everywhere, whose error path is a console log. The bugs land on real users, not on the developer’s localhost — the demo lied because the demo only had to render one frame.
The AI-era hinge: the screenshot used to be the start of the spec. Now it’s offered as the entire spec, because AI can build from it. The compounding piece is that the mid-fidelity UI is so convincing that the missing pieces don’t feel missing until production hits a state nobody designed for.
How to prevent it
A screenshot is a goal, not a spec. Before the UI gets built, the spec has to answer the unanswered states — data shape, permissions, empty, loading, error, validation, accessibility, analytics. AI is excellent at producing the question list: point it at the screenshot and ask “what does this design not tell me?” The resulting checklist is the spec the screenshot didn’t come with.
Scale to stakes: a marketing static page isn’t a checkout flow. The friction signal is concrete — you’re about to start building, and you can’t say what the page does when the list is empty, or who can see it, or what error the form returns on a duplicate submission. The spec isn’t ready and the build isn’t either.
The serious team fix
Three things, reinforcing each other:
- A team habit of treating the screenshot as the question, not the answer. Before any UI work begins, the team converts the visual into a written spec that answers the unanswered states — empty, error, loading, permission-denied, validation, accessibility, analytics. The screenshot lives at the top of the spec; the answers live below it. Designers and PMs participate; the engineer isn’t left to guess.
- An AI-leveraged “what’s missing” agent. A slash command or skill that takes the screenshot or design and produces the list of states, behaviors, and edge cases it doesn’t specify. The AI is excellent at exhaustively enumerating the missing pieces; the human (or PM) decides which ones matter for this product and what the answers are. The AI does the cataloguing nobody has time for.
- A UI testing harness that exercises non-happy states. Storybook (or similar) with explicit stories for each state: empty, loading, error, permission-denied, very long content, RTL, high contrast. Visual regression catches drift. Accessibility lint catches contrast and labels. The infrastructure makes the non-happy states first-class artifacts the team can’t forget — because they’re in the file tree, not in someone’s memory.
The shift is: a screenshot is one frame. A product is every frame nobody drew — and the team’s job is to draw them, deliberately, before AI builds the wrong defaults into the codebase.