Evals Are Powerful, Not The Starting Line
At a product conference last week, someone presented the idea that evals are the new PRDs. That in the AI era, PMs don’t build with specs, they build with evals. Evals define “good,” catch regressions, and keep model iteration grounded in reality instead of debating requirements.
The charitable read is that writing good evals requires the same rigor as a good PRD — clear success criteria, edge cases, and user intent. If that’s the argument, I’m with it. Good evals are hard to write, and they force the same precision we should have been bringing to PRDs all along.
But that’s not what was said. The claim was that evals replace PRDs. And that’s where the framing falls apart.
These are not the same cognitive job
Evals measure model behavior against defined criteria. PRDs define the problem worth solving, who it’s for, and what success looks like for the user. Conflating them doesn’t elevate evals; it just misunderstands PRDs.
An eval can tell us whether the model’s response was accurate, concise, and free of hallucinations. It cannot tell us whether we’re solving a problem worth solving. It cannot tell us who the user is and what their struggling moment looks like. It cannot tell us whether the product should exist at all.
Last week, I wrote about product mindset as a function of clarity — clarity of purpose, users, and impact. Evals measure whether the thing we built is working. But they can’t do the upstream work of defining purpose or identifying users. If any of those are missing, it doesn’t matter how good the eval suite is; the product may still lack purpose or solve a problem for the wrong users.
I’ve built evals. They’re not the starting line.
We built a rubric to evaluate every conversation customers had with our chatbot — measuring accuracy, relevance, conciseness, hallucinations, and toxicity. As LLMs evolved, we updated the rubric based on how customers were interacting with the product, and then worked on automating the evaluation to review conversations at scale.
That eval work was valuable. But we could only write those evals because we had already done the harder work — defining what the chatbot was for, who it served, and what a good experience looked like for those users. The eval didn’t replace that thinking. It depended on it.
The real problem with “X is dead” framing
This is the same type of rhetoric that says PRDs are dead because you can show (not tell) your idea in Lovable. Or that product management itself is dead. These titles make good clickbait. But what are they telling the product management community? Your area of expertise is dead. Your artifacts are no longer needed.
AI has changed how we work and the artifacts we produce. We no longer need to spend hours on manual documentation. But the judgment underneath — what to build, for whom, and why — hasn’t been automated. The decisions we make from empathy for our customers, understanding of our cross-functional teams, and relationships with stakeholders are the PM job that will remain human. The documents were never the point; they were the output.
Evals are powerful. They’re not the starting line.
Evals belong in every AI PM’s toolkit. That’s how we validate that the product is doing what it’s supposed to do. But they sit downstream of the clarity work — the purpose, the users, the definition of impact. Just like documents, evals are output. The thinking is still the job. Without that, we’re measuring precisely and building aimlessly.
