Where does human judgment live when AI can build the software?

Scott.Romack · May 30, 2026, 4:43pm

A question keeps surfacing in every AI thread here: if AI can generate the implementation, where does human judgment actually live?

I’ve been building a project, Grace Commons, that takes a hard position on it.

Today the authoritative description of how a system behaves is the code. That’s fine until the people accountable for the behavior — clinicians, compliance officers, regulators, designers, execs — can’t read it. The source of truth becomes an artifact most stakeholders are locked out of. That’s a wall.

Grace Commons inverts it. The canonical artifact is a behavioral specification in structured, human-readable language — permissions, approvals, retention, audit, workflows, business behavior. Code, tests, and audit trails are generated from it, not the other way around. Code becomes a build artifact. If the generated code is clever, the spec has failed.

The interesting part isn’t code generation — plenty of tools do that. It’s where authority sits. If AI writes the implementation, the implementation is no longer where judgment lives; judgment lives in the spec humans review, challenge, and approve. And it’s deterministic in a boring, verifiable way: the spec generates the tests and the code; if the code passes the tests, it ships; if not, the AI iterates. You’re not trusting the model — you’re trusting the spec and the check.

We also assume specs are incomplete. Before anything is built, they go through adversarial pressure-testing meant to surface hidden assumptions, governance gaps, and missing decisions — to drag decisions into the open before they disappear into code.

Where this should interest designers specifically: it’s the natural home for design as more than pixels. Every interface sits at the intersection of three models — the business model, the user’s mental model, and the system’s model — and design’s real job is aligning them. That alignment is a contract for human understanding, and it belongs in the spec right next to the behavioral rules. The visual tool becomes a view of that contract, not the source of truth. Buttons don’t fix conceptual mismatch — the spec is where the mismatch gets found and named.

So the division settles out as:
Humans own intent, judgment, and governance.
AI owns translation and execution.

I’d love this room’s read on two things. Is moving authority from code to specification a real shift — or does the implementation inevitably stay the true source of truth no matter how much structure sits above it? And for the designers: if the interface is a contract for human understanding, does authoring that become the design deliverable, ahead of the layout?

Live demo: https://beacon-clinical.fly.dev/
Pattern library and docs: https://gracecommons.dev/

Bryan · May 30, 2026, 6:00pm

This is such an interesting new way to think about how we generate and edit outputs. Most people perhaps haven’t consider what goes into a 15 second output.

If we think about this… the trajectory is more like thinking about making a YouTube video than writing an app… so what does that look like?

This is compelling. As I’ve been working to align the Skills in Glare, the part that’s emerging as a hole is the agent agreement.

With our concepts, like onboarding, the user needs and business goals are considered, but the system’s model not so much. We’re trying to rectify that by thinking about the Agent Operations.

The idea of a system’s model is even more important if we are to consider collaborating with agents.

Answering your questions:

Is moving authority from code to specification a real shift — or does the implementation inevitably stay the true source of truth no matter how much structure sits above it?

I think you are thinking about this right…though my question becomes, who continues to evolve coding languages, and what is their role?

And for the designers: if the interface is a contract for human understanding, does authoring that become the design deliverable, ahead of the layout?

Man, these are good questions. The reality is that I don’t think people think like this now. Or have the patience? Creative translation is a real sticking point right now- you sorta have to own the stack yourself. So what happens when teams work together?

Scott.Romack · June 2, 2026, 2:16pm

Yes, it is a paradigm shift so it will take time.
There is strong collation with what Glare is doing we just have to find the seams.

Grace inherits a lot from https://essenceofsoftware.com/ and is trying to be it’s fulfillment.

ben · June 2, 2026, 4:06pm

I really like this split @Scott.Romack

One thing that I’m thinking about is how AI could go both ways. It can translate code into intent, and intent into code (but, to your point, the plain code might lose a lot of the original intent).

But, there could also be a place where the code contains the intent. Properly written comments, PRDs that live alongside of the code might be a way to bridge the worlds together.

Just some food for thought!

Scott.Romack · June 2, 2026, 4:07pm

Bryan, good questions. Direct answers.

Does implementation stay the true source of truth?
Until the spec is executable, yes. We’re closing that gap deliberately — formal layer underneath the natural language, codegen becomes deterministic from it. Not there yet. Architecture points there. We’re doing the work.

Does authoring the contract become the design deliverable?
Designers already do this. They just have nowhere to put it. Every pushback on a flow that “doesn’t match how users think” is spec authorship. Grace Commons makes it a first-class artifact instead of a Figma comment that dies in review.

Your agent agreement gap in Glare — that’s exactly the missing third leg.
User needs, business goals — covered. System model — nowhere. A Grace Commons atom is that place. The behavioral contract the agent has to honor. Onboarding isn’t just a concept, it’s a contract.

On coding languages and frameworks —
You’re asking about target portability. Right now we say “render to Deno/Hono.” Next experiment: same spec, different framework, compare output. The spec stays stable. The render target is a parameter.

We call this Evergreen Modernization. Traditional modernization moves you from stack A to stack B. Once. We’re targeting continuous replaceability — better stack shows up, repoint, regenerate. The spec is the permanent asset. The implementation is always provisional.

You’re not paying to move to the new stack. You’re paying to never be locked to any stack again. The only slight danger here is it is fairly disruptive to the powers that be.

Scott.Romack · June 2, 2026, 4:12pm

Exactly the direction we’re pushing. Single source of evolving truth — but the input can be whatever the human is comfortable with. Code (we used pseudo code in places already), comments, PRDs, napkin sketches, conversation. The spec is the destination, not the starting format.

The reason we land on spec-as-canonical rather than code-as-canonical is your own observation — code loses intent in translation. Comments drift, PRDs go stale, and nothing enforces coherence between them. The spec is the place where intent is formally captured and stays live. Everything else derives from it, including the code that currently holds the comments you’re describing.

Bryan · June 12, 2026, 4:38am

Would love to know more about this, as we’re exploring how we use MCP with our skills to surface more value.

Topic		Replies	Views
Who Owns Judgment When AI Is Doing the Work? (Q&A) Using AI	20	81	June 5, 2026
Design has shifted from control to stewardship (Q&A) Using AI	23	54	February 2, 2026
What does human-in-the-loop really mean in practice? (Q&A) Using AI	28	87	January 27, 2026
PRDs are the new hotness? Measuring design	2	23	November 7, 2025
Are you shaping decisions or just delivering artifacts? (Q&A) Aligning your team	16	17	February 12, 2026

Where does human judgment live when AI can build the software?

Bryan, good questions. Direct answers.

Related topics