Until We Reach a Shared Understanding — The Foundational AI Skill

April 17, 2026

Until we reach a shared understanding — the foundational AI skill

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one by one.

That’s the entire skill. Three sentences. I read them and something in my chest went quiet.

“Until we reach a shared understanding.” So plain. So romantic. So touching. Most of the wrong things I’ve ever shipped weren’t bugs. They were mismatches — between what I thought I asked for, and what actually got built. And here was a phrase that named the thing. Not “write better specs.” Not “clarify requirements.” Shared understanding. Between two minds. Mine and the model’s.

I stole it immediately.

Where it came from

The phrase is from Matt Pocock’s video on the 5 Claude Code skills he uses every day. The one he opens with is grill-me. Three sentences. No templates, no scaffolding. Just an instruction to interview him until the design tree — an idea he credits to Frederick P. Brooks’ The Design of Design — has been walked branch by branch.

What hit me is that the phrase isn’t about the AI producing something. It’s about a state two parties arrive at together. The AI hasn’t completed its job when the plan is written. It has completed its job when we both see the same thing.

That reframing is the whole article.

Plan mode alone doesn’t do it

Before I found grill-me, I lived in Claude Code’s plan mode. My prompts looked like this:

implement landing page using this design.md — start with hero, then…

Plan mode does what it says: it plans. It jumps straight to how. What it does not do — and what I was missing — is push back on the premise. It will happily produce a plan for the wrong feature. That is the shape of the failure. It looks like progress.

grill-me forces a different phase before planning. You don’t get to plan until the two of you agree on what’s being built.

The loop I actually run

I’ve been running this for weeks. Not just for PRDs — for anything where the cost of building the wrong thing is higher than the cost of talking for 20 more minutes. Which, it turns out, is almost everything.

I propose. A goal, a feature, a refactor, a direction. Rough.
The AI grills me. Usually 10–16 questions. Sometimes two rounds — it’ll answer what it can from the codebase, ask a batch, digest my answers, then come back with more.
I reflect. About a third of those questions hit corners I hadn’t considered.
I propose again. Sharper. With the new angles folded in.

Back and forth. Push and pull. Until the design tree is walked.

A real session (abbreviated)

Here’s the shape, from a recent session on a private client project — a chat platform with hundreds of thousands of messages. I wanted to add message-content search so staff could type msg:พาร์ทไทม์ and find rooms that had actually talked about part-time work, not just rooms tagged with someone named “พาร์ทไทม์.”

The AI explored the codebase first, then asked 14 questions over the next hour. A sample of the ones that made me stop typing:

Q1: Should message search be part of free text, or a separate qualifier? If we fold it in, every name search scans 165K search_text rows. Or we introduce msg: as an explicit qualifier.

Fine — I had this one in my head already. (8-out-of-12 territory.)

Q2: The matching message won’t be the latest message shown in the card. The user sees a room but has no idea which message matched or where it is in the conversation. Three options: (A) return the room, user scrolls; (B) return the room + matched message snippet; (C) A now, B later.

I hadn’t thought about this at all. The card UI shows the latest message — so msg: would hand users a room full of thousands of messages with no cursor. Design tree branch I missed. (4-out-of-12.)

Q12: Snippet length. The matched message could be very long. Should the RPC return the full search_text and let the frontend CSS-truncate, or return a 100-char window from the DB centered on the match?

A database-performance question I would not have asked myself until production. (Also 4-out-of-12.)

By question 14, the AI summarized the whole plan back — every branch resolved, every decision recorded. When it said “Anything you want to change, or should I start implementing?” I actually felt calm. That feeling is the stopping criterion I’ll come back to.

The variance

That 4-out-of-12 number is a rough average. It’s not stable.

Domains I know cold — sometimes I come out of a grilling session with 0/12. Everything the AI asks, I already resolved. That’s still useful. It’s confirmation. It means I can go implement without doubt.
Unfamiliar territory — sometimes it’s 8/12. I’ll be halfway through answering a question and realize I don’t actually understand what I’m building. That’s not a failure of the skill — that is the skill. It’s the skill working exactly as intended.

The 4/12 is not a target. It’s an observation that there’s almost always something.

Works both ways

grill-me looks, on the surface, like the AI extracting requirements from you. What actually happens is different: the questions push you back. They make you reconsider your own standing.

This shows up hardest in non-engineering work. I recently asked an AI how to get into a particular real estate market. I expected a cheerful playbook. What I got was a grilling — questions about my capital horizon, my risk comfort, whether I wanted yield or appreciation, whether I’d operate or delegate. By question six I had changed my mind about what I was even asking for.

It is less “the AI interviews me” and more “two minds at a whiteboard, one with perfect recall of the internet.” Neither of us is in charge. We’re both responsible for the shared understanding.

When not to use it

Honest section. grill-me is not free.

Small, well-defined changes — fixing a typo, bumping a dependency, adding a log line. Just prompt normally.
Areas where I’m so confident it has to be done exactly this way — I skip straight to plan mode and spend the time on implementation detail instead.
When it over-asks — sometimes I’ve seen the AI keep drilling into tiny details I’ve already dismissed. 20, 27 questions, 12 per round. When that happens I cut it off and say “that’s enough, write the plan.” It listens.

The overhead is maybe 15–30 minutes for a meaningful feature. That is a lot. It’s also a lot less than a week of building the wrong thing.

The stopping criterion

How do you know you’ve reached shared understanding?

The honest answer: the AI has a sense. It stops at the right time. When the tree is walked, the questions stop coming, and it offers to summarize the plan back. The summary is the tell. If I can read it and feel calm, we’re done.

If I can read it and feel uneasy — wait, what about… — that’s another round.

Why AI is specifically good at this

Pair programming and rubber ducks exist. I’ve used both. Pair programming is real but one-directional for me — as the more senior engineer, I don’t get much push-back. The duck doesn’t ask anything.

The AI has what neither has: it will ask all 12 questions, not just the three a tired human would rank as highest-priority. There is no social cost for it to ask something that sounds obvious. It won’t judge me for not having thought about it. It is infinitely patient. That coverage — the commitment to actually walking the whole tree — is why the 4/12 blind-spot rate is so consistent. A human collaborator would have stopped at 3.

The chain downstream

After grill-me, the rest of Matt’s workflow kicks in — write-a-prd, prd-to-issues, tdd. I’ve run the full chain end-to-end now. The thing I didn’t expect: once shared understanding is locked in upstream, I can let agents run unsupervised overnight. I wake up to finished issues.

That courage — to step away from the keyboard while something is being built — is not technical. It’s upstream. It’s the shared understanding.

Short, simple, framework-agnostic

One more thing. If you open Matt’s public skills repo, you notice something: they’re tiny. grill-me is three sentences. The others aren’t much longer. No framework coupling, no language-specific patterns.

That’s the virtue. These skills encode process, not tech. They work in a SvelteKit repo, a Go service, a Rails app, a notebook, a PRD doc, a real estate decision. Process is the thing AI agents consistently miss — no memory, no taste, no intuition for when to stop. Encoding it explicitly is what makes them usable.

Skills don’t need to be long to be impactful. As Matt says in the video — pick the right words for the LLM at the right time. The shortness is the feature.

And they’re portable across tools. Vercel’s skills CLI works with anything:

npx skills@latest add mattpocock/skills/grill-me

That’s it. One line, and you have the foundational skill in any agent-backed setup.

The phrase, one more time

until we reach a shared understanding

Put it in a skill. Put it in a system prompt. Put it at the top of a PRD template. Put it on a sticky note.

Most of the AI collaboration advice I’ve seen is about getting better output. This one is about getting to a better state — one that sits between you and the model, before any output exists. Once you’re in that state, the output writes itself.

Thanks, Matt. I’ll be using this one for a long time.