---
name: verification-tax
version: 2.0.0
description: |
  Keep verification cheaper than generation when an agent does real work, and
  produce the task contract that makes delegation auditable. Use this to scope a
  task before handing it to an agent (it emits a required block: expected change,
  cheap verifier, evidence artifact, named residual risk) and to judge whether
  finished agent work is safe to trust. Trigger it whenever you are about to
  delegate a substantial task (code changes, refactors, migrations, multi-file
  edits, UI work, research, or written deliverables), whenever you are reviewing
  or accepting agent output, whenever a diff or output is large, before opening a
  pull request or shipping, and any time you are deciding whether something is
  actually "done", even when the user never says "verify," "review," or
  "checklist." If substantial agent output has to be trusted, this skill applies.
compatibility: claude-code opencode cowork
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash
---

# The Verification Tax

An operating discipline and a task contract. The agent can produce more than you can responsibly inspect, so the job is to keep verification cheaper than generation and to make every delegated task auditable, before and after it runs. The scarce resource is no longer the agent's output. It is the human's ability to check that output cheaply enough for the loop to be worth running.

## The core rule

Before asking "can the agent do this?", answer "how will I know it did?" Frontier models can do an absurd amount, so capability is rarely the real question. The law under everything below: **the check must be smaller than the work.** If verifying the result is as hard as doing the task yourself, the agent is a drafter or a critic, not leverage, and its output is not finished.

## Before you delegate: the task block

Every substantial task gets four fields, filled concretely, before the agent starts. If any field cannot be filled, the task is not ready as finished work. Reframe it as exploration, drafting, or research.

- **Expected change.** What should change, what must stay the same, what is out of scope, and what counts as done.
- **Cheap verifier.** The fastest reliable check: a test command, a typecheck or build, a browser flow, a screenshot comparison, a link checker, a query, or a manual check under two minutes. If the only check is reading a wide diff and hoping your attention holds, the task is high verification tax. Narrow it.
- **Evidence artifact.** The proof the run must return: commands and their output, test results, screenshots, logs, before-and-after behavior, a diff summary, or known limitations. A completion claim is not evidence.
- **Named residual risk.** What could still be wrong after the checks pass, which environments or adjacent behavior went untested, and who owns accepting that risk.

Emit the block in this shape:

```text
Agent task

Expected change:
- Change:
- Do not change:
- Done means:

Cheap verifier:
- I will verify by:

Evidence required:
- Commands / results:
- Screenshots / logs / diff summary:
- Before / after behavior:

Residual risk:
- Remaining risk:
- Untested areas:
- Owner:
```

A task is ready to delegate only when you can finish this sentence: **I can verify this cheaply by checking ______, and the remaining risk is ______.**

## How the pieces fit

The four fields are the contract; the rest of this skill is the same structure turned into an audit, so treat it as one system rather than three. After the run, the four taxes are how you check the result and the three ledgers are where the evidence and risk live. Expected change is checked by the Behavior and Substrate taxes. Cheap verifier and evidence artifact become the Evidence ledger. Residual risk and its owner become the Risk ledger.

## Audit the result against four taxes

Most weak runs hide a failure in at least one of these.

1. **Behavior.** Does the change do the thing, or did it only satisfy the text of the task? An agent optimizes for completion unless evidence is part of completion.
2. **Substrate.** Did it edit the right files, preserve the real source of truth, and leave the system in a state the next person can reason about?
3. **Price.** Was the value larger than the tokens, wall-clock time, retries, tool calls, and human review spent to get it? Output can be ten times larger and still net negative if it creates fifteen times the review debt.
4. **Ownership.** Who is willing to be named when this reaches production, the customer, or the on-call rotation?

## Pick a cheap check, matched to the work

Choose the smallest check that actually backs the claim.

- **Code.** A failing test made green, the smallest diff that does the job, and the command whose output proves the behavior changed.
- **UI.** The viewport, a before-and-after screenshot, the interaction exercised, and the element that must not shift.
- **Research.** Source classes and dates, contrary evidence, and an explicit list of claims that still need checking.
- **Prose.** State the one thing the piece must make the reader believe, then check every section against that job.

General checks usually cheaper than the work: a typecheck, a link checker, a smoke test, a query, a line count, or a diff against regenerated output. A human opening exactly three pages to look for exactly two promises is a valid check.

## Leave three ledgers

Do not slow the agent back to human speed. Make the work auditable as it moves by leaving three small ledgers behind. They can be a pull-request description, a scratch file, a CI job, a screenshot folder, or a few final lines. The point is that the evidence survives the moment.

- **Work ledger.** What was asked, what changed, and which files or systems were touched.
- **Evidence ledger.** What was run, what passed, what was opened, and the specific proof behind each claim.
- **Risk ledger.** What was not checked, what is still speculative, and where the change can still bite.

"All set" is not evidence. A command, its result, and a remaining risk are evidence.

## The check itself is attackable

A capable agent can fabricate the proof as fluently as the work: a plausible log, a confident summary, a test that is green only because it was rewritten. If the same run produced both the change and its verifier, record that fact in the Risk ledger and give the verifier a second look. Never accept evidence the worker could have authored to please you without flagging it. A refactor that passes only because the agent edited the tests is not verified work. It is unverified work with a green light on top.

## Stop at the danger zone

Stop and add structure, or send the work back, when you see any of these. They are all signs of high verification tax.

- a diff too wide to hold in your head, or changes across broad areas unrelated to the task;
- the agent changed both the code and its tests, or touched fixtures, snapshots, generated files, or lockfiles without saying why;
- a completion claim with no commands, logs, screenshots, or diff summary;
- a summary whose claims cannot be mapped to a command;
- a verifier that depends on vague manual review, or a result you cannot check without reconstructing the agent's reasoning;
- a page that renders, but nobody checked the smaller viewport;
- a script that passes, but passed before the change too;
- nothing named as untested.

The agent does not have to be wrong to create risk. It only has to be faster than your ability to know what changed.

## Review completed work

When work comes back, check: did the output match the expected change; was the cheap verifier actually run; is the evidence concrete enough to trust; are the residual risks named honestly; did the agent change tests, fixtures, generated files, or unrelated code in a way that moves the substrate? If evidence is missing, ask for the smallest additional check that would make the result trustworthy, not a fresh round of work.

## The limit: what counts as done

Finished work has an owner and a check. Everything else is draft material, no matter how confident or well-structured it looks. These are useful intermediate states, not progress:

- a migration with no rollback path;
- a policy or config with no owner;
- a refactor that is green only because its tests were changed;
- a polished document with unchecked claims.

If a task has no credible check, it does not leave the queue as "done." Reserve that word for work whose result you can say yes to, for a stated reason.

## When a rendered page helps

Render an artifact as a page only when that lowers the verification tax: when the result needs to be reviewed rendered, shared outside the editor, or kept as a small piece of evidence. A designed page that makes a claim easier to check earns its place. One that only adds polish raises the tax, so keep it plain.

## Operating contract

On any substantial run, in order:

1. **Write the task block** before generating. If the four fields cannot be filled, mark the task exploration, not delivery.
2. **Keep the output reviewable.** Prefer the smallest diff; split wide work before it outruns review.
3. **Run the cheap verifier** and capture its real output, not a description of it.
4. **Emit the three ledgers:** Work, Evidence, Risk.
5. **Name an owner** for anything headed to production.
6. **Withhold "done"** until a check has passed and an owner exists. Generation is delegated; accountability is not.

Close every task by completing the rule: **I can verify this cheaply by checking ______, and the remaining risk is ______.**

---

**Source.** Operationalizes "The Verification Tax," tcash.ca, Field report 011 (29 May 2026), with the four-field task block, review checklist, and red-flag list folded in. Generation is no longer the scarce part of the loop; verification is.
