Most developers meet a coding agent as a thing that edits a repo and opens a pull request. The interesting shift is what happens once the same system can hold context across sessions, reach past the repo, and keep working while you’re away from the desk.
Much of the work on a computer is already mediated by code: shell commands, web pages, API calls, exported documents, triggered automations. When those surfaces open up, the agent stops feeling like a narrow coding assistant and starts feeling like a system for getting computer work done.
The first thing that changes is memory. Instead of resetting after each exchange, a thread can keep context, use tools, surface artifacts, and continue across prompts. That continuity is the substrate everything else in this reading is built on, without it, every other capability would have to rebuild its understanding from scratch each time.
Pinning is how you keep the useful ones close. The recurring work streams are the obvious candidates: a Chief-of-Staff thread, a release thread, a documentation-review thread, a thread that does nothing but watch the outside world. Each preserves prior decisions, preferences, and context that would otherwise need rebuilding from zero.
Pinned-thread shortcuts make this practical rather than aspirational: the saved threads sit one keystroke away.
Voice input is valuable because it captures a thought before it’s compressed into polished prose. It works for the vague starting points that are natural to say but awkward to type.
The canonical example is the half-remembered lead: I think someone named Ben mentioned this in Slack. I don’t remember the details. Please go look. For an agent that can search, gather context, and report back, that’s often enough to start. A two- or three-minute thought dump works the same way, and so does a raw transcript: a dictated planning note often beats a tidy summary precisely because it keeps the uncertainty, emphasis, and unfinished lines of thought intact.
Voice becomes far more useful once it’s paired with explicit control over a task that’s already running. There are two such controls, and the whole point is that they do opposite things.
Both keep you close to the work while it’s unfolding, which is the defining feature of this end of the spectrum. You haven’t handed the task off; you’re shaping it in real time.
Once a thread has continuity, the next question is what it can touch. The browser surfaces widen by how much of your real environment they assume, from a sandboxed page in the side panel out to your full signed-in desktop.
Reach also unbinds the task from the desk. A job can start on the Mac where the files, permissions, and local setup already live, then continue while you check in from a phone. That matters in small moments, leave the desk while a longer task runs, answer a question from outside, approve the next step, or redirect the thread before you’re back. The local environment stays put; you don’t have to.
At the far end of the spectrum, the human steps away entirely. Two mechanisms cover this: a recurring wake-up that returns to a running thread, and a long task with a real finish line the agent can keep pushing toward.
Automations run work on a schedule, and the choice between them is about where the work resumes. A scheduled automation starts fresh from a workspace: right for a daily report or a regular repository check. A thread automation returns to an active conversation with its running context still intact.
A Chief-of-Staff thread is the clean illustration. It wakes every half hour, does the expensive context-gathering, and leaves the irreversible decision to you.
When you return, the costly part, assembling context, is usually done; you still decide what actually goes out. The same shape fits feedback loops: a thread automation watches PR comments, Google Docs comments, or Slack replies and keeps the surrounding work moving. In an animation workflow, a reviewer drops a video in Slack, the automation renders an updated version when comments arrive, replies in the same thread tagging the reviewer, and, if one integration can’t finish the upload, desktop automation closes the last step through the GUI. The loop spans Slack for feedback, the codebase for rendering, and the desktop for the final hand-off.
The other away-from-desk mechanism is the goal: a longer-running task with a stopping condition the agent can keep working toward. Goals are only as good as their verifier: the difference between a wish and a finish line is whether something other than vibes can tell you it’s done.
Goal mode gets its own full treatment elsewhere in this library, see the editor’s note below for the cross-reference, and No. 04 for the markdown-file scaffolding that makes a long goal run auditable.
Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when you step away. Goals add a concrete finish line the agent can keep working toward.Jason Liu · the control model, in four moves
Instead of exporting an artifact and switching contexts, you review it in place. The output might be code, but it might equally be a deck, a PDF, a browser page, a table, or anything else created along the way.
A few surfaces work especially well here: index.html for lightweight static artifacts, Storybook for UI review, Remotion Studio for programmatic animation, browser-based decks for presentations, and data apps for analysis. A single index.html can become a durable interactive artifact with no server required, and a thread automation can refresh it over time, so the thread has something new waiting when you come back.
Long-running threads get more useful when they share memory outside any one conversation. Important context shouldn’t live only inside a transcript, it should live somewhere the next thread can pick back up.
One durable pattern anchors persistent threads in a plain folder of files: an Obsidian vault, say: that stays easy to inspect, edit, move, and keep for a long time. Store it wherever your workflow already syncs: Git, Dropbox, Drive, cloud storage. The repository holds code; the vault holds the rolling context, who’s involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.
~/vault as durable work memory.Don’t copy one exact structure. Teach the agent where durable context should live, what to preserve, and when not to create churn. First-party memory features add a local recall layer for preferences, recurring workflows, and known pitfalls, they complement the written context rather than replacing it. The written vault is the part you can read, diff, and hand to the next thread.
Codex still starts from code. But more of the work around code is now reachable through the same system: browser surfaces, desktop control, MCP, automations, and reviewable artifacts. What changes is the control model, and that’s the whole reading.
Liu’s essay is the map; Hayduk’s goal-mode piece, archived here as
codex-goals.html (No. 04), is the close-up of one region of it.
Where this reading mentions Goals in a single panel, weak versus strong, and the list of
verifiers, No. 04 expands exactly that material into three rules: a measurable goal, a tight
feedback loop, and the three markdown files (PLAN.md, EXPERIMENTS.md,
EXPERIMENT_NOTES.md) the agent thinks in while a long goal grinds for hours.
That same trio links onward to Karpathy’s autoresearch (autoresearch.html): the hand-built version of goal mode, with its program.md / results.tsv split.
Read together, the three describe one idea at three zoom levels: the spectrum (Liu),
the product feature (Hayduk), and the bare-metal loop (Karpathy).
codex-goals.html (No. 04).
See § 04 (Automations & goals) above for the seam where the two readings meet.