Format as HTML · A Reading
From code outward
No. 05 · MMXXVI
A reading · on coding agents

Getting the
most out of Codex

Most developers meet a coding agent as a thing that edits a repo and opens a pull request. The interesting shift is what happens once the same system can hold context across sessions, reach past the repo, and keep working while you’re away from the desk.

Original essay
Jason Liu
“Getting the Most Out of Codex”
@jxnlco · 2026
Reformatted for this library
A Reading · No. 05
Companion to No. 04: Hayduk on goal mode
One file · inline CSS · inline SVG
Contents
The spine: a spectrum from being in the loop (steer, queue) to being away from the desk (automate, set a goal), held together by durable threads and a written-down memory.
  • I
    Durable threads
    Persistent workspaces that don’t reset between sessions.
    The substrate
  • II
    Voice, steering & queuing
    Staying close to the work while it unfolds.
    In the loop
  • III
    Tools & reach
    Browser, computer-use, MCP, connectors, skills.
    Beyond the repo
  • IV
    Automations & goals
    Continuing the work while you’re gone.
    Away from the desk
  • V
    The side panel
    Reviewing artifacts beside the conversation that made them.
    Output as control
  • VI
    Shared memory
    Durable context written down outside any one thread.
    The vault
The control spectrum · where the human standsin the loop → away
IN THE LOOP AWAY FROM THE DESK MORE AUTONOMY → Steering interrupt: now Queuing line up: next Automations wake: on schedule Goals push: to a finish line DURABLE THREADS the substrate every mode rides on
In the loop · §II
Steering
Interrupt the work now and redirect it.
In the loop · §II
Queuing
Don’t touch what’s running: line up the next task.
Away from the desk · §IV
Automations
Wake on a schedule and continue the thread.
Away from the desk · §IV
Goals
Push toward a finish line a verifier can confirm.
Read left to right. The closer you are, the more you shape the work in real time; the further away, the more the agent carries it alone. All four modes ride on the same durable thread, lose that, and each one rebuilds its context from scratch.
§
I
The substrate · context that survives

A thread that keeps its working context.

Much of the work on a computer is already mediated by code: shell commands, web pages, API calls, exported documents, triggered automations. When those surfaces open up, the agent stops feeling like a narrow coding assistant and starts feeling like a system for getting computer work done.

The first thing that changes is memory. Instead of resetting after each exchange, a thread can keep context, use tools, surface artifacts, and continue across prompts. That continuity is the substrate everything else in this reading is built on, without it, every other capability would have to rebuild its understanding from scratch each time.

Durable threads
Long-running threads that preserve working context across repeated sessions: persistent workspaces, not short chats.

Pinning is how you keep the useful ones close. The recurring work streams are the obvious candidates: a Chief-of-Staff thread, a release thread, a documentation-review thread, a thread that does nothing but watch the outside world. Each preserves prior decisions, preferences, and context that would otherwise need rebuilding from zero.

Pinned-thread shortcuts make this practical rather than aspirational: the saved threads sit one keystroke away.

⌘1⌘2⌘3⌘4⌘5⌘6⌘7⌘8⌘9

Jump directly into the saved thread

§
II
In the loop · staying close to the work

Voice catches the rough version of a thought.

Voice input is valuable because it captures a thought before it’s compressed into polished prose. It works for the vague starting points that are natural to say but awkward to type.

The canonical example is the half-remembered lead: I think someone named Ben mentioned this in Slack. I don’t remember the details. Please go look. For an agent that can search, gather context, and report back, that’s often enough to start. A two- or three-minute thought dump works the same way, and so does a raw transcript: a dictated planning note often beats a tidy summary precisely because it keeps the uncertainty, emphasis, and unfinished lines of thought intact.

Voice becomes far more useful once it’s paired with explicit control over a task that’s already running. There are two such controls, and the whole point is that they do opposite things.

Steering  ·  Queuing
Steering interrupts an in-flight task with new direction before the current step finishes. Queuing adds work to the line without interrupting what’s already running.
Two controls, opposite jobsnow  vs  next
Steering · changes what it’s doing now
The agent is heading the wrong way and needs a correction before it lands.
Best while you’re annotating a surface in the side panel: interrupt the work mid-step, redirect, let it continue from the new direction.
make this smaller: the spacing between these two feels off: this copy is wrong

Both keep you close to the work while it’s unfolding, which is the defining feature of this end of the spectrum. You haven’t handed the task off; you’re shaping it in real time.

§
III
Beyond the repo · what it can act on

Reach moves outward in layers.

Once a thread has continuity, the next question is what it can touch. The browser surfaces widen by how much of your real environment they assume, from a sandboxed page in the side panel out to your full signed-in desktop.

The browser surfaces · narrowest → widest reach3 layers
$browser
In-app browser, in the side panel. Inspect and annotate a web surface in place. Fits side-panel review of something the agent itself rendered.
@chrome
Signed-in browser state. Chrome-based workflows that depend on your existing logged-in context.
@computer
The desktop GUI. Work that only exists through a graphical app: no API, no page, just the interface a human would click.
MCP servers and connectors extend the same idea into the rest of a workflow. Slack, Gmail, and Calendar matter because many important tasks first appear as a message, an inbox item, or a scheduling problem long before they ever become code. And skills make repeated routines reusable: once a workflow proves useful, package it as a skill so the agent can run it again without relearning it.

Work from anywhere

Reach also unbinds the task from the desk. A job can start on the Mac where the files, permissions, and local setup already live, then continue while you check in from a phone. That matters in small moments, leave the desk while a longer task runs, answer a question from outside, approve the next step, or redirect the thread before you’re back. The local environment stays put; you don’t have to.

§
IV
Away from the desk · work that continues

Two ways to keep going without you.

At the far end of the spectrum, the human steps away entirely. Two mechanisms cover this: a recurring wake-up that returns to a running thread, and a long task with a real finish line the agent can keep pushing toward.

Automations run work on a schedule, and the choice between them is about where the work resumes. A scheduled automation starts fresh from a workspace: right for a daily report or a regular repository check. A thread automation returns to an active conversation with its running context still intact.

Thread automations
Heartbeat-style recurring wake-up calls that return to the same thread on a schedule: checking on something, continuing until a condition is met, adjusting cadence over time.

A Chief-of-Staff thread is the clean illustration. It wakes every half hour, does the expensive context-gathering, and leaves the irreversible decision to you.

Chief-of-Staff thread · automationevery 30 min
# runs while you're away: drafts, never sends Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it.

When you return, the costly part, assembling context, is usually done; you still decide what actually goes out. The same shape fits feedback loops: a thread automation watches PR comments, Google Docs comments, or Slack replies and keeps the surrounding work moving. In an animation workflow, a reviewer drops a video in Slack, the automation renders an updated version when comments arrive, replies in the same thread tagging the reviewer, and, if one integration can’t finish the upload, desktop automation closes the last step through the GUI. The loop spans Slack for feedback, the codebase for rendering, and the desktop for the final hand-off.

Goals carry a finish line

The other away-from-desk mechanism is the goal: a longer-running task with a stopping condition the agent can keep working toward. Goals are only as good as their verifier: the difference between a wish and a finish line is whether something other than vibes can tell you it’s done.

What separates a goal from a wishverify  vs  hope
Weak goal · no finish line
Implement the plan in this Markdown file.
Nothing here says when it’s done, or whether each step moved closer. The agent has ambition and no signal.
Strong goal · measurable
Migrate this tool from Python to Rust, not done until the unit tests pass.
The outcome, the stopping condition, and the signal of progress are all explicit. Useful verifiers: a test suite, a benchmark, a bug reproduction, a validation matrix, an end-to-end workflow that must keep passing.

Goal mode gets its own full treatment elsewhere in this library, see the editor’s note below for the cross-reference, and No. 04 for the markdown-file scaffolding that makes a long goal run auditable.

Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when you step away. Goals add a concrete finish line the agent can keep working toward. Jason Liu · the control model, in four moves
§
V
Output as control · review without the handoff

Keep the work beside the conversation.

Instead of exporting an artifact and switching contexts, you review it in place. The output might be code, but it might equally be a deck, a PDF, a browser page, a table, or anything else created along the way.

The side panel · four jobs it does wellinspect · annotate · operate · review
i
Inspect artifacts
Open markdown, spreadsheets, data tables, documents, and slides in place: no export, no context switch.
ii
Annotate what changes
Mark up a deck or PDF beside the thread that produced it. Comments stay inside the working loop instead of becoming a separate handoff.
iii
Operate web surfaces
The in-app browser lets the agent inspect a rendered page, control it, and respond to annotations directly on the surface under review.
iv
Review changes
Inspect, mark up, and revise without breaking the loop: the web becomes both the output and the control surface.

A few surfaces work especially well here: index.html for lightweight static artifacts, Storybook for UI review, Remotion Studio for programmatic animation, browser-based decks for presentations, and data apps for analysis. A single index.html can become a durable interactive artifact with no server required, and a thread automation can refresh it over time, so the thread has something new waiting when you come back.

§
VI
The vault · context outside the transcript

Write the durable context down.

Long-running threads get more useful when they share memory outside any one conversation. Important context shouldn’t live only inside a transcript, it should live somewhere the next thread can pick back up.

Shared memory
Durable context stored outside a single thread so future work can resume from something explicit and reviewable.

One durable pattern anchors persistent threads in a plain folder of files: an Obsidian vault, say: that stays easy to inspect, edit, move, and keep for a long time. Store it wherever your workflow already syncs: Git, Dropbox, Drive, cloud storage. The repository holds code; the vault holds the rolling context, who’s involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/ # AGENTS.md at the top level # defines how to update it # as the agent learns more.
A practical AGENTS.md might say
  • Treat ~/vault as durable work memory.
  • Prefer canonical notes over note sprawl.
  • Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
  • Preserve decisions, blockers, owners, dates, and useful links.
  • If nothing meaningful changed, do not churn the vault.

Don’t copy one exact structure. Teach the agent where durable context should live, what to preserve, and when not to create churn. First-party memory features add a local recall layer for preferences, recurring workflows, and known pitfalls, they complement the written context rather than replacing it. The written vault is the part you can read, diff, and hand to the next thread.

From code outward

Instruction to execution
to artifact review: even
when the work leaves the repo.

Codex still starts from code. But more of the work around code is now reachable through the same system: browser surfaces, desktop control, MCP, automations, and reviewable artifacts. What changes is the control model, and that’s the whole reading.

J.L.
@jxnlco · 2026
Editor’s note

The wide-angle companion to No. 04.

Liu’s essay is the map; Hayduk’s goal-mode piece, archived here as codex-goals.html (No. 04), is the close-up of one region of it. Where this reading mentions Goals in a single panel, weak versus strong, and the list of verifiers, No. 04 expands exactly that material into three rules: a measurable goal, a tight feedback loop, and the three markdown files (PLAN.md, EXPERIMENTS.md, EXPERIMENT_NOTES.md) the agent thinks in while a long goal grinds for hours.

That same trio links onward to Karpathy’s autoresearch (autoresearch.html): the hand-built version of goal mode, with its program.md / results.tsv split. Read together, the three describe one idea at three zoom levels: the spectrum (Liu), the product feature (Hayduk), and the bare-metal loop (Karpathy).

Cross-reference · Hayduk, “Using Codex Goals Effectively”: codex-goals.html (No. 04). See § 04 (Automations & goals) above for the seam where the two readings meet.