A reading · on coding agents

Getting the
most out of Codex

Most developers meet a coding agent as a thing that edits a repo and opens a pull request. The interesting shift is what happens once the same system can hold context across sessions, reach past the repo, and keep working while you’re away from the desk.

Original essay

Jason Liu

“Getting the Most Out of Codex”
@jxnlco · 2026

Reformatted for this library

A Reading · No. 05

Companion to No. 04: Hayduk on goal mode
One file · inline CSS · inline SVG

Contents

The spine: a spectrum from being in the loop (steer, queue) to being away from the desk (automate, set a goal), held together by durable threads and a written-down memory.

I

Durable threads
Persistent workspaces that don’t reset between sessions.

The substrate
II

Voice, steering & queuing
Staying close to the work while it unfolds.

In the loop
III

Tools & reach
Browser, computer-use, MCP, connectors, skills.

Beyond the repo
IV

Automations & goals
Continuing the work while you’re gone.

Away from the desk
V

The side panel
Reviewing artifacts beside the conversation that made them.

Output as control
VI

Shared memory
Durable context written down outside any one thread.

The vault

The control spectrum · where the human standsin the loop → away

In the loop · §II

Steering

Interrupt the work now and redirect it.

In the loop · §II

Queuing

Don’t touch what’s running: line up the next task.

Away from the desk · §IV

Automations

Wake on a schedule and continue the thread.

Away from the desk · §IV

Goals

Push toward a finish line a verifier can confirm.

Read left to right. The closer you are, the more you shape the work in real time; the further away, the more the agent carries it alone. All four modes ride on the same durable thread, lose that, and each one rebuilds its context from scratch.

§
I

The first thing that changes is memory. Instead of resetting after each exchange, a thread can keep context, use tools, surface artifacts, and continue across prompts. That continuity is the substrate everything else in this reading is built on, without it, every other capability would have to rebuild its understanding from scratch each time.

Durable threads

Long-running threads that preserve working context across repeated sessions: persistent workspaces, not short chats.

Pinning is how you keep the useful ones close. The recurring work streams are the obvious candidates: a Chief-of-Staff thread, a release thread, a documentation-review thread, a thread that does nothing but watch the outside world. Each preserves prior decisions, preferences, and context that would otherwise need rebuilding from zero.

Pinned-thread shortcuts make this practical rather than aspirational: the saved threads sit one keystroke away.

⌘1⌘2⌘3⌘4⌘5⌘6⌘7⌘8⌘9

Jump directly into the saved thread

§
II

The canonical example is the half-remembered lead: I think someone named Ben mentioned this in Slack. I don’t remember the details. Please go look. For an agent that can search, gather context, and report back, that’s often enough to start. A two- or three-minute thought dump works the same way, and so does a raw transcript: a dictated planning note often beats a tidy summary precisely because it keeps the uncertainty, emphasis, and unfinished lines of thought intact.

Voice becomes far more useful once it’s paired with explicit control over a task that’s already running. There are two such controls, and the whole point is that they do opposite things.

Steering · Queuing

Steering interrupts an in-flight task with new direction before the current step finishes. Queuing adds work to the line without interrupting what’s already running.

Two controls, opposite jobsnow vs next

Steering · changes what it’s doing now

The agent is heading the wrong way and needs a correction before it lands.

Best while you’re annotating a surface in the side panel: interrupt the work mid-step, redirect, let it continue from the new direction.

make this smaller: the spacing between these two feels off: this copy is wrong

Queuing · changes what happens next

The current step is fine; you just want to line up the task that should follow it.

It doesn’t touch the work in progress. It adds to the queue so the next thing happens automatically once this finishes.

once the work is done, send the preview link to the reviewer in Slack

Both keep you close to the work while it’s unfolding, which is the defining feature of this end of the spectrum. You haven’t handed the task off; you’re shaping it in real time.

§
III

The browser surfaces · narrowest → widest reach3 layers

$browser

In-app browser, in the side panel. Inspect and annotate a web surface in place. Fits side-panel review of something the agent itself rendered.

@chrome

Signed-in browser state. Chrome-based workflows that depend on your existing logged-in context.

@computer

The desktop GUI. Work that only exists through a graphical app: no API, no page, just the interface a human would click.

MCP servers and connectors extend the same idea into the rest of a workflow. Slack, Gmail, and Calendar matter because many important tasks first appear as a message, an inbox item, or a scheduling problem long before they ever become code. And skills make repeated routines reusable: once a workflow proves useful, package it as a skill so the agent can run it again without relearning it.

Work from anywhere

Reach also unbinds the task from the desk. A job can start on the Mac where the files, permissions, and local setup already live, then continue while you check in from a phone. That matters in small moments, leave the desk while a longer task runs, answer a question from outside, approve the next step, or redirect the thread before you’re back. The local environment stays put; you don’t have to.

§
IV

Automations run work on a schedule, and the choice between them is about where the work resumes. A scheduled automation starts fresh from a workspace: right for a daily report or a regular repository check. A thread automation returns to an active conversation with its running context still intact.

Thread automations

Heartbeat-style recurring wake-up calls that return to the same thread on a schedule: checking on something, continuing until a condition is met, adjusting cadence over time.

A Chief-of-Staff thread is the clean illustration. It wakes every half hour, does the expensive context-gathering, and leaves the irreversible decision to you.

Chief-of-Staff thread · automationevery 30 min

# runs while you're away: drafts, never sends Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it.

When you return, the costly part, assembling context, is usually done; you still decide what actually goes out. The same shape fits feedback loops: a thread automation watches PR comments, Google Docs comments, or Slack replies and keeps the surrounding work moving. In an animation workflow, a reviewer drops a video in Slack, the automation renders an updated version when comments arrive, replies in the same thread tagging the reviewer, and, if one integration can’t finish the upload, desktop automation closes the last step through the GUI. The loop spans Slack for feedback, the codebase for rendering, and the desktop for the final hand-off.

Goals carry a finish line

The other away-from-desk mechanism is the goal: a longer-running task with a stopping condition the agent can keep working toward. Goals are only as good as their verifier: the difference between a wish and a finish line is whether something other than vibes can tell you it’s done.

What separates a goal from a wishverify vs hope

Weak goal · no finish line

Implement the plan in this Markdown file.

Nothing here says when it’s done, or whether each step moved closer. The agent has ambition and no signal.

Strong goal · measurable

Migrate this tool from Python to Rust, not done until the unit tests pass.

The outcome, the stopping condition, and the signal of progress are all explicit. Useful verifiers: a test suite, a benchmark, a bug reproduction, a validation matrix, an end-to-end workflow that must keep passing.

Goal mode gets its own full treatment elsewhere in this library, see the editor’s note below for the cross-reference, and No. 04 for the markdown-file scaffolding that makes a long goal run auditable.

Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when you step away. Goals add a concrete finish line the agent can keep working toward. Jason Liu · the control model, in four moves

§
V

The side panel · four jobs it does wellinspect · annotate · operate · review

Inspect artifacts

Open markdown, spreadsheets, data tables, documents, and slides in place: no export, no context switch.

Annotate what changes

Mark up a deck or PDF beside the thread that produced it. Comments stay inside the working loop instead of becoming a separate handoff.

iii

Operate web surfaces

The in-app browser lets the agent inspect a rendered page, control it, and respond to annotations directly on the surface under review.

Review changes

Inspect, mark up, and revise without breaking the loop: the web becomes both the output and the control surface.

A few surfaces work especially well here: index.html for lightweight static artifacts, Storybook for UI review, Remotion Studio for programmatic animation, browser-based decks for presentations, and data apps for analysis. A single index.html can become a durable interactive artifact with no server required, and a thread automation can refresh it over time, so the thread has something new waiting when you come back.

§
VI

Shared memory

Durable context stored outside a single thread so future work can resume from something explicit and reviewable.

One durable pattern anchors persistent threads in a plain folder of files: an Obsidian vault, say: that stays easy to inspect, edit, move, and keep for a long time. Store it wherever your workflow already syncs: Git, Dropbox, Drive, cloud storage. The repository holds code; the vault holds the rolling context, who’s involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/ # AGENTS.md at the top level # defines how to update it # as the agent learns more.

A practical AGENTS.md might say

Treat ~/vault as durable work memory.
Prefer canonical notes over note sprawl.
Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
Preserve decisions, blockers, owners, dates, and useful links.
If nothing meaningful changed, do not churn the vault.

Don’t copy one exact structure. Teach the agent where durable context should live, what to preserve, and when not to create churn. First-party memory features add a local recall layer for preferences, recurring workflows, and known pitfalls, they complement the written context rather than replacing it. The written vault is the part you can read, diff, and hand to the next thread.

From code outward

Instruction to execution
to artifact review: even
when the work leaves the repo.

Codex still starts from code. But more of the work around code is now reachable through the same system: browser surfaces, desktop control, MCP, automations, and reviewable artifacts. What changes is the control model, and that’s the whole reading.

J.L.

@jxnlco · 2026

Editor’s note

The wide-angle companion to No. 04.

Liu’s essay is the map; Hayduk’s goal-mode piece, archived here as codex-goals.html (No. 04), is the close-up of one region of it. Where this reading mentions Goals in a single panel, weak versus strong, and the list of verifiers, No. 04 expands exactly that material into three rules: a measurable goal, a tight feedback loop, and the three markdown files (PLAN.md, EXPERIMENTS.md, EXPERIMENT_NOTES.md) the agent thinks in while a long goal grinds for hours.

That same trio links onward to Karpathy’s autoresearch (autoresearch.html): the hand-built version of goal mode, with its program.md / results.tsv split. Read together, the three describe one idea at three zoom levels: the spectrum (Liu), the product feature (Hayduk), and the bare-metal loop (Karpathy).

Cross-reference · Hayduk, “Using Codex Goals Effectively”: codex-goals.html (No. 04). See § 04 (Automations & goals) above for the seam where the two readings meet.

Getting the
most out of Codex

A thread that keeps its working context.

Jump directly into the saved thread

Voice catches the rough version of a thought.

Reach moves outward in layers.

Work from anywhere

Two ways to keep going without you.

Goals carry a finish line

Keep the work beside the conversation.

Write the durable context down.

Instruction to execution
to artifact review: even
when the work leaves the repo.

The wide-angle companion to No. 04.

Getting themost out of Codex

A thread that keeps its working context.

Jump directly into the saved thread

Voice catches the rough version of a thought.

Reach moves outward in layers.

Work from anywhere

Two ways to keep going without you.

Goals carry a finish line

Keep the work beside the conversation.

Write the durable context down.

Instruction to executionto artifact review: evenwhen the work leaves the repo.

The wide-angle companion to No. 04.

Getting the
most out of Codex

Instruction to execution
to artifact review: even
when the work leaves the repo.