Format-as-HTML · Comparative study no. 01
Browser automation for AI agents
May 2026 · v1.0
Category
CLI & MCP servers for agents
Tools compared
4 contenders
Method
Specs · ergonomics · trade-offs
Status
Practitioner review

Driving the browser,
four ways.

Vercel's agent-browser, Microsoft's Playwright MCP, Google's Chrome DevTools MCP, and the lighter Browser MCP: four tools that let coding agents drive a real browser, built around different bets about transport (CLI vs MCP), runtime, and how much debugging you actually want in the loop.

×4
~100k
Combined stars
All four repos, GitHub, May 2026.
2
Transports
CLI invoked from shell, or MCP server speaking to the agent over stdio.
1
Cross-browser
Only Playwright MCP. Everything else is Chrome-only.
3
Approaches to auth
Isolated profile, extension bridge, or live-attach to your real Chrome.
§ 01 / Framing

The CLI vs MCP question is the real one.

On the surface, these four tools look interchangeable: each lets an AI agent navigate a page, click an element, fill a form, take a screenshot. Underneath, they make very different bets about how the agent talks to the browser, and that bet is the thing that determines which one you actually want.

Three of the four are MCP servers: long-running processes the agent speaks to over a tool-protocol channel, with each capability registered as a named tool in the agent's prompt. The fourth, Vercel Labs' agent-browser, is a plain CLI: a Rust binary the agent invokes by writing agent-browser click @e2 in a shell, the same way it'd write grep or git.

The CLI-vs-MCP split is the live argument of early 2026. MCP servers register every tool definition into the agent's context window, so a 29-tool server like Chrome DevTools MCP costs ~18k tokens just to exist in the prompt. A CLI registers nothing: the agent reads --help when it needs to, like a developer. That's why some shops (Perplexity, YC's internal tooling) have been publicly moving back from MCP to CLI for tools that don't need persistent state.

Playwright is in the business of driving a browser, and Chrome DevTools MCP is in the business of debugging one. Steve Kinney, Playwright vs. Chrome DevTools MCP: Driving vs. Debugging

Kinney's framing is the cleanest one I've found. Driving is the Playwright/agent-browser axis: deterministic clicks against an accessibility tree, designed to make the user flow work end-to-end. Debugging is the Chrome DevTools axis: performance traces, network waterfalls, source-mapped stack traces, the same data a human developer opens DevTools to see. The four contenders below sort cleanly onto that split, plus a third dimension, session reuse, that determines whether the agent runs in a fresh sandbox or attaches to the browser you're already logged into.

§ 02 / The contenders

Four tools, four bets.

No. 01 · AB CLI · Rust · driving

agent-browser

Vercel Labs · MIT · agent-browser.dev
33k
★ Stars
50+
Commands
Rust
Runtime
CLI
Transport

A native Rust binary that talks to Chrome via the Chrome DevTools Protocol through a persistent local daemon. The agent calls it like any other shell command, agent-browser open example.com, agent-browser snapshot -i. The snapshot returns a compact accessibility tree with refs (@e1, @e2) the agent uses for deterministic targeting. The big practical wins are no Node dependency, instant cold start, and the fact that any agent that can run shell, Claude Code, Codex, Cursor, opencode, Copilot, can drive it without an MCP integration.

Pick this when
You want the lightest possible dependency footprint, you're driving the browser (not debugging it), and you want a tool any agent that runs shell can use without MCP wiring.
No. 02 · PW MCP · Node · driving · cross-browser

Playwright MCP

Microsoft · Apache-2.0 · github.com/microsoft/playwright-mcp
30k
★ Stars
25+
MCP tools
Node
Runtime
3
Browsers

The most established option, and the only one that does real cross-browser: Chromium, Firefox, WebKit, plus 143 device emulation profiles for mobile testing. Same accessibility-snapshot approach as agent-browser, just delivered as an MCP server: every tool definition lives in the agent's context, which means richer affordances at the cost of more tokens. Connection to an existing logged-in browser goes through a Playwright MCP Bridge extension rather than Chrome's native remote-debug path, which is workable but more setup.

Pick this when
You need real cross-browser testing (Firefox or WebKit), your team already lives in Playwright, or your agent flow is part of a broader Playwright test suite.
No. 03 · CD MCP · Node · debugging · perf

Chrome DevTools MCP

Chrome team / Google · Apache-2.0 · 0.20.3
37k
★ Stars
29
MCP tools
Node
Runtime
1
Chrome only

The debugging-side champion. Ships everything the other three ship plus the dedicated DevTools surface: performance_start_trace, network waterfall inspection, source-mapped console errors, CrUX field-data integration, Lighthouse audits. --autoConnect (Chrome 144+) lets the agent attach to your already-running Chrome with explicit user approval, same auth, same cookies, same tabs. The cost: full mode burns ~18k tokens of tool definitions. A --slim mode trims to 3 core tools and ~6k tokens.

Pick this when
The agent's job involves diagnosing what's wrong: perf traces, failed requests, console errors, or when you want it operating inside your real, logged-in Chrome session via --autoConnect.
No. 04 · BM MCP · Node + extension · session reuse

Browser MCP

browsermcp.io · MIT · extension-bridge
~7k
★ Stars
~15
MCP tools
Node
Runtime
1
Chrome only

The lightweight specialist. Pairs a small MCP server with a browser extension that exposes your current tab to the agent. The whole proposition is reuse what's already in your browser: your logins, your cookies, the page you're currently on, without spawning a separate Chrome instance or fiddling with remote-debugging ports. Smaller tool surface than the Microsoft or Google servers; no perf or network introspection. Useful when the agent's job is to do something on a page you're already authenticated on.

Pick this when
The flow is fundamentally "act on the tab I'm already looking at": an authenticated dashboard, an internal tool, and you don't need perf or network depth.
§ 03 / At a glance

The comparison matrix.

Dimension agent-browser Vercel Labs Playwright MCP Microsoft Chrome DevTools MCP Chrome team Browser MCP browsermcp.io
Transport CLI: invoked per command from the shell MCP server (stdio) MCP server (stdio) MCP server + browser extension
Runtime Native Rust binary, no Node required Node.js (npx @playwright/mcp) Node.js (npx chrome-devtools-mcp) Node.js + extension
Token cost in prompt ~0 (no tool defs registered; agent reads --help) ~12k (25+ tool defs) ~18k full / ~6k slim ~5k (smaller surface)
Cross-browser Chromium only Chromium, Firefox, WebKit + 143 device profiles Chrome only Chrome only
Page targeting Accessibility-tree refs (@e1, @e2) Accessibility-tree snapshots Accessibility-tree + CDP IDs Accessibility-tree via extension
Perf / debugging tools No (driving only) No (driving only) Yes: performance traces, network, Lighthouse, CrUX No
Reuse your real Chrome session Persistent profile via daemon Via --extension bridge Native via --autoConnect (Chrome 144+) Native via extension
Setup floor npm i -g or brew install MCP config in agent + npx MCP config in agent + npx MCP config + install extension
Works with Any shell-capable agent: Claude Code, Codex, Cursor, Copilot, Gemini, opencode MCP-aware agents (Claude Code, Cursor, Copilot, Cline, VS Code) MCP-aware agents (Claude Code, Cursor, Gemini CLI, Antigravity) MCP-aware agents
License MIT Apache-2.0 Apache-2.0 MIT
§ 04 / Decision

If this, then that.

Context window matters more than tooling
You're already burning tokens on MCP servers and your agent's tool selection feels noisy.
Pick agent-browser. A CLI registers no tool definitions; the agent learns commands on demand. Same accessibility-tree primitive as Playwright MCP, ~0 prompt overhead, no Node.
You need Firefox or WebKit, period
Cross-browser is the whole reason you're automating the browser in the first place.
Pick Playwright MCP. It's the only one of the four with real Firefox and WebKit support. The cross-browser surface is the moat.
The agent's job is "figure out why this is slow / broken"
You want perf traces, network waterfalls, source-mapped console errors, DevTools data, not just clicks.
Pick Chrome DevTools MCP. performance_start_trace and friends are the differentiator; nothing else in this set has them. Run with --slim if you only need the driving subset.
The agent operates on your real, logged-in browser
You don't want a fresh sandbox: you want it acting on the dashboard, the issue tracker, the admin tool you're already in.
Chrome DevTools MCP --autoConnect for the cleanest path (one approval prompt, no extension). Browser MCP if you want a lighter surface and don't need perf/network depth.
You're driving multiple agents that all need browser access
Claude Code on one machine, Codex on another, all sharing the same automation primitives.
Pick agent-browser. Any agent that can run shell can use it; no per-client MCP integration to maintain. The CLI is the integration.
You want the safest minimum to evaluate
First time wiring a browser into an agent loop; you want the path that fails closed.
Start with Chrome DevTools MCP in default mode (isolated profile, no --autoConnect). It's official, well-documented, and doesn't touch your real session until you opt in.
Editorial · for the curious

The interesting question isn't which is best: it's whether the MCP-server era is permanent.

agent-browser is the most architecturally distinct of the four, and the one most worth watching. Not because the Rust binary is faster than Node, that's a small win, but because it's a public bet that CLI is the right shape for agent tooling, and that MCP's tool-definitions-in-prompt model is the wrong shape for anything that doesn't need persistent state.

If that bet is right, the next year of agent tooling looks more like Unix and less like RPC: small composable binaries the agent learns on demand, rather than always-on servers that consume context just by existing. If it's wrong, MCP wins on developer ergonomics and the token cost gets absorbed by cheaper models. The empirical answer is probably both, for different jobs, but the argument is live, and worth watching.

Project knowledge · related
html-effectiveness-catalog: the visual register and card conventions used in this artifact.
External · cited
Steve Kinney, "Playwright vs. Chrome DevTools MCP: Driving vs. Debugging", source of the framing in §01.
Primary sources
agent-browser.dev · github.com/microsoft/playwright-mcp · github.com/ChromeDevTools/chrome-devtools-mcp · browsermcp.io
Caveat
Star counts and tool counts are accurate as of May 2026 and will drift; the architectural shape of each tool is more stable than the numbers.