A single CLAUDE.md distilled from Karpathy's complaints about how LLMs write code: that the models make wrong assumptions silently, overcomplicate, and touch code they shouldn't. Four behavioral rules, one file, packaged as a Claude Code plugin. Below: the rules, the failure mode each one closes, and one paired wrong-vs-right specimen for every principle: taken straight from the repo's EXAMPLES.md.
In late January 2026, Karpathy posted a thread on X complaining that current models "make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should", and that they "really like to overcomplicate code and APIs, bloat abstractions … implement a bloated construction over 1000 lines when 100 would do."
Forrest Chang's repo turns the complaints into four imperatives in a single CLAUDE.md: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution. Each one names a failure mode from the thread and writes the contract that closes it. The same file ships three ways, as a Claude Code plugin via /plugin install, as a drop-in CLAUDE.md for an existing project, and as a Cursor rule under .cursor/rules/.
This is the same repo Mnimiy treats as the "4-rule origin point" in claude-md.html: that essay argues the four rules are necessary but incomplete for a May 2026 agent ecosystem (multi-step plans, hook cascades, multi-codebase work) and proposes eight more on top. The four below are the foundation that argument sits on. This sheet renders the foundation as a sibling specimen to the html-effectiveness catalog: each principle on its own, with the paired diff that shows what compliance actually looks like.
Don't assume. Don't hide confusion. Surface tradeoffs. Models pick an interpretation silently and run with it. This rule forces the interpretation step into the open before any code is written.
"They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." The simplest fix is the most disciplined one, refuse to start coding while the interpretation is still ambiguous.
# Adds caching, indexes, AND async # without asking which "faster" # the user actually meant. @lru_cache(maxsize=1000) async def search(query: str) -> List[Result]: #... 200 lines of optimization, # spanning three different goals, # none of them verified to be # the thing the user wanted....
Why it's wrong: "Faster" has at least three meanings, response time, throughput, perceived speed. The model picks one (or all three) without checking, then writes code for the wrong problem.
"Make the search faster" could mean:
Current search takes ~500ms for typical queries. Which aspect matters most for your use case?
Why it's right: The interpretations are named, costed, and laid out before any code is written. The user points at one. Zero rework.
Minimum code that solves the problem. Nothing speculative. Combat the tendency to add abstractions, configurability, and error paths for scenarios that aren't in the request.
The "overcomplicated" code isn't obviously wrong: it follows real design patterns. The problem is timing: complexity is added before it's earned, which makes the code harder to read, harder to test, and harder to delete later.
from abc import ABC, abstractmethod from dataclasses import dataclass class DiscountStrategy(ABC): @abstractmethod def calculate(self, amount)... class PercentageDiscount(DiscountStrategy): def __init__(self, percentage): self.percentage = percentage def calculate(self, amount): return amount * (self.percentage / 100) class FixedDiscount(DiscountStrategy)... @dataclass class DiscountConfig: strategy: DiscountStrategy min_purchase: float = 0.0 max_discount: float = float('inf') class DiscountCalculator: # 30+ lines of setup to do one # multiplication. Nobody asked.
Why it's wrong: A pattern designed for multiple discount types applied to a request for one. The cost is real: readers now have to understand the abstraction to use the function.
def calculate_discount(amount: float, percent: float) -> float: """Calculate discount. percent is 0 to 100.""" return amount * (percent / 100) # Usage: discount = calculate_discount(100.0, 10.0) # → $10 off. Done.
Why it's right: If a second discount type lands later, refactor then. The strategy pattern is two minutes of work once you actually need it, and zero minutes now, because you don't.
Touch only what you must. Clean up only your own mess. When editing existing code, every changed line should trace directly to the user's request, no drive-by reformatting, no "while I'm here" refactors.
The model treats a bug-fix request as a license to "improve" the file: rewrites quote style, adds type hints, reformats whitespace, drops dead-looking code that wasn't actually dead. The diff becomes unreviewable; the bug fix is hidden in noise.
def validate_user(user_data): - # Check email format - if not user_data.get('email'): + """Validate user data.""" + email = user_data.get('email', '').strip() + if not email: raise ValueError("Email required") - if '@' not in user_data['email']: + if '@' not in email or '.' not in email.split('@')[1]: raise ValueError("Invalid email") - # Check username - if not user_data.get('username'): + username = user_data.get('username', '').strip() + if not username: raise ValueError("Username required") + if len(username) < 3: + raise ValueError("Username too short") + if not username.isalnum(): + raise ValueError("Username must be alphanumeric") return True
Why it's wrong: "Improved" email validation past the bug fix. Added username validation that wasn't asked for. Rewrote comments. Added a docstring. The bug fix is buried.
def validate_user(user_data): # Check email format - if not user_data.get('email'): + email = user_data.get('email', '') + if not email or not email.strip(): raise ValueError("Email required") # Basic email validation - if '@' not in user_data['email']: + if '@' not in email: raise ValueError("Invalid email") # Check username if not user_data.get('username'): raise ValueError("Username required") return True
Why it's right: Only the lines that fix empty-email handling were touched. Username code, comments, and style are untouched. Every changed line traces directly to the request.
Define success criteria. Loop until verified. Karpathy's central insight: models are exceptionally good at looping toward a checkable target, but only if you give them one. Turn imperative tasks into verifiable goals.
Weak success criteria ("make it work," "fix the bug," "improve it") force the model to keep asking. Strong criteria: a failing test, a benchmark number, a passing assertion, let it iterate on its own. The principle isn't about testing; it's about specifying.
# Implements full rate limiting
# with Redis, multiple strategies,
# a configuration system, and
# monitoring: in one 300-line
# commit. No clear verification.
No way to tell which piece works. No way to ship incrementally. If something breaks, the bisect surface is the whole feature.
Why it's wrong: One opaque step. The "success criterion" is "the feature exists": too weak to loop on, too coarse to debug.
Plan for rate limiting:
/users and /posts; existing tests still pass./search allows 10/min, /users allows 100/min.Each step independently verifiable and deployable. Start with step 1?
Why it's right: Every step has a concrete pass/fail check. The agent can loop on each one alone. If step 3 breaks, steps 1 to 2 already shipped.
The repo's EXAMPLES.md closes with a one-row-per-principle summary. It's the densest version of the contract, useful to keep open while reviewing a PR an agent wrote, or as a checklist before merging.
| Principle | Anti-pattern | Fix |
|---|---|---|
| Think Before Coding TH · No. 01 | Silently assumes file format, fields, scope. | List assumptions explicitly. Ask for clarification on the ones that branch the work. |
| Simplicity First SI · No. 02 | Strategy pattern for a single discount calculation. | One function until complexity is actually needed. Refactor on the second use, not the first. |
| Surgical Changes SU · No. 03 | Reformats quotes, adds type hints, drops comments, while "fixing" the bug. | Only change lines that fix the reported issue. Match existing style verbatim. |
| Goal-Driven Execution GO · No. 04 | "I'll review the code and make improvements." | "Write test for bug X → make it pass → verify no regressions." Loop on a checkable target. |
The "overcomplicated" examples aren't obviously wrong: they follow design patterns and best practices. The problem is timing: they add complexity before it's earned.From the repo's Key Insight · forrestchang/andrej-karpathy-skills
Inside Claude Code, add the marketplace, then install the plugin. Available across all projects.
/plugin marketplace add forrestchang/andrej-karpathy-skills /plugin install andrej-karpathy-skills@karpathy-skills
CLAUDE.mdFor a new project: single curl, single file:
curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md
The repo ships a Cursor rule at .cursor/rules/karpathy-guidelines.mdc. Open the project in Cursor and the rule activates automatically.
"These guidelines bias toward caution over speed." For trivial work: a typo fix, an obvious one-liner: the full rigor is overkill, and the README says so. The goal is reducing costly mistakes on non-trivial work, not slowing down simple ones. Treat the four principles as the default, not the floor.