Repository

github.com/forrestchang/andrej-karpathy-skills

Primary source

Andrej Karpathy · X · Jan 2026

Packaged by

Forrest Chang · @jiayuan_jy

Form

Claude Code plugin · single CLAUDE.md

Four rules for an
agent that
writes your code.

A single CLAUDE.md distilled from Karpathy's complaints about how LLMs write code: that the models make wrong assumptions silently, overcomplicate, and touch code they shouldn't. Four behavioral rules, one file, packaged as a Claude Code plugin. Below: the rules, the failure mode each one closes, and one paired wrong-vs-right specimen for every principle: taken straight from the repo's EXAMPLES.md.

Principles

File: drop into project root

~75

Lines of behavioral contract

∞

Mergeable with project rules

Provenance · what's being packaged

In late January 2026, Karpathy posted a thread on X complaining that current models "make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should", and that they "really like to overcomplicate code and APIs, bloat abstractions … implement a bloated construction over 1000 lines when 100 would do."

Forrest Chang's repo turns the complaints into four imperatives in a single CLAUDE.md: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution. Each one names a failure mode from the thread and writes the contract that closes it. The same file ships three ways, as a Claude Code plugin via /plugin install, as a drop-in CLAUDE.md for an existing project, and as a Cursor rule under .cursor/rules/.

"LLMs are exceptionally good at looping until they meet specific goals … Don't tell it what to do, give it success criteria and watch it go." Karpathy · the line behind Principle 4

This is the same repo Mnimiy treats as the "4-rule origin point" in claude-md.html: that essay argues the four rules are necessary but incomplete for a May 2026 agent ecosystem (multi-step plans, hook cascades, multi-codebase work) and proposes eight more on top. The four below are the foundation that argument sits on. This sheet renders the foundation as a sibling specimen to the html-effectiveness catalog: each principle on its own, with the paired diff that shows what compliance actually looks like.

No. 01 · TH01

Think before coding.

Don't assume. Don't hide confusion. Surface tradeoffs. Models pick an interpretation silently and run with it. This rule forces the interpretation step into the open before any code is written.

Closes

Silent assumption /
hidden ambiguity

Karpathy's complaint

The models make wrong assumptions on your behalf and just run along with them without checking.

"They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should." The simplest fix is the most disciplined one, refuse to start coding while the interpretation is still ambiguous.

The four sub-rules

a.State your assumptions explicitly. If uncertain, ask rather than guess.
b.If multiple interpretations exist, present them: don't pick silently.
c.If a simpler approach exists, say so. Push back when warranted.
d.If something is unclear, stop. Name what's confusing. Ask.

User request · "Make the search faster."

SPEC-TH-01 · from EXAMPLES.md

×Pick silently

Wrong

# Adds caching, indexes, AND async
# without asking which "faster"
# the user actually meant.

@lru_cache(maxsize=1000)
async def search(query: str) -> List[Result]:
    #... 200 lines of optimization,
    # spanning three different goals,
    # none of them verified to be
    # the thing the user wanted....

Why it's wrong: "Faster" has at least three meanings, response time, throughput, perceived speed. The model picks one (or all three) without checking, then writes code for the wrong problem.

✓Surface the fork

Right

"Make the search faster" could mean:

Faster response time (<100ms instead of 500ms) · DB indexes, query caching. ~2 hrs.
Handle more concurrent searches: async/await, connection pooling. ~4 hrs.
Faster perceived speed: partial results, progressive load. ~3 hrs.

Current search takes ~500ms for typical queries. Which aspect matters most for your use case?

Why it's right: The interpretations are named, costed, and laid out before any code is written. The user points at one. Zero rework.

No. 02 · SI02

Simplicity first.

Minimum code that solves the problem. Nothing speculative. Combat the tendency to add abstractions, configurability, and error paths for scenarios that aren't in the request.

Closes

Premature abstraction /
scope creep

Karpathy's complaint

They really like to overcomplicate code and APIs, bloat abstractions, implement a bloated construction over 1000 lines when 100 would do.

The "overcomplicated" code isn't obviously wrong: it follows real design patterns. The problem is timing: complexity is added before it's earned, which makes the code harder to read, harder to test, and harder to delete later.

The five sub-rules

a.No features beyond what was asked.
b.No abstractions for single-use code.
c.No "flexibility" or "configurability" that wasn't requested.
d.No error handling for impossible scenarios.
e.If you write 200 lines and it could be 50, rewrite it. The test: would a senior engineer call this overcomplicated?

User request · "Add a function to calculate discount."

SPEC-SI-01 · from EXAMPLES.md

×Strategy pattern, 30+ lines of setup

Wrong

from abc import ABC, abstractmethod
from dataclasses import dataclass

class DiscountStrategy(ABC):
    @abstractmethod
    def calculate(self, amount)...

class PercentageDiscount(DiscountStrategy):
    def __init__(self, percentage):
        self.percentage = percentage
    def calculate(self, amount):
        return amount * (self.percentage / 100)

class FixedDiscount(DiscountStrategy)...

@dataclass
class DiscountConfig:
    strategy: DiscountStrategy
    min_purchase: float = 0.0
    max_discount: float = float('inf')

class DiscountCalculator:
    # 30+ lines of setup to do one
    # multiplication. Nobody asked.

Why it's wrong: A pattern designed for multiple discount types applied to a request for one. The cost is real: readers now have to understand the abstraction to use the function.

✓One function, three lines

Right

def calculate_discount(amount: float,
                       percent: float) -> float:
    """Calculate discount. percent is 0 to 100."""
    return amount * (percent / 100)


# Usage:
discount = calculate_discount(100.0, 10.0)
# → $10 off. Done.

Why it's right: If a second discount type lands later, refactor then. The strategy pattern is two minutes of work once you actually need it, and zero minutes now, because you don't.

No. 03 · SU03

Surgical changes.

Touch only what you must. Clean up only your own mess. When editing existing code, every changed line should trace directly to the user's request, no drive-by reformatting, no "while I'm here" refactors.

Closes

Orthogonal damage /
style drift

Karpathy's complaint

They still sometimes change or remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task.

The model treats a bug-fix request as a license to "improve" the file: rewrites quote style, adds type hints, reformats whitespace, drops dead-looking code that wasn't actually dead. The diff becomes unreviewable; the bug fix is hidden in noise.

When editing existing code

a.Don't "improve" adjacent code, comments, or formatting.
b.Don't refactor things that aren't broken.
c.Match existing style, even if you'd do it differently.
d.If you notice unrelated dead code, mention it: don't delete it.
e.Remove orphans your changes created. Don't remove pre-existing dead code unless asked.

User request · "Fix the bug where empty emails crash the validator."

SPEC-SU-01 · from EXAMPLES.md

×Drive-by refactoring

Wrong

  def validate_user(user_data):
-     # Check email format
-     if not user_data.get('email'):
+     """Validate user data."""
+     email = user_data.get('email', '').strip()
+     if not email:
          raise ValueError("Email required")
-     if '@' not in user_data['email']:
+     if '@' not in email or '.' not in email.split('@')[1]:
          raise ValueError("Invalid email")

-     # Check username
-     if not user_data.get('username'):
+     username = user_data.get('username', '').strip()
+     if not username:
          raise ValueError("Username required")
+     if len(username) < 3:
+         raise ValueError("Username too short")
+     if not username.isalnum():
+         raise ValueError("Username must be alphanumeric")

      return True

Why it's wrong: "Improved" email validation past the bug fix. Added username validation that wasn't asked for. Rewrote comments. Added a docstring. The bug fix is buried.

✓Two lines

Right

  def validate_user(user_data):
      # Check email format
-     if not user_data.get('email'):
+     email = user_data.get('email', '')
+     if not email or not email.strip():
          raise ValueError("Email required")

      # Basic email validation
-     if '@' not in user_data['email']:
+     if '@' not in email:
          raise ValueError("Invalid email")

      # Check username
      if not user_data.get('username'):
          raise ValueError("Username required")

      return True

Why it's right: Only the lines that fix empty-email handling were touched. Username code, comments, and style are untouched. Every changed line traces directly to the request.

No. 04 · GO04

Goal-driven execution.

Define success criteria. Loop until verified. Karpathy's central insight: models are exceptionally good at looping toward a checkable target, but only if you give them one. Turn imperative tasks into verifiable goals.

Closes

Vague success /
no verification loop

Karpathy's complaint

Models are exceptionally good at looping until they meet specific goals … Don't tell it what to do, give it success criteria and watch it go.

Weak success criteria ("make it work," "fix the bug," "improve it") force the model to keep asking. Strong criteria: a failing test, a benchmark number, a passing assertion, let it iterate on its own. The principle isn't about testing; it's about specifying.

The transform

a."Add validation" → "Write tests for invalid inputs, then make them pass."
b."Fix the bug" → "Write a test that reproduces it, then make it pass."
c."Refactor X" → "Ensure tests pass before and after."
d.For multi-step tasks, state a brief plan: "[Step] → verify: [check]" on each line. Each step independently verifiable, independently deployable.

User request · "Add rate limiting to the API."

SPEC-GO-01 · from EXAMPLES.md

×All-at-once

Wrong

# Implements full rate limiting
# with Redis, multiple strategies,
# a configuration system, and
# monitoring: in one 300-line
# commit. No clear verification.

No way to tell which piece works. No way to ship incrementally. If something breaks, the bisect surface is the whole feature.

Why it's wrong: One opaque step. The "success criterion" is "the feature exists": too weak to loop on, too coarse to debug.

✓Step-by-step, each verifiable

Right

Plan for rate limiting:

Basic in-memory limit on one endpoint. Verify: test: 100 requests, first 10 succeed, rest get 429. Curl confirms.
Extract to middleware (applied to all endpoints). Verify: limits apply to /users and /posts; existing tests still pass.
Add Redis backend for multi-server. Verify: limit persists across restarts; two app instances share counter.
Per-endpoint config. Verify: /search allows 10/min, /users allows 100/min.

Each step independently verifiable and deployable. Start with step 1?

Why it's right: Every step has a concrete pass/fail check. The agent can loop on each one alone. If step 3 breaks, steps 1 to 2 already shipped.

No. 05§

The four anti-patterns: a reference card.

The repo's EXAMPLES.md closes with a one-row-per-principle summary. It's the densest version of the contract, useful to keep open while reviewing a PR an agent wrote, or as a checklist before merging.

Principle	Anti-pattern	Fix
Think Before Coding TH · No. 01	Silently assumes file format, fields, scope.	List assumptions explicitly. Ask for clarification on the ones that branch the work.
Simplicity First SI · No. 02	Strategy pattern for a single discount calculation.	One function until complexity is actually needed. Refactor on the second use, not the first.
Surgical Changes SU · No. 03	Reformats quotes, adds type hints, drops comments, while "fixing" the bug.	Only change lines that fix the reported issue. Match existing style verbatim.
Goal-Driven Execution GO · No. 04	"I'll review the code and make improvements."	"Write test for bug X → make it pass → verify no regressions." Loop on a checkable target.

§ The thing underneath all four

The "overcomplicated" examples aren't obviously wrong: they follow design patterns and best practices. The problem is timing: they add complexity before it's earned.

From the repo's Key Insight · forrestchang/andrej-karpathy-skills

No. 06 · How to know it's working

The diff tells you
the rules are live.

✓Fewer unnecessary changes in diffs. Only the requested lines appear. The reviewer doesn't have to mentally subtract style noise.
✓Fewer rewrites due to overcomplication. The first version is the version that ships.
✓Clarifying questions come before implementation. Not after the mistake landed.
✓Clean, minimal PRs. No drive-by refactoring or speculative "improvements" sneaking in alongside the actual change.

No. 07 · Install

Three ways to load it.

karpathy-guidelines

v1.0.0 · MIT

A · Recommended

Claude Code plugin

Inside Claude Code, add the marketplace, then install the plugin. Available across all projects.

/plugin marketplace add forrestchang/andrej-karpathy-skills
/plugin install andrej-karpathy-skills@karpathy-skills

B · Per-project

Drop in a `CLAUDE.md`

For a new project: single curl, single file:

curl -o CLAUDE.md https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md

C · Cursor

Project rule, already committed

The repo ships a Cursor rule at .cursor/rules/karpathy-guidelines.mdc. Open the project in Cursor and the rule activates automatically.

⚖

One tradeoff the repo states out loud.

"These guidelines bias toward caution over speed." For trivial work: a typo fix, an obvious one-liner: the full rigor is overkill, and the README says so. The goal is reducing costly mistakes on non-trivial work, not slowing down simple ones. Treat the four principles as the default, not the floor.

Four rules for an agent that writes your code.

Think before coding.

The models make wrong assumptions on your behalf and just run along with them without checking.

Simplicity first.

They really like to overcomplicate code and APIs, bloat abstractions, implement a bloated construction over 1000 lines when 100 would do.

Surgical changes.

They still sometimes change or remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task.

Goal-driven execution.

Models are exceptionally good at looping until they meet specific goals … Don't tell it what to do, give it success criteria and watch it go.

The four anti-patterns: a reference card.

The diff tells youthe rules are live.

Three ways to load it.

Claude Code plugin

Drop in a CLAUDE.md

Project rule, already committed

One tradeoff the repo states out loud.

Four rules for an
agent that
writes your code.

The diff tells you
the rules are live.

Drop in a `CLAUDE.md`