This is the single reference for prompt engineering with Claude's latest models, Opus 4.7, Opus 4.6, Sonnet 4.6, and Haiku 4.5. It covers foundational techniques, output control, tool use, thinking, and agentic systems. Jump to the section that matches your situation.
Prompting Claude Opus 4.7
Response length and verbosity
Opus 4.7 calibrates response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity. This usually means shorter answers on simple lookups and much longer ones on open-ended analysis.
If your product depends on a certain style or verbosity, you may need to tune your prompts. To decrease verbosity, you might add:
promptProvide concise, focused responses. Skip non-essential context, and keep examples minimal.
If you see specific kinds of verbosity (over-explaining, redundant summaries), add targeted instructions to prevent them. Positive examples showing how Claude should communicate with the appropriate level of concision tend to be more effective than negative instructions telling the model what not to do.
Calibrating effort and thinking depth
The effort parameter tunes Claude's intelligence vs.
token spend, trading off capability for faster speed and lower cost. Start with the
new xhigh level for coding and agentic use cases, and use
a minimum of high for most intelligence-sensitive work.
Meaningfully changing from 4.6, Opus 4.7 respects effort levels strictly,
especially at the low end. At low and
medium, the model scopes its work to what was asked rather
than going above and beyond. This is good for latency and cost, but on moderately
complex tasks at low there's some risk of under-thinking.
If you observe shallow reasoning on complex problems, raise effort to
high or xhigh rather than
prompting around it. If you need to keep effort low for latency, add targeted
guidance:
promptThis task involves multi-step reasoning. Think carefully through the problem before responding.
The triggering behavior for adaptive thinking is steerable. If the model is thinking more often than you'd like, which can happen with large or complex system prompts, steer it:
promptThinking adds latency and should only be used when it will meaningfully improve answer quality: typically for problems that require multi-step reasoning. When in doubt, respond directly.
If you are running Opus 4.7 at max or
xhigh, set a large max-output-token budget so the model
has room to think and act across its subagents and tool calls. Anthropic recommends
starting at 64k tokens and tuning from there.
Tool use triggering
Opus 4.7 has a tendency to use tools less often than 4.6 and to use reasoning
more. This produces better results in most cases. However, increasing effort is a
useful lever to increase tool usage, particularly in knowledge work.
high and xhigh show
substantially more tool usage in agentic search and coding. For scenarios where you
want more tool use, you can also adjust your prompt to explicitly describe when and
how the model should use its tools.
User-facing progress updates
Opus 4.7 provides more regular, higher-quality updates throughout long agentic traces. If you've added scaffolding to force interim status messages ("after every 3 tool calls, summarize progress"), try removing it. If the length or contents of the model's user-facing updates aren't well-calibrated to your case, describe what they should look like and provide examples.
More literal instruction following
Opus 4.7 interprets prompts more literally than 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside is precision and less thrash; it generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. If you need an instruction applied broadly, state the scope explicitly, "Apply this formatting to every section, not just the first one."
Tone and writing style
As with any new model, prose style on long-form writing may shift. Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than 4.6's warmer style. If your product relies on a specific voice, re-evaluate style prompts against the new baseline. For warmer, more conversational voice:
promptUse a warm, collaborative tone. Acknowledge the user's framing before answering.
Controlling subagent spawning
Opus 4.7 tends to spawn fewer subagents by default. This behavior is steerable, give explicit guidance about when subagents are desirable:
prompt: coding agentDo not spawn a subagent for work you can complete directly in a single response (e.g. refactoring a function you can already see).
Spawn multiple subagents in the same turn when fanning out across items or reading multiple files.
Design and frontend defaults
Opus 4.7 has stronger design instincts than 4.6, with a consistent default house
style: warm cream backgrounds (~#F4F1EA), serif display
type (Georgia, Fraunces, Playfair), italic word-accents, and a terracotta accent.
This reads well for editorial, hospitality, and portfolio briefs, but feels off for
dashboards, dev tools, fintech, healthcare, or enterprise apps, and it shows up in
slide decks as well as web UIs.
This default is persistent. Generic instructions ("don't use cream," "make it clean and minimal") tend to shift the model to a different fixed palette rather than producing variety. Two approaches work reliably:
1. Specify a concrete alternative
The model follows explicit specs precisely:
prompt: concrete briefDesign a desktop landing page for a supplement brand called AEFRM.
The visual direction should come from a cold monochrome atmosphere using pale silver-gray tones that gradually deepen into blue-gray and near-black, similar to a misted metallic surface.
The page should feel sharp and controlled, with a strong sense of structure and restraint.
Use this tonal system across the full page instead of introducing bright accent colors.
Use the uploaded image on the hero design in black and white.
The layout should be built with clear horizontal sections and a centered max-width container. Use 4px corner radius consistently across cards, buttons, inputs, and media frames. Margins should feel generous, with enough empty space around each section so the page breathes.
Typography should use a square, angular sans-serif with wider letter spacing than usual, especially in headings and navigation, so the text feels more engineered and less compressed. Headline text can be large and uppercase, while supporting copy remains short and sparse.
For the structure, start with a hero section containing a strong product statement, one short supporting paragraph, and a clean product placeholder or packshot frame. Below that, add a benefit grid with three or four blocks, then a formulation or ingredients section, and finally a cta.
Buttons should be flat and precise, with subtle hover changes using transition: all 160ms ease out where brightness and border contrast shift slightly rather than using dramatic motion.
Color palette should stay within this range:
#E9ECEC, #C9D2D4, #8C9A9E, #44545B, #11171B.
2. Have the model propose options before building
This breaks the default and gives users control. If you previously relied on
temperature for design variety, use this approach: it
produces meaningfully different directions across runs:
prompt: options firstBefore building, propose 4 distinct visual directions tailored to this brief (each as: bg hex / accent hex / typeface, one-line rationale). Ask the user to pick one, then implement only that direction.
Opus 4.7 also requires less frontend-design prompting than earlier models to avoid the "AI slop" aesthetic. This minimal snippet works well alongside the variety techniques above:
system snippet<frontend_aesthetics>
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white or dark backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character. Use unique fonts, cohesive colors and themes, and animations for effects and micro-interactions.
</frontend_aesthetics>
Interactive coding products
Token usage and behavior differ between autonomous, asynchronous agents with a single user turn and interactive, synchronous agents with multiple turns. Opus 4.7 tends to use more tokens in interactive settings, primarily because it reasons more after user turns. This improves long-horizon coherence and instruction following, but comes with higher token usage.
To maximize both performance and token efficiency in coding products: use
xhigh or high effort, add
autonomous features like an auto mode, and reduce the number of human
interactions required from your users. When limiting required interactions,
specify the task, intent, and relevant constraints up front in the first human turn.
Ambiguous or underspecified prompts conveyed progressively over multiple turns tend
to reduce token efficiency and sometimes performance.
Code review harnesses
Opus 4.7 is meaningfully better at finding bugs than prior models, with both higher recall and precision in evals, 11pp better recall in one of Anthropic's hardest bug-finding evals based on real PRs. However, if your code-review harness was tuned for an earlier model, you may initially see lower recall. This is likely a harness effect, not a capability regression.
When a review prompt says "only report high-severity issues," "be conservative," or "don't nitpick," Opus 4.7 may follow that instruction more faithfully than earlier models. It may investigate the code just as thoroughly, identify the bugs, and then not report findings it judges below your stated bar. Precision typically rises but measured recall can fall, even though the underlying bug-finding ability has improved.
prompt: coverage stageReport every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage - a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a real bug. For each finding, include your confidence level and an estimated severity so a downstream filter can rank them.
This prompt can be used without an actual second step, but moving confidence filtering out of the finding step often helps. If you do want a single-pass self-filter, be concrete about the bar rather than using qualitative terms, for example, "report any bugs that could cause incorrect behavior, a test failure, or a misleading result; only omit nits like pure style or naming preferences."
Computer use
Computer use works across resolutions, up to a new maximum of 2576px / 3.75MP. In testing, 1080p provides a good balance of performance and cost. For cost-sensitive workloads, 720p or 1366×768 are strong lower-cost options. Conduct your own testing, experimenting with effort settings can also help tune behavior.
General principles
Be clear and direct
Claude responds well to clear, explicit instructions. Being specific about your desired output enhances results. If you want "above and beyond" behavior, request it explicitly rather than relying on the model to infer it from vague prompts.
Think of Claude as a brilliant but new employee who lacks context on your norms and workflows. The more precisely you explain what you want, the better the result.
Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they'd be confused, Claude will be too.
Be specific about the desired output format and constraints. Provide instructions as sequential steps using numbered lists when order or completeness matters.
Example: an analytics dashboard
Create an analytics dashboard
Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation.
Add context to improve performance
Providing context or motivation behind your instructions: explaining why such behavior is important: helps Claude better understand your goals and deliver more targeted responses.
Example: formatting preferences
NEVER use ellipses
Your response will be read aloud by a text-to-speech engine, so never use ellipses since the text-to-speech engine will not know how to pronounce them.
Claude is smart enough to generalize from the explanation.
Use examples effectively
Examples are one of the most reliable ways to steer Claude's output format, tone, and structure. A few well-crafted examples (few-shot or multishot prompting) can dramatically improve accuracy and consistency.
When adding examples, make them:
- Relevant: mirror your actual use case closely.
- Diverse: cover edge cases and vary enough that Claude doesn't pick up unintended patterns.
- Structured: wrap examples in
<example>tags (multiple in<examples>tags) so Claude can distinguish them from instructions.
Include 3 to 5 examples for best results. You can also ask Claude to evaluate your examples for relevance and diversity, or generate additional ones based on your initial set.
Structure prompts with XML tags
XML tags help Claude parse complex prompts unambiguously, especially when your prompt
mixes instructions, context, examples, and variable inputs. Wrapping each type of
content in its own tag (<instructions>,
<context>, <input>)
reduces misinterpretation.
Best practices: use consistent, descriptive tag names across your prompts; nest tags
when content has a natural hierarchy (documents inside
<documents>, each inside
<document index="n">).
Give Claude a role
Setting a role in the system prompt focuses Claude's behavior and tone for your use case. Even a single sentence makes a difference:
pythonimport anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system="You are a helpful coding assistant specializing in Python.",
messages=[
{"role": "user", "content": "How do I sort a list of dictionaries by key?"}
],
)
print(message.content)
Long-context prompting
When working with large documents or data-rich inputs (20k+ tokens), structure your prompt carefully:
- Put longform data at the top. Place long documents and inputs above your query, instructions, and examples. This can significantly improve performance across all models.
-
Structure with XML. Wrap each document in
<document>tags with<document_content>and<source>subtags. - Ground responses in quotes. Ask Claude to quote relevant parts of the documents first, before carrying out its task. This helps cut through noise.
Queries placed at the end of long-context prompts can improve response quality by up to 30% in tests, especially with complex, multi-document inputs.
xml: multi-document<documents>
<document index="1">
<source>annual_report_2023.pdf</source>
<document_content>
{{ANNUAL_REPORT}}
</document_content>
</document>
<document index="2">
<source>competitor_analysis_q2.xlsx</source>
<document_content>
{{COMPETITOR_ANALYSIS}}
</document_content>
</document>
</documents>
Analyze the annual report and competitor analysis. Identify strategic advantages and recommend Q3 focus areas.
xml: quote groundingYou are an AI physician's assistant. Your task is to help doctors diagnose possible patient illnesses.
<documents>
<document index="1">
<source>patient_symptoms.txt</source>
<document_content>{{PATIENT_SYMPTOMS}}</document_content>
</document>
<document index="2">
<source>patient_records.txt</source>
<document_content>{{PATIENT_RECORDS}}</document_content>
</document>
</documents>
Find quotes from the patient records that are relevant to diagnosing the reported symptoms. Place these in <quotes> tags. Then, based on these quotes, list all information that would help the doctor diagnose the patient's symptoms. Place your diagnostic information in <info> tags.
Model self-knowledge
If you want Claude to identify itself correctly in your application:
prompt: identityThe assistant is Claude, created by Anthropic. The current model is Claude Opus 4.7.
For LLM-powered apps that need to specify model strings:
prompt: model stringWhen an LLM is needed, please default to Claude Opus 4.7 unless the user requests otherwise. The exact model string for Claude Opus 4.7 is claude-opus-4-7.
Output and formatting
Communication style and verbosity
The latest models have a more concise, natural style than earlier ones:
- More direct and grounded: fact-based progress reports, not self-celebratory updates.
- More conversational: slightly more fluent and colloquial, less machine-like.
- Less verbose: may skip detailed summaries for efficiency unless prompted.
This means Claude may skip verbal summaries after tool calls, jumping straight to the next action. For more visibility into reasoning:
promptAfter completing a task that involves tool use, provide a quick summary of the work you've done.
Control the format of responses
A few particularly effective ways to steer output formatting:
-
Tell Claude what to do, not what not to do.
Instead of: "Do not use markdown in your response"
Try: "Your response should be composed of smoothly flowing prose paragraphs." -
Use XML format indicators.
Try: "Write the prose sections of your response in<smoothly_flowing_prose_paragraphs>tags." - Match prompt style to desired output style. If you're seeing steerability issues, mirror the format you want. Removing markdown from your prompt can reduce markdown in the output.
- Use detailed prompts for specific formatting preferences.
prompt: minimize markdown<avoid_excessive_markdown_and_bullet_points>
When writing reports, documents, technical explanations, analyses, or any long-form content, write in clear, flowing prose using complete paragraphs and sentences. Use standard paragraph breaks for organization and reserve markdown primarily for `inline code`, code blocks (```...```), and simple headings (###, and ###). Avoid using **bold** and *italics*.
DO NOT use ordered lists (1....) or unordered lists (*) unless: a) you're presenting truly discrete items where a list format is the best option, or b) the user explicitly requests a list or ranking
Instead of listing items with bullets or numbers, incorporate them naturally into sentences. This guidance applies especially to technical writing. Using prose instead of excessive formatting will improve user satisfaction. NEVER output a series of overly short bullet points.
Your goal is readable, flowing text that guides the reader naturally through ideas rather than fragmenting information into isolated points.
</avoid_excessive_markdown_and_bullet_points>
LaTeX output
Opus 4.6 defaults to LaTeX for mathematical expressions, equations, and technical explanations. If you prefer plain text:
promptFormat your response in plain text only. Do not use LaTeX, MathJax, or any markup notation such as \( \), $, or \frac{}{}. Write all math expressions using standard text characters (e.g. "/" for division, "*" for multiplication, and "^" for exponents).
Document creation
The latest models excel at creating presentations, animations, and visual documents with strong creative flair and instruction following. They produce polished, usable output on the first try in most cases.
promptCreate a professional presentation on [topic]. Include thoughtful design elements, visual hierarchy, and engaging animations where appropriate.
Migrating away from prefilled responses
Starting with 4.6 models and Claude Mythos Preview, prefilled responses on the last assistant turn are no longer supported. On Mythos Preview, requests with prefilled assistant messages return a 400 error. Adding assistant messages elsewhere in the conversation is not affected.
Controlling output formatting
Prefills have been used to force JSON, YAML, classification, and other structured outputs. Migration: use the Structured Outputs feature to constrain responses to a schema, or simply ask the model, newer models reliably match complex schemas when told to, especially with retries.
Eliminating preambles
Prefills like Here is the requested summary:\n were used
to skip introductory text. Migration: use direct instructions: "Respond directly without preamble. Do not start with phrases like 'Here is...',
'Based on...', etc." Alternatively, output within XML tags, use structured
outputs, or tool calling. Strip stray preambles in post-processing.
Avoiding bad refusals
Prefills were used to steer around unnecessary refusals. Migration:
Claude is much better at appropriate refusals now. Clear prompting in the
user message without prefill should be sufficient.
Continuations
Prefills were used to continue partial completions. Migration: move the continuation to the user message and include the final text of the interrupted response, "Your previous response was interrupted and ended with `[previous_response]`. Continue from where you left off." If there's no UX penalty, retry instead.
Context hydration and role consistency
Prefills were used to periodically inject refreshed context. Migration: for very long conversations, inject what were previously prefilled-assistant reminders into the user turn. For complex agentic systems, hydrate via tools or during context compaction.
Tool use
Tool usage
The latest models are trained for precise instruction following and benefit from explicit direction. If you say "can you suggest some changes," Claude will sometimes suggest rather than implement, even when making changes is what you intended.
Can you suggest some changes to improve this function?
Change this function to improve its performance. or: Make these edits to the authentication flow.
To make Claude more proactive by default, add to your system prompt:
prompt: default to action<default_to_action>
By default, implement changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover any missing details instead of guessing. Try to infer the user's intent about whether a tool call (e.g. file edit or read) is intended or not, and act accordingly.
</default_to_action>
Conversely, if you want the model to be more hesitant:
prompt: conservative<do_not_act_before_instructions>
Do not jump into implementation or change files unless clearly instructed to make changes. When the user's intent is ambiguous, default to providing information, doing research, and providing recommendations rather than taking action. Only proceed with edits, modifications, or implementations when the user explicitly requests them.
</do_not_act_before_instructions>
Opus 4.5 and 4.6 are more responsive to the system prompt than previous models. If your prompts were designed to reduce under-triggering, these models may now over-trigger. The fix is to dial back aggressive language, where you might have said "CRITICAL: You MUST use this tool when...", use normal prompting like "Use this tool when...".
Optimize parallel tool calling
The latest models excel at parallel tool execution:
- Run multiple speculative searches during research.
- Read several files at once to build context faster.
- Execute bash commands in parallel (which can even bottleneck system performance).
The behavior is easily steerable. To boost to ~100% or adjust aggression:
prompt: maximum parallelism<use_parallel_tool_calls>
If you intend to call multiple tools and there are no dependencies between the tool calls, make all of the independent tool calls in parallel. Prioritize calling tools simultaneously whenever the actions can be done in parallel rather than sequentially. For example, when reading 3 files, run 3 tool calls in parallel to read all 3 files into context at the same time. Maximize use of parallel tool calls where possible to increase speed and efficiency. However, if some tool calls depend on previous calls to inform dependent values like the parameters, do NOT call these tools in parallel and instead call them sequentially. Never use placeholders or guess missing parameters in tool calls.
</use_parallel_tool_calls>
prompt: reduce parallelismExecute operations sequentially with brief pauses between each step to ensure stability.
Thinking and reasoning
Overthinking and excessive thoroughness
Opus 4.6 does significantly more upfront exploration than earlier models, especially at higher effort settings. This often optimizes final results, but the model may gather extensive context or pursue multiple research threads without being prompted. If your prompts previously encouraged thoroughness, you should tune that:
- Replace blanket defaults with targeted instructions. Instead of "Default to using [tool]," say "Use [tool] when it would enhance your understanding of the problem."
- Remove over-prompting. Tools that under-triggered in previous models are likely to trigger appropriately now. "If in doubt, use [tool]" will cause over-triggering.
-
Use effort as a fallback. If Claude continues to be overly
aggressive, lower
effort.
In some cases, Opus 4.6 may think extensively, inflating thinking tokens and slowing responses. To constrain reasoning:
prompt: commit to an approachWhen you're deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning. If you're weighing two approaches, pick one and see it through. You can always course-correct later if the chosen approach fails.
Leverage thinking & interleaved thinking
Opus 4.6 and Sonnet 4.6 use adaptive thinking
(thinking: {type: "adaptive"}): Claude dynamically
decides when and how much to think, calibrated by the
effort parameter and query complexity. On easier queries
that don't require thinking, the model responds directly. Anthropic's evals show
adaptive thinking reliably driving better performance than extended thinking.
Use adaptive thinking for workloads that require agentic behavior: multi-step tool use, complex coding, long-horizon loops.
prompt: guide thinkingAfter receiving tool results, carefully reflect on their quality and determine optimal next steps before proceeding. Use your thinking to plan and iterate based on this new information, and then take the best next action.
To steer adaptive thinking down:
prompt: less thinkingExtended thinking adds latency and should only be used when it will meaningfully improve answer quality - typically for problems that require multi-step reasoning. When in doubt, respond directly.
If you're migrating from extended thinking with budget_tokens:
client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=64000,
thinking={"type": "enabled",
"budget_tokens": 32000},
messages=[{"role": "user",
"content": "..."}],
)
client.messages.create(
model="claude-opus-4-7",
max_tokens=64000,
thinking={"type": "adaptive"},
output_config={"effort": "high"},
messages=[{"role": "user",
"content": "..."}],
)
If you're not using extended thinking, no changes are required, thinking is off by default when you omit the thinking parameter.
- Prefer general instructions over prescriptive steps. A prompt like "think thoroughly" often produces better reasoning than a hand-written step-by-step plan. Claude's reasoning frequently exceeds what a human would prescribe.
-
Multishot examples work with thinking. Use
<thinking>tags inside few-shot examples to show Claude the reasoning pattern; it will generalize that style. -
Manual CoT as a fallback. When thinking is off, you can still
encourage step-by-step reasoning. Use
<thinking>and<answer>tags to separate reasoning from output. - Ask Claude to self-check. Append "Before you finish, verify your answer against [test criteria]." This catches errors reliably, especially for coding and math.
When extended thinking is disabled, Opus 4.5 is particularly sensitive to the word "think" and its variants. Consider alternatives like "consider," "evaluate," or "reason through" in those cases.
Agentic systems
Long-horizon reasoning and state tracking
The latest models excel at long-horizon reasoning with exceptional state tracking. Claude maintains orientation across extended sessions by focusing on incremental progress, making steady advances on a few things at a time rather than attempting everything at once. This capability especially emerges across multiple context windows, where Claude can save state and continue from a fresh window.
Context awareness and multi-window workflows
4.6 and 4.5 models feature context awareness, tracking the remaining context window throughout a conversation. This enables better task execution and context management.
If you're using Claude in an agent harness that compacts context or saves it to external files (like Claude Code), tell Claude:
prompt: context awarenessYour context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely from where you left off. Therefore, do not stop tasks early due to token budget concerns. As you approach your token budget limit, save your current progress and state to memory before the context window refreshes. Always be as persistent and autonomous as possible and complete tasks fully, even if the end of your budget is approaching. Never artificially stop any task early regardless of the context remaining.
Multi-context-window workflows
- Use a different prompt for the very first context window. Use it to set up a framework (write tests, create setup scripts); use future windows to iterate on a todo-list.
-
Have the model write tests in a structured format. Ask Claude to
create tests before starting and track them in
tests.json. Remind Claude: "It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality." -
Set up quality-of-life tools. Encourage Claude to create setup
scripts (e.g.
init.sh) to start servers, run tests, and lint. Prevents repeated work in fresh windows. - Starting fresh vs. compacting. When a window is cleared, consider starting brand new rather than compacting. Claude is excellent at discovering state from the local filesystem. Be prescriptive about how to start, "Call pwd; you can only read and write files in this directory. Review progress.txt, tests.json, and the git logs."
- Provide verification tools. As autonomy grows, Claude needs to verify correctness without human feedback, Playwright MCP or computer-use capabilities help.
- Encourage complete usage of context.
prompt: full context usageThis is a very long task, so it may be beneficial to plan out your work clearly. It's encouraged to spend your entire output context working on the task - just make sure you don't run out of context with significant uncommitted work. Continue working systematically until you have completed this task.
State management best practices
- Structured formats for state data (JSON for test results, task status: schema clarity).
- Unstructured text for progress notes (freeform tracking).
- Use git as a log and checkpoint system; the latest models perform especially well with it.
- Emphasize incremental progress.
Example: state tracking
tests.json{
"tests": [
{ "id": 1, "name": "authentication_flow", "status": "passing" },
{ "id": 2, "name": "user_management", "status": "failing" },
{ "id": 3, "name": "api_endpoints", "status": "not_started" }
],
"total": 200,
"passing": 150,
"failing": 25,
"not_started": 25
}
progress.txtSession 3 progress:
- Fixed authentication token validation
- Updated user model to handle edge cases
- Next: investigate user_management test failures (test #2)
- Note: Do not remove tests as this could lead to missing functionality
Balancing autonomy and safety
Without guidance, Opus 4.6 may take actions that are hard to reverse or affect shared systems, deleting files, force-pushing, posting externally. To require confirmation for risky actions:
prompt: confirm risky actionsConsider the reversibility and potential impact of your actions. You are encouraged to take local, reversible actions like editing files or running tests, but for actions that are hard to reverse, affect shared systems, or could be destructive, ask the user before proceeding.
Examples of actions that warrant confirmation:
- Destructive operations: deleting files or branches, dropping database tables, rm -rf
- Hard to reverse operations: git push --force, git reset --hard, amending published commits
- Operations visible to others: pushing code, commenting on PRs/issues, sending messages, modifying shared infrastructure
When encountering obstacles, do not use destructive actions as a shortcut. For example, don't bypass safety checks (e.g. --no-verify) or discard unfamiliar files that may be in-progress work.
Research and information gathering
The latest models have exceptional agentic-search capabilities. For optimal results:
- Provide clear success criteria. Define what makes an answer successful.
- Encourage source verification. Ask Claude to cross-check across sources.
- Use a structured approach for complex research.
prompt: structured researchSearch for this information in a structured way. As you gather data, develop several competing hypotheses. Track your confidence levels in your progress notes to improve calibration. Regularly self-critique your approach and plan. Update a hypothesis tree or research notes file to persist information and provide transparency. Break down this complex research task systematically.
Subagent orchestration
The latest models have significantly improved native subagent orchestration, recognizing when tasks benefit from delegation and doing so proactively without explicit instruction.
- Have well-defined subagent tools with clear descriptions.
- Let Claude orchestrate naturally.
- Watch for overuse. Opus 4.6 has a strong predilection for subagents and may spawn them where a direct call (e.g. a single grep) is faster.
prompt, when to delegateUse subagents when tasks can run in parallel, require isolated context, or involve independent workstreams that don't need to share state. For simple tasks, sequential operations, single-file edits, or tasks where you need to maintain context across steps, work directly rather than delegating.
Chain complex prompts
With adaptive thinking and subagent orchestration, Claude handles most multi-step reasoning internally. Explicit prompt chaining is still useful when you need to inspect intermediate outputs or enforce a specific pipeline structure. The most common pattern is self-correction: generate a draft → have Claude review it against criteria → have Claude refine. Each step is a separate API call so you can log, evaluate, or branch.
Reduce file creation in agentic coding
The latest models may create new files for testing and iteration, using files (especially Python scripts) as a temporary scratchpad. This can improve outcomes, but to minimize net new files:
prompt: clean up afterIf you create any temporary new files, scripts, or helper files for iteration, clean up these files by removing them at the end of the task.
Overeagerness
Opus 4.5 and 4.6 have a tendency to overengineer: creating extra files, adding unnecessary abstractions, building in flexibility that wasn't requested. To minimize:
prompt: minimal solutionAvoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused:
- Scope: Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability.
- Documentation: Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident.
- Defensive coding: Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs).
- Abstractions: Don't create helpers, utilities, or abstractions for one-time operations. Don't design for hypothetical future requirements. The right amount of complexity is the minimum needed for the current task.
Avoid focusing on passing tests and hard-coding
Claude can sometimes focus too heavily on making tests pass at the expense of general solutions, or use workarounds like helper scripts for complex refactoring instead of standard tools. To ensure robust, generalizable solutions:
prompt: general solutionPlease write a high-quality, general-purpose solution using the standard tools available. Do not create helper scripts or workarounds to accomplish the task more efficiently. Implement a solution that works correctly for all valid inputs, not just the test cases. Do not hard-code values or create solutions that only work for specific test inputs. Instead, implement the actual logic that solves the problem generally.
Focus on understanding the problem requirements and implementing the correct algorithm. Tests are there to verify correctness, not to define the solution. Provide a principled implementation that follows best practices and software design principles.
If the task is unreasonable or infeasible, or if any of the tests are incorrect, please inform me rather than working around them. The solution should be robust, maintainable, and extendable.
Minimizing hallucinations in agentic coding
The latest models are less prone to hallucinations. To encourage this further:
prompt: investigate first<investigate_before_answering>
Never speculate about code you have not opened. If the user references a specific file, you MUST read the file before answering. Make sure to investigate and read relevant files BEFORE answering questions about the codebase. Never make any claims about code before investigating unless you are certain of the correct answer - give grounded and hallucination-free answers.
</investigate_before_answering>
Capability-specific tips
Improved vision capabilities
Opus 4.5 and 4.6 have improved vision compared to previous Claude models. They perform better on image processing and data extraction, particularly with multiple images in context. These improvements carry over to computer use, where the models interpret screenshots and UI elements more reliably. You can also analyze videos by breaking them into frames.
One technique that has proven effective: give Claude a crop tool or skill. Testing shows consistent uplift on image evals when Claude can "zoom" into relevant regions.
Frontend design
Opus 4.5 and 4.6 excel at complex, real-world web applications with strong frontend design. Without guidance, models can default to generic patterns: the "AI slop" aesthetic. To create distinctive frontends that surprise and delight, use this system-prompt snippet:
system prompt: frontend aesthetics<frontend_aesthetics>
You tend to converge toward generic, "on distribution" outputs. In frontend design, this creates what users call the "AI slop" aesthetic. Avoid this: make creative, distinctive frontends that surprise and delight.
Focus on:
- Typography: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics.
- Color & Theme: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes. Draw from IDE themes and cultural aesthetics for inspiration.
- Motion: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions.
- Backgrounds: Create atmosphere and depth rather than defaulting to solid colors. Layer CSS gradients, use geometric patterns, or add contextual effects that match the overall aesthetic.
Avoid generic AI-generated aesthetics:
- Overused font families (Inter, Roboto, Arial, system fonts)
- Clichéd color schemes (particularly purple gradients on white backgrounds)
- Predictable layouts and component patterns
- Cookie-cutter design that lacks context-specific character
Interpret creatively and make unexpected choices that feel genuinely designed for the context. Vary between light and dark themes, different fonts, different aesthetics. You still tend to converge on common choices (Space Grotesk, for example) across generations. Avoid this: it is critical that you think outside the box!
</frontend_aesthetics>
Migration considerations
When migrating to 4.6 models from earlier generations:
- Be specific about desired behavior. Describe exactly what you want in the output.
- Frame instructions with modifiers. Adding modifiers that encourage Claude to increase quality and detail helps. Instead of "Create an analytics dashboard," use "Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation."
- Request specific features explicitly. Animations and interactive elements should be requested when desired.
-
Update thinking configuration. 4.6 models use
adaptive thinking (
thinking: {type: "adaptive"}) instead of manualbudget_tokens. Useeffortto control depth. - Migrate away from prefilled responses. Prefills on the last assistant turn are deprecated starting with 4.6.
- Tune anti-laziness prompting. If your prompts previously encouraged thoroughness or aggressive tool use, dial that back. 4.6 models are significantly more proactive and may over-trigger on language tuned for earlier models.
Migrating from Sonnet 4.5 to Sonnet 4.6
Sonnet 4.6 defaults to an effort level of high, unlike
Sonnet 4.5 which had no effort parameter. Consider
adjusting effort as you migrate. If not explicitly set, you may experience higher
latency at the default.
Medium for most applications. Low for high-volume or latency-sensitive workloads. Set a large max-output-token budget (64k recommended) at medium or high effort to give the model room to think and act.
When to use Opus 4.7 instead: for the hardest, longest-horizon problems (large-scale code migrations, deep research, extended autonomous work), Opus 4.7 remains the right choice. Sonnet 4.6 is optimized for fast turnaround and cost efficiency.
If you're not using extended thinking
You can continue without it on Sonnet 4.6. Set effort
explicitly. At low effort with thinking disabled, you can
expect similar or better performance relative to Sonnet 4.5 with no extended
thinking.
pythonclient.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
thinking={"type": "disabled"},
output_config={"effort": "low"},
messages=[{"role": "user", "content": "..."}],
)
If you're using extended thinking
Extended thinking with budget_tokens is still functional
on Sonnet 4.6 but deprecated. Migrate to adaptive thinking with
effort.
Adaptive thinking is particularly well suited to:
-
Autonomous multi-step agents: coding agents, data pipelines, bug
finding. Start at
high; scale down tomediumif latency or tokens are a concern. - Computer-use agents: Sonnet 4.6 achieved best-in-class accuracy on computer use evaluations using adaptive mode.
- Bimodal workloads: a mix of easy and hard tasks: adaptive skips thinking on simple queries and reasons deeply on complex ones.
python: adaptiveclient.messages.create(
model="claude-sonnet-4-6",
max_tokens=64000,
thinking={"type": "adaptive"},
output_config={"effort": "high"},
messages=[{"role": "user", "content": "..."}],
)
Keeping budget_tokens during migration
A budget around 16k tokens provides headroom for harder problems without runaway usage. This configuration is deprecated and will be removed in a future release.
For coding use cases (agentic coding, tool-heavy workflows, code generation), start with medium effort:
python: codingclient.messages.create(
model="claude-sonnet-4-6",
max_tokens=16384,
thinking={"type": "enabled", "budget_tokens": 16384},
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "..."}],
)
For chat and non-coding use cases (chat, content generation, search, classification), start with low effort:
python: chatclient.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
thinking={"type": "enabled", "budget_tokens": 16384},
output_config={"effort": "low"},
messages=[{"role": "user", "content": "..."}],
)
and effort that's wrong for the workload
is the most common reason a model feels worse than it is.