Technical Reference · KIMI-BPP / 2026

Best practices for prompts.

Moonshot's official prompt documentation, rendered with real hierarchy, copyable JSON, and a side-by-side comparison instead of a markdown table. The model can't read your mind, so the doc spells out, in nine numbered moves, what to put in the system prompt instead.

Three families
Write clearly · Give references · Decompose
Nine moves
Numbered & copyable
Format
One file · open in browser

Three families,
nine moves.

The original page groups its advice under three headings. Inside two of them, several numbered sub-techniques. We've kept the numbering and the verbatim guidance, only the layout is new.

I

The model can't read your mind.

The whole document hangs off this premise. If output is too long, ask for short. If it's too simple, ask for expert. If you don't like the format, show one. Every sub-technique below is a different way to leave less to guesswork.

Verbatim · Moonshot

The model can't read your mind. If the output is too long, you can ask the model to respond briefly. If the output is too simple, you can request expert-level writing. If you don't like the format of the output, show the model the format you'd like to see. The less the model has to guess about your needs, the more likely you are to get satisfactory results.

§ I.1 · Details

Including more details in your request yields more relevant responses.

The cleanest illustration in the entire doc is the diptych below: the same intent, written two ways. The "general request" is what most people type; the "better request" is what someone who's been burned a few times learns to type.

General request Vague
Excel
How to add numbers in Excel?
Work summary
Work report summary
Better request Targeted
Excel
How do I sum a row of numbers in an Excel table? I want to automatically sum each row in the entire table and place all the totals in the rightmost column named "Total."
Work summary
Summarize my work records from 2023 in a paragraph of no more than 500 words. List the highlights of each month in sequence and provide a summary of the entire year.
§ I.2 · Role

Ask the model to assume a role.

Add a specified role for the model to use in its response in the messages field. The system message in the example sets identity, language preference, safety posture, and a proper-noun preservation rule, four constraints in one block.

json · system role
{
  "messages": [
    {
      "role": "system",
      "content": "You are Kimi, an artificial intelligence assistant provided by Moonshot AI. You are more proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. At the same time, you will refuse to answer any questions involving terrorism, racism, or explicit violence. Moonshot AI is a proper noun and should not be translated into other languages."
    },
    {
      "role": "user",
      "content": "Hello, my name is Li Lei. What is 1+1?"
    }
  ]
}
§ I.3 · Delimiters

Use delimiters to distinguish parts of the input.

Triple quotes, XML tags, or section headings: anything that lets the model see "this chunk is different from that chunk." The doc gives two flavors of this: XML tags for distinct articles, and explicit field labels for an abstract+title pair.

json · xml-tagged articles
{
  "messages": [
    {
      "role": "system",
      "content": "You will receive two articles of the same category, separated by XML tags. First, summarize the arguments of each article, then point out which article presents a better argument and explain why."
    },
    {
      "role": "user",
      "content": "<article>Insert article here</article><article>Insert article here</article>"
    }
  ]
}
json · labeled fields
{
  "messages": [
    {
      "role": "system",
      "content": "You will receive an abstract and the title of a paper. The title should give readers a clear idea of the paper's topic and also be eye-catching. If the title you receive does not meet these standards, please suggest five alternative options."
    },
    {
      "role": "user",
      "content": "Abstract: Insert abstract here.\n\nTitle: Insert title here"
    }
  ]
}
The less the model has to guess
about your needs, the more likely
you are to get a satisfactory result.
Moonshot AI · platform.kimi.ai/docs
§ I.4 · Steps

Clearly define the steps needed.

For multi-stage tasks, write the sequence into the system prompt. The example below uses a two-step pipeline: summarize the quoted input, then translate the summary. Each step has an explicit output prefix, which doubles as a parser hook.

json · step sequence
{
  "messages": [
    {
      "role": "system",
      "content": "Respond to user input using the following steps.\nStep one: The user will provide text enclosed in triple quotes. Summarize this text into one sentence with the prefix \"Summary: \".\nStep two: Translate the summary from step one into English and add the prefix \"Translation: \"."
    },
    {
      "role": "user",
      "content": "\"\"\"Insert text here\"\"\""
    }
  ]
}
§ I.5 · Few-shot

Show examples of the desired output.

For style or format that's hard to describe explicitly: voice, register, idiomatic phrasing, examples beat description. This is the few-shot pattern, and the doc's example is deliberately minimal:

json · few-shot scaffold
{
  "messages": [
    {
      "role": "system",
      "content": "Respond in a consistent style"
    },
    {
      "role": "user",
      "content": "Insert text here"
    }
  ]
}
§ I.6 · Length

Specify the desired length.

Length can be requested in words, sentences, paragraphs, or bullets, but the doc is candid about a real limitation:

Caveat: direct from the source

"Instructing the model to generate a specific number of words is not highly precise. The model is better at generating output of a specific number of paragraphs or bullet points."

json · length-constrained summary
{
  "messages": [
    {
      "role": "user",
      "content": "Summarize the text within the triple quotes in two sentences, within 50 words. \"\"\"Insert text here\"\"\""
    }
  ]
}
II

Ground answers in retrieved context.

Where Section I is about how you ask, Section II is about what you bring. The whole technique fits in one paragraph and one prompt, but it is the practical foundation of every RAG pipeline ever built on top of a Kimi-style API.

Verbatim · Moonshot

If you can provide a model with credible information related to the current query, you can guide the model to use the provided information to answer the question.

The example uses triple quotes as the article delimiter and gives the model an explicit fallback, "I can't find the answer": for when the source doesn't contain the information. That last instruction is the most important word in the prompt: it's what suppresses hallucinated citations.

§ II.1 · Grounding

Guide the model to use the reference.

json · grounded answer
{
  "messages": [
    {
      "role": "system",
      "content": "Answer the question using the provided article (enclosed in triple quotes). If the answer is not found in the article, write \"I can't find the answer.\""
    },
    {
      "role": "user",
      "content": "<Insert article, each article enclosed in triple quotes>"
    }
  ]
}
III

When one prompt won't fit it all.

Three sub-techniques, all addressing the same constraint: a fixed context window. The first routes; the second compresses live conversation; the third recurses over documents too long to read in one pass.

§ III.1 · Routing

Categorize, then send the relevant instructions.

When you'd otherwise stuff one giant system prompt with every conditional branch, instead, classify the query type first, then load only the rules for that branch. The example pretends the user has been routed into the "technical support" lane, and unpacks just the troubleshooting playbook.

json · category-conditional rules
# Based on the classification of the customer query, a set of more specific
# instructions can be provided to the model. For example, assume the customer
# needs help with "troubleshooting."
{
  "messages": [
    {
      "role": "system",
      "content": "You will receive a customer service inquiry that requires technical support. You can assist the user in the following ways:\n\n- Ask them to check if *** is configured.\n  If all *** are configured but the problem persists, ask for the device model they are using.\n- Now you need to tell them how to restart the device:\n  = If the device model is A, perform ***.\n  - If the device model is B, suggest they perform ***."
    }
  ]
}
§ III.2 · Dialog summary

For long dialogs, summarize the older turns.

The model's context window is fixed; an ongoing conversation between user and assistant is not. The proposed fix is the same trick every long-running agent system eventually adopts:

Pattern

When input size crosses a threshold, fire a sub-query that summarizes the older turns. Replace those turns with the summary as part of the system message, then continue. The same compression can run asynchronously across the whole chat.

§ III.3 · Recursive

For long documents, recursively build a summary of summaries.

The recursive pattern. To summarize a book, summarize each chapter; aggregate those into partial summaries; aggregate those into a summary of summaries; repeat until the whole work fits in one response. The key refinement, easy to miss on a first read, is what you do when later sections depend on earlier ones:

Verbatim · Moonshot

If understanding later parts requires reference to earlier chapters, then when summarizing a specific point in the book, include summaries of the chapters preceding that point.

That clause turns naïve map-reduce into something with a working memory, earlier summaries are forwarded as context to later summarization passes. Without it, recursive summarization quietly drops the narrative thread.

Bottom line · entire doc

Three families, nine numbered moves, and a single sentence underneath all of them.

Be specific. Show what you mean. Give the model the source. If a task is too big for one prompt, route it, compress it, or recurse over it. The doc is short by design: it's a reference, not an essay, and every technique is mechanical enough to drop into a system message tomorrow.