Moonshot's official prompt documentation, rendered with real hierarchy, copyable JSON, and a side-by-side comparison instead of a markdown table. The model can't read your mind, so the doc spells out, in nine numbered moves, what to put in the system prompt instead.
The original page groups its advice under three headings. Inside two of them, several numbered sub-techniques. We've kept the numbering and the verbatim guidance, only the layout is new.
Six sub-techniques. The bulk of the document: covers details, roles, delimiters, steps, examples, and length.
Ground the model's answers in retrieved context. The practical foundation of every RAG pipeline.
Routing, conversation summarization, recursive document summarization, when one prompt can't fit it all.
The whole document hangs off this premise. If output is too long, ask for short. If it's too simple, ask for expert. If you don't like the format, show one. Every sub-technique below is a different way to leave less to guesswork.
The model can't read your mind. If the output is too long, you can ask the model to respond briefly. If the output is too simple, you can request expert-level writing. If you don't like the format of the output, show the model the format you'd like to see. The less the model has to guess about your needs, the more likely you are to get satisfactory results.
The cleanest illustration in the entire doc is the diptych below: the same intent, written two ways. The "general request" is what most people type; the "better request" is what someone who's been burned a few times learns to type.
Add a specified role for the model to use in its response in the messages field. The system message in the example sets identity, language preference, safety posture, and a proper-noun preservation rule, four constraints in one block.
{ "messages": [ { "role": "system", "content": "You are Kimi, an artificial intelligence assistant provided by Moonshot AI. You are more proficient in Chinese and English conversations. You provide users with safe, helpful, and accurate answers. At the same time, you will refuse to answer any questions involving terrorism, racism, or explicit violence. Moonshot AI is a proper noun and should not be translated into other languages." }, { "role": "user", "content": "Hello, my name is Li Lei. What is 1+1?" } ] }
Triple quotes, XML tags, or section headings: anything that lets the model see "this chunk is different from that chunk." The doc gives two flavors of this: XML tags for distinct articles, and explicit field labels for an abstract+title pair.
{ "messages": [ { "role": "system", "content": "You will receive two articles of the same category, separated by XML tags. First, summarize the arguments of each article, then point out which article presents a better argument and explain why." }, { "role": "user", "content": "<article>Insert article here</article><article>Insert article here</article>" } ] }
{ "messages": [ { "role": "system", "content": "You will receive an abstract and the title of a paper. The title should give readers a clear idea of the paper's topic and also be eye-catching. If the title you receive does not meet these standards, please suggest five alternative options." }, { "role": "user", "content": "Abstract: Insert abstract here.\n\nTitle: Insert title here" } ] }
For multi-stage tasks, write the sequence into the system prompt. The example below uses a two-step pipeline: summarize the quoted input, then translate the summary. Each step has an explicit output prefix, which doubles as a parser hook.
{ "messages": [ { "role": "system", "content": "Respond to user input using the following steps.\nStep one: The user will provide text enclosed in triple quotes. Summarize this text into one sentence with the prefix \"Summary: \".\nStep two: Translate the summary from step one into English and add the prefix \"Translation: \"." }, { "role": "user", "content": "\"\"\"Insert text here\"\"\"" } ] }
For style or format that's hard to describe explicitly: voice, register, idiomatic phrasing, examples beat description. This is the few-shot pattern, and the doc's example is deliberately minimal:
{ "messages": [ { "role": "system", "content": "Respond in a consistent style" }, { "role": "user", "content": "Insert text here" } ] }
Length can be requested in words, sentences, paragraphs, or bullets, but the doc is candid about a real limitation:
"Instructing the model to generate a specific number of words is not highly precise. The model is better at generating output of a specific number of paragraphs or bullet points."
{ "messages": [ { "role": "user", "content": "Summarize the text within the triple quotes in two sentences, within 50 words. \"\"\"Insert text here\"\"\"" } ] }
Where Section I is about how you ask, Section II is about what you bring. The whole technique fits in one paragraph and one prompt, but it is the practical foundation of every RAG pipeline ever built on top of a Kimi-style API.
If you can provide a model with credible information related to the current query, you can guide the model to use the provided information to answer the question.
The example uses triple quotes as the article delimiter and gives the model an explicit fallback, "I can't find the answer": for when the source doesn't contain the information. That last instruction is the most important word in the prompt: it's what suppresses hallucinated citations.
{ "messages": [ { "role": "system", "content": "Answer the question using the provided article (enclosed in triple quotes). If the answer is not found in the article, write \"I can't find the answer.\"" }, { "role": "user", "content": "<Insert article, each article enclosed in triple quotes>" } ] }
Three sub-techniques, all addressing the same constraint: a fixed context window. The first routes; the second compresses live conversation; the third recurses over documents too long to read in one pass.
When you'd otherwise stuff one giant system prompt with every conditional branch, instead, classify the query type first, then load only the rules for that branch. The example pretends the user has been routed into the "technical support" lane, and unpacks just the troubleshooting playbook.
# Based on the classification of the customer query, a set of more specific # instructions can be provided to the model. For example, assume the customer # needs help with "troubleshooting." { "messages": [ { "role": "system", "content": "You will receive a customer service inquiry that requires technical support. You can assist the user in the following ways:\n\n- Ask them to check if *** is configured.\n If all *** are configured but the problem persists, ask for the device model they are using.\n- Now you need to tell them how to restart the device:\n = If the device model is A, perform ***.\n - If the device model is B, suggest they perform ***." } ] }
The model's context window is fixed; an ongoing conversation between user and assistant is not. The proposed fix is the same trick every long-running agent system eventually adopts:
When input size crosses a threshold, fire a sub-query that summarizes the older turns. Replace those turns with the summary as part of the system message, then continue. The same compression can run asynchronously across the whole chat.
The recursive pattern. To summarize a book, summarize each chapter; aggregate those into partial summaries; aggregate those into a summary of summaries; repeat until the whole work fits in one response. The key refinement, easy to miss on a first read, is what you do when later sections depend on earlier ones:
If understanding later parts requires reference to earlier chapters, then when summarizing a specific point in the book, include summaries of the chapters preceding that point.
That clause turns naïve map-reduce into something with a working memory, earlier summaries are forwarded as context to later summarization passes. Without it, recursive summarization quietly drops the narrative thread.
Be specific. Show what you mean. Give the model the source. If a task is too big for one prompt, route it, compress it, or recurse over it. The doc is short by design: it's a reference, not an essay, and every technique is mechanical enough to drop into a system message tomorrow.