perf: trim grader prompts ~40% to reduce LLM latency

2026-05-21 15:52:33 +08:00
parent b15e821252
commit c53f26a07e
1 changed files with 40 additions and 67 deletions
@@ -91,28 +91,23 @@ export const graderNode = async (
    };
  }

-  const systemPromptZh = `你是一位专业的考官。
-请根据以下问题和关键点对用户的回答进行评分。
+  const systemPromptZh = `你是一位考官。请评分并给出反馈。

-重要提示：
-1. **你必须使用以下语言提供反馈：中文 (Simplified Chinese)**。
-2. 如果这是多轮追问，用户消息中会包含多轮回答（"第N轮回答："标记），请综合所有轮次判断用户是否已覆盖关键点。已经在前几轮中回答过的内容，不要追问。
+规则：
+1. 只用中文。
+2. 多轮追问时，用户回答含所有轮次（第N轮回答：标记），综合判断已覆盖内容。

 问题：${currentQuestion.questionText}
-预期的关键点：${currentQuestion.keyPoints.join(', ')}
+关键点：${currentQuestion.keyPoints.join(', ')}

-评估标准：
-1. 准确性：他们是否正确覆盖了关键点？
-2. 完整性：他们是否遗漏了任何重要内容？
-3. 深度：解释是否充分？
+评分标准：准确性、完整性、深度。
+部分正确也给分（5-7分），完全不沾边才0-2分。

-**重要：评分请给部分分数。不完全正确不等于0分——回答方向对、意思接近但不够完整时请给5-7分。完全不沾边才给0-2分。**
-
-请提供：
-1. 0 到 10 的评分。
-2. 建设性的反馈。
-3. 如果回答不完整或不清晰，需要进一步解释，请将 'should_follow_up' 标志设为 true。
-4. follow_up_question：当 should_follow_up 为 true 时必须填写——针对用户尚未覆盖的关键点提问，不得提问已涵盖的内容。false 时填 null。
+返回JSON：
+- score: 0-10
+- feedback: 评语
+- should_follow_up: true/false
+- follow_up_question: 追问（仅true时需要，针对未覆盖的关键点，false时null）

 请以 JSON 格式返回响应：
 {"score":0到10,"feedback":"评语","should_follow_up":true或false,"follow_up_question":"追问或null"}
@@ -121,34 +116,25 @@ export const graderNode = async (
 {"score":6,"feedback":"提到了安全性和性能，未说明依赖关系。","should_follow_up":true,"follow_up_question":"你如何让AI在计划中明确任务依赖关系？"}

 示例（不需追问）：
-{"score":8,"feedback":"回答完整。","should_follow_up":false,"follow_up_question":null}
+{"score":8,"feedback":"回答完整。","should_follow_up":false,"follow_up_question":null}`;

-反面示例（禁止这样做）：
-{"should_follow_up":true,"follow_up_question":"除了这些还有什么？"}
-↑ 用户已列出安全性、性能具体内容，不应再泛泛追问"还有什么"。`;
+  const systemPromptJa = `あなたは試験官です。採点とフィードバックを提供してください。

-  const systemPromptJa = `あなたは専門的な試験官です。
-以下の質問とキーポイントに基づいて、ユーザーの回答を採点してください。
-
-重要事項：
-1. **フィードバックは必ず次の言語で提供してください：日本語**。
-2. 複数回の追質問の場合、ユーザーメッセージには複数ラウンドの回答が含まれます（「第N輪回答：」マーク）。すべてのラウンドを総合して、ユーザーがキーポイントを既にカバーしているか判断してください。前のラウンドで既に回答済みの内容は追質問しないでください。
+ルール：
+1. 日本語のみ使用。
+2. 複数ラウンドの回答は「第N輪回答：」でマークされ、全ラウンドを総合判断。

 質問：${currentQuestion.questionText}
-期待されるキーポイント：${currentQuestion.keyPoints.join(', ')}
+キーポイント：${currentQuestion.keyPoints.join(', ')}

-評価基準：
-1. 正確性：キーポイントを正確に網羅していますか？
-2. 網羅性：重要な内容が欠落していませんか？
-3. 深さ：説明は十分ですか？
+評価基準：正確性、網羅性、深さ。
+部分点可（5〜7点）、見当違いのみ0〜2点。

-**重要：点数は部分点をつけてください。完全に正解でなくても0点ではありません——方向性が合っていて、部分的に正しい場合は5〜7点を与えてください。全く見当違いの場合のみ0〜2点としてください。**
-
-以下を提供してください：
-1. 0 から 10 までのスコア。
-2. 建設的なフィードバック。
-3. 回答が不完全または不明確で、さらなる説明が必要な場合は、'should_follow_up' フラグを true に設定してください。
-4. follow_up_question：should_follow_up が true の場合必須——ユーザーがまだカバーしていないキーポイントに焦点を当て、既に回答済みの内容は質問しないこと。false の場合は null。
+JSON形式：
+- score: 0〜10
+- feedback: 評価
+- should_follow_up: true/false
+- follow_up_question: 追質問（true時のみ、未カバーのポイントに焦点、false時null）

 JSON 形式で回答してください：
 {"score":0から10,"feedback":"評価","should_follow_up":trueかfalse,"follow_up_question":"追質問かnull"}
@@ -157,34 +143,25 @@ JSON 形式で回答してください：
 {"score":6,"feedback":"安全性と性能に言及したが、依存関係が不明。","should_follow_up":true,"follow_up_question":"AIに計画内のタスク依存関係を明示させる方法は？"}

 例（不要）：
-{"score":8,"feedback":"回答は完全。","should_follow_up":false,"follow_up_question":null}
+{"score":8,"feedback":"回答は完全。","should_follow_up":false,"follow_up_question":null}`;

-悪い例：
-{"should_follow_up":true,"follow_up_question":"他に何かありますか？"}
-↑ ユーザーが既に具体的内容を挙げているのに「他に何か」と聞くのは不適切。`;
+  const systemPromptEn = `You are an examiner. Grade and give feedback.

-  const systemPromptEn = `You are an expert examiner. 
-Grade the user's answer based on the following question and key points.
+Rules:
+1. English only.
+2. Multi-round answers are tagged "第N轮回答：". Consider all rounds.

-IMPORTANT: 
-1. **You MUST provide the feedback in English.**
-2. In multi-round follow-ups, user messages contain multiple rounds of answers (marked "第N轮回答：" or "Round N:"). Consider ALL rounds when determining what the user has already covered. Do not ask follow-up questions about content already answered in previous rounds.
+Question: ${currentQuestion.questionText}
+Key points: ${currentQuestion.keyPoints.join(', ')}

-QUESTION: ${currentQuestion.questionText}
-EXPECTED KEY POINTS: ${currentQuestion.keyPoints.join(', ')}
+Criteria: accuracy, completeness, depth.
+Give partial credit (5-7 for partial), 0-2 only for off-target.

-Evaluate:
-1. Accuracy: Did they cover the key points correctly?
-2. Completeness: Did they miss anything important?
-3. Depth: Is the explanation sufficient?
-
-**Important: Give partial credit. Incomplete answers are not 0 — if the direction is right and partially correct, give 5-7. Only give 0-2 for completely off-target answers.**
-
-Provide:
-1. A score from 0 to 10.
-2. Constructive feedback.
-3. A boolean flag 'should_follow_up' if the answer is incomplete or unclear and needs further clarification.
-4. follow_up_question: Required when should_follow_up is true—target key points the user hasn't covered, do not ask about already-answered content. Set to null when false.
+Return JSON:
+- score: 0-10
+- feedback: text
+- should_follow_up: true/false
+- follow_up_question: question (only when true, target uncovered points, null when false)

 Format as JSON:
 {"score":0-10,"feedback":"...","should_follow_up":true|false,"follow_up_question":"question or null"}
@@ -193,11 +170,7 @@ Example (follow-up needed):
 {"score":6,"feedback":"Covered security and performance, missed dependencies.","should_follow_up":true,"follow_up_question":"How would you make the AI clarify task dependencies?"}

 Example (no follow-up):
-{"score":8,"feedback":"Complete answer.","should_follow_up":false,"follow_up_question":null}
-
-Bad example:
-{"should_follow_up":true,"follow_up_question":"Anything else?"}
-↑ User already provided details, vague "anything else" is unacceptable.`;
+{"score":8,"feedback":"Complete answer.","should_follow_up":false,"follow_up_question":null}`;

  let systemPrompt = isZh
    ? systemPromptZh