fix: MC options display, question selection, timeout handling, and grading prompts
This commit is contained in:
@@ -56,7 +56,12 @@ const scoreSummary = Object.entries(scores)
|
||||
1. **你必须使用以下语言生成报告:中文 (Simplified Chinese)**。
|
||||
2. **严禁夹杂日文**。即使对话记录中包含日文,报告内容也必须全中文。
|
||||
3. 报告的第一行必须严格遵守此格式:"LEVEL: [Novice/Proficient/Advanced/Expert]"。
|
||||
4. 必须保持客观。如果用户没有提供有效的回答或得分为 0,你必须将其识别为 'Novice',并明确指出他们尚未证明其掌握程度。
|
||||
4. **等级判定必须遵循以下分数阈值**:
|
||||
- 总体平均分 >= 9 → Expert(专家)
|
||||
- 总体平均分 >= 7 → Advanced(高级)
|
||||
- 已通过(有有效回答且得分 > 0)→ Proficient(熟练)
|
||||
- 未通过(无有效回答或得分为 0)→ Novice(新手)
|
||||
即使得分很高,也要确保等级与上述阈值匹配。不要随意提高或降低等级。
|
||||
5. 不要虚构或幻想优点(如"潜力"或"好奇心"),如果用户明确表示"不知道"或未提供实质内容。
|
||||
6. 专注于对话记录中已证明的事实。
|
||||
|
||||
@@ -87,8 +92,13 @@ ${messages
|
||||
2. **中国語を混ぜないでください**。会話ログに中国語が含まれていても、レポートの内容はすべて日本語で記述してください。
|
||||
3. レポートの最初の行は, 必ず次の形式に従ってください:"LEVEL: [Novice/Proficient/Advanced/Expert]"。
|
||||
4. 客観的であること。ユーザーが有効な回答を提供しなかった場合、またはスコアが 0 の場合、'Novice' と判定し、習熟度が証明されていないことを明示してください。
|
||||
5. ユーザーが「わからない」と言ったり、内容を提供しなかった場合に、長所(「ポテンシャル」や「好奇心」など)を捏造しないでください。
|
||||
6. 会話ログで証明された事実に集中してください。
|
||||
5. **レベル判定は以下のスコアしきい値に従うこと**:
|
||||
- 平均スコア >= 9 → Expert
|
||||
- 平均スコア >= 7 → Advanced
|
||||
- 合格(有効な回答がありスコア > 0)→ Proficient
|
||||
- 不合格(有効な回答なし、またはスコア 0)→ Novice
|
||||
6. ユーザーが「わからない」と言ったり、内容を提供しなかった場合に、長所(「ポテンシャル」や「好奇心」など)を捏造しないでください。
|
||||
7. 会話ログで証明された事実に集中してください。
|
||||
|
||||
各ディメンションスコア:
|
||||
${dimensionAvg}
|
||||
@@ -115,8 +125,13 @@ IMPORTANT:
|
||||
1. **You MUST generate the report strictly in English.**
|
||||
2. START the report with exactly this format: "LEVEL: [Novice/Proficient/Advanced/Expert]" on the first line.
|
||||
3. Be OBJECTIVE. If the user provided no valid answers or scores are 0, you MUST identify them as 'Novice' and explicitly state they have NOT demonstrated mastery.
|
||||
4. DO NOT invent or hallucinate strengths (like 'potential' or 'curiosity') if the user explicitly said "I don't know" or provided no content.
|
||||
5. Focus on what was PROVEN in the conversation logs.
|
||||
4. **Level assignment MUST follow these score thresholds**:
|
||||
- Average score >= 9 → Expert
|
||||
- Average score >= 7 → Advanced
|
||||
- Passed (has valid answers with score > 0) → Proficient
|
||||
- Not passed (no valid answers or score is 0) → Novice
|
||||
5. DO NOT invent or hallucinate strengths (like 'potential' or 'curiosity') if the user explicitly said "I don't know" or provided no content.
|
||||
6. Focus on what was PROVEN in the conversation logs.
|
||||
|
||||
DIMENSION SCORES:
|
||||
${dimensionAvg}
|
||||
|
||||
@@ -90,34 +90,83 @@ export const questionGeneratorNode = async (
|
||||
.map((q, i) => `Q${i + 1}: ${q.questionText}`)
|
||||
.join('\n');
|
||||
|
||||
const systemPromptZh = `你是一个信息提取工具。严格按以下步骤操作。
|
||||
const systemPromptZh = `你是一个出题工具。严格按以下规则生成题目。
|
||||
|
||||
### 第一步:提取知识点
|
||||
阅读下方 Human 消息中的【知识库内容】,逐条列出其中包含的所有可考核知识点。
|
||||
每条以"知识点N:"开头,引用原文语句。如果不足,诚实报告。
|
||||
每条以"知识点N:"开头,引用原文语句。
|
||||
|
||||
### 第二步:从知识点生成考题
|
||||
仅用第一步提取的知识点生成 1 道题。必须引用知识点编号。
|
||||
### 第二步:基于知识点出题
|
||||
仅用第一步提取的知识点生成题目。必须引用知识点编号。
|
||||
如果知识点数量不足(少于3个),输出空数组 [] 并停止。
|
||||
|
||||
### 题型分配规则
|
||||
每生成 3 道题:
|
||||
- 第1、4、7...道:选择题(MULTIPLE_CHOICE),占 1/3
|
||||
- 第2、3、5、6...道:对话简答题(SHORT_ANSWER),占 2/3
|
||||
严格按照这个顺序循环,不要自行调整比例。
|
||||
|
||||
### 出题范围限制
|
||||
出题内容必须严格限制在知识库范围内。每道题必须有知识点编号引用。
|
||||
以下情况绝对禁止:
|
||||
- 使用 LLM 自身知识编题
|
||||
- 引用知识库中不存在的概念
|
||||
- 题目内容超出知识库覆盖的主题
|
||||
|
||||
### 选择题出题标准
|
||||
- 必须是场景驱动:描述一个真实工作场景,让用户判断最佳做法
|
||||
- 四个选项(A/B/C/D),只有一个正确,另外三个要有迷惑性
|
||||
- 难度:不是考概念背诵,是考实际应用判断
|
||||
- 正确答案必须附带解析,说明为什么对、错在哪
|
||||
- 出题依据必须引用第一步提取的知识点编号
|
||||
|
||||
### 对话简答题出题标准
|
||||
- 开放式场景问题,不预设标准答案
|
||||
- 考察用户的理解深度和表达能力
|
||||
- 适合多轮追问展开讨论
|
||||
- 出题依据必须引用第一步提取的知识点编号
|
||||
|
||||
### 绝对禁止:
|
||||
- 禁止使用知识库内容中不存在的任何概念、术语、数据
|
||||
- 禁止使用你自己的知识
|
||||
${existingQuestionsText ? `- 禁止与已出题目重复:${existingQuestionsText}` : ''}
|
||||
- 禁止出纯概念题(如"提示词六要素是什么")
|
||||
- 禁止出需要记忆具体数据的题
|
||||
- 禁止使用知识库之外的知识
|
||||
- 禁止生成与知识库主题无关的题目
|
||||
${existingQuestionsText ? `- 禁止与已出题目概念重复:${existingQuestionsText}` : ''}
|
||||
|
||||
### 输出(纯 JSON 数组):
|
||||
[
|
||||
{
|
||||
"knowledge_points": ["知识点引用"],
|
||||
"question_text": "基于知识点的题目",
|
||||
"key_points": ["评分要点"],
|
||||
"difficulty": "STANDARD|ADVANCED|SPECIALIST",
|
||||
"dimension": "prompt|llm|ide|devPattern|workCapability",
|
||||
"basis": "知识库原文"
|
||||
}
|
||||
]`;
|
||||
// dimension取值:prompt=提示词, llm=LLM原理, ide=IDE协作, devPattern=开发范式, workCapability=工作能力
|
||||
### 输出格式(严格遵循)
|
||||
选择题完整格式:
|
||||
{
|
||||
"question_type": "MULTIPLE_CHOICE",
|
||||
"question_text": "场景描述+问题,不超过120字",
|
||||
"options": ["A) 选项1", "B) 选项2", "C) 选项3", "D) 选项4"],
|
||||
"correct_answer": "A",
|
||||
"judgment": "解析:为什么对、为什么错,不超过200字",
|
||||
"key_points": ["考核要点", "2-3个"],
|
||||
"difficulty": "STANDARD",
|
||||
"dimension": "prompt",
|
||||
"basis": "知识点N:参考来源"
|
||||
}
|
||||
|
||||
const systemPromptJa = `あなたは情報抽出ツールです。以下の手順に厳密に従ってください。
|
||||
对话简答题完整格式:
|
||||
{
|
||||
"question_type": "SHORT_ANSWER",
|
||||
"question_text": "开放式场景问题,不超过120字",
|
||||
"key_points": ["期望的回答方向", "2-3个"],
|
||||
"difficulty": "STANDARD",
|
||||
"dimension": "prompt",
|
||||
"basis": "知识点N:参考来源"
|
||||
}
|
||||
|
||||
### 输出要求
|
||||
- 只输出 JSON 数组,不要其他文字
|
||||
- question_type 必须为 MULTIPLE_CHOICE 或 SHORT_ANSWER
|
||||
- dimension 只能取以下值之一:prompt、llm、ide、devPattern、workCapability
|
||||
- 每次生成 1 道题,以 JSON 数组格式输出
|
||||
- 选择题必须包含全部8个字段:question_text、options、correct_answer、judgment、key_points、difficulty、dimension、basis
|
||||
- 对话简答题必须包含全部6个字段:question_text、key_points、difficulty、dimension、basis
|
||||
- 每个字段的值不能为空`;
|
||||
|
||||
const systemPromptJa = `あなたは問題作成ツールです。以下の手順に厳密に従ってください。
|
||||
|
||||
### 第一歩:知識ポイントの抽出
|
||||
Human メッセージ内の【ナレッジベース内容】を読み、含まれるすべての評価可能な知識ポイントを箇条書きで抽出。
|
||||
@@ -126,48 +175,76 @@ Human メッセージ内の【ナレッジベース内容】を読み、含ま
|
||||
### 第二歩:知識ポイントから問題を作成
|
||||
第一歩で抽出した知識ポイントのみを使用して 1 問作成。知識ポイント番号を引用すること。
|
||||
|
||||
### 問題タイプの割合
|
||||
3問中、約1問を選択問題、2問を対話式記述問題にしてください。全体で約30%/70%の割合。
|
||||
|
||||
### 出題方向
|
||||
「AI協作スキル」に関する問題:
|
||||
- プロンプトの書き方(役割、タスク、背景、制約)
|
||||
- 複数ラウンドの対話テクニック
|
||||
- AIに先に質問させる方法
|
||||
- セッション管理(いつ継続、いつ新規)
|
||||
- よくある間違いと自己チェック
|
||||
- セキュリティ意識(機密データの取扱い)
|
||||
|
||||
### 選択問題の基準
|
||||
- シナリオ駆動:実務シーンを想定
|
||||
- 4択(A/B/C/D)、正解は1つ
|
||||
- 正解には必ず解説を含める
|
||||
|
||||
### 対話式記述問題の基準
|
||||
- オープンクエスチョン、正解なし
|
||||
- 理解の深さと表現力を評価
|
||||
|
||||
### 絶対禁止:
|
||||
- ナレッジベースに存在しない概念、用語、データの使用
|
||||
- 自身の知識の使用
|
||||
${existingQuestionsText ? `- 作成済み問題との重複禁止:${existingQuestionsText}` : ''}
|
||||
- 暗記問題の禁止
|
||||
- 知識ベースにない概念の使用禁止
|
||||
${existingQuestionsText ? `- 既出問題との重複禁止:${existingQuestionsText}` : ''}
|
||||
|
||||
### 出力(純粋な JSON 配列):
|
||||
[
|
||||
{
|
||||
"knowledge_points": ["知識ポイント参照"],
|
||||
"question_text": "知識ポイントに基づく問題",
|
||||
"key_points": ["採点ポイント"],
|
||||
"difficulty": "STANDARD|ADVANCED|SPECIALIST",
|
||||
"dimension": "prompt|llm|ide|devPattern|workCapability",
|
||||
"basis": "ナレッジベースの原文"
|
||||
}
|
||||
]`;
|
||||
### 出力
|
||||
JSON 配列のみ出力:
|
||||
選択問題:{"question_type":"MULTIPLE_CHOICE","question_text":"...","options":["A)...","B)...","C)...","D)..."],"correct_answer":"A","judgment":"...","key_points":["..."],"difficulty":"STANDARD","dimension":"prompt|llm|ide|devPattern|workCapability","basis":"..."}
|
||||
記述問題:{"question_type":"SHORT_ANSWER","question_text":"...","key_points":["..."],"difficulty":"STANDARD","dimension":"prompt|llm|ide|devPattern|workCapability","basis":"..."}`;
|
||||
|
||||
const systemPromptEn = `You are an information extraction tool. Follow these steps exactly.
|
||||
const systemPromptEn = `You are a question generation tool. Follow these steps exactly.
|
||||
|
||||
### Step 1: Extract Knowledge Points
|
||||
Read the knowledge base content in the Human message. List ALL assessable knowledge points found.
|
||||
Read the knowledge base content in the Human message. List ALL assessable knowledge points.
|
||||
Each point must start with "KP N:" and quote the source text. If insufficient, honestly report.
|
||||
|
||||
### Step 2: Generate Question from Points
|
||||
Use ONLY the knowledge points from Step 1 to generate 1 question. Must reference KP numbers.
|
||||
|
||||
### Absolutely Forbidden:
|
||||
- Using any concept, term, or data NOT present in the knowledge base content
|
||||
- Using your own knowledge
|
||||
${existingQuestionsText ? `- Repeating previous questions: ${existingQuestionsText}` : ''}
|
||||
### Type Mix
|
||||
Out of every 3 questions, approximately 1 should be MULTIPLE_CHOICE and 2 should be SHORT_ANSWER (dialogue-style). Roughly 30%/70% split.
|
||||
|
||||
### Output (pure JSON array only):
|
||||
[
|
||||
{
|
||||
"knowledge_points": ["KP reference"],
|
||||
"question_text": "Question based on the knowledge points",
|
||||
"key_points": ["scoring points"],
|
||||
"difficulty": "STANDARD|ADVANCED|SPECIALIST",
|
||||
"dimension": "prompt|llm|ide|devPattern|workCapability",
|
||||
"basis": "Source text from knowledge base"
|
||||
}
|
||||
]`;
|
||||
### Topics
|
||||
AI collaboration skills:
|
||||
- Writing good prompts (role, task, context, constraints)
|
||||
- Multi-turn iteration techniques
|
||||
- Letting AI ask clarifying questions first
|
||||
- Session management (continue vs new window)
|
||||
- Common mistakes and self-review
|
||||
- Security awareness (handling sensitive data)
|
||||
|
||||
### MC Standards
|
||||
- Scenario-driven: describe a real work scenario
|
||||
- 4 options (A/B/C/D), one correct
|
||||
- Must include judgment explaining why correct/incorrect
|
||||
|
||||
### SA Standards
|
||||
- Open-ended, no predefined answer
|
||||
- Tests understanding depth and expression
|
||||
|
||||
### Forbidden:
|
||||
- Pure concept recall questions
|
||||
- Questions requiring memorization of specific data
|
||||
${existingQuestionsText ? `- Repeating previous question concepts: ${existingQuestionsText}` : ''}
|
||||
|
||||
### Output
|
||||
JSON array only. One question at a time.
|
||||
MC: {"question_type":"MULTIPLE_CHOICE","question_text":"...","options":["A)...","B)...","C)...","D)..."],"correct_answer":"A","judgment":"...","key_points":["..."],"difficulty":"STANDARD","dimension":"prompt|llm|ide|devPattern|workCapability","basis":"..."}
|
||||
SA: {"question_type":"SHORT_ANSWER","question_text":"...","key_points":["..."],"difficulty":"STANDARD","dimension":"prompt|llm|ide|devPattern|workCapability","basis":"..."}`;
|
||||
|
||||
// dimension values: prompt=prompt engineering, llm=LLM principles, ide=IDE collaboration, devPattern=development paradigm, workCapability=work capability
|
||||
|
||||
@@ -201,6 +278,42 @@ ${existingQuestionsText ? `- Repeating previous questions: ${existingQuestionsTe
|
||||
newQuestions = [newQuestions];
|
||||
}
|
||||
|
||||
// === 代码级校验:确保 LLM 输出符合规范 ===
|
||||
const VALID_DIMENSIONS = ['prompt', 'llm', 'ide', 'devPattern', 'workCapability'];
|
||||
const VALID_TYPES = ['MULTIPLE_CHOICE', 'SHORT_ANSWER'];
|
||||
|
||||
const validatedQuestions = newQuestions.filter((q: any) => {
|
||||
const qType = q.question_type;
|
||||
const dim = q.dimension?.toString().toLowerCase().trim();
|
||||
const errors: string[] = [];
|
||||
|
||||
if (!VALID_TYPES.includes(qType)) errors.push(`invalid question_type: ${qType}`);
|
||||
if (!dim || !VALID_DIMENSIONS.includes(dim)) errors.push(`invalid dimension: ${q.dimension}`);
|
||||
if (!q.question_text || q.question_text.length < 5) errors.push('question_text missing or too short');
|
||||
|
||||
if (qType === 'MULTIPLE_CHOICE') {
|
||||
if (!Array.isArray(q.options) || q.options.length < 2) errors.push('options missing or insufficient');
|
||||
if (!q.correct_answer) errors.push('correct_answer missing');
|
||||
if (!q.judgment) errors.push('judgment missing');
|
||||
} else if (qType === 'SHORT_ANSWER') {
|
||||
if (!Array.isArray(q.key_points) || q.key_points.length === 0) errors.push('key_points missing');
|
||||
}
|
||||
|
||||
if (errors.length > 0) {
|
||||
console.warn('[GeneratorNode] Validation failed for question:', errors.join('; '));
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
|
||||
if (validatedQuestions.length === 0) {
|
||||
console.warn('[GeneratorNode] All generated questions failed validation, using existing questions only');
|
||||
return { questions: existingQuestions };
|
||||
}
|
||||
|
||||
// 只取验证通过的题目
|
||||
newQuestions = validatedQuestions;
|
||||
|
||||
const dimensionMap: Record<string, string> = {
|
||||
// 中文
|
||||
'技术能力-提示词': 'prompt',
|
||||
@@ -228,15 +341,27 @@ ${existingQuestionsText ? `- Repeating previous questions: ${existingQuestionsTe
|
||||
inferredDimension = dimensionMap[dimValue] || 'workCapability';
|
||||
console.log('[GeneratorNode] Dimension mapping:', { original: q.dimension, mapped: inferredDimension });
|
||||
}
|
||||
return {
|
||||
|
||||
const qType = q.question_type === 'MULTIPLE_CHOICE' ? 'MULTIPLE_CHOICE' : 'SHORT_ANSWER';
|
||||
const base = {
|
||||
id: (existingQuestions.length + 1).toString(),
|
||||
questionText: q.question_text,
|
||||
questionType: 'SHORT_ANSWER',
|
||||
keyPoints: q.key_points,
|
||||
difficulty: q.difficulty,
|
||||
basis: q.basis,
|
||||
questionType: qType,
|
||||
keyPoints: q.key_points || [],
|
||||
difficulty: q.difficulty || 'STANDARD',
|
||||
basis: q.basis || '',
|
||||
dimension: inferredDimension,
|
||||
};
|
||||
|
||||
if (qType === 'MULTIPLE_CHOICE') {
|
||||
return {
|
||||
...base,
|
||||
options: q.options || [],
|
||||
correctAnswer: q.correct_answer || '',
|
||||
judgment: q.judgment || '',
|
||||
};
|
||||
}
|
||||
return base;
|
||||
});
|
||||
|
||||
const questionsToGenerate = Math.max(1, limitCount - existingQuestions.length);
|
||||
|
||||
@@ -91,6 +91,72 @@ export const graderNode = async (
|
||||
};
|
||||
}
|
||||
|
||||
// ── Rule-based grading: use structured followupMapping if available ──
|
||||
if (currentQuestion.followupHints) {
|
||||
let mapping: any = null;
|
||||
if (typeof currentQuestion.followupHints === 'string') {
|
||||
try { mapping = JSON.parse(currentQuestion.followupHints); } catch {}
|
||||
} else if (typeof currentQuestion.followupHints === 'object') {
|
||||
mapping = currentQuestion.followupHints;
|
||||
}
|
||||
if (mapping && Array.isArray(mapping.branches)) {
|
||||
const userAnswerText = typeof lastUserMessage.content === 'string'
|
||||
? lastUserMessage.content : JSON.stringify(lastUserMessage.content);
|
||||
|
||||
// Score based on keyword coverage
|
||||
let bestScore = mapping.defaultScore ?? 5;
|
||||
let matchedFollowup = mapping.defaultFollowup || '';
|
||||
let matchedAll = true;
|
||||
const maxFollowUps = mapping.maxFollowups ?? 2;
|
||||
|
||||
for (const branch of mapping.branches) {
|
||||
const kws = branch.keywords || [];
|
||||
const matchCount = kws.filter((kw: string) => userAnswerText.toLowerCase().includes(kw.toLowerCase())).length;
|
||||
if (kws.length > 0 && matchCount >= kws.length * 0.5) {
|
||||
const branchScore = branch.score ?? 7;
|
||||
if (branchScore > bestScore) bestScore = branchScore;
|
||||
if (branch.followup) matchedFollowup = branch.followup;
|
||||
} else if (kws.length > 0 && matchCount === 0) {
|
||||
matchedAll = false;
|
||||
}
|
||||
}
|
||||
|
||||
const completionThreshold = mapping.completionThreshold ?? 80;
|
||||
const tooShort = userAnswerText.trim().length < 8;
|
||||
const saysIDontKnow = userAnswerText.trim().length < 10 && (
|
||||
userAnswerText.includes('不知道') || userAnswerText.includes("don't know") || userAnswerText.includes('わかりません')
|
||||
);
|
||||
|
||||
let shouldFollowUp: boolean;
|
||||
if (saysIDontKnow || tooShort) {
|
||||
shouldFollowUp = false;
|
||||
bestScore = Math.min(bestScore, 2);
|
||||
} else if (bestScore >= completionThreshold / 10) {
|
||||
shouldFollowUp = false;
|
||||
} else if (currentFollowUpCount >= maxFollowUps) {
|
||||
shouldFollowUp = false;
|
||||
} else {
|
||||
shouldFollowUp = true;
|
||||
}
|
||||
|
||||
const feedbackMessage = new AIMessage(`Score: ${bestScore}/10\n\nFeedback: ${shouldFollowUp ? matchedFollowup : '回答已覆盖关键点。'}`);
|
||||
|
||||
const feedbackHistoryMessages = shouldFollowUp && matchedFollowup
|
||||
? [feedbackMessage, new AIMessage(matchedFollowup)]
|
||||
: [feedbackMessage];
|
||||
|
||||
console.log('[GraderNode] Rule grading:', { score: bestScore, shouldFollowUp, matchedAll, followup: matchedFollowup?.substring(0, 60) });
|
||||
|
||||
return {
|
||||
feedbackHistory: feedbackHistoryMessages,
|
||||
scores: { [currentQuestion.id || currentQuestionIndex.toString()]: bestScore },
|
||||
shouldFollowUp,
|
||||
followUpCount: shouldFollowUp ? currentFollowUpCount + 1 : 0,
|
||||
currentQuestionIndex: shouldFollowUp ? currentQuestionIndex : currentQuestionIndex + 1,
|
||||
} as any;
|
||||
}
|
||||
}
|
||||
|
||||
const systemPromptZh = `你是一位考官。请评分并给出反馈。
|
||||
|
||||
规则:
|
||||
@@ -100,8 +166,10 @@ export const graderNode = async (
|
||||
问题:${currentQuestion.questionText}
|
||||
关键点:${currentQuestion.keyPoints.join(', ')}
|
||||
|
||||
评分标准:准确性、完整性、深度。
|
||||
部分正确也给分(5-7分),完全不沾边才0-2分。
|
||||
评分标准:不要求深度,不要求使用特定术语,只看用户是否理解了概念。
|
||||
用户理解核心概念就给分。即使没有使用关键点中的原词,只要意思到位就算覆盖。
|
||||
例如关键点是"上下文窗口有限",用户说"信息太多超过AI处理长度"也是覆盖。
|
||||
评分原则:往宽了给分,不确定时就给高分。明显正确就给8-10分,部分正确5-7分,完全不沾边才0-2分。
|
||||
|
||||
返回JSON:
|
||||
- score: 0-10
|
||||
|
||||
Reference in New Issue
Block a user