fix: confusion group confidence calibration — false positive detection inflation

Issues found through matching program classification analysis:
1. dedup_vs_nodedup: 0.85→0.50 for negative detection (no WS-PREV-KEY
   is not strong evidence for '含まず')
2. validation_vs_keybreak: 0.80→0.55 for has_counter (counter is a
   generic pattern, not specific to key-break)
3. simple_vs_two_stage: 0.80→0.50 for non-open-close-open pattern
   (sequential OPEN is the default for most programs)

Result: matching programs now correctly classified:
- MT01-03/18/20 → マッチング  (was 項目チェック)
- MT16-17 → 二段階マッチング  (unchanged)
- MT32 → 項目チェック(重複含む)  (correct: has WS-PREV-KEY)
- VL01 → 項目チェック(重複含む)  (correct)
- CSV → CSV合并  (correct)
Regression: 745 passed (3 test expectation bounds updated)
This commit is contained in:
NB-076
2026-06-21 13:17:31 +08:00
parent 0b0a013f51
commit 958b12e9a9
2 changed files with 12 additions and 12 deletions
+6 -6
View File
@@ -82,11 +82,11 @@ def test_dedup_vs_nodedup_dedup():
def test_dedup_vs_nodedup_nodedup():
"""WS-PREV-KEY 不存在 → 不含重复"""
"""WS-PREV-KEY 不存在 → 不含重复(低确信度:无 WS-PREV-KEY 不代表一定是项目检查)"""
features = {"variable_patterns": {"has_prev_key": False, "has_accumulator": False, "has_error_field": False}}
result = resolve_dedup_vs_nodedup(features)
assert result["resolved_type"] == "項目チェック(重複含まず)"
assert result["confidence"] >= 0.70
assert result["confidence"] >= 0.30
# ═══════════════════════════════════════════════════════════════════════════
@@ -102,11 +102,11 @@ def test_validation_vs_keybreak_validation():
def test_validation_vs_keybreak_keybreak():
"""WS-*CNT 计数器存在 → キーブレイク"""
"""WS-*CNT 计数器存在 → キーブレイク(低确信度:计数器是通用模式)"""
features = {"variable_patterns": {"has_error_field": False, "has_counter": True, "has_prev_key": False}}
result = resolve_validation_vs_keybreak(features)
assert result["resolved_type"] == "キーブレイク"
assert result["confidence"] >= 0.75
assert result["confidence"] >= 0.40
def test_validation_vs_keybreak_unknown():
@@ -163,11 +163,11 @@ def test_simple_vs_two_stage_two_stage():
def test_simple_vs_two_stage_simple():
"""顺序 OPEN → 简单匹配"""
"""顺序 OPEN → 简单匹配(低确信度:非 OPEN-CLOSE-OPEN 不代表一定是匹配程序)"""
features = {"open_pattern": "sequential"}
result = resolve_simple_vs_two_stage(features)
assert result["resolved_type"] == "単純マッチング"
assert result["confidence"] >= 0.75
assert result["confidence"] >= 0.40
# ═══════════════════════════════════════════════════════════════════════════