feat: structural matching detection — no KEY variable needed

Add _detect_matching_structure(): detection based on control flow
pattern, not variable naming conventions. Uses 5 structural signals:
1. READ + AT END + EOF pattern
2. PERFORM UNTIL with EOF condition
3. ELSE body with conditional READ (matching core)
4. IF comparing hyphenated fields (cross-file comparison)
5. Multi-file OPEN INPUT

5/5 signals → 0.55, 4/5 → 0.50, 3/5 → 0.40.

Real-world impact: matching programs with key fields named CUST-CODE
and ORDR-CODE (no '-KEY' in name) are now correctly detected.

Also:
- Rule engine type priority: main types (マッチング etc.) override
  secondary types (M:N, DIVIDE) when keyword confidence is low
- has_structural_match injected into features so rule engine can use it
- matching_vs_keybreak accepts equality IFs as matching evidence
- New test: test_structural_matching_no_keyword()

Regression: 764 passed (0 new failures).
This commit is contained in:
NB-076
2026-06-21 15:28:32 +08:00
parent 33762ca959
commit da5d1058e7
4 changed files with 176 additions and 25 deletions
+6 -3
View File
@@ -42,11 +42,14 @@ def resolve_matching_vs_keybreak(features: dict) -> dict:
evidence.append(f"WS-PREV-KEY 存在 + 累加器存在 + IF 分支 → キーブレイク")
return {"resolved_type": "キーブレイク", "confidence": 0.85, "evidence": evidence}
# 补充规则: SELECT 文件数 >= 2 且 comparison 至少 1 → 倾向マッチング
# 补充规则: SELECT 文件数 >= 2 且 comparison/eqlality 至少 1 → 倾向マッチング
# 要求必须有实际的 KEY 变量比较(防止计数器比较误判)
# 或结构性匹配检测信号(变量名不含 KEY 但结构是匹配)
has_key_compare = variable_patterns.get("has_prev_key", False) or features.get("has_key_var", False)
if file_count >= 2 and comparison_ifs >= 1 and has_key_compare:
evidence.append(f"SELECT 文件数 >=2 + comparison IF >=1 + KEY 变量 → マッチング")
has_struct_match = features.get("has_structural_match", False) or features.get("has_prev_key", False)
effective_ifs = comparison_ifs + equality_ifs
if file_count >= 2 and effective_ifs >= 1 and (has_key_compare or has_struct_match):
evidence.append(f"SELECT 文件数 >=2 + IF >=1 + KEY/结构证据 → マッチング")
return {"resolved_type": "マッチング", "confidence": 0.75, "evidence": evidence}
# 回退: 无法明确判定