fix: adversarial testing — 4 false positive/negative fixes + comment stripping
COBOL migration expert adversarial testing found 4 real defects:
FIX 1: Comment-stripping in detect_keyword() (FP-2)
- Remove *> inline comments and * comment lines before keyword matching
- Prevents 「マッチング」 from triggering on WS-KEY in comments
FIX 2: KEY comparison context validation (FP-1, FP-6)
- Add _matches_key_comparison() — requires WS-KEY variable to appear
NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl
- Same check in _path_rule_engine features via has_key_var injection
- Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl
FIX 3: Old-school naming support (FN-1)
- Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence
- Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention)
FIX 4: mn_output_mode over-matching (FP-6)
- Require IF branches + KEY evidence before returning M:N for file>=3
- matching_vs_keybreak rule 3 now requires has_key_var
New tests: test_adversarial.py — 8 parametrized adversarial tests
Regression: 755 passed (0 new failures)
This commit is contained in:
@@ -156,6 +156,17 @@ def _path_rule_engine(
|
||||
# 1. 结构特征直接作为 features
|
||||
features = dict(structure)
|
||||
|
||||
# 注入 has_key_var: 源码中是否存在实际的 KEY 比较
|
||||
# (避免 matching_vs_keybreak 规则被计数器比较误触发)
|
||||
if features.get("source_upper"):
|
||||
import re
|
||||
su = features["source_upper"]
|
||||
features["has_key_var"] = bool(re.search(
|
||||
r'WS-[\w-]*KEY[A-Z0-9-]*\s*[=<>]|' # WS-KEY = / WS-KEY >
|
||||
r'\b[A-Z]\d{0,2}-[\w-]*KEY\s*[=<>]', # K01-KEY =
|
||||
su
|
||||
))
|
||||
|
||||
# 2. 运行所有混淆组解析器
|
||||
resolved_types: dict[str, str] = {}
|
||||
resolved_confidences: dict[str, float] = {}
|
||||
@@ -570,6 +581,10 @@ def classify_program(cobol_source: str, llm: Any = None) -> dict:
|
||||
except Exception as e:
|
||||
logger.warning("[pipeline] extract_structure 失败: %s", e)
|
||||
|
||||
# 注入源代码用于 features 中的上下文验证(如 has_key_var)
|
||||
if structure:
|
||||
structure["source_upper"] = cobol_source.upper()
|
||||
|
||||
# ── 第 2 步: 分析关键字结果, 确定路径 ──
|
||||
keyword_info = _get_best_keyword_match(keyword_matches)
|
||||
max_keyword_confidence = keyword_info["confidence"] if keyword_info else 0.0
|
||||
|
||||
Reference in New Issue
Block a user