Add _detect_matching_structure(): detection based on control flow
pattern, not variable naming conventions. Uses 5 structural signals:
1. READ + AT END + EOF pattern
2. PERFORM UNTIL with EOF condition
3. ELSE body with conditional READ (matching core)
4. IF comparing hyphenated fields (cross-file comparison)
5. Multi-file OPEN INPUT
5/5 signals → 0.55, 4/5 → 0.50, 3/5 → 0.40.
Real-world impact: matching programs with key fields named CUST-CODE
and ORDR-CODE (no '-KEY' in name) are now correctly detected.
Also:
- Rule engine type priority: main types (マッチング etc.) override
secondary types (M:N, DIVIDE) when keyword confidence is low
- has_structural_match injected into features so rule engine can use it
- matching_vs_keybreak accepts equality IFs as matching evidence
- New test: test_structural_matching_no_keyword()
Regression: 764 passed (0 new failures).
COBOL migration expert adversarial testing found 4 real defects:
FIX 1: Comment-stripping in detect_keyword() (FP-2)
- Remove *> inline comments and * comment lines before keyword matching
- Prevents 「マッチング」 from triggering on WS-KEY in comments
FIX 2: KEY comparison context validation (FP-1, FP-6)
- Add _matches_key_comparison() — requires WS-KEY variable to appear
NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl
- Same check in _path_rule_engine features via has_key_var injection
- Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl
FIX 3: Old-school naming support (FN-1)
- Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence
- Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention)
FIX 4: mn_output_mode over-matching (FP-6)
- Require IF branches + KEY evidence before returning M:N for file>=3
- matching_vs_keybreak rule 3 now requires has_key_var
New tests: test_adversarial.py — 8 parametrized adversarial tests
Regression: 755 passed (0 new failures)
Issues found through matching program classification analysis:
1. dedup_vs_nodedup: 0.85→0.50 for negative detection (no WS-PREV-KEY
is not strong evidence for '含まず')
2. validation_vs_keybreak: 0.80→0.55 for has_counter (counter is a
generic pattern, not specific to key-break)
3. simple_vs_two_stage: 0.80→0.50 for non-open-close-open pattern
(sequential OPEN is the default for most programs)
Result: matching programs now correctly classified:
- MT01-03/18/20 → マッチング ✅ (was 項目チェック)
- MT16-17 → 二段階マッチング ✅ (unchanged)
- MT32 → 項目チェック(重複含む) ✅ (correct: has WS-PREV-KEY)
- VL01 → 項目チェック(重複含む) ✅ (correct)
- CSV → CSV合并 ✅ (correct)
Regression: 745 passed (3 test expectation bounds updated)
M1: Cache confusion-pair confidences in Path B (eliminate redundant
resolve_confusion_pair re-calls in _path_rule_engine)
M2: Resolve contradictions in Path C instead of hardcoding
resolved_count=0 in _path_llm_assisted
M4: Add DIVIDE_25 to contradiction pair coverage (50-25, 100-25)
and update test_contradiction_pairs_defined to verify all 3 variants