Root cause: IF condition and EVALUATE WHEN parsing swallowed entire
line including THEN-body (e.g. '50 MOVE BIG...' instead of just '50').
Fix:
1. Single-line IF cond_text truncated at COBOL statement-starting keywords
(MOVE/DISPLAY/COMPUTE/ADD/...)
2. Multi-line IF continuation loop also breaks on these keywords (was
missing DISPLAY, READ, WRITE, CLOSE, OPEN, SEARCH, ...)
3. EVALUATE WHEN raw_val truncated at same keyword set
4. All raw-string escape sequences fixed (Python 3.12 SyntaxWarning)
Verification:
- IF single-line A>50: A=51(true)/12(false) previously A=01/00
- IF multi-line X>50: X=51(true)/12(false) previously not steered
- EVALUATE WHEN 1/2/OTHER: C=1/2/4 previously C=0/0/0
- IF AND compound: (A<=10,B<20), (A>10,B<20), (A>10,B>=20)
- IF >75: A=76(true)/12(false) previously not steered
R11 tests updated: BUG documentation replaced with real assertions.
13 suites / 0 FAIL.
Co-Authored-By: Claude <noreply@anthropic.com>
BREAKING CHANGE DISCOVERED: generate_data constraint steering is BROKEN
- apply_constraint does not steer field values to satisfy branch conditions
- All generate_data tests now DOCUMENT this as known bug
- Previous tests never caught this because they only checked 'is not None'
What R11 actually verifies:
1. AST structure: IF CondAnd leaves, EVAL WHEN count, CALL params,
SEARCH ALL flag, PERFORM type — verified by attribute equality
2. propagate_assignments: chain values verified (X=100, Y=105, INSPECT ALL L->X)
arithmetic chain ((0+5-2)*3/2 = 4)
3. GnuCOBOL: real compilation + execution output captured
HELLO WORLD, IF branch (DISPLAY 01), PERFORM loop (SUM=15)
4. gcov: --coverage compile, run, line rate measurement
5. Exception paths: bad syntax, empty sections, newlines, garbage bytes
6. pipeline: classify result non-empty
7. orchestrator: _done state machine with value assertions
Co-Authored-By: Claude <noreply@anthropic.com>
All 58 test cases across 6 roles now passing:
- 65 recorded passes (some tests assert multiple things)
- 0 failures
- All L1 regex patterns verified with proper COBOL source format
- Fixed inline format issues: P() now adds \n after preamble,
P-002 uses chr(10) for proper newlines, CRLF test uses chr(13)+chr(10)
Regression: 767 passed (0 new)
Add _detect_matching_structure(): detection based on control flow
pattern, not variable naming conventions. Uses 5 structural signals:
1. READ + AT END + EOF pattern
2. PERFORM UNTIL with EOF condition
3. ELSE body with conditional READ (matching core)
4. IF comparing hyphenated fields (cross-file comparison)
5. Multi-file OPEN INPUT
5/5 signals → 0.55, 4/5 → 0.50, 3/5 → 0.40.
Real-world impact: matching programs with key fields named CUST-CODE
and ORDR-CODE (no '-KEY' in name) are now correctly detected.
Also:
- Rule engine type priority: main types (マッチング etc.) override
secondary types (M:N, DIVIDE) when keyword confidence is low
- has_structural_match injected into features so rule engine can use it
- matching_vs_keybreak accepts equality IFs as matching evidence
- New test: test_structural_matching_no_keyword()
Regression: 764 passed (0 new failures).
COBOL migration expert adversarial testing found 4 real defects:
FIX 1: Comment-stripping in detect_keyword() (FP-2)
- Remove *> inline comments and * comment lines before keyword matching
- Prevents 「マッチング」 from triggering on WS-KEY in comments
FIX 2: KEY comparison context validation (FP-1, FP-6)
- Add _matches_key_comparison() — requires WS-KEY variable to appear
NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl
- Same check in _path_rule_engine features via has_key_var injection
- Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl
FIX 3: Old-school naming support (FN-1)
- Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence
- Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention)
FIX 4: mn_output_mode over-matching (FP-6)
- Require IF branches + KEY evidence before returning M:N for file>=3
- matching_vs_keybreak rule 3 now requires has_key_var
New tests: test_adversarial.py — 8 parametrized adversarial tests
Regression: 755 passed (0 new failures)
Refactor _resolve_matching_subtype to use an LLM agent for ambiguous
cases instead of pure static rules:
Architecture (3 layers):
1. Static deterministic rules: M:N→MxN, 1:N (WS-MAST/TRAN-KEY),
二段階, 混合 — high confidence, no LLM needed
2. LLM agent: ambiguous cases (N:1 vs 1:1, M:N→M vs M:N→N)
- _MATCHING_SUBTYPE_AGENT_PROMPT with 5 subtypes
- Calls existing hina.hina_agent._parse_llm_response for parsing
- Minimum confidence threshold 0.4 to gate low-quality LLM output
3. Fallback: conservative defaults (M:N or 1:1) when LLM unavailable
This follows the original architecture design: agent handles the
hard classification problems that static analysis alone can't resolve.
Regression: 745 passed (unchanged).
Issues found through matching program classification analysis:
1. dedup_vs_nodedup: 0.85→0.50 for negative detection (no WS-PREV-KEY
is not strong evidence for '含まず')
2. validation_vs_keybreak: 0.80→0.55 for has_counter (counter is a
generic pattern, not specific to key-break)
3. simple_vs_two_stage: 0.80→0.50 for non-open-close-open pattern
(sequential OPEN is the default for most programs)
Result: matching programs now correctly classified:
- MT01-03/18/20 → マッチング ✅ (was 項目チェック)
- MT16-17 → 二段階マッチング ✅ (unchanged)
- MT32 → 項目チェック(重複含む) ✅ (correct: has WS-PREV-KEY)
- VL01 → 項目チェック(重複含む) ✅ (correct)
- CSV → CSV合并 ✅ (correct)
Regression: 745 passed (3 test expectation bounds updated)
Bug 1: ELSE IF breaks IF false_seq parsing (core.py)
- _parse_if checked self.clean() == 'ELSE' which fails on 'ELSE IF ...'
- Fix: use startswith('ELSE'), reinsert IF portion for recursive parse
- Impact: ALL ELSE IF chains were silently dropped (huge branch loss)
Bug 2: READ skip loop greedily consumes subsequent statements (core.py)
- READ's AT END / NOT AT END skip loop used bare advance() with no
statement boundary detection
- Fix: add _stmt_boundary regex that stops on IF/PERFORM/READ/etc.
- Impact: everything after first READ was consumed as 'AT END' lines
Bug 3: _walk() in extract_structure doesn't descend into BrPerform (__init__.py)
- Branch counting _walk() only handled BrIf/BrEval/BrSeq
- IF statements inside PERFORM bodies were never counted
- Fix: add BrPerform.body_seq and BrSearch descent
Combined impact: matching programs (MT01-33) now correctly report
their branches instead of 0. Full regression: 749 passed (unchanged).
M1: Cache confusion-pair confidences in Path B (eliminate redundant
resolve_confusion_pair re-calls in _path_rule_engine)
M2: Resolve contradictions in Path C instead of hardcoding
resolved_count=0 in _path_llm_assisted
M4: Add DIVIDE_25 to contradiction pair coverage (50-25, 100-25)
and update test_contradiction_pairs_defined to verify all 3 variants