fix: 3 critical parsing bugs found through statement benchmark testing

Bug 1: ELSE IF breaks IF false_seq parsing (core.py) - _parse_if checked self.clean() == 'ELSE' which fails on 'ELSE IF ...' - Fix: use startswith('ELSE'), reinsert IF portion for recursive parse - Impact: ALL ELSE IF chains were silently dropped (huge branch loss) Bug 2: READ skip loop greedily consumes subsequent statements (core.py) - READ's AT END / NOT AT END skip loop used bare advance() with no statement boundary detection - Fix: add _stmt_boundary regex that stops on IF/PERFORM/READ/etc. - Impact: everything after first READ was consumed as 'AT END' lines Bug 3: _walk() in extract_structure doesn't descend into BrPerform (__init__.py) - Branch counting _walk() only handled BrIf/BrEval/BrSeq - IF statements inside PERFORM bodies were never counted - Fix: add BrPerform.body_seq and BrSearch descent Combined impact: matching programs (MT01-33) now correctly report their branches instead of 0. Full regression: 749 passed (unchanged).
2026-06-21 12:52:04 +08:00
parent dbee3b7251
commit 0b0a013f51
2 changed files with 24 additions and 3 deletions
@@ -372,7 +372,7 @@ def extract_structure(cobol_source: str) -> dict:
    file_sec = parse_file_section(preprocessed)
    open_dir = scan_open_statements(proc_div) if proc_div else {}

-    from .models import BrIf, BrEval, BrSeq, BrPerform, Assign, CondAnd, CondOr
+    from .models import BrIf, BrEval, BrSeq, BrPerform, BrSearch, Assign, CondAnd, CondOr

    decision_points = []
    total_branches = 0
@@ -403,6 +403,12 @@ def extract_structure(cobol_source: str) -> dict:
        elif isinstance(node, BrSeq):
            for child in node.children:
                _walk(child, counter)
+        elif isinstance(node, BrPerform):
+            _walk(node.body_seq, counter)
+        elif isinstance(node, BrSearch):
+            _walk(node.at_end_seq, counter)
+            for _, seq in node.when_list:
+                _walk(seq, counter)

    if branch_tree:
        _walk(branch_tree, [0])