S14: 58-program benchmark suite — Lark grammar fixes + external COBOL validation
Grammar fixes: 1. COPY regex: handle quoted names COPY "STD-REC.CPY" 2. Quoted name strip: remove quotes before file lookup 3. VALUE clause: support comma-separated 88-level values 4. PIC STRING: support decimal dot (ZZ9.99 -> PICTURE_STRING.99 + DOT) 5. LEVEL: use INT for level number (fixes 05/01/77 all levels) Results on 58 telecom billing COBOL programs: - Parse OK: 54/58 (93%) - Parse fail: 4 (special chars: TAB, X'01', U'NNNN', &) - Classification known issue: matching programs misclassified as '文件编成' because FILE-CONTROL keyword overrides matching signals (requires rule engine priority fix - separate issue) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -95,7 +95,7 @@ _COPYBOOK_EXTENSIONS = ['.cpy', '.cbl', '.cpb', '']
|
||||
def resolve_copybooks(source: str, source_dir: str, _recursion_depth: int = 0) -> str:
|
||||
"""Find COPY statements and replace with copybook content."""
|
||||
_RE_COPY = re.compile(
|
||||
r"^\s*COPY\s+(\w[\w-]*)(?:\s+REPLACING\s+(.+?))?\s*\.?\s*$",
|
||||
r"^\s*COPY\s+(\w[\w-]*|\"[^\"]*\"|\'[^\']*\')(?:\s+REPLACING\s+(.+?))?\s*\.?\s*$",
|
||||
re.IGNORECASE
|
||||
)
|
||||
_RE_PAIR = re.compile(r"==(.+?)==\s+BY\s+==(.+?)==", re.IGNORECASE)
|
||||
@@ -105,7 +105,8 @@ def resolve_copybooks(source: str, source_dir: str, _recursion_depth: int = 0) -
|
||||
for line in lines:
|
||||
m = _RE_COPY.match(line)
|
||||
if m:
|
||||
name = m.group(1).upper()
|
||||
raw_name = m.group(1)
|
||||
name = raw_name.strip('"').strip("'").upper()
|
||||
found = None
|
||||
for ext in _COPYBOOK_EXTENSIONS:
|
||||
p = Path(source_dir, name + ext)
|
||||
|
||||
Reference in New Issue
Block a user