S14: 58-program benchmark suite — Lark grammar fixes + external COBOL validation
Grammar fixes: 1. COPY regex: handle quoted names COPY "STD-REC.CPY" 2. Quoted name strip: remove quotes before file lookup 3. VALUE clause: support comma-separated 88-level values 4. PIC STRING: support decimal dot (ZZ9.99 -> PICTURE_STRING.99 + DOT) 5. LEVEL: use INT for level number (fixes 05/01/77 all levels) Results on 58 telecom billing COBOL programs: - Parse OK: 54/58 (93%) - Parse fail: 4 (special chars: TAB, X'01', U'NNNN', &) - Classification known issue: matching programs misclassified as '文件编成' because FILE-CONTROL keyword overrides matching signals (requires rule engine priority fix - separate issue) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -7,14 +7,15 @@ FD_SUFFIX: /(?:"[^"]*"|'[^']*'|[^.])*\./
|
||||
working_storage: "WORKING-STORAGE" "SECTION" DOT data_item*
|
||||
linkage: "LINKAGE" "SECTION" DOT data_item*
|
||||
data_item: level_num (NAME | "FILLER") clause* DOT
|
||||
level_num: LEVEL
|
||||
level_num: INT
|
||||
clause: pic_clause | value_clause | occurs_clause | redefines_clause | usage_clause
|
||||
| "SYNC" | "SYNCHRONIZED"
|
||||
| "JUSTIFIED" "RIGHT"?
|
||||
| "BLANK" "WHEN" "ZERO"
|
||||
| "GLOBAL" | "EXTERNAL"
|
||||
pic_clause: "PIC" "IS"? PICTURE_STRING
|
||||
value_clause: "VALUE" "IS"? value_literal+
|
||||
value_clause: "VALUE" "IS"? value_list
|
||||
value_list: value_literal (","? value_literal)*
|
||||
value_literal: INT | SIGNED_NUMBER | STRING | SQSTRING
|
||||
| "ZERO" | "ZEROS" | "ZEROES"
|
||||
| "SPACE" | "SPACES"
|
||||
@@ -27,9 +28,9 @@ key_clause: ("ASCENDING" | "DESCENDING") "KEY" "IS"? NAME (","? NAME)*
|
||||
indexed_clause: "INDEXED" "BY" NAME (","? NAME)*
|
||||
usage_clause: USAGE_VAL
|
||||
USAGE_VAL: "COMP" | "COMP-3" | "COMP-5" | "BINARY" | "PACKED-DECIMAL" | "DISPLAY"
|
||||
LEVEL: /0[1-9]|[1-4][0-9]|49|77|88/
|
||||
LEVEL: /0[1-9]|[0-4][0-9]|49|77|88|[0-9]+/
|
||||
NAME: /[A-Z][A-Z0-9-]*/i
|
||||
PICTURE_STRING: /[0-9A-Z()+,\-*\/V]+/i
|
||||
PICTURE_STRING: /[0-9A-Z()+,\-*\/V]+(?:\.[0-9A-Z()+,\-*\/V]+)?/i
|
||||
INT: /[0-9]+/
|
||||
DOT: /\./
|
||||
%import common.SIGNED_NUMBER
|
||||
|
||||
Reference in New Issue
Block a user