Files
cobol-java-v3/cobol_testgen/grammar.lark
T
NB-076 3b150b6c54 S14: 58-program benchmark suite — Lark grammar fixes + external COBOL validation
Grammar fixes:
1. COPY regex: handle quoted names COPY "STD-REC.CPY"
2. Quoted name strip: remove quotes before file lookup
3. VALUE clause: support comma-separated 88-level values
4. PIC STRING: support decimal dot (ZZ9.99 -> PICTURE_STRING.99 + DOT)
5. LEVEL: use INT for level number (fixes 05/01/77 all levels)

Results on 58 telecom billing COBOL programs:
- Parse OK: 54/58 (93%)
- Parse fail: 4 (special chars: TAB, X'01', U'NNNN', &)
- Classification known issue: matching programs misclassified as
  '文件编成' because FILE-CONTROL keyword overrides matching signals
  (requires rule engine priority fix - separate issue)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-22 12:31:00 +08:00

40 lines
1.6 KiB
Plaintext

start: data_div_content
data_div_content: (file_section | working_storage | linkage)*
file_section: "FILE" "SECTION" DOT (fd | sd)+
fd: "FD" NAME FD_SUFFIX data_item*
sd: "SD" NAME FD_SUFFIX data_item*
FD_SUFFIX: /(?:"[^"]*"|'[^']*'|[^.])*\./
working_storage: "WORKING-STORAGE" "SECTION" DOT data_item*
linkage: "LINKAGE" "SECTION" DOT data_item*
data_item: level_num (NAME | "FILLER") clause* DOT
level_num: INT
clause: pic_clause | value_clause | occurs_clause | redefines_clause | usage_clause
| "SYNC" | "SYNCHRONIZED"
| "JUSTIFIED" "RIGHT"?
| "BLANK" "WHEN" "ZERO"
| "GLOBAL" | "EXTERNAL"
pic_clause: "PIC" "IS"? PICTURE_STRING
value_clause: "VALUE" "IS"? value_list
value_list: value_literal (","? value_literal)*
value_literal: INT | SIGNED_NUMBER | STRING | SQSTRING
| "ZERO" | "ZEROS" | "ZEROES"
| "SPACE" | "SPACES"
| "HIGH-VALUE" | "HIGH-VALUES"
| "LOW-VALUE" | "LOW-VALUES"
SQSTRING: /'[^']*'/
redefines_clause: "REDEFINES" NAME
occurs_clause: "OCCURS" INT ("TO" INT)? "TIMES"? ("DEPENDING" "ON" NAME)? key_clause? indexed_clause?
key_clause: ("ASCENDING" | "DESCENDING") "KEY" "IS"? NAME (","? NAME)*
indexed_clause: "INDEXED" "BY" NAME (","? NAME)*
usage_clause: USAGE_VAL
USAGE_VAL: "COMP" | "COMP-3" | "COMP-5" | "BINARY" | "PACKED-DECIMAL" | "DISPLAY"
LEVEL: /0[1-9]|[0-4][0-9]|49|77|88|[0-9]+/
NAME: /[A-Z][A-Z0-9-]*/i
PICTURE_STRING: /[0-9A-Z()+,\-*\/V]+(?:\.[0-9A-Z()+,\-*\/V]+)?/i
INT: /[0-9]+/
DOT: /\./
%import common.SIGNED_NUMBER
%import common.ESCAPED_STRING -> STRING
%import common.WS
%ignore WS