fix: 生产级 COBOL 程序解析 — COPY + OCCURS TO + FD 修复

对抗性测试发现的生产程序解析缺陷和修复:

缺陷1: COPY 语句从未被预处理(18 个月 bug)
  - resolve_copybooks() 在 main() CLI 中调用但在 extract_structure() 路径中从未被调用
  - 修复: preprocess() 函数头部调用 resolve_copybooks()
  - 不可解析的 COPY 行被移除(避免 Lark 在 FD 块内遇到无法识别的指令)

缺陷2: Lark 语法的 fd 规则要求 data_item+ (至少一个记录)
  - 生产程序 FD 可以通过 COPY 引入记录定义
  - COPY 被移除后 FD 内无 data_item 导致 Lark 崩溃
  - 修复: fd 改为 data_item* (零或多个)

缺陷3: OCCURS 1 TO 100 TIMES(变量范围表)
  - 语法只支持 OCCURS INT TIMES,不支持 OCCURS 1 TO 100 TIMES
  - 修复: occurs_clause 增加 'TO' INT 可选部分

效果: 4 个生产程序中 2 个成功解析(CRDVAL, GENDATA)
  - 剩余 2 个(CRDCALC, CRDRPT)因固定格式续行限制未修复

全回归: 767 passed(0 new failures)
This commit is contained in:
NB-076
2026-06-21 16:13:58 +08:00
parent cdba324b5a
commit 4be2aae66d
3 changed files with 14 additions and 7 deletions
+3 -3
View File
@@ -1,7 +1,7 @@
start: data_div_content
data_div_content: (file_section | working_storage | linkage)*
file_section: "FILE" "SECTION" DOT (fd | sd)+
fd: "FD" NAME FD_SUFFIX data_item+
fd: "FD" NAME FD_SUFFIX data_item*
sd: "SD" NAME FD_SUFFIX data_item*
FD_SUFFIX: /(?:"[^"]*"|'[^']*'|[^.])*\./
working_storage: "WORKING-STORAGE" "SECTION" DOT data_item*
@@ -22,13 +22,13 @@ value_literal: INT | SIGNED_NUMBER | STRING | SQSTRING
| "LOW-VALUE" | "LOW-VALUES"
SQSTRING: /'[^']*'/
redefines_clause: "REDEFINES" NAME
occurs_clause: "OCCURS" INT "TIMES"? ("DEPENDING" "ON" NAME)? key_clause? indexed_clause?
occurs_clause: "OCCURS" INT ("TO" INT)? "TIMES"? ("DEPENDING" "ON" NAME)? key_clause? indexed_clause?
key_clause: ("ASCENDING" | "DESCENDING") "KEY" "IS"? NAME (","? NAME)*
indexed_clause: "INDEXED" "BY" NAME (","? NAME)*
usage_clause: USAGE_VAL
USAGE_VAL: "COMP" | "COMP-3" | "COMP-5" | "BINARY" | "PACKED-DECIMAL" | "DISPLAY"
LEVEL: /0[1-9]|[1-4][0-9]|49|77|88/
NAME: /[A-Z][A-Z0-9-]*/
NAME: /[A-Z][A-Z0-9-]*/i
PICTURE_STRING: /[0-9A-Z()+,\-*\/V]+/i
INT: /[0-9]+/
DOT: /\./