hangshuo652
|
7fb9304212
|
merge local cobol_testgen improvements into v3 shared modules
- cond.py: SQLCODE/SQLSTATE handling, alphanumeric >/< boundary fix
- output.py: termination tracking, db_input support, _is_field_assigned filter
- coverage.py: mark_from_gcov, THRU support, KeyError protection
- gcov.py: new file (dependency for coverage.py)
- grammar.lark: multi-segment PIC support
- read.py: SQL INCLUDE resolution, DECLARE TABLE parsing, * comment fix
- core.py: SQL parsing, blocked_names, keyword list
- design.py: multi-sentinel, THRU ranges, PERFORM VARYING last iteration
- __init__.py: local main() + v3 API functions, guarded imports
All 6 ZAN programs verified passing through v3 pipeline
|
2026-06-23 22:38:17 +08:00 |
|
NB-076
|
e5ab3baa46
|
提升:37/37基准程序全量解析+O(N)路径枚举+运行时gcov验证
## 核心变更
### 1. 新PROCEDURE DIVISION解析器(procedure_parser.py)
- 行级状态机替换旧的BrParser regex解析器
- 覆盖:IF/ELSE/END-IF(嵌套)、EVALUATE/WHEN/ALSO、
PERFORM UNTIL/VARYING、READ/AT END/NOT AT END、
SORT/MERGE、GO TO DEPENDING ON
- 之前:3/37程序有分支检测 → 现在:37/37全部有分支
- 速度:~20ms/程序,纯规则引擎
### 2. 桥接层(pipeline_bridge.py)
- 新解析器为主,旧解析器3秒超时兜底
- 自动选取分支数更多的结果
### 3. 线性路径枚举(design_mcdc.py)
- 替换旧的Cartesian积路径枚举(O(2^N))为每决策点独立枚举(O(N))
- 28-sysin: 162分支仅163条路径(之前需截断到60DP)
- 消除了500路径硬上限和60DP截断
### 4. 条件解析修复(cond.py)
- NOT运算符规范化:X NOT = 5 → X <> 5
- 88-level反向:NOT WS-EOF-Y → parent <> value
- 裸字段引用:NOT WS-EOF → WS-EOF <> 'Y'
- 验证:1182个IF条件中0个NOT污染
### 5. 约束字段过滤(__init__.py)
- OF限定词剥离:STD-KEY OF MASTER-REC → STD-KEY
- 下标字段解析:WS-ITEM(SUB) → WS-ITEM
- 跳过不在fields_dict中的字段(group item/伪影)
### 6. 预处理器增强(read.py)
- VALUE ALL剥离(VALUE ALL '*' → VALUE '*')
- &续行合并(COBOL多行字符串拼接)
- PIC小数点点→V转换(Z(9)9.99. → Z(9)9V99.)
- 缺少点号补全
### 7. Grammar修复(grammar.lark)
- OCCURS 1 TIME支持(原只认TIMES)
- USAGE IS COMP支持(可选IS)
- $符号在PICTURE_STRING中
- 无NAME条款支持(clause+)
### 8. Flatfile写入(flatfile.py)
- 多记录FD支持(选字段最多的记录)
- Path类型强制转换
- 回退零值记录
### 9. Bug修复
- trace_to_root空列表保护(core.py)
### 10. 测试套件(S16-S21)
- S16: 全量基准程序端到端
- S17: gcov运行时对比
- S18/S19: 桥接器验证
- S20: DISPLAY插桩运行时验证+gcov分支覆盖率
- S21: 条件解析修复验证
- 全部17/17回归测试通过
Co-Authored-By: Claude <noreply@anthropic.com>
|
2026-06-22 23:41:22 +08:00 |
|
NB-076
|
bb4a7a2346
|
fix: classification修复+grammar增强+75/75回归确认
分类修复:
- FILE-CONTROL关键词(0.99)错误覆盖匹配检测信号
- 添加匹配型规则引擎更优优先级,确保匹配检测结果优先
- has_matching_kw特征注入,使IF-less匹配程序也能识别
Grammar增强:
- LEVEL扩展到/[0-9]+/覆盖所有COBOL层级号
- HEX_STRING添加支持X'...'十六进制字面量
- VALUE子句逗号预处理剥离(88-level多值)
- COPY正则支持引号包覆的名称
结果: 内部75/75, 外部基准54/58(93%)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2026-06-22 13:18:07 +08:00 |
|
NB-076
|
3b150b6c54
|
S14: 58-program benchmark suite — Lark grammar fixes + external COBOL validation
Grammar fixes:
1. COPY regex: handle quoted names COPY "STD-REC.CPY"
2. Quoted name strip: remove quotes before file lookup
3. VALUE clause: support comma-separated 88-level values
4. PIC STRING: support decimal dot (ZZ9.99 -> PICTURE_STRING.99 + DOT)
5. LEVEL: use INT for level number (fixes 05/01/77 all levels)
Results on 58 telecom billing COBOL programs:
- Parse OK: 54/58 (93%)
- Parse fail: 4 (special chars: TAB, X'01', U'NNNN', &)
- Classification known issue: matching programs misclassified as
'文件编成' because FILE-CONTROL keyword overrides matching signals
(requires rule engine priority fix - separate issue)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2026-06-22 12:31:00 +08:00 |
|
NB-076
|
4be2aae66d
|
fix: 生产级 COBOL 程序解析 — COPY + OCCURS TO + FD 修复
对抗性测试发现的生产程序解析缺陷和修复:
缺陷1: COPY 语句从未被预处理(18 个月 bug)
- resolve_copybooks() 在 main() CLI 中调用但在 extract_structure() 路径中从未被调用
- 修复: preprocess() 函数头部调用 resolve_copybooks()
- 不可解析的 COPY 行被移除(避免 Lark 在 FD 块内遇到无法识别的指令)
缺陷2: Lark 语法的 fd 规则要求 data_item+ (至少一个记录)
- 生产程序 FD 可以通过 COPY 引入记录定义
- COPY 被移除后 FD 内无 data_item 导致 Lark 崩溃
- 修复: fd 改为 data_item* (零或多个)
缺陷3: OCCURS 1 TO 100 TIMES(变量范围表)
- 语法只支持 OCCURS INT TIMES,不支持 OCCURS 1 TO 100 TIMES
- 修复: occurs_clause 增加 'TO' INT 可选部分
效果: 4 个生产程序中 2 个成功解析(CRDVAL, GENDATA)
- 剩余 2 个(CRDCALC, CRDRPT)因固定格式续行限制未修复
全回归: 767 passed(0 new failures)
|
2026-06-21 16:13:58 +08:00 |
|
NB-076
|
dbee3b7251
|
fix: Lark grammar + parse_file_section SD/ASCENDING KEY support
Bug fixes found through statement benchmark testing:
1. grammar.lark: Add ASCENDING/DESCENDING KEY IS + INDEXED BY to
occurs_clause — fixes HINA024 (SEARCH ALL) parsing crash
2. grammar.lark: Add SD (Sort Description) entry type to file_section
— fixes HINA034 (SORT), ST01, ST02 parsing crashes
3. read.py parse_file_section(): Handle SD blocks alongside FD blocks
— enables SORT/MERGE file structure extraction
4 previously crashing files now parse successfully:
- HINA024.cbl (SEARCH ALL): paras=3, files=0
- HINA034.cbl (SORT): paras=1, files=3
- ST01_SORT.cbl: paras=2, files=3
- ST02_MERGE.cbl: paras=1, files=4
Regression: 749 passed (unchanged — classify_program internally caught
the crashes, so tests already 'passed'; real improvement is in data
quality: structure extraction now works for these programs)
|
2026-06-21 12:21:36 +08:00 |
|
hangshuo652
|
bc1d56d1a4
|
feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark
P0.6: gcov infrastructure
P1: extract_structure output expansion (11 new feature fields)
P2: Confusion group rule engine (8 pairs + contradiction + backtrack)
P3: 4-factor confidence calculation + quality gate update
P4: 33+2 COBOL program type test samples (22 files, 7 categories)
P5: parametrized/ test data generation engine
P6: japanese_data.py lookup tables
P7-10: Type-specific test suites (~159 parametrized tests)
P11: Full classification pipeline (classify_program) + orchestrator integration
P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix)
Architecture decisions:
- classification_pipeline/ merged to hina/pipeline/
- parametrized/ as independent module
- japanese_data.py as root-level file
- hina/__all__ only exports classify_program()
Co-Authored-By: Claude <noreply@anthropic.com>
|
2026-06-19 23:51:55 +08:00 |
|