cobol-java-v3

Author	SHA1	Message	Date
hangshuo652	7fb9304212	merge local cobol_testgen improvements into v3 shared modules - cond.py: SQLCODE/SQLSTATE handling, alphanumeric >/< boundary fix - output.py: termination tracking, db_input support, _is_field_assigned filter - coverage.py: mark_from_gcov, THRU support, KeyError protection - gcov.py: new file (dependency for coverage.py) - grammar.lark: multi-segment PIC support - read.py: SQL INCLUDE resolution, DECLARE TABLE parsing, * comment fix - core.py: SQL parsing, blocked_names, keyword list - design.py: multi-sentinel, THRU ranges, PERFORM VARYING last iteration - __init__.py: local main() + v3 API functions, guarded imports All 6 ZAN programs verified passing through v3 pipeline	2026-06-23 22:38:17 +08:00
NB-076	e5ab3baa46	提升：37/37基准程序全量解析+O(N)路径枚举+运行时gcov验证 ## 核心变更 ### 1. 新PROCEDURE DIVISION解析器（procedure_parser.py） - 行级状态机替换旧的BrParser regex解析器 - 覆盖：IF/ELSE/END-IF（嵌套）、EVALUATE/WHEN/ALSO、 PERFORM UNTIL/VARYING、READ/AT END/NOT AT END、 SORT/MERGE、GO TO DEPENDING ON - 之前：3/37程序有分支检测 → 现在：37/37全部有分支 - 速度：~20ms/程序，纯规则引擎 ### 2. 桥接层（pipeline_bridge.py） - 新解析器为主，旧解析器3秒超时兜底 - 自动选取分支数更多的结果 ### 3. 线性路径枚举（design_mcdc.py） - 替换旧的Cartesian积路径枚举（O(2^N)）为每决策点独立枚举（O(N)） - 28-sysin: 162分支仅163条路径（之前需截断到60DP） - 消除了500路径硬上限和60DP截断 ### 4. 条件解析修复（cond.py） - NOT运算符规范化：X NOT = 5 → X <> 5 - 88-level反向：NOT WS-EOF-Y → parent <> value - 裸字段引用：NOT WS-EOF → WS-EOF <> 'Y' - 验证：1182个IF条件中0个NOT污染 ### 5. 约束字段过滤（__init__.py） - OF限定词剥离：STD-KEY OF MASTER-REC → STD-KEY - 下标字段解析：WS-ITEM(SUB) → WS-ITEM - 跳过不在fields_dict中的字段（group item/伪影） ### 6. 预处理器增强（read.py） - VALUE ALL剥离（VALUE ALL '' → VALUE ''） - &续行合并（COBOL多行字符串拼接） - PIC小数点点→V转换（Z(9)9.99. → Z(9)9V99.） - 缺少点号补全 ### 7. Grammar修复（grammar.lark） - OCCURS 1 TIME支持（原只认TIMES） - USAGE IS COMP支持（可选IS） - $符号在PICTURE_STRING中 - 无NAME条款支持（clause+） ### 8. Flatfile写入（flatfile.py） - 多记录FD支持（选字段最多的记录） - Path类型强制转换 - 回退零值记录 ### 9. Bug修复 - trace_to_root空列表保护（core.py） ### 10. 测试套件（S16-S21） - S16: 全量基准程序端到端 - S17: gcov运行时对比 - S18/S19: 桥接器验证 - S20: DISPLAY插桩运行时验证+gcov分支覆盖率 - S21: 条件解析修复验证 - 全部17/17回归测试通过 Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 23:41:22 +08:00
NB-076	6e69dff7a4	fix: 3 bugs confirmed and repaired from honest audit Bug #1: AND compound branch-body MOVE not propagated (HIGH) Root cause: ELSE on same line as false_body, rest of line lost after self.advance(). Fix: reinsert ELSE body text same as ELSE IF does. Result: MOVE 'Y'/'N' TO WS-FLAG correctly propagated, all 3 paths verified (A<=10/B<20=F, A>10/B<20=T, A>10/B>=20=F). Bug #2: Performance — path explosion (25 IFs = 47s, 10000 records) Root cause: BrSeq inner loop combined all paths before capping. Fix: early break at _MAX_PATHS in the combo loop. + _MAX_PATHS reduced from 10000 to 500. Result: 47s/10000rec -> 0.2s/27rec (235x improvement) Bug #3: COPY+REDEFINES parse failure (test-only) Root cause: test code called parse_data_division on full source instead of extract_data_division first. Fixed. Real pipeline (extract_structure -> generate_data) was never affected. Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 11:36:33 +08:00
NB-076	abb283669c	R13: final sweep — EXEC stripping + INSPECT bugfix + more EQ assertions 1. Lark: preprocess strips EXEC CICS/SQL...END-EXEC blocks -> CI01_CICS/DB01_SELECT_UPDATE now parse, 75/75 samples pass 2. propagate_assignments INSPECT TALLYING bugfix: was reading source from count_var (wrong field) instead of asgn['tgt']. Now CNT='005' instead of '003' for len(HELLO)=5. 3. 26 new EQ/falsifiable assertions added (propagate chains, orchestrator state, data_writer, report generator) 4. Hardened: ACCEPT DATE string len check, DataWriter JSON format 16 suites / 0 FAIL. Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 09:37:58 +08:00
NB-076	d8176ea07b	fix: generate_data constraint steering fully repaired Root cause: IF condition and EVALUATE WHEN parsing swallowed entire line including THEN-body (e.g. '50 MOVE BIG...' instead of just '50'). Fix: 1. Single-line IF cond_text truncated at COBOL statement-starting keywords (MOVE/DISPLAY/COMPUTE/ADD/...) 2. Multi-line IF continuation loop also breaks on these keywords (was missing DISPLAY, READ, WRITE, CLOSE, OPEN, SEARCH, ...) 3. EVALUATE WHEN raw_val truncated at same keyword set 4. All raw-string escape sequences fixed (Python 3.12 SyntaxWarning) Verification: - IF single-line A>50: A=51(true)/12(false) previously A=01/00 - IF multi-line X>50: X=51(true)/12(false) previously not steered - EVALUATE WHEN 1/2/OTHER: C=1/2/4 previously C=0/0/0 - IF AND compound: (A<=10,B<20), (A>10,B<20), (A>10,B>=20) - IF >75: A=76(true)/12(false) previously not steered R11 tests updated: BUG documentation replaced with real assertions. 13 suites / 0 FAIL. Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 09:10:21 +08:00
NB-076	0b0a013f51	fix: 3 critical parsing bugs found through statement benchmark testing Bug 1: ELSE IF breaks IF false_seq parsing (core.py) - _parse_if checked self.clean() == 'ELSE' which fails on 'ELSE IF ...' - Fix: use startswith('ELSE'), reinsert IF portion for recursive parse - Impact: ALL ELSE IF chains were silently dropped (huge branch loss) Bug 2: READ skip loop greedily consumes subsequent statements (core.py) - READ's AT END / NOT AT END skip loop used bare advance() with no statement boundary detection - Fix: add _stmt_boundary regex that stops on IF/PERFORM/READ/etc. - Impact: everything after first READ was consumed as 'AT END' lines Bug 3: _walk() in extract_structure doesn't descend into BrPerform (__init__.py) - Branch counting _walk() only handled BrIf/BrEval/BrSeq - IF statements inside PERFORM bodies were never counted - Fix: add BrPerform.body_seq and BrSearch descent Combined impact: matching programs (MT01-33) now correctly report their branches instead of 0. Full regression: 749 passed (unchanged).	2026-06-21 12:52:04 +08:00
hangshuo652	bc1d56d1a4	feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-19 23:51:55 +08:00

7 Commits