cobol-java-v3

Author	SHA1	Message	Date
NB-076	4be2aae66d	fix: 生产级 COBOL 程序解析 — COPY + OCCURS TO + FD 修复对抗性测试发现的生产程序解析缺陷和修复: 缺陷1: COPY 语句从未被预处理（18 个月 bug） - resolve_copybooks() 在 main() CLI 中调用但在 extract_structure() 路径中从未被调用 - 修复: preprocess() 函数头部调用 resolve_copybooks() - 不可解析的 COPY 行被移除（避免 Lark 在 FD 块内遇到无法识别的指令）缺陷2: Lark 语法的 fd 规则要求 data_item+ (至少一个记录) - 生产程序 FD 可以通过 COPY 引入记录定义 - COPY 被移除后 FD 内无 data_item 导致 Lark 崩溃 - 修复: fd 改为 data_item* (零或多个) 缺陷3: OCCURS 1 TO 100 TIMES（变量范围表） - 语法只支持 OCCURS INT TIMES，不支持 OCCURS 1 TO 100 TIMES - 修复: occurs_clause 增加 'TO' INT 可选部分效果: 4 个生产程序中 2 个成功解析（CRDVAL, GENDATA） - 剩余 2 个（CRDCALC, CRDRPT）因固定格式续行限制未修复全回归: 767 passed（0 new failures）	2026-06-21 16:13:58 +08:00
NB-076	0b0a013f51	fix: 3 critical parsing bugs found through statement benchmark testing Bug 1: ELSE IF breaks IF false_seq parsing (core.py) - _parse_if checked self.clean() == 'ELSE' which fails on 'ELSE IF ...' - Fix: use startswith('ELSE'), reinsert IF portion for recursive parse - Impact: ALL ELSE IF chains were silently dropped (huge branch loss) Bug 2: READ skip loop greedily consumes subsequent statements (core.py) - READ's AT END / NOT AT END skip loop used bare advance() with no statement boundary detection - Fix: add _stmt_boundary regex that stops on IF/PERFORM/READ/etc. - Impact: everything after first READ was consumed as 'AT END' lines Bug 3: _walk() in extract_structure doesn't descend into BrPerform (__init__.py) - Branch counting _walk() only handled BrIf/BrEval/BrSeq - IF statements inside PERFORM bodies were never counted - Fix: add BrPerform.body_seq and BrSearch descent Combined impact: matching programs (MT01-33) now correctly report their branches instead of 0. Full regression: 749 passed (unchanged).	2026-06-21 12:52:04 +08:00
NB-076	dbee3b7251	fix: Lark grammar + parse_file_section SD/ASCENDING KEY support Bug fixes found through statement benchmark testing: 1. grammar.lark: Add ASCENDING/DESCENDING KEY IS + INDEXED BY to occurs_clause — fixes HINA024 (SEARCH ALL) parsing crash 2. grammar.lark: Add SD (Sort Description) entry type to file_section — fixes HINA034 (SORT), ST01, ST02 parsing crashes 3. read.py parse_file_section(): Handle SD blocks alongside FD blocks — enables SORT/MERGE file structure extraction 4 previously crashing files now parse successfully: - HINA024.cbl (SEARCH ALL): paras=3, files=0 - HINA034.cbl (SORT): paras=1, files=3 - ST01_SORT.cbl: paras=2, files=3 - ST02_MERGE.cbl: paras=1, files=4 Regression: 749 passed (unchanged — classify_program internally caught the crashes, so tests already 'passed'; real improvement is in data quality: structure extraction now works for these programs)	2026-06-21 12:21:36 +08:00
hangshuo652	bc1d56d1a4	feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-19 23:51:55 +08:00
hangshuo652	e2486db510	fix: 3 issues found during real COBOL validation	2026-06-18 16:26:44 +08:00
hangshuo652	097530b036	feat: Phase 1 - cobol_testgen API + quality fields + retry handler	2026-06-18 15:47:35 +08:00

6 Commits