cobol-java-v3

Author	SHA1	Message	Date
NB-076	bb4a7a2346	fix: classification修复+grammar增强+75/75回归确认分类修复: - FILE-CONTROL关键词(0.99)错误覆盖匹配检测信号 - 添加匹配型规则引擎更优优先级，确保匹配检测结果优先 - has_matching_kw特征注入，使IF-less匹配程序也能识别 Grammar增强: - LEVEL扩展到/[0-9]+/覆盖所有COBOL层级号 - HEX_STRING添加支持X'...'十六进制字面量 - VALUE子句逗号预处理剥离(88-level多值) - COPY正则支持引号包覆的名称结果: 内部75/75, 外部基准54/58(93%) Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 13:18:07 +08:00
NB-076	3b150b6c54	S14: 58-program benchmark suite — Lark grammar fixes + external COBOL validation Grammar fixes: 1. COPY regex: handle quoted names COPY "STD-REC.CPY" 2. Quoted name strip: remove quotes before file lookup 3. VALUE clause: support comma-separated 88-level values 4. PIC STRING: support decimal dot (ZZ9.99 -> PICTURE_STRING.99 + DOT) 5. LEVEL: use INT for level number (fixes 05/01/77 all levels) Results on 58 telecom billing COBOL programs: - Parse OK: 54/58 (93%) - Parse fail: 4 (special chars: TAB, X'01', U'NNNN', &) - Classification known issue: matching programs misclassified as '文件编成' because FILE-CONTROL keyword overrides matching signals (requires rule engine priority fix - separate issue) Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-22 12:31:00 +08:00
NB-076	4be2aae66d	fix: 生产级 COBOL 程序解析 — COPY + OCCURS TO + FD 修复对抗性测试发现的生产程序解析缺陷和修复: 缺陷1: COPY 语句从未被预处理（18 个月 bug） - resolve_copybooks() 在 main() CLI 中调用但在 extract_structure() 路径中从未被调用 - 修复: preprocess() 函数头部调用 resolve_copybooks() - 不可解析的 COPY 行被移除（避免 Lark 在 FD 块内遇到无法识别的指令）缺陷2: Lark 语法的 fd 规则要求 data_item+ (至少一个记录) - 生产程序 FD 可以通过 COPY 引入记录定义 - COPY 被移除后 FD 内无 data_item 导致 Lark 崩溃 - 修复: fd 改为 data_item* (零或多个) 缺陷3: OCCURS 1 TO 100 TIMES（变量范围表） - 语法只支持 OCCURS INT TIMES，不支持 OCCURS 1 TO 100 TIMES - 修复: occurs_clause 增加 'TO' INT 可选部分效果: 4 个生产程序中 2 个成功解析（CRDVAL, GENDATA） - 剩余 2 个（CRDCALC, CRDRPT）因固定格式续行限制未修复全回归: 767 passed（0 new failures）	2026-06-21 16:13:58 +08:00
NB-076	dbee3b7251	fix: Lark grammar + parse_file_section SD/ASCENDING KEY support Bug fixes found through statement benchmark testing: 1. grammar.lark: Add ASCENDING/DESCENDING KEY IS + INDEXED BY to occurs_clause — fixes HINA024 (SEARCH ALL) parsing crash 2. grammar.lark: Add SD (Sort Description) entry type to file_section — fixes HINA034 (SORT), ST01, ST02 parsing crashes 3. read.py parse_file_section(): Handle SD blocks alongside FD blocks — enables SORT/MERGE file structure extraction 4 previously crashing files now parse successfully: - HINA024.cbl (SEARCH ALL): paras=3, files=0 - HINA034.cbl (SORT): paras=1, files=3 - ST01_SORT.cbl: paras=2, files=3 - ST02_MERGE.cbl: paras=1, files=4 Regression: 749 passed (unchanged — classify_program internally caught the crashes, so tests already 'passed'; real improvement is in data quality: structure extraction now works for these programs)	2026-06-21 12:21:36 +08:00
hangshuo652	bc1d56d1a4	feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-19 23:51:55 +08:00

5 Commits